Skip to main content

Research Skill Workflow

This is the canonical operator runbook for the experimental page-content research workflow that landed in docs-v2-dev. Use this page for:
  • operational usage
  • source-of-truth boundaries
  • readiness status
  • maintenance and improvement workflow
Do not use rollout reports or tests/README.md as the canonical narrative source.

What Is Canonical

Canonical sources:
  • skill behavior: ai-tools/ai-skills/templates/*.template.md
  • fact storage: tasks/research/claims/
  • adjudication ledger: tasks/research/adjudication/page-content-research-outcomes.json
  • registry validation: tools/scripts/docs-fact-registry.js
  • manual fact runner: tools/scripts/docs-page-research.js
  • PR advisory runner: tools/scripts/docs-page-research-pr-report.js
  • packet runner: tools/scripts/docs-research-packet.js
  • adjudication workflow: tools/scripts/docs-research-adjudication.js
  • research-to-plan template: docs-guide/tooling/research-to-implementation-plan-template.md
  • local/manual PR prep integration: tools/scripts/create-codex-pr.js --advisory-research
  • packet planning template: docs-guide/tooling/research-review-packet-plan-template.md
  • forward plan: tasks/plan/future/page-content-research-trust-roadmap.md
Generated or derived:
  • local installed Codex skills under $CODEX_HOME/skills
  • saved advisory reports and validation artifacts
Historical only:
  • rollout evidence and exploratory pilots under tasks/plan/repo-ops-reports/
Legacy:
  • route-centric claim-ledger advisory helpers are retained only as legacy comparison tooling and are not the active PR advisory path

Workflow Model

The workflow is claim-led rather than route-led. It is responsible for:
  • extracting material factual claims
  • checking evidence sources
  • detecting contradictions across related pages
  • classifying claims by confidence and freshness risk
  • producing a propagation queue for dependent pages
It is not responsible for:
  • MDX syntax validation
  • style-guide compliance
  • link and import integrity
  • generic navigation cleanup
Route or ownership issues only belong here when they change factual ownership or contradiction resolution.

Readiness Status

Current status:
  • Codex-ready: yes
  • Cross-agent-ready: portable with minor work
  • Operating mode: experimental and advisory-first
Interpretation:
  • Codex skills can sync from the canonical template bundle immediately.
  • The public and internal docs now explain how to use the workflow.
  • Cross-agent portability exists structurally, but broader packaging and operator guidance still need hardening before claiming equal readiness across all agents.

Operator Commands

Validate the registry:
node tools/scripts/docs-fact-registry.js --validate --registry tasks/research/claims
Run a single-page pass:
node tools/scripts/docs-page-research.js \
  --page v2/orchestrators/guides/deployment-details/setup-options.mdx \
  --report-md /tmp/docs-page-research.md \
  --report-json /tmp/docs-page-research.json
Run a cluster review:
node tools/scripts/docs-page-research.js \
  --files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx \
  --report-md /tmp/docs-page-research-cluster.md \
  --report-json /tmp/docs-page-research-cluster.json
Run a nav-based research packet:
node tools/scripts/docs-research-packet.js \
  --nav tools/config/scoped-navigation/docs-gate-work.json \
  --version v2 \
  --language en \
  --tab Orchestrators \
  --group Guides \
  --out tasks/reports/orchestrator-guides-review/research-guides-review
Run a files and folders packet:
node tools/scripts/docs-research-packet.js \
  --folders v2/gateways/guides/payments-and-pricing,v2/gateways/guides/monitoring-and-tooling \
  --files v2/gateways/guides/support-and-operations/funding-and-support.mdx \
  --split-by dir \
  --out tasks/reports/gateway-guides-review/research-guides-review
Run a manifest-defined packet:
node tools/scripts/docs-research-packet.js \
  --manifest tasks/reports/repo-ops/research-packet-manifest.json \
  --out tasks/reports/repo-ops/research-packet
Run the PR advisory helper:
node tools/scripts/docs-page-research-pr-report.js \
  --files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx \
  --report-md /tmp/page-content-research-pr.md \
  --report-json /tmp/page-content-research-pr.json
Run PR prep with advisory research:
node tools/scripts/create-codex-pr.js \
  --advisory-research \
  --changed-files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx
Validate the adjudication ledger:
node tools/scripts/docs-research-adjudication.js \
  --validate \
  --ledger tasks/research/adjudication/page-content-research-outcomes.json
Record one adjudicated outcome from a report artifact:
node tools/scripts/docs-research-adjudication.js \
  --record \
  --ledger tasks/research/adjudication/page-content-research-outcomes.json \
  --report-json tasks/reports/repo-ops/2026-03-16-page-content-research-pilot-gateway-trust-hardening.json \
  --reviewer codex \
  --claim-id gw-startup-program-current \
  --human-verdict time-sensitive \
  --outcome-class true_positive \
  --cause-tag wording_only_conflict \
  --action "keep advisory and continue current verification"
Record a missing-coverage outcome that was not detected in the report:
node tools/scripts/docs-research-adjudication.js \
  --record \
  --ledger tasks/research/adjudication/page-content-research-outcomes.json \
  --report-json tasks/reports/repo-ops/2026-03-16-page-content-research-pilot-gateway-trust-hardening.json \
  --reviewer codex \
  --claim-family gateway-support-contact-channel \
  --human-verdict time-sensitive \
  --outcome-class false_negative \
  --cause-tag missing_coverage \
  --action "expand claim-family coverage for the support contact channel"
Write a trust summary from adjudicated outcomes:
node tools/scripts/docs-research-adjudication.js \
  --summary \
  --ledger tasks/research/adjudication/page-content-research-outcomes.json \
  --report-md /tmp/page-content-research-adjudication.md \
  --report-out-json /tmp/page-content-research-adjudication.json
Use packet mode when:
  • the request covers a full nav section or several logical tranches
  • the findings need reusable packet artifacts for later fix execution
  • you want page-run and PR-run views preserved together across a larger scope
Use a single page or cluster run when:
  • the request is limited to one page or one tight claim family
  • a packet root would add more operational overhead than value
  • the next action is immediate page editing rather than section-wide reporting
Use the research-to-plan handoff when:
  • the research output is complete but the fixes span content, registry, and runner behavior
  • another agent needs a decision-complete implementation plan before execution
  • the next task is sequencing, not more source verification
The dedicated follow-on skill is docs-research-to-implementation-plan. It consumes page reports, PR advisory reports, or research packets and turns them into a planning-only implementation artifact.

Expected Outputs

Every substantive run should surface some combination of:
  • verified claims
  • conflicted claims
  • time-sensitive claims
  • unresolved or historical-only claims
  • cross-page contradictions
  • propagation queue items
  • explicit evidence sources
  • trust summary counts
Operators should prefer conservative interpretation:
  • if evidence is weak, treat the claim as unresolved
  • if wording is stronger than the evidence, downgrade the wording
  • if the same claim appears elsewhere, queue propagation work instead of fixing one page in isolation

Discovery Boundaries

The runner can now discover supporting evidence beyond explicit evidence_refs, but the ranking stays strict:
  • active repo files and official pages remain the highest default sources for current-state claims
  • v1/** is a historical lineage lane, not a silent current-state override
  • _contextData/**, _plans-and-research/**, _workspace/research/**, and v2/x-archived/** are context lanes only
  • GitHub discovery is strongest for implementation-status and support-status families
  • DeepWiki is corroboration only and should not become primary evidence for current product truth

Trust Summary

Each report now includes a compact trust summary with:
  • unresolved_claims: how many tracked claims still lack strong enough evidence
  • contradiction_groups: how many factual collisions the run found across reviewed pages
  • evidence_sources: how many evidence records were actually checked
  • explicit_page_targets: how many propagation targets came from explicit registry ownership or dependencies
  • inferred_page_targets: how many propagation targets came from IA/path inference
Interpretation:
  • higher contradiction_groups usually means a real review problem, not report noise
  • higher unresolved_claims means the registry or evidence adapters still need work before trusting wording changes
  • higher inferred_page_targets is acceptable when path inference is covering current siblings, but it should not dominate stable high-confidence families
  • low evidence_sources on a broad review usually means claim-family coverage or source mapping is still too thin
The trust summary is a proxy only. Trust-promotion decisions should come from adjudicated review outcomes, not from raw report counts alone.

Source-of-Truth Boundaries

Use this split consistently:
  • public contributor usage: v2/resources/documentation-guide/
  • internal operator workflow: this runbook in docs-guide/frameworks/
  • rollout/adoption record: tasks/plan/repo-ops-reports/
  • future hardening plan: tasks/plan/future/
  • executable behavior: scripts, templates, tests, and claim registries
If these sources disagree:
  1. scripts and tests define runtime behavior
  2. template bundles define skill behavior
  3. this runbook defines operator workflow and readiness
  4. public documentation guide pages summarize contributor usage

Maintenance Workflow

When improving the research skill:
  1. expand claim-family coverage in tasks/research/claims/
  2. improve evidence matching and classification logic in the runner
  3. validate on real orchestrator and gateway page clusters
  4. run PR advisory on tracked factual docs pages
  5. update this runbook when the operator contract changes

Operator Review Rubric

Use this rubric when deciding whether a run was useful enough to trust:
  • useful:
    • primary evidence is current and from the right source class
    • contradiction groups are concrete and explainable
    • propagation queue points at pages that really repeat or depend on the claim
  • noisy:
    • weak sources outrank stronger official or GitHub evidence
    • contradiction groups collapse unrelated wording into one family
    • propagation is mostly speculative sibling fan-out
  • expand a claim family when:
    • the same fact keeps recurring across active pages
    • reviewers repeatedly need to verify the same current-state claim manually
    • source classes and canonical ownership are clear enough to defend
  • narrow a claim family when:
    • wording overlap keeps producing false contradictions
    • the claim is really style guidance, not factual truth
    • evidence quality is too weak to classify reliably

Adjudication Workflow

Adjudicate runs when:
  • a report is used to make or block a real content decision
  • a contradiction group looks noisy or unexpectedly broad
  • a reviewer had to manually rediscover facts that should have been tracked
  • a gateway status claim is being considered for stronger PR-time trust
Classify outcomes like this:
  • true_positive: the report surfaced a real issue or useful current-state warning
  • false_positive: the report surfaced a claim family that was not actually useful or was misleadingly grouped
  • false_negative: the reviewer had to manually verify a factual claim that the system failed to track
  • needs_split: one family is collapsing multiple concepts and needs to be divided
  • needs_narrowing: the family exists but its matching or propagation logic is too broad
  • needs_more_sources: the family is valid but current source coverage is too weak
Treat these as the default family-status interpretations:
  • stable: repeated adjudications show the family is useful and low-noise
  • advisory-only: keep reporting, but do not move toward stronger PR behavior yet
  • needs-split: separate mixed concepts before trusting the family further
  • needs-narrowing: reduce matching or inference breadth before trusting the family further
  • needs-more-sources: expand or improve source coverage before trusting the family further

Trust Tiers

Trust tiers are metadata only in the current phase:
  • experimental: not enough adjudicated evidence yet
  • advisory: usable, but still too noisy or under-evidenced for stronger handling
  • advisory-high-confidence: a narrow family with low noise and strong current source fit
  • not-eligible: outside the current trust-candidate slice
Current trust-candidate slice:
  • clearinghouse-public-readiness
  • remote-signer-current-scope
  • programme-availability
  • community-signer-testing-surface
  • gateway-support-contact-channel
Do not treat any other family as eligible for stronger PR-time trust until adjudicated outcomes say otherwise. Do not:
  • widen the workflow back into generic navigation QA
  • let tests/README.md become the primary narrative home again
  • treat exploratory reports as canonical instructions
  • Public contributor page: /v2/resources/documentation-guide/research-and-fact-checking
  • AI tools index: /docs-guide/catalog/ai-tools
  • Source of truth policy: /docs-guide/policies/source-of-truth-policy
  • Trust roadmap: tasks/plan/future/page-content-research-trust-roadmap.md
Last modified on March 16, 2026