Research Skill Workflow
This is the canonical operator runbook for the experimental page-content research workflow that landed in docs-v2-dev.
Use this page for:
- operational usage
- source-of-truth boundaries
- readiness status
- maintenance and improvement workflow
Do not use rollout reports or tests/README.md as the canonical narrative source.
What Is Canonical
Canonical sources:
- skill behavior:
ai-tools/ai-skills/templates/*.template.md
- fact storage:
tasks/research/claims/
- adjudication ledger:
tasks/research/adjudication/page-content-research-outcomes.json
- registry validation:
tools/scripts/docs-fact-registry.js
- manual fact runner:
tools/scripts/docs-page-research.js
- PR advisory runner:
tools/scripts/docs-page-research-pr-report.js
- packet runner:
tools/scripts/docs-research-packet.js
- adjudication workflow:
tools/scripts/docs-research-adjudication.js
- research-to-plan template:
docs-guide/tooling/research-to-implementation-plan-template.md
- local/manual PR prep integration:
tools/scripts/create-codex-pr.js --advisory-research
- packet planning template:
docs-guide/tooling/research-review-packet-plan-template.md
- forward plan:
tasks/plan/future/page-content-research-trust-roadmap.md
Generated or derived:
- local installed Codex skills under
$CODEX_HOME/skills
- saved advisory reports and validation artifacts
Historical only:
- rollout evidence and exploratory pilots under
tasks/plan/repo-ops-reports/
Legacy:
- route-centric claim-ledger advisory helpers are retained only as legacy comparison tooling and are not the active PR advisory path
Workflow Model
The workflow is claim-led rather than route-led.
It is responsible for:
- extracting material factual claims
- checking evidence sources
- detecting contradictions across related pages
- classifying claims by confidence and freshness risk
- producing a propagation queue for dependent pages
It is not responsible for:
- MDX syntax validation
- style-guide compliance
- link and import integrity
- generic navigation cleanup
Route or ownership issues only belong here when they change factual ownership or contradiction resolution.
Readiness Status
Current status:
- Codex-ready: yes
- Cross-agent-ready: portable with minor work
- Operating mode: experimental and advisory-first
Interpretation:
- Codex skills can sync from the canonical template bundle immediately.
- The public and internal docs now explain how to use the workflow.
- Cross-agent portability exists structurally, but broader packaging and operator guidance still need hardening before claiming equal readiness across all agents.
Operator Commands
Validate the registry:
node tools/scripts/docs-fact-registry.js --validate --registry tasks/research/claims
Run a single-page pass:
node tools/scripts/docs-page-research.js \
--page v2/orchestrators/guides/deployment-details/setup-options.mdx \
--report-md /tmp/docs-page-research.md \
--report-json /tmp/docs-page-research.json
Run a cluster review:
node tools/scripts/docs-page-research.js \
--files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx \
--report-md /tmp/docs-page-research-cluster.md \
--report-json /tmp/docs-page-research-cluster.json
Run a nav-based research packet:
node tools/scripts/docs-research-packet.js \
--nav tools/config/scoped-navigation/docs-gate-work.json \
--version v2 \
--language en \
--tab Orchestrators \
--group Guides \
--out tasks/reports/orchestrator-guides-review/research-guides-review
Run a files and folders packet:
node tools/scripts/docs-research-packet.js \
--folders v2/gateways/guides/payments-and-pricing,v2/gateways/guides/monitoring-and-tooling \
--files v2/gateways/guides/support-and-operations/funding-and-support.mdx \
--split-by dir \
--out tasks/reports/gateway-guides-review/research-guides-review
Run a manifest-defined packet:
node tools/scripts/docs-research-packet.js \
--manifest tasks/reports/repo-ops/research-packet-manifest.json \
--out tasks/reports/repo-ops/research-packet
Run the PR advisory helper:
node tools/scripts/docs-page-research-pr-report.js \
--files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx \
--report-md /tmp/page-content-research-pr.md \
--report-json /tmp/page-content-research-pr.json
Run PR prep with advisory research:
node tools/scripts/create-codex-pr.js \
--advisory-research \
--changed-files v2/orchestrators/guides/deployment-details/setup-options.mdx,v2/orchestrators/setup/rcs-requirements.mdx,v2/orchestrators/guides/operator-considerations/business-case.mdx
Validate the adjudication ledger:
node tools/scripts/docs-research-adjudication.js \
--validate \
--ledger tasks/research/adjudication/page-content-research-outcomes.json
Record one adjudicated outcome from a report artifact:
node tools/scripts/docs-research-adjudication.js \
--record \
--ledger tasks/research/adjudication/page-content-research-outcomes.json \
--report-json tasks/reports/repo-ops/2026-03-16-page-content-research-pilot-gateway-trust-hardening.json \
--reviewer codex \
--claim-id gw-startup-program-current \
--human-verdict time-sensitive \
--outcome-class true_positive \
--cause-tag wording_only_conflict \
--action "keep advisory and continue current verification"
Record a missing-coverage outcome that was not detected in the report:
node tools/scripts/docs-research-adjudication.js \
--record \
--ledger tasks/research/adjudication/page-content-research-outcomes.json \
--report-json tasks/reports/repo-ops/2026-03-16-page-content-research-pilot-gateway-trust-hardening.json \
--reviewer codex \
--claim-family gateway-support-contact-channel \
--human-verdict time-sensitive \
--outcome-class false_negative \
--cause-tag missing_coverage \
--action "expand claim-family coverage for the support contact channel"
Write a trust summary from adjudicated outcomes:
node tools/scripts/docs-research-adjudication.js \
--summary \
--ledger tasks/research/adjudication/page-content-research-outcomes.json \
--report-md /tmp/page-content-research-adjudication.md \
--report-out-json /tmp/page-content-research-adjudication.json
Use packet mode when:
- the request covers a full nav section or several logical tranches
- the findings need reusable packet artifacts for later fix execution
- you want page-run and PR-run views preserved together across a larger scope
Use a single page or cluster run when:
- the request is limited to one page or one tight claim family
- a packet root would add more operational overhead than value
- the next action is immediate page editing rather than section-wide reporting
Use the research-to-plan handoff when:
- the research output is complete but the fixes span content, registry, and runner behavior
- another agent needs a decision-complete implementation plan before execution
- the next task is sequencing, not more source verification
The dedicated follow-on skill is docs-research-to-implementation-plan. It consumes page reports, PR advisory reports, or research packets and turns them into a planning-only implementation artifact.
Expected Outputs
Every substantive run should surface some combination of:
- verified claims
- conflicted claims
- time-sensitive claims
- unresolved or historical-only claims
- cross-page contradictions
- propagation queue items
- explicit evidence sources
- trust summary counts
Operators should prefer conservative interpretation:
- if evidence is weak, treat the claim as unresolved
- if wording is stronger than the evidence, downgrade the wording
- if the same claim appears elsewhere, queue propagation work instead of fixing one page in isolation
Discovery Boundaries
The runner can now discover supporting evidence beyond explicit evidence_refs, but the ranking stays strict:
- active repo files and official pages remain the highest default sources for current-state claims
v1/** is a historical lineage lane, not a silent current-state override
_contextData/**, _plans-and-research/**, _workspace/research/**, and v2/x-archived/** are context lanes only
- GitHub discovery is strongest for implementation-status and support-status families
- DeepWiki is corroboration only and should not become primary evidence for current product truth
Trust Summary
Each report now includes a compact trust summary with:
unresolved_claims: how many tracked claims still lack strong enough evidence
contradiction_groups: how many factual collisions the run found across reviewed pages
evidence_sources: how many evidence records were actually checked
explicit_page_targets: how many propagation targets came from explicit registry ownership or dependencies
inferred_page_targets: how many propagation targets came from IA/path inference
Interpretation:
- higher
contradiction_groups usually means a real review problem, not report noise
- higher
unresolved_claims means the registry or evidence adapters still need work before trusting wording changes
- higher
inferred_page_targets is acceptable when path inference is covering current siblings, but it should not dominate stable high-confidence families
- low
evidence_sources on a broad review usually means claim-family coverage or source mapping is still too thin
The trust summary is a proxy only. Trust-promotion decisions should come from adjudicated review outcomes, not from raw report counts alone.
Source-of-Truth Boundaries
Use this split consistently:
- public contributor usage:
v2/resources/documentation-guide/
- internal operator workflow: this runbook in
docs-guide/frameworks/
- rollout/adoption record:
tasks/plan/repo-ops-reports/
- future hardening plan:
tasks/plan/future/
- executable behavior: scripts, templates, tests, and claim registries
If these sources disagree:
- scripts and tests define runtime behavior
- template bundles define skill behavior
- this runbook defines operator workflow and readiness
- public documentation guide pages summarize contributor usage
Maintenance Workflow
When improving the research skill:
- expand claim-family coverage in
tasks/research/claims/
- improve evidence matching and classification logic in the runner
- validate on real orchestrator and gateway page clusters
- run PR advisory on tracked factual docs pages
- update this runbook when the operator contract changes
Operator Review Rubric
Use this rubric when deciding whether a run was useful enough to trust:
- useful:
- primary evidence is current and from the right source class
- contradiction groups are concrete and explainable
- propagation queue points at pages that really repeat or depend on the claim
- noisy:
- weak sources outrank stronger official or GitHub evidence
- contradiction groups collapse unrelated wording into one family
- propagation is mostly speculative sibling fan-out
- expand a claim family when:
- the same fact keeps recurring across active pages
- reviewers repeatedly need to verify the same current-state claim manually
- source classes and canonical ownership are clear enough to defend
- narrow a claim family when:
- wording overlap keeps producing false contradictions
- the claim is really style guidance, not factual truth
- evidence quality is too weak to classify reliably
Adjudication Workflow
Adjudicate runs when:
- a report is used to make or block a real content decision
- a contradiction group looks noisy or unexpectedly broad
- a reviewer had to manually rediscover facts that should have been tracked
- a gateway status claim is being considered for stronger PR-time trust
Classify outcomes like this:
true_positive: the report surfaced a real issue or useful current-state warning
false_positive: the report surfaced a claim family that was not actually useful or was misleadingly grouped
false_negative: the reviewer had to manually verify a factual claim that the system failed to track
needs_split: one family is collapsing multiple concepts and needs to be divided
needs_narrowing: the family exists but its matching or propagation logic is too broad
needs_more_sources: the family is valid but current source coverage is too weak
Treat these as the default family-status interpretations:
stable: repeated adjudications show the family is useful and low-noise
advisory-only: keep reporting, but do not move toward stronger PR behavior yet
needs-split: separate mixed concepts before trusting the family further
needs-narrowing: reduce matching or inference breadth before trusting the family further
needs-more-sources: expand or improve source coverage before trusting the family further
Trust Tiers
Trust tiers are metadata only in the current phase:
experimental: not enough adjudicated evidence yet
advisory: usable, but still too noisy or under-evidenced for stronger handling
advisory-high-confidence: a narrow family with low noise and strong current source fit
not-eligible: outside the current trust-candidate slice
Current trust-candidate slice:
clearinghouse-public-readiness
remote-signer-current-scope
programme-availability
community-signer-testing-surface
gateway-support-contact-channel
Do not treat any other family as eligible for stronger PR-time trust until adjudicated outcomes say otherwise.
Do not:
- widen the workflow back into generic navigation QA
- let
tests/README.md become the primary narrative home again
- treat exploratory reports as canonical instructions
- Public contributor page:
/v2/resources/documentation-guide/research-and-fact-checking
- AI tools index:
/docs-guide/catalog/ai-tools
- Source of truth policy:
/docs-guide/policies/source-of-truth-policy
- Trust roadmap:
tasks/plan/future/page-content-research-trust-roadmap.md
Last modified on March 16, 2026