AI Contributor Audit Model
An AI Contributor audit should make five things explicit:
- The repository state being audited.
- The specification and tooling version used for the audit.
- The evidence collected from that repository state.
- The status derived from that evidence.
- The judgment that remains for a human reviewer.
The goal is not full automation. The goal is a repeatable audit trail where automated facts, derived conclusions, and human judgment are separated clearly.
Plain-language summary
Section titled “Plain-language summary”An audit should answer: which commit was checked, which version of the rules was used, what evidence was collected, what status follows from that evidence, and what a human still needs to judge.
The checklist is not trusted because an agent filled it. It is trusted only when each status points back to evidence that another reviewer can inspect.
Core Principles
Section titled “Core Principles”Pin The Target
Section titled “Pin The Target”The audit is tied to audited_commit, not to a caller’s current worktree.
The collector should inspect a clean extraction of the audited commit. Dirty local files may be recorded for traceability, but they must not affect the audit unless the auditor explicitly chooses a working-tree audit mode.
Pin The Policy
Section titled “Pin The Policy”The audit is also tied to spec_source, the immutable source of the
specification, checklist, audit-log template, validator, and collector used for
that run.
Audits should not depend on a mutable branch name. If a branch is used as the human input, it must first be resolved to a full commit SHA and that SHA must be recorded.
Collect Before Judging
Section titled “Collect Before Judging”Evidence collection happens before checklist rendering.
The durable machine-readable artifact is .ai-contributor-audit/AI-CONTRIBUTOR-EVIDENCE.json. The checklist and
audit log are human-readable projections of that evidence plus any additional
manual evidence and judgment.
Separate Measurement From Policy
Section titled “Separate Measurement From Policy”Collectors and probes measure observable facts. They do not define the policy.
The policy lives in the specification and checklist. A probe may say that branch protection requires pull requests, or that a lockfile installs with frozen dependencies. The checklist decides whether those facts satisfy a rule.
Prefer Small Probes
Section titled “Prefer Small Probes”Small facts are easier to reproduce than broad conclusions.
Instead of one opaque “repository is protected” result, a collector should prefer smaller observations such as:
- default branch detected
- branch-protection endpoint readable
- pull requests required
- status checks required
- bypass actors present
- token tier used for hosted API evidence
Checklist status can then be derived from those smaller facts where the predicate is crisp.
Keep Judgment First-Class
Section titled “Keep Judgment First-Class”Not every rule should be automated.
Some statuses can be derived mechanically. Others require human review because
the evidence is semantic, contextual, or policy-dependent. In those cases the
collector should record evidence and set the collector-side
judgment_required flag, rather than pretending that the answer is
deterministic.
Evidence Flow
Section titled “Evidence Flow”The audit flow is:
spec_source + audited_commit -> bootstrap (materializes the pinned runbook before the audit) -> audit-run (umbrella command for the normal cycle) -> audit-collect (writes .ai-contributor-audit/AI-CONTRIBUTOR-EVIDENCE.json) -> audit-stamp (writes derivable content into checklist + audit-log) -> auditor (fills judgment-required rows) -> audit-stamp (refreshes sections derived from row statuses) -> audit-validate (read-only structural checks)The conceptual flow is the same when run phase by phase:
spec_source + audited_commit -> audit-collect -> audit-stamp -> auditor (fills judgment-required rows) -> audit-stamp -> audit-validateArtifact And Field Ownership
Section titled “Artifact And Field Ownership”The audit artifacts have different jobs. No single rendered Markdown file is the whole source of truth.
| Artifact or script | Role | Owner |
|---|---|---|
bootstrap.ts | Materializes the audit runbook from the pinned spec_source before collection starts. | Runbook bootstrap |
AI-CONTRIBUTOR-RUNBOOK-MANIFEST.json | Records the exact runbook source and files materialized by bootstrap. | Runbook bootstrap |
audit-run.ts | Orchestrates collect, initial stamp, edit pause, final stamp, and validate. | Runner |
audit-collect.ts | Gathers repository and hosted-setting evidence from the pinned audited_commit. | Collector |
.ai-contributor-audit/AI-CONTRIBUTOR-EVIDENCE.json | Machine-readable source for collector-derived facts, command results, derived statuses, and collector-side verification gaps. | Collector |
audit-summary.ts | Read-only one-shot summary of an existing evidence JSON file. | Reader |
audit-stamp.ts | Writes every derivable field into the rendered audit artifacts. Idempotent. | Stamper |
.ai-contributor-audit/AI-CONTRIBUTOR-CHECKLIST.md | Human-readable conformance and scoring surface; contains stamped rows plus auditor judgment rows. | Stamper + auditor |
.ai-contributor-audit/AI-CONTRIBUTOR-AUDIT-LOG.md | Command and manual evidence trace; contains stamped collector rows plus auditor-run evidence rows. | Stamper + auditor |
AI-CONTRIBUTOR-AUDIT.md | Root-level projection of the checklist conformance summary and backlog. | Stamper |
audit-validate.ts | Read-only structural consistency checks over the rendered artifacts. | Validator |
The frontmatter fields are owned as follows:
| Field or content | Owner |
|---|---|
spec_version | Template and release process. Bump only with the specification version. |
spec_source, audited_commit, auditor, runner_agent, runner_model | Supplied once by stamp flags, environment, evidence, manifest, or pinned runbook path; stamped identically into the checklist and audit log. |
assessment_started_at, assessment_completed_at, assessment_duration, validator_version, collector_version, conformance_level | Stamped automatically by audit-stamp.ts. |
Collector-derived checklist A / Status / Comment, stamped audit-log collector rows, verification-gap stamped rows, conformance summary Status, root summary, and backlog derived columns | Stamped automatically by audit-stamp.ts. |
Judgment-required checklist Status / Comment, manual audit-log rows, conformance summary Notes, backlog Next action / Owner / Target date, and manual verification-gap rows | Auditor-owned. |
Date reached | Hybrid: auditor enters it when a level is first claimed as reached; the stamper preserves it while reached and clears it when the level drops. |
The validator checks whether the rendered audit is internally consistent. It does not rerun the audit, does not write to files, and does not decide whether an arbitrary claim is true.
Conceptual Split
Section titled “Conceptual Split”The audit has four layers, each with a distinct responsibility:
- Collector — observable facts and deterministic thresholds.
- Stamper — mechanical writes of derivable content into the rendered artifacts.
- Auditor (human or agent) — applicability, meaning, and explanation for rules that need judgment.
- Validator — structural integrity of the rendered artifacts.
When to generate, stamp, and validate
Section titled “When to generate, stamp, and validate”Generate when checked-in authoring content is fully derivable from canonical
source files. AI-CONTRIBUTOR-SPECIFICATION.md is generated from the
specification template plus AI-CONTRIBUTOR-RULE-CATALOG.json.
.ai-contributor-audit/AI-CONTRIBUTOR-CHECKLIST.md is generated from the
checklist template plus the same catalog. AI-CONTRIBUTOR-COVERAGE.md is
generated from its coverage template plus catalog rule, pillar, and level
metadata. AI-CONTRIBUTOR-AUDIT.md and
.ai-contributor-audit/AI-CONTRIBUTOR-AUDIT-LOG.md are generated from their
audit templates plus catalog-owned spec version and conformance-level metadata.
In the checklist, the conformance-level values, conformance-level summary rows,
quick-reference level bullets, row grouping, scope, requirement summary,
pillar, and AIC-* IDs belong to the catalog; the instructions, stamped-block
markers, status semantics, backlog policy, evidence rules, and auditor workflow
prose belong to the checklist template. In the audit-log template, the catalog
owns the spec version and accepted conformance-level values; the template owns
frontmatter field comments, evidence-row instructions, and stamped collector-row
markers. The audit run-specific A, Status, and Comment cells remain blank
until the stamper or auditor fills them for a repository audit.
Stamp when an audit run can mechanically derive content from the current
evidence JSON, current checklist state, and stamp flags. This includes
frontmatter, automated checklist A / Status / Comment cells, stamped
audit-log collector rows, the conformance summary, root summary, backlog
derived columns, verification-gap rows for collector-flagged gaps, stamped
block checksums, and cross-file equality fields. Stamping is audit-run output;
it must not redefine catalog-owned rule metadata or overwrite auditor-owned
judgment.
Validate when the artifact still contains auditor-owned fields or cross-artifact consistency constraints. The checklist and audit log are validated after stamping because judgment-required rows, manual audit-log rows, summary notes, and backlog next actions cannot be generated from the catalog or evidence without human or accountable-auditor judgment. Validation checks structure, allowed status values, current-run evidence linkage, generated projection freshness, and cross-file consistency; it does not decide whether a semantic audit claim is convincing.
The collector records what exists, what commands produced, what hosted APIs returned, and which statuses can be derived without judgment.
The stamper turns collector output and current checklist state into rendered
checklist and audit-log content. It writes timestamps, mechanical frontmatter
(spec_source, audited_commit, auditor identity fields), version fields,
derived row statuses and comments, the conformance summary, the conformance
level, the backlog, drift rows, judgment-required navigation hints,
verification-gap rows for collector-flagged gaps, audit-log evidence rows for
collector-derived rules, stamped-block checksum sentinels, and cross-file
equality fields. Anything the stamper writes is, by definition, not the
auditor’s judgment.
The auditor decides whether a repository ships an artifact, whether a policy is meaningful, whether a boundary is actually covered, and how to describe remaining verification gaps. The auditor may be a human or an agent, but it classifies, connects, and explains evidence that is visible in the repository, collector output, hosted-setting snapshots, or audit log — never invents.
The validator checks that the filled checklist and audit log are structurally consistent, tied to current-run evidence, and free of known anti-patterns such as blank statuses or unlogged command citations. It does not write, does not call external APIs, and does not decide whether cited evidence is semantically convincing.
Rules with stable, observable thresholds are collector-derived and stamped. Rules whose evaluation depends on meaning or context are auditor judgment. The split keeps every judgment anchored in recorded evidence.
Role of LLMs in the Audit
Section titled “Role of LLMs in the Audit”Scripts own facts and derivable content. LLMs interpret evidence and draft a reviewable explanation for the rules that need judgment. Humans approve the result.
Scripts are the right place for facts:
- files exist or are missing
- commands pass or fail
- lockfiles validate
- hosted APIs return specific settings
- generated projections are current
- checklist and audit-log structure is consistent
LLMs are useful where audit evidence requires interpretation:
- connecting repository-specific evidence and controls to checklist rows the collector leaves for judgment
- reading policy documents and judging whether they are specific enough
- explaining why evidence supports
Warninginstead ofFulfilled - turning gaps into useful backlog actions
- recognizing equivalent controls that do not match the default stack
- reviewing
Not relevantclaims for weak assumptions - summarizing hosted or API evidence for human reviewers
The LLM must not be the source of truth. It should classify, connect, and
explain current-run evidence only. If evidence is missing, indirect, or unclear,
the correct output is Warning, Alarm, Not relevant with a cited reason, or
judgment_required in collector output.
Determinism Boundary
Section titled “Determinism Boundary”Repository-local evidence can be close to deterministic when the target commit, tooling source, and execution environment are pinned.
Examples:
- tracked files
- lockfiles
- static configuration
- local scripts
- type-check or lint commands with pinned dependencies
Hosted or external evidence cannot be fully reproduced from Git alone.
Examples:
- branch protection
- repository rulesets
- hosted secret scanning settings
- dependency-alert state
- CI provider settings
- organization or team membership
For external evidence, the audit should record a snapshot:
- timestamp
- provider
- endpoint or command
- actor or token tier
- output excerpt or structured result
- whether access was complete, partial, or unavailable
The snapshot is evidence for the audit time. It is not a guarantee that the external setting will remain unchanged.
Applicability
Section titled “Applicability”Technology-neutral audits depend on explicit applicability.
The collector should first detect repository capabilities, then run relevant probes:
detect stack and provider -> decide applicability -> run matching probes -> derive status when possible -> mark judgment_required when not possibleA missing TypeScript configuration should not automatically fail a repository that does not use TypeScript. A missing GitHub ruleset should not automatically fail a repository hosted somewhere else. The audit must distinguish between:
- applicable and fulfilled
- applicable and not fulfilled
- not applicable
- unknown because the collector lacks access or support
- judgment required because the rule depends on meaning or context
Derived Statuses
Section titled “Derived Statuses”A derived status is appropriate when the rule has a clear condition and the collector has enough evidence to evaluate it.
Good candidates:
- lockfile integrity
- strict type-checking for a detected typed stack
- branch protection when hosted settings are readable
- required CI checks when provider settings are readable
- objective file/config presence checks where presence itself satisfies the rule
Poor candidates:
- whether instructions are clear enough for a new agent
- whether a committed policy or instruction file is specific enough
- whether a fallback plan is credible
- whether a human review process is meaningful in practice
- whether a threat model covers the right risks
For judgment-heavy rules, the collector should gather evidence and explain why the auditor must decide.
Validator Boundary
Section titled “Validator Boundary”The validator should enforce file consistency, not semantic truth.
It should check:
- matching frontmatter across audit files
- valid status values
- required comments
- no stale placeholders
- audit-log rows are structured
- command citations resolve to current audit-log rows
- collector-derived
Fulfilledrows cite.ai-contributor-audit/AI-CONTRIBUTOR-EVIDENCE.json - conformance summary matches checklist rows
It should not:
- call external APIs
- rerun build or test commands
- decide whether prose evidence is convincing
- infer repository policy from unstructured comments
- silently change an audit result
This boundary keeps validation deterministic and cheap.
Technology Neutrality
Section titled “Technology Neutrality”The core model should be independent of language, package manager, CI system, repository host, and AI provider.
That means the core collector should be organized around adapters:
core collector repo inventory git facts file probes ecosystem adapters provider adapters optional external toolsAdapters may be specific. The schema should stay generic.
For example, a TypeScript adapter can parse tsconfig.json, but the evidence
schema should describe the result as a toolchain-specific strictness probe, not
as a universal assumption that every repository has TypeScript.
Optional External Tools
Section titled “Optional External Tools”External tools can be useful, but they should not be required for the core audit unless the project deliberately accepts that dependency.
When an external tool is used, record:
- tool name
- version or commit
- binary or container digest when available
- command
- exit code
- output excerpt or artifact path
- rules that consumed the result
External tools should provide supporting evidence. The audit should still make its own rule mapping explicit.
Operating Rules
Section titled “Operating Rules”Auditors and agents follow these rules:
- Start from pinned
spec_sourceandaudited_commit; let the stamper record mechanical frontmatter from evidence, the bootstrap manifest, flags, or environment variables. - Run the collector before any rendering.
.ai-contributor-audit/AI-CONTRIBUTOR-EVIDENCE.jsonis the source of truth for collector-derived findings. - Run the stamper to populate every machine-derivable field. Do not hand-edit content the stamper owns.
- Fill judgment-required rows from current-run evidence. Use audit-log rows for commands and manual evidence not captured by the collector.
- Re-run the stamper after judgment edits so derived sections (summary, backlog, drift, judgment hints, verification gaps) reflect the final evidence and row statuses.
- Do not copy prior filled audit content into a new run.
- Cite current-run evidence in every non-blank checklist row.
- Mark unsupported or ambiguous cases as
judgment_requiredorWarningrather than inventing certainty. - Record hosted evidence as a timestamped snapshot with token/access context.
- Keep provider- and stack-specific logic inside adapters.
- Let the validator check consistency, not truth.
North Star
Section titled “North Star”A deterministic audit is not one where every answer is automated.
A deterministic audit is one where the target, policy, evidence, derivation, and remaining judgment are pinned and explicit enough that another reviewer can reconstruct how the audit reached each row.