Session Audit

A forensic self-audit over your AI's work records that the producing system cannot grade itself on: pinned inputs, mechanical extraction, blind adversarial review.

session-audit.md

Session Audit: a forensic self-audit your AI cannot grade itself on
An Ena Pragma method.

WHAT IT NEEDS
Your agent's work records (session logs, transcripts, tickets, PR history) and
the dates your correction notes / rules were written (version control is ideal).

THE PIPELINE
1. PIN THE INPUT. The N most recent substantive work sessions (we use 20), listed
   to a file before anything is read. No cherry-picking after the fact.
2. EXTRACT, DON'T CONCLUDE. Run extraction passes over the sessions with a fixed
   schema and a quote required for every entry. Forbid recommendations. Schema:
   - ASKED: what the operator requested
   - SHIPPED: what was actually delivered
   - CORRECTIONS: every correction the operator gave, and what triggered it
   - VERIFICATIONS: checks run (what they caught) vs checks skipped
   - REWORK: do-overs and wrong first attempts, with root cause
   - TOOL_FRICTION: failures and workarounds
   - MANUAL_SEQUENCES: repeated multi-step routines (automation candidates)
   - WINS: what went right and the visible mechanism why
3. FIND PATTERNS WITH A BAR. A pattern needs evidence in 2+ distinct sessions.
   Single occurrences go to a watchlist, not the findings.
4. CROSS-REFERENCE WITH DATES. For every "it repeated a known lesson" claim,
   check the lesson's recorded date PRECEDES the repeat. Memory is not evidence;
   timestamps are.
5. BLIND ADVERSARIAL REVIEW. Give the draft findings, WITHOUT your reasoning, to
   fresh reviewers instructed to refute: does the cited evidence exist? does it
   mean what is claimed? are the counts right? Verdict per finding:
   confirmed / weakened (state what to cut) / refuted. Cut what does not survive.
6. SHIP THE DELTA. Publish findings WITH their verdicts, and keep the pre-review
   draft. The corrections the reviewers forced are your proof the audit is honest.

THE RULE THAT MAKES IT LEGITIMATE
The producer never certifies its own work. The extraction is mechanical, the
dates are from version control, and the judgment calls belong to reviewers that
never saw the producing reasoning. A self-audit that comes back clean is telling
you about the auditor, not the audit.

WHAT TO DO WITH THE FINDINGS
Anything that repeated after being written down goes to the mechanize checklist.
Do not respond to a repeated failure with a stronger memo.

This method is published in full in the post Why Does Your AI Keep Making the Same Mistake? We Audited Ours to Find Out., which covers the evidence behind it and when to reach for it.