audit_trace_budget

Refactoring & Bug Fixes

RCA-gated changes backed by evidence, not vibes

When to use

  • An AI suggests a refactor or bug fix
  • A human will review/merge (or you want to avoid wasting review time)
  • You need the model to stop "vibing" about root cause, safety, or behavior preservation

The workflow

  1. Collect evidence. Reproduction command + baseline output, relevant code at crash site, call site / data flow, invariants/spec, experiment results.
  2. Fill the RCA template. Every factual claim must be cited. If you can't cite it, label it as a hypothesis and propose an experiment.
  3. Run the verifier. Call audit_trace_budget on the critical claims: root cause, fix mechanism, test plan + results.
  4. Revise if flagged. Request missing evidence or downgrade the claim to a hypothesis. Do not proceed as if the claim is confirmed.

Evidence pack

Suggested spans:

  • S0: Reproduction command + baseline output (stack trace, failing test)
  • S1: Relevant code excerpt at crash site
  • S2: Call site / data flow into the crash site
  • S3: Invariants/spec (comments, docs, requirements)
  • S4: Experiment results (new test output, benchmark, log excerpt)
  • S5: Proposed patch diff (optional)

Trace claims to audit

Put these into the steps array:

  • Root cause primary claim + sub-claims
  • Fix mechanism claims ("why this should fix it")
  • Test plan + results claims
[
  {"idx": 0, "claim": "The crash occurs because X is called with undefined.", "cites": ["S0","S2","S3"]},
  {"idx": 1, "claim": "Function X dereferences Y without guarding.", "cites": ["S1"]},
  {"idx": 2, "claim": "Fix: handle undefined by returning a fallback.", "cites": ["S1","S3"]}
]

Verifier settings

audit_trace_budget(
  steps=[...],
  spans=[...],
  require_citations=true,
  context_mode="cited",
  default_target=0.95
)

Copy/paste prompt

Fill the RCA template. Every claim must cite S0–S5.
Then extract the Root Cause + Fix Plan into a steps list and run
audit_trace_budget(require_citations=true, context_mode="cited", default_target=0.95).
If anything is flagged, stop and request missing evidence instead of guessing.

What good looks like

  • The RCA stops at what the evidence supports
  • Hypotheses are explicitly labeled and paired with a confirming experiment
  • The fix is justified by cited mechanism, not vibes
  • "Behavior preserved" is only claimed if you cite tests/spec and show before/after