audit_trace_budget
Verify explicit reasoning steps before they become patches
What it does
audit_trace_budget verifies a structured trace of reasoning steps. Each step is a short claim with citations. The tool scores each step and flags the ones that are not supported by the cited evidence.
Use it when the output is a plan, decision log, root-cause analysis, or any reasoning you want to validate before code changes.
Inputs
steps(array):[{ claim: "...", cites: ["S0"] }]spans(array): evidence spans[{ sid: "S0", text: "..." }]verifier_model(string, defaultgpt-4o-mini)default_target(float, default0.95)require_citations(bool, defaultfalse)context_mode(string, default"cited"; accepts"all")timeout_s(float, default60.0)
Each step can optionally include confidence to override the default target for that specific claim.
How it works
- Use explicit steps. You provide the claims and their citations, rather than letting the tool split sentences.
- Select context. If
context_modeis"cited", only cited spans are considered for each step. - Score each step. The tool computes whether each claim is supported by the evidence and reports a budget gap when it is not.
Verifier behavior
- Uses strict textual entailment, not world knowledge
- Only declarative assertions in the context can support a claim
- Questions or instructions do not count as evidence
Outputs
The response includes:
flagged: whether any step failed verificationunder_budget: mirrorsflaggedfor this toolsummary: counts + verifier metadatadetails: one entry per step with budgets and flags
{
"flagged": false,
"under_budget": false,
"summary": {
"steps_scored": 4,
"flagged_steps": 0,
"units": "bits",
"verifier_model": "gpt-4o-mini",
"backend": "openai"
},
"details": [
{
"idx": 0,
"claim": "...",
"cites": ["S0"],
"flagged": false,
"required": { "min": 4.2, "max": 6.1, "units": "bits" },
"observed": { "min": 5.0, "max": 7.2, "units": "bits" },
"budget_gap": { "min": -1.0, "max": -0.2, "units": "bits" }
}
]
}How to read the report
- required: evidence budget needed to hit the target
- observed: evidence budget actually observed
- budget_gap: positive means under-supported, negative means enough support
- missing_citations: set when
require_citations=trueand a step has no cites
Recommended settings
- For merge gates:
default_target=0.95+require_citations=true - For exploratory planning:
default_target=0.90 - For strict evidence only:
context_mode="cited"
Operational requirements
OPENAI_API_KEYis required for authenticationBERRY_SERVICE_URLcan override the default service endpoint
Example call
audit_trace_budget(
steps=[
{ idx: 0, claim: "Auth validates issuer via JWT_ISSUER", cites: ["S0"] }
],
spans=[{ sid: "S0", text: "..." }],
require_citations=true,
context_mode="cited",
default_target=0.95
)When to use
- Plans, decision traces, RCA reports, or reasoning steps
- Any workflow where you want to verify reasoning before editing files
When not to use
- Final prose answers (use
detect_hallucination) - Outputs with no citations (unless you want them flagged)