detect_hallucination
Per-claim verification for cited answers
What it does
detect_hallucination scores each claim in a written answer and checks whether the cited evidence actually supports it. The output is a diagnostic report that marks unsupported or weakly supported claims.
It is designed for answers that are already written in natural language and include bracket citations like [S0].
Inputs
answer(string): the full answer text with citationsspans(array): evidence spans[{ sid: "S0", text: "..." }]verifier_model(string, defaultgpt-4o-mini)default_target(float, default0.95)max_claims(int, default25)require_citations(bool, defaultfalse)context_mode(string, default"cited"; accepts"all")timeout_s(float, default60.0)
How it works
- Split into claims. The answer is split into sentence-sized claims up to
max_claims. - Extract citations. Each claim is scanned for citations and mapped to known span IDs.
- Verify entailment. Each claim is checked for support from the selected context spans. If
context_modeis"cited", only cited spans are considered. - Return a report. The tool returns a per-claim report with budget metrics and a
flaggedstatus.
Verifier behavior
- Uses strict textual entailment, not world knowledge
- Only declarative assertions in the context can support a claim
- Questions or instructions do not count as evidence
Outputs
The response includes:
flagged: whether any claim failed verificationunder_budget: mirrorsflaggedfor this toolsummary: counts + verifier metadatadetails: one entry per claim with budgets and flags
{
"flagged": true,
"under_budget": true,
"summary": {
"claims_scored": 5,
"flagged_claims": 2,
"flagged_idxs": [1, 4],
"units": "bits",
"verifier_model": "gpt-4o-mini",
"backend": "openai"
},
"details": [
{
"idx": 1,
"claim": "...",
"cites": ["S0"],
"flagged": true,
"required": { "min": 10.1, "max": 12.4, "units": "bits" },
"observed": { "min": 2.0, "max": 3.2, "units": "bits" },
"budget_gap": { "min": 7.1, "max": 10.2, "units": "bits" },
"has_any_citations": true,
"missing_citations": false
}
]
}How to read the report
- prior_yes: probability of the claim without the cited evidence
- post_yes: probability of the claim with the cited evidence
- required: how much evidence is needed to hit the target
- observed: how much evidence was actually observed
- budget_gap: positive means under-supported, negative means enough support
- missing_citations: set when
require_citations=trueand a claim has no cites
Recommended settings
- For strict verification:
require_citations=true+context_mode="cited" - For exploration: lower
default_target(e.g.0.90) - For long answers: adjust
max_claimsand keep sentences short
Operational requirements
OPENAI_API_KEYis required for authenticationBERRY_SERVICE_URLcan override the default service endpoint
Example call
detect_hallucination(
answer="Auth validates issuer via JWT_ISSUER. [S0]",
spans=[{ sid: "S0", text: "..." }],
require_citations=true,
context_mode="cited",
default_target=0.95
)When to use
- Q&A, documentation, or analysis that is already written in sentences
- Any answer that should be verifiable claim-by-claim
When not to use
- Plan or reasoning traces (use
audit_trace_budget) - Outputs without citations (unless you want them flagged)