detect_hallucination

detect_hallucination

Per-claim verification for cited answers

What it does

detect_hallucination scores each claim in a written answer and checks whether the cited evidence actually supports it. The output is a diagnostic report that marks unsupported or weakly supported claims.

It is designed for answers that are already written in natural language and include bracket citations like [S0].

Inputs

  • answer (string): the full answer text with citations
  • spans (array): evidence spans [{ sid: "S0", text: "..." }]
  • verifier_model (string, default gpt-4o-mini)
  • default_target (float, default 0.95)
  • max_claims (int, default 25)
  • require_citations (bool, default false)
  • context_mode (string, default "cited"; accepts "all")
  • timeout_s (float, default 60.0)

How it works

  1. Split into claims. The answer is split into sentence-sized claims up to max_claims.
  2. Extract citations. Each claim is scanned for citations and mapped to known span IDs.
  3. Verify entailment. Each claim is checked for support from the selected context spans. If context_mode is"cited", only cited spans are considered.
  4. Return a report. The tool returns a per-claim report with budget metrics and a flagged status.

Verifier behavior

  • Uses strict textual entailment, not world knowledge
  • Only declarative assertions in the context can support a claim
  • Questions or instructions do not count as evidence

Outputs

The response includes:

  • flagged: whether any claim failed verification
  • under_budget: mirrors flagged for this tool
  • summary: counts + verifier metadata
  • details: one entry per claim with budgets and flags
{
  "flagged": true,
  "under_budget": true,
  "summary": {
    "claims_scored": 5,
    "flagged_claims": 2,
    "flagged_idxs": [1, 4],
    "units": "bits",
    "verifier_model": "gpt-4o-mini",
    "backend": "openai"
  },
  "details": [
    {
      "idx": 1,
      "claim": "...",
      "cites": ["S0"],
      "flagged": true,
      "required": { "min": 10.1, "max": 12.4, "units": "bits" },
      "observed": { "min": 2.0, "max": 3.2, "units": "bits" },
      "budget_gap": { "min": 7.1, "max": 10.2, "units": "bits" },
      "has_any_citations": true,
      "missing_citations": false
    }
  ]
}

How to read the report

  • prior_yes: probability of the claim without the cited evidence
  • post_yes: probability of the claim with the cited evidence
  • required: how much evidence is needed to hit the target
  • observed: how much evidence was actually observed
  • budget_gap: positive means under-supported, negative means enough support
  • missing_citations: set when require_citations=true and a claim has no cites

Recommended settings

  • For strict verification: require_citations=true + context_mode="cited"
  • For exploration: lower default_target (e.g. 0.90)
  • For long answers: adjust max_claims and keep sentences short

Operational requirements

  • OPENAI_API_KEY is required for authentication
  • BERRY_SERVICE_URL can override the default service endpoint

Example call

detect_hallucination(
  answer="Auth validates issuer via JWT_ISSUER. [S0]",
  spans=[{ sid: "S0", text: "..." }],
  require_citations=true,
  context_mode="cited",
  default_target=0.95
)

When to use

  • Q&A, documentation, or analysis that is already written in sentences
  • Any answer that should be verifiable claim-by-claim

When not to use

  • Plan or reasoning traces (use audit_trace_budget)
  • Outputs without citations (unless you want them flagged)