Strawberry API

detect_hallucination

Per-claim verification for cited answers

What it does

detect_hallucination scores each claim in a written answer and checks whether the cited evidence actually supports it. The output is a diagnostic report that marks unsupported or weakly supported claims.

It is designed for answers that are already written in natural language and include bracket citations like [S0].

Inputs

answer (string): the full answer text with citations
spans (array): evidence spans [{ sid: "S0", text: "..." }]
verifier_model (string, default gpt-4o-mini)
default_target (float, default 0.95)
max_claims (int, default 25)
require_citations (bool, default false)
context_mode (string, default "cited"; accepts "all")
timeout_s (float, default 60.0)

How it works

Split into claims. The answer is split into sentence-sized claims up to max_claims.
Extract citations. Each claim is scanned for citations and mapped to known span IDs.
Verify entailment. Each claim is checked for support from the selected context spans. If context_mode is"cited", only cited spans are considered.
Return a report. The tool returns a per-claim report with budget metrics and a flagged status.

Verifier behavior

Uses strict textual entailment, not world knowledge
Only declarative assertions in the context can support a claim
Questions or instructions do not count as evidence

Outputs

The response includes:

flagged: whether any claim failed verification
under_budget: mirrors flagged for this tool
summary: counts + verifier metadata
details: one entry per claim with budgets and flags

{
  "flagged": true,
  "under_budget": true,
  "summary": {
    "claims_scored": 5,
    "flagged_claims": 2,
    "flagged_idxs": [1, 4],
    "units": "bits",
    "verifier_model": "gpt-4o-mini",
    "backend": "openai"
  },
  "details": [
    {
      "idx": 1,
      "claim": "...",
      "cites": ["S0"],
      "flagged": true,
      "required": { "min": 10.1, "max": 12.4, "units": "bits" },
      "observed": { "min": 2.0, "max": 3.2, "units": "bits" },
      "budget_gap": { "min": 7.1, "max": 10.2, "units": "bits" },
      "has_any_citations": true,
      "missing_citations": false
    }
  ]
}

How to read the report

prior_yes: probability of the claim without the cited evidence
post_yes: probability of the claim with the cited evidence
required: how much evidence is needed to hit the target
observed: how much evidence was actually observed
budget_gap: positive means under-supported, negative means enough support
missing_citations: set when require_citations=true and a claim has no cites

Recommended settings

For strict verification: require_citations=true + context_mode="cited"
For exploration: lower default_target (e.g. 0.90)
For long answers: adjust max_claims and keep sentences short

Operational requirements

OPENAI_API_KEY is required for authentication
BERRY_SERVICE_URL can override the default service endpoint

Example call

detect_hallucination(
  answer="Auth validates issuer via JWT_ISSUER. [S0]",
  spans=[{ sid: "S0", text: "..." }],
  require_citations=true,
  context_mode="cited",
  default_target=0.95
)

When to use

Q&A, documentation, or analysis that is already written in sentences
Any answer that should be verifiable claim-by-claim

When not to use

Plan or reasoning traces (use audit_trace_budget)
Outputs without citations (unless you want them flagged)