Workflow Playbooks

Verification-first workflows for common AI-assisted coding tasks

These playbooks show how to use Berry as a verification step inside a real loop. Each workflow follows the same pattern: start a run, store evidence as spans, pick one unresolved claim, run the smallest next experiment, record an attempt, audit, repeat.

Two verification tools, seven workflows. Pick the one that matches what you're doing.

The two verification tools

`detect_hallucination`

Verifies a cited answer sentence-by-sentence. Use for Q&A, documentation, and any task where the output is text with factual claims.

Used in: Search & Learn, Greenfield, Objective Optimization

Read the deep dive

claude-code

> Run detect_hallucination on this statement: "Give me information about Project Atlas."

claude-code: ... gathering evidence and framing the claim

• Called
  └ hallucination-detector.detect_hallucination({
        "answer":"Project Atlas launched in 2020 and uses the Apollo API for
        analytics...",
        ...
      })
    {"flagged": true}
claude-code: ... verifier returned flagged result, the first sentence is supported; the second is contradicted by the source, so it’s flagged as a hallucination. If you want, I can run the same demo with multiple spans or mixed citations per claim. then it says "The model needs to revise its claim or you need to provide or let me gather more evidence"}

`audit_trace_budget`

Verifies a structured trace of reasoning steps—each step is a claim plus citations. Use when you want to catch "almost right" reasoning before it becomes a confident patch.

Used in: RCA Fix, Boilerplate, Inline, Plan & Execute

Read the deep dive

claude-code

> "Build me a tool that helps my users calculate their tax."

claude-code: ... "easy, let me "Use Next.js with TypeScript and Prisma for the frontend and API layer..." wait before I start let me run hallucination detection"

• Called
  └ hallucination-detector.audit_trace_budget({
        "steps":[
          {"idx":0,"claim":"Use Next.js with TypeScript and Prisma..."}
        ]
      })
    {"flagged": true}

claude-code: ... The user didn’t actually specify what stack they asked for, which is why my plan lacks the evidence budget to continue. Better ask for more context.

Workflow prompts in practice

Each workflow runs as a persisted loop with an evidence ledger and attempt ledger.

Search & Learn

"Explain how this repo handles auth end-to-end."

RCA Fix Agent

"Root cause why the like button broke between v1.0.0 and v1.0.1."

Greenfield Prototyping

"Prototype an events ingestion API."

Plan & Execute

"Plan and implement a new auth middleware."

Client tips

These playbooks are written as skills. In practice, MCP clients vary in how strictly they execute the sequence.

Codex: Best adherence. Tends to follow the skill end-to-end without deviating.
Claude: Start in /plan mode and ask it to create a plan for the workflow. Then execute the plan step-by-step, requiring the verifier call before the final answer.
Other clients: May skip tool calls or drift. Pin the "copy/paste prompt" from each playbook as a system instruction.

Workflow Playbooks

The two verification tools

`detect_hallucination`

`audit_trace_budget`

Workflow prompts in practice

Search & Learn

RCA Fix Agent

Greenfield Prototyping

Plan & Execute

Search & Learn

RCA Fix Agent

Greenfield Prototyping

Objective Optimization

Plan & Execute

Generate Boilerplate

Inline Completions

Client tips