Workflow Playbooks

Verification-first workflows for common AI-assisted coding tasks

These playbooks show how to use Berry as a verification step inside a real loop. Each workflow follows the same pattern: start a run, store evidence as spans, pick one unresolved claim, run the smallest next experiment, record an attempt, audit, repeat.

Two verification tools, seven workflows. Pick the one that matches what you're doing.

The two verification tools

detect_hallucination

Verifies a cited answer sentence-by-sentence. Use for Q&A, documentation, and any task where the output is text with factual claims.

Used in: Search & Learn, Greenfield, Objective Optimization

Read the deep dive

claude-code
> Run detect_hallucination on this statement: "Give me information about Project Atlas."

claude-code: ... gathering evidence and framing the claim

• Called
  â”” hallucination-detector.detect_hallucination({
        "answer":"Project Atlas launched in 2020 and uses the Apollo API for
        analytics...",
        ...
      })
    {"flagged": true}
claude-code: ... verifier returned flagged result, the first sentence is supported; the second is contradicted by the source, so it’s flagged as a hallucination. If you want, I can run the same demo with multiple spans or mixed citations per claim. then it says "The model needs to revise its claim or you need to provide or let me gather more evidence"}

audit_trace_budget

Verifies a structured trace of reasoning steps—each step is a claim plus citations. Use when you want to catch "almost right" reasoning before it becomes a confident patch.

Used in: RCA Fix, Boilerplate, Inline, Plan & Execute

Read the deep dive

claude-code
> "Build me a tool that helps my users calculate their tax."

claude-code: ... "easy, let me "Use Next.js with TypeScript and Prisma for the frontend and API layer..." wait before I start let me run hallucination detection"

• Called
  â”” hallucination-detector.audit_trace_budget({
        "steps":[
          {"idx":0,"claim":"Use Next.js with TypeScript and Prisma..."}
        ]
      })
    {"flagged": true}

claude-code: ... The user didn’t actually specify what stack they asked for, which is why my plan lacks the evidence budget to continue. Better ask for more context.

Workflow prompts in practice

Each workflow runs as a persisted loop with an evidence ledger and attempt ledger.

Search & Learn

"Explain how this repo handles auth end-to-end."

RCA Fix Agent

"Root cause why the like button broke between v1.0.0 and v1.0.1."

Greenfield Prototyping

"Prototype an events ingestion API."

Plan & Execute

"Plan and implement a new auth middleware."

Search & Learn

detect_hallucination

Q&A, repo exploration, API understanding. Iterative evidence loop with cited answers.

RCA Fix Agent

audit_trace_budget

Full debugging loop: baseline, hypotheses, experiments, keep/revert decisions, verified claims.

Greenfield Prototyping

detect_hallucination

Facts vs Decisions vs Assumptions inside a persisted evidence loop.

Objective Optimization

audit_trace_budget

Baseline, hypothesis, smallest experiment, measurement, keep/revert, verify.

Plan & Execute

audit_trace_budget

Verified planning loop + post-approval autonomous execution loop.

Generate Boilerplate

audit_trace_budget

Tests, docs, migrations, configs. Verify constraints and decisions before generating.

Inline Completions

audit_trace_budget

Spot-check high-impact tab-complete with a 3-6 step micro-trace.

Client tips

These playbooks are written as skills. In practice, MCP clients vary in how strictly they execute the sequence.

  • Codex: Best adherence. Tends to follow the skill end-to-end without deviating.
  • Claude: Start in /plan mode and ask it to create a plan for the workflow. Then execute the plan step-by-step, requiring the verifier call before the final answer.
  • Other clients: May skip tool calls or drift. Pin the "copy/paste prompt" from each playbook as a system instruction.