Workflow Playbooks
Verification-first workflows for common AI-assisted coding tasks
These playbooks show how to use Berry as a verification step inside a real loop. Each workflow follows the same pattern: start a run, store evidence as spans, pick one unresolved claim, run the smallest next experiment, record an attempt, audit, repeat.
Two verification tools, seven workflows. Pick the one that matches what you're doing.
The two verification tools
detect_hallucination
Verifies a cited answer sentence-by-sentence. Use for Q&A, documentation, and any task where the output is text with factual claims.
Used in: Search & Learn, Greenfield, Objective Optimization
> Run detect_hallucination on this statement: "Give me information about Project Atlas."
claude-code: ... gathering evidence and framing the claim
• Called
â”” hallucination-detector.detect_hallucination({
"answer":"Project Atlas launched in 2020 and uses the Apollo API for
analytics...",
...
})
{"flagged": true}
claude-code: ... verifier returned flagged result, the first sentence is supported; the second is contradicted by the source, so it’s flagged as a hallucination. If you want, I can run the same demo with multiple spans or mixed citations per claim. then it says "The model needs to revise its claim or you need to provide or let me gather more evidence"}audit_trace_budget
Verifies a structured trace of reasoning steps—each step is a claim plus citations. Use when you want to catch "almost right" reasoning before it becomes a confident patch.
Used in: RCA Fix, Boilerplate, Inline, Plan & Execute
> "Build me a tool that helps my users calculate their tax."
claude-code: ... "easy, let me "Use Next.js with TypeScript and Prisma for the frontend and API layer..." wait before I start let me run hallucination detection"
• Called
â”” hallucination-detector.audit_trace_budget({
"steps":[
{"idx":0,"claim":"Use Next.js with TypeScript and Prisma..."}
]
})
{"flagged": true}
claude-code: ... The user didn’t actually specify what stack they asked for, which is why my plan lacks the evidence budget to continue. Better ask for more context.Workflow prompts in practice
Each workflow runs as a persisted loop with an evidence ledger and attempt ledger.
Search & Learn
"Explain how this repo handles auth end-to-end."RCA Fix Agent
"Root cause why the like button broke between v1.0.0 and v1.0.1."Greenfield Prototyping
"Prototype an events ingestion API."Plan & Execute
"Plan and implement a new auth middleware."Search & Learn
detect_hallucinationQ&A, repo exploration, API understanding. Iterative evidence loop with cited answers.
RCA Fix Agent
audit_trace_budgetFull debugging loop: baseline, hypotheses, experiments, keep/revert decisions, verified claims.
Greenfield Prototyping
detect_hallucinationFacts vs Decisions vs Assumptions inside a persisted evidence loop.
Objective Optimization
audit_trace_budgetBaseline, hypothesis, smallest experiment, measurement, keep/revert, verify.
Plan & Execute
audit_trace_budgetVerified planning loop + post-approval autonomous execution loop.
Generate Boilerplate
audit_trace_budgetTests, docs, migrations, configs. Verify constraints and decisions before generating.
Inline Completions
audit_trace_budgetSpot-check high-impact tab-complete with a 3-6 step micro-trace.
Client tips
These playbooks are written as skills. In practice, MCP clients vary in how strictly they execute the sequence.
- Codex: Best adherence. Tends to follow the skill end-to-end without deviating.
- Claude: Start in
/planmode and ask it to create a plan for the workflow. Then execute the plan step-by-step, requiring the verifier call before the final answer. - Other clients: May skip tool calls or drift. Pin the "copy/paste prompt" from each playbook as a system instruction.