Workflow Playbooks
Verification-first workflows for common AI-assisted coding tasks
These playbooks show how to use Berry as a verification step to catch when a model "vibed" instead of using real evidence. Each playbook is a tight sequence: collect evidence, write with citations, run the verifier, revise.
Two tools, five workflows. Pick the one that matches what you're doing.
The workflows are designed around detect_hallucination andaudit_trace_budget, so any agentic system can plug them in as a verification layer. Built around Berry's two verification tools, these playbooks follow an evidence gate: collect evidence, write with citations, run the verifier, revise.
The two verification tools
detect_hallucination
Verifies a cited answer sentence-by-sentence. Use for Q&A, documentation, and any task where the output is text with factual claims.
Used in: Search & Learn, Greenfield Prototyping
> Run detect_hallucination on this statement: "Give me information about Project Atlas."
claude-code: ... gathering evidence and framing the claim
• Called
â”” hallucination-detector.detect_hallucination({
"answer":"Project Atlas launched in 2020 and uses the Apollo API for
analytics...",
...
})
{"flagged": true}
claude-code: ... verifier returned flagged result, the first sentence is supported; the second is contradicted by the source, so it’s flagged as a hallucination. If you want, I can run the same demo with multiple spans or mixed citations per claim. then it says "The model needs to revise its claim or you need to provide or let me gather more evidence"}audit_trace_budget
Verifies a structured trace of reasoning steps—each step is a claim plus citations. Use when you want to catch "almost right" reasoning before it becomes a confident patch.
Used in: Refactoring, Boilerplate, Inline
> "Build me a tool that helps my users calculate their tax."
claude-code: ... "easy, let me "Use Next.js with TypeScript and Prisma for the frontend and API layer..." wait before I start let me run hallucination detection"
• Called
â”” hallucination-detector.audit_trace_budget({
"steps":[
{"idx":0,"claim":"Use Next.js with TypeScript and Prisma..."}
]
})
{"flagged": true}
claude-code: ... The user didn’t actually specify what stack they asked for, which is why my plan lacks the evidence budget to continue. Better ask for more context.Workflow prompts in practice
These playbooks are written as skills with max-contrast (vibes vs evidence) examples, aimed at exploration, RCA, and greenfield planning.
Search & Learn
"Explain how this repo handles auth end-to-end."Refactoring & Bug Fixes
"Root cause why the like button broke between v1.0.0 and v1.0.1."Greenfield Prototyping
"Make me a website."Plan & Execute (Verified)
"Vibe code me a new CV rewriting tool that hacks the ATS system."Search & Learn
detect_hallucinationQ&A, repo exploration, API understanding. When you're asking questions or trying to understand unfamiliar code.
Refactoring & Bug Fixes
audit_trace_budgetRCA-gated changes with an audited claim trace. Force a high-signal writeup backed by evidence before merging.
Greenfield Prototyping
detect_hallucinationMove fast with Facts vs Decisions vs Assumptions. Don't let assumptions get smuggled in as facts.
Generate Boilerplate
audit_trace_budgetTests, docs, migrations, configs. Verify constraints and decisions before generating code.
Inline Completions
audit_trace_budgetSpot-check high-impact tab-complete with a 3-6 step micro-trace.
Client tips
These playbooks are written as skills. In practice, MCP clients vary in how strictly they execute the sequence.
- Codex: Best adherence. Tends to follow the skill end-to-end without deviating.
- Claude: Start in
/planmode and ask it to create a plan for the workflow. Then execute the plan step-by-step, requiring the verifier call before the final answer. - Other clients: May skip tool calls or drift. Pin the "copy/paste prompt" from each playbook as a system instruction.