Why this mattersThe shift: from supervising tasks to directing an agent that checks itself
Part oneAnatomy of a /goal: a crisp definition-of-done
/goal is a Claude Code command (v2.1.139+). You give it a completion condition and Claude keeps working across turns toward it — you don't re-prompt each step. After every turn, a separate small fast model (Haiku by default) reads the conversation and decides yes/no: is the condition met? A 'no' sends Claude back to work with the reason as guidance for the next turn. A 'yes' clears the goal. Setting it starts a turn immediately — the condition itself is the directive. This is the 'loops until done' behaviour from the video: the agent doesn't clock off at 5pm regardless; it keeps going until your definition-of-done holds.- Set it:
/goal all tests in test/auth pass and the lint step is clean— the condition is the prompt. No separate message needed. - Make the condition provable from Claude's own output. The evaluator doesn't run commands or read files itself — it only judges what Claude has surfaced in the transcript. 'npm test exits 0' works because Claude runs it and the result lands in the conversation.
- Give it a measurable end state (a test result, a build exit code, a file count, an empty queue), a stated check (how to prove it), and constraints that must not change on the way (e.g. 'no other test file is modified').
- Bound the run: add a clause like
or stop after 20 turnsso a goal that can't converge doesn't burn tokens forever. The condition can be up to 4,000 characters. - Check or stop:
/goal(no argument) shows the condition, turns evaluated, token spend, and the evaluator's latest reason;/goal clearends it early.
Part twoAnatomy of a workflow: verify each part, diff against spec
ultracode effort setting and let Claude decide when the task is big enough to warrant one.- Verify each part, not the whole. Ask the workflow to verify each part of the plan separately — a per-component check catches the one module that drifted, where a single 'does it work?' pass hides it.
- Demand a diff against spec. The high-leverage instruction is 'report what was implemented and if anything differed.' That report is where you spend your attention now — not re-reading every line, just the delta between what you asked for and what shipped.
- Let refutation do the QA. Subagents checking each other's findings is the mechanism. You don't have to design it — you invoke it by asking for a workflow and stating what 'correct' means.
The patternThe copy-paste template: spec → goal → workflow → report
- Write the spec first. Even three bullet points. The diff report is only as useful as the spec it's diffing against — vague spec, useless report.
- Set the goal:
/goal implement [SPEC or @spec.md] fully — every acceptance criterion holds and the test suite exits 0, without modifying [files/areas that must stay untouched], or stop after [N] turns - Then, in the run, ask for the workflow: 'Use a workflow to verify each part of the plan against the spec, and prepare a report of what was implemented and whether anything differed from the spec — flag every deviation with the file and the reason.'
- Read the report, not the code. Your job is now the deltas: what differed, and whether each deviation is acceptable. Approve, or feed a correction back as a new turn — the goal is still active, so it keeps working.
- For unattended runs, pair it with auto mode (approves tool calls within a turn) so each goal turn runs without per-tool prompts; /goal also works non-interactively, e.g.
claude -p "/goal …"runs the whole loop in one invocation.
Where to point itThree jobs that fit this shape
- The overnight refactor. Goal: 'migrate every call site off the old API until the project compiles and all tests pass, touching no test files.' Workflow: verify each migrated module compiles and its tests pass, report any call site it couldn't migrate and why. You read the exceptions list in the morning, not the diff.
- The spec build. Goal: 'implement @design-doc.md until all acceptance criteria hold.' Workflow: verify each criterion independently, report which are met and where the implementation diverged from the doc. The diff is your acceptance review.
- Batch QA. Goal: 'work through the labelled issue backlog until the queue is empty.' Workflow: for each item, verify the fix against the issue's stated repro and report any it closed without a verifiable check. You audit the unverifiable ones.
Don't get burnedFailure modes: where this quietly breaks
- Vague done-criteria. 'Make the code better' has no measurable end state, so the evaluator can never confidently say yes — the loop runs until your turn-cap, burning tokens (and at $10/$50 per million on Fable 5, that adds up). Always give a test result, exit code, count, or empty-queue condition.
- No verifiable success check. The /goal evaluator only judges what's in the transcript — it doesn't run commands or open files. If your condition is 'the feature works' but Claude never runs the feature, the evidence isn't there to evaluate. State how Claude should prove it ('npm test exits 0', 'git status is clean').
- Verifying the whole instead of each part. A single end-of-run 'does it work?' check hides the one component that drifted. Ask the workflow to verify each part and diff each against spec — the deviation you care about is usually local.
- A spec the report can't diff against. If the spec lives only in your head, 'what differed from the spec' has nothing to compare to. Write it down — even briefly — and reference it (
@spec.md) so the diff is real. - No turn or time bound. A goal with no
or stop after N turnsclause and a condition that can't converge will keep looping. Bound every long run.
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
What's the exact difference between /goal and a workflow?
/goal (Claude Code v2.1.139+) sets a completion condition and keeps Claude working across turns until a separate fast model (Haiku by default) confirms the condition holds — it's the 'loop until done' part. A dynamic workflow is Claude writing an orchestration script that fans work across parallel subagents and checks the results before they reach you — it's the 'verify and report' part. The two-part prompt sets a goal (definition-of-done) and asks for a workflow (per-part verification + diff against spec).Are /goal and workflows specific to Fable 5, or do they work on other models?
/goal is a command and dynamic workflows are an orchestration capability, both available in recent Claude Code. They shine on Fable 5 because it's built for long-horizon agentic runs (minutes to hours, self-verifying), which is exactly what a looping goal plus a verifying workflow exercises. You can use the pattern with other capable models; the payoff scales with how long and autonomously the model can run.How do I write a /goal condition the evaluator can actually judge?
Is Claude Fable 5 free, and for how long?
Can I run this unattended, like the overnight job in the video?
/goal with auto mode so tool calls are approved within each turn — then every goal turn runs without per-tool prompts. /goal also works in non-interactive mode: claude -p "/goal CHANGELOG.md has an entry for every PR merged this week" runs the loop to completion in a single invocation (Ctrl+C to stop early). A goal still active when a session ends is restored on --resume/--continue, though the turn count and token baseline reset. That's the 'leave it running on the Mac Mini overnight' workflow, made reproducible.