Fast waste — the dominant failure of AI-era software delivery — is not a speed problem. It is a direction problem. An agent sprinting confidently in the wrong direction produces exactly as much waste as it would produce code in the right one. The sprint is the same; only the output changes. Part 1 named this failure and introduced the method that addresses it: AI-Driven Development, eight steps, looping. This part goes deep on the first two: Specify and Scenarios — the steps that pin the what before any machine-led work begins.
Both steps are human-led. The AI drafts; you decide. Together they produce the artifacts that make the agent’s job bounded — not just described.
Specify: naming what the feature must and must not do
The specification is not a description of intent. It is a set of rules written in the form the next three steps — Scenarios, Contract, Tests — can be derived from directly. A rule that cannot be turned into a scenario and a test is not precise enough to build from.
A complete specification has four parts:
- Must — the behaviors the feature is required to perform.
- Reject — the inputs or situations it must refuse, each paired with a named error code.
- After — the state that is true once it succeeds.
- Assumptions, lowest-confidence first — the things taken for granted, ranked so the most-likely-wrong comes first, with a flag explaining why and what it costs if wrong.
The ordering of these parts is deliberate. Must and Reject together define the behavioral surface — the complete answer to “what does this do and what does it refuse?” After anchors the success condition as a concrete state change, not a feeling. Assumptions, ranked and flagged, direct a reviewer’s attention where it pays, not where it is easy.
Here is a representative specification, using the money-transfer feature from the source material:
Feature: Transfer money between my own accountsFramings weighed: synchronous single-currency transfer (chosen) · queued transfer · multi-currency with FXMust: - move an amount from one of my accounts to another of mine - amount > 0 - source and destination are different accounts - source has enough balanceAfter: - source balance -= amount, destination balance += amountReject: - amount <= 0 -> "amount_invalid" - source == destination -> "same_account" - balance < amount -> "insufficient_funds" - account not mine -> "forbidden"Assumptions — lowest-confidence first: ⚠ same currency only (no FX) in v1 — lowest confidence because the ticket never said; if wrong: the amount/rounding model changes and this contract is wrong - [x] no daily limit in v1 — confirmed: out of scope for v1Several things earn their place in this format and are worth naming explicitly.
The Framings weighed line records what was considered and dropped. The single-currency choice is not a default — it is a decision. Showing what was weighed prevents the alternative from being re-litigated two steps later when the agent is already building.
Named error codes, not prose. “Reject bad amounts” is an instruction to guess. amount <= 0 -> "amount_invalid" is a rule that produces a testable scenario, a defined contract response, and — when production emits it — an alert you can write a monitor for. The code is the unit; prose is commentary on it.
The ⚠ flag. The AI ranks its own uncertainty and leads with what it is most likely wrong about, not a flat list of equal-looking assumptions. The product owner reads the flagged line first — the assumption most likely to be wrong and most expensive if it is — and confirms or corrects it in one sentence. In ai-proxy, this catch happened at the argon2 key-hashing decision: argon2 was the right choice for key storage, but the spec flag surfaced that it would add 50–200 ms to the hot authentication path. Caught at spec time, addressed at zero cost, because no code existed yet.
If you cannot write the spec, you do not yet understand the feature well enough to build it. The inability to specify is information — not an obstacle to push past.
This property is not incidental. A feature that resists precise statement is a feature whose owner disagrees about what it means. The resistance surfaces that disagreement before the agent builds one interpretation and the stakeholder expected another.
Co-specification: how the spec gets made
The spec is not dictated by one side. The process is collaborative and has three moves: the agent surfaces the decision space — the genuine framings and the open questions it would otherwise resolve by guessing — you react and redirect; then it drafts, ranking uncertainty lowest-confidence first; then you validate, reading the flag before anything else. The brainstorm leaves a light trace: what was chosen becomes a rule; what was weighed and dropped becomes the Framings weighed note; what stayed uncertain becomes the flag. Nothing new to maintain.
The agent’s instinct is to fill gaps silently and present a confident wall. The method forces those gaps into the open. The defining instruction is plain: if a requirement is unclear, ask — do not resolve it by guessing — and of the things you must assume, say where your confidence is lowest.
Scenarios: making “correct” concrete
Rules are still open to interpretation. “Source must have enough balance” leaves open: enough for what, exactly? What happens to the balances when it is not enough? A scenario removes the interpretation by pinning a specific situation to a specific expected result.
Scenarios occupy a unique position in the method: readable by people and checkable by machines at the same time. A product owner can confirm a scenario is what they meant; a test can be generated directly from it. This makes them the bridge between the human-led and machine-led halves of the flow. They are, arguably, the most leverage-bearing artifact in the method — because everything downstream, the tests and through them the build’s definition of success, derives from them.
The form is Given/When/Then, with an optional And clause for what must remain unchanged:
Scenario: successful transfer Given A has 100 and B has 0, both mine When I transfer 30 from A to B Then A has 70 and B has 30
Scenario: insufficient funds Given A has 20, mine When I transfer 50 from A to B Then it is rejected "insufficient_funds" And no balance changes
Scenario: not my account Given account C is not mine When I transfer 10 from C to B Then it is rejected "forbidden"The And no balance changes line in the second scenario is doing real work. It specifies that a rejected transfer must leave the world untouched — a property the agent could easily violate by deducting the amount before checking authorization. Every rejection scenario needs this clause, or a partial, corrupting failure can pass all tests.
What scenarios surface
The move from spec to scenarios is where ambiguity that survived the spec — ambiguity small enough to hide in a rule — becomes visible. A rule can seem precise and still leave the agent room to choose: which HTTP status code carries “insufficient_funds”? Does a rejection roll back a partially applied state, or is the check atomic? Scenarios pin those choices by making the observable result concrete.
In ai-proxy, the scenarios for the SigV4 authentication feature included a scenario asserting that the secret access key must never appear in the returned headers or in the repr of the credentials object. That constraint existed implicitly in the Must rules. Making it a standalone scenario — Given any credentials, When sign_request returns, Then the secret substring appears in neither the headers nor the repr — turned an implicit expectation into an explicit test target. The Build step had something checkable; the Verify step had something to adversarially probe.
The exit check
A scenario set is ready when every Must rule has at least one scenario, every Reject rule has at least one scenario, every result is a specific observable fact (not “it works”), and every rejection scenario asserts what must stay unchanged. A rule with no scenario is a rule that will never be tested — a rule in name only. Either write the missing scenario or remove the rule from the spec.
Why Specify and Scenarios kill fast waste
The agent’s failure mode — sprinting confidently in the wrong direction — depends on ambiguity. Every ambiguity is a gap the agent fills with a plausible guess. The guess looks finished. It survives a quick read. It surfaces in production.
An ambiguity that is not in the spec is not a gap in the agent’s knowledge — it is a decision the agent makes for you, silently, at build time.
Specify eliminates ambiguity in the rules. Scenarios eliminate ambiguity in what the rules mean in concrete situations. Together they hand the agent a domain it cannot expand: the Must list bounds what it must build; the Reject list bounds what it must refuse; the After-state bounds what success looks like; and the scenarios make every boundary checkable.
This is why both steps come before the Contract, before the Tests, before a single line of code. The cheapest moment to remove an ambiguity is in a sentence, before anything depends on it. A correction at Specify costs one conversation turn; the same correction at Build costs a re-generation and a re-test; at production, it costs an incident.
| Vague prompt | Specified + Scenarios | |
|---|---|---|
| What the agent builds against | Its own interpretation of the intent | An explicit Must/Reject/After rule set |
| How rejections are handled | Agent chooses error format, message, status code | Named error codes from the spec, bound in scenarios |
| Edge cases | Discovered in production when the agent guessed wrong | Surfaced at Specify, pinned in scenarios before any code |
| The After-state | Implicit — agent infers what success looks like | Explicit — stated as a concrete state change |
| What a test can assert | That the agent produced *something* | That the observable result matches a specific scenario |
| Where direction failures are caught | At production (expensive) | At Specify (free) |
The contrast in the last row is the one that matters. Fast waste is a direction failure caught late. Specify and Scenarios are where direction gets set — and where the failure gets caught early, when catching it is still free.
The artifact is the asset
The spec and the scenarios are not scaffolding. They do not disappear when the code is shipped. They are the durable record of what was decided and why — why this rejects “forbidden” rather than “not_found,” what a successful transfer guarantees about balances, what was weighed and dropped at the framing stage. In ADD, the code is disposable. The spec and scenarios are what you protect.
Specify and Scenarios are still editable
Both steps produce artifacts, not verdicts. The spec can be corrected. A scenario can be added. If the Build step exposes a missing rule — a case the spec did not anticipate — the flow sends you back to Specify to add the rule and back to Scenarios to add its scenario. This is the method working, not failing. Backward correction is always allowed; forward-skipping is what is forbidden.
What locks the spec and scenarios in place is not the step itself but the step that follows: the Contract freeze. Once the contract is frozen — the external interface, the data shapes, the error codes, locked and checksummed — the spec and scenarios it was derived from are effectively fixed as well. Changes after the freeze are not corrections; they are change requests that return the whole bundle to Specify.
That gate — the Contract freeze — is where Part 4 picks up.
Next in the series: The Frozen Contract — the one human gate in the default flow, why freezing the interface before building is what gives the agent real autonomy, and how ADD enforces the freeze mechanically so even a pseudocode comment change trips the alarm.