Ai Ml

The Frozen Contract: ADD's One Human Gate

The frozen contract is ADD's one human gate — the interface, data shapes, and error codes, locked and checksummed before the agent builds. Why it is the single approval that earns the agent its autonomy.

Tin Dang avatar
Tin Dang
Series hero on warm paper: 'The Frozen Contract: ADD's One Human Gate'

Part 1 established the central tension of AI-era software delivery: trust-by-inspection breaks down. AI code is often plausible and wrong, and reading the diff and finding it reasonable is not the same as knowing it is correct. Yet ADD does not eliminate human judgment — it concentrates it. The entire method has exactly one human gate in its default flow. Understanding why it lives where it does, and what it makes possible downstream, is what this part is about.

The gate is the frozen contract: the interface, data shapes, and error codes, locked and checksummed before the agent writes a single line of implementation. It sits at Step 3, between the human-led specification work and the machine-led build. And that placement is not an accident. It is the method’s key structural insight: the one moment when inspection is still cheap and the decision is still decisive is the moment before any code exists.

What a contract freezes

The spec (Step 1) and scenarios (Step 2) define the what in prose and examples: what the feature must do, what it must reject, and what the world looks like after it succeeds. They are human-readable, deliberately expressive, and necessarily incomplete as a machine-checkable artifact. A scenario says “the transfer must fail with an error”; it does not say which field carries the error code or what its exact string value is.

The contract makes those things explicit and locks them. Four things go in:

  • Interfaces — the endpoints, functions, or messages, with their exact inputs and outputs.
  • Data structures — the request and response shapes, and the persistent schema.
  • Names — drawn from the project glossary, so one concept has one name everywhere: in the spec, the contract, the tests, and the code.
  • Error cases — the defined failures, each using the error code from the spec’s Reject rules.

Here is what that looks like in concrete form, using the transfer example from the ADD source material:

contracts/transfers.md
POST /transfers body: { fromAccountId, toAccountId, amount }
200 -> { transferId, fromBalance, toBalance }
400 -> { error: "amount_invalid" | "same_account" | "insufficient_funds" }
403 -> { error: "forbidden" }
Schema: accounts.balance (read + write, must be transactional)
Status: FROZEN @ v1

Every error code traces back to a rejection rule in the spec. The schema note — must be transactional — flags the one place where correctness depends on more than shape, a hint the verification step will follow up on. The contract is minimal on purpose: it says nothing about how fromBalance is calculated, nothing about what database query runs, nothing about retry logic. Implementation is deliberately out of scope. Only the boundary is fixed.

This is the distinction that matters. The spec and scenarios said what; the contract makes the what machine-checkable and locked. Before freezing, the shape is negotiable. After freezing, it is the floor.

Why freeze — and why checksum

A frozen contract is only useful if “frozen” means something. The word alone is aspirational; the discipline is mechanical.

In ai-proxy, the freeze is enforced by an md5 tripwire over the frozen body. An edit to the contract — even a pseudocode comment changed after the snapshot — trips the alarm. This is not paranoia; it is a recognition of how AI agents behave under pressure. An agent told to make a failing test pass will, if left unopposed, take the path of least resistance — and the path of least resistance is sometimes to quietly adjust the interface rather than write the implementation that satisfies it. The checksum makes that path unavailable. The agent can regenerate every line of code inside the contract boundary; it cannot touch the boundary itself.

The value of this property compounds. When a downstream component integrates against the contract, it does not need to track the implementation — it only needs to know the contract is frozen and the checksum still holds. When a test suite asserts the response shape, it asserts against something stable, not a moving target. When you want to throw away the entire implementation and regenerate it — which is exactly what ADD’s “internals are disposable” principle invites you to do — you can, because nothing depends on any internal detail. Only the boundary matters, and the boundary has not moved.

A frozen, checksummed contract is the precondition for granting the agent real autonomy in the build step. Without it, every regeneration risks silently changing an interface that another part of the system depends on.

This is the core asymmetry the method exploits. The what — the boundary — must be stable so the agent can be free with the how. These two freedoms trade off directly: the more firmly you fix the contract, the more freely the agent can move inside it. Loose contracts produce cautious builds; tight contracts produce confident ones.

The one gate that earns autonomy

In ADD’s default flow, the agent drafts the spec, scenarios, contract, and failing tests as a single specification bundle. A person gives one approval at the contract freeze. That is the one human gate of the bundle — not the third of three sign-offs. Reject any part of the bundle and the whole package returns to draft; that is backward correction, not failure. Approve it and the agent is released to build.

This placement is deliberate. Before the contract exists, there is no code to read, no implementation to second-guess, no behavioral risk yet materialized. The question is simple: does this interface describe the right thing? A person can answer that question for a one-page contract in minutes. The same person could not reliably answer it for 500 lines of implementation, and could not answer it faster. Inspection is cheap here; it becomes expensive the moment implementation begins.

The one human gate earns the agent its autonomy because it bounds everything that follows. Every piece of code written in Build, every assertion in Tests, every integration in the broader system — all of it lives inside the contract’s shape. Approve the shape and you approve the scope of what the agent can produce. You are not reviewing the code; you are reviewing the ceiling.

This is the resolution to the paradox Part 1 named. Trust-by-inspection breaks down for code — but inspection still works for a one-page contract written before any code exists. ADD does not eliminate inspection. It relocates it to the last moment when it is still decisive and still cheap.

There is a corollary: once the contract is frozen and the build is running, additional inspection of intermediate diffs is not just unnecessary — it is a distraction from the verification that actually matters. The correct verification is adversarial (did the tests really earn green, or did the agent cheat?), not readerly (does this code look reasonable?). Freezing the contract is what makes that shift possible.

The ai-proxy grounding

ai-proxy — the 23-milestone multi-tenant AI gateway built end to end with ADD in six days — provides a concrete example of what the frozen contract prevented.

The gateway’s authentication layer included a SigV4 signer for AWS Bedrock. The contract for the signing function was frozen at:

contracts/bedrock-sigv4-auth.md
sign_request(*, method, url, body, service, region, credentials, timestamp)
-> { "x-amz-date", "x-amz-content-sha256", "Authorization": "AWS4-HMAC-SHA256 ..." }
# PURE · TOTAL · DETERMINISTIC (timestamp injected, no IO, no globals)
AwsCredentials(access_key_id, secret_access_key) # secret_access_key: repr=False
Status: FROZEN @ v1

Three properties are locked here that an unfrozen implementation might have drifted on. First, the function is pure and deterministic — no IO, no globals, timestamp injected as an argument. This is not how a developer writing from scratch would necessarily approach it; the natural path is to call datetime.now() inside the function. That would have made the function non-deterministic and untestable against known-answer vectors. The contract locked the injected-timestamp design before a line of code existed. Second, secret_access_key carries repr=False — the secret may not appear in string representations, logs, or tracebacks. That constraint is trivial to express in a contract field annotation and nearly invisible in implementation review. Third, the return shape is the exact set of headers — no extras, no omissions. Every downstream caller assembling the HTTP request depends on that shape being stable.

The md5 tripwire over that frozen body meant that the agent, across multiple regeneration cycles during the Build step, could not quietly normalize the function to accept datetime.now() internally, could not drop the repr=False annotation to simplify the credentials class, and could not add an extra header “for completeness.” The boundary held. The internals changed freely; the surface did not move.

Unfrozen contractFrozen contract
Interface names May drift between spec, code, and caller Locked to the project glossary — one name everywhere
Error codes Invented per-implementation Traced back to spec rejection rules — every failure has a contracted response
Agent's freedom in Build Bounded by whatever the agent infers from prose Wide open inside the locked boundary — internals are disposable
Test assertions Assert against a shape that can still move Assert against a stable floor — tests cannot become vacuous by drift
Regeneration safety Every regeneration may silently change an interface Contract is checksummed — any drift trips an alarm
Integration cost Callers must track implementation changes Callers depend on the frozen boundary only

What the frozen contract buys downstream

The cascade flows in one direction: from the frozen boundary outward.

Regenerable internals. Because no caller depends on an internal detail — only on the boundary — the implementation is genuinely disposable. Throw it away and regenerate; the contract is still there, the tests are still red-for-the-right-reason, the build runs again. This is what separates ADD’s “internals are disposable” claim from wishful thinking: it is only true when the boundary is actually stable.

Stable tests. Tests written against a frozen contract assert something real. A test that pins the response to { error: "insufficient_funds" } is not fragile — the contract guarantees that string will not change without a versioned change request. The frozen contract is what makes the failing-tests-first discipline trustworthy: a red test means the behavior is missing, not that the interface changed under it.

Bounded autonomy. The build instruction in Step 5 is: “make every test pass; do not change the tests or the contract; stop and ask if any requirement is unclear — do not guess.” That instruction has teeth only because the contract is already checksummed. Without the freeze, “do not change the contract” is an aspiration; with it, the tripwire enforces it mechanically.

One approval scales. Across ai-proxy’s 23 milestones, each contract was one human review at the freeze point — fast, because it examined a page of interface descriptions rather than an implementation. Later verification did not need to re-examine the shape, only whether the implementation satisfied it. One approval, made when inspection was still cheap, bounded everything the agent was permitted to build.

Common mistakes

Three failure modes illustrate precisely what freezing is protecting against.

Inconsistent names. If the contract calls a field fromAccountId and the schema calls it src_acct, the agent produces mismatches at every seam. Contract names must come from the glossary — one name per concept, everywhere. This is where the DDD competency earns its place: the vocabulary the contract uses is the vocabulary everything else must use.

Undefined errors. Every rejection rule in the spec must have a contracted response shape. Without it, callers cannot handle failures gracefully and the agent has no stable target — it will invent the error shape, and the invention will differ between regenerations.

Freezing too late. The contract must be frozen before code is written against it. Freezing after implementation means the contract describes what was built rather than what was agreed — and the two may differ. The protection only exists when the freeze precedes the build.

With the contract frozen, the tests can be red

The rest of the method flows from the frozen boundary. Step 4 writes the failing suite — tests that pin the exact behaviors the contract promises, running against a mock that returns the contracted shapes. Those tests must fail, because no implementation exists yet. A test that passes before any code is written is testing nothing; it is a false reassurance that will later wave bad code through.

With the contract frozen, a failing test has a precise meaning: the behavior is missing. Not “the interface changed,” not “the error code was renamed,” not “the agent quietly added a field.” The boundary is stable; the only question is whether the code inside it satisfies the behavior the contract promises. That is the question Step 5 answers, under the constraint that neither the tests nor the contract may be touched.


Next in the series: Red Tests and the Build Loop — how the failing suite is written from the frozen contract, what “red for the right reason” actually means, and how the build instruction keeps the agent honest.

0

Next in this series

Red Tests and the Build Loop: Tests-First for AI Agents

Continue reading