Ai Ml

ADD for Engineering Leadership: Owning Direction and Verification

When agents and teams produce more than any leader can read, line-by-line review collapses. ADD reframes the job: the ADR and its fitness functions are the frozen contract, fitness checks are the red tests, and production evidence — not a clean diagram — is how direction is verified.

Tin Dang avatar
Tin Dang
Series hero on warm paper: 'ADD for Engineering Leadership — own direction and verification, not line-by-line review'

An AI drafts a service mesh proposal in twelve minutes. The RFC is forty pages. The diagrams are clean. The tradeoffs section reads confidently. The engineering lead skims it, finds no obvious errors, approves it — and three months later the team discovers the design requires cross-AZ synchronous calls on every write, a constraint that was in the original SLO document but never made it into the prompt. The latency budget breaks. A quarter of rework follows.

This is fast waste at the architecture level: the output looks finished, so the error survives a quick read and surfaces later when it is expensive. The leader’s seniority did not protect them. Speed did not cause the mistake. The absence of a frozen direction — with its constraints made explicit before any design work began — did.

AI is now a credible producer of architecture proposals, ADRs, RFC drafts, tech strategy decks, and prioritization rationales. It can write the entire paper. What it cannot do is know which constraints are load-bearing, which prior decisions are still operative, and which SLOs will be quietly violated by a plausible design. That judgment is the engineering leader’s job — and the volume of AI-authored proposals means it has to be exercised upstream, at the direction level, rather than distributed across every line of a forty-page document.

This is Part 7 of the ADD Across the Org series. The software-native version of this method lives at /blog/how-add-fixes-ai-era-sdlc; the framework overview explains the universal translation. Here the domain is engineering leadership — and the translation surfaces a reframe most leaders resist until they have been burned: your job is no longer to review every artifact; it is to own direction and verify by evidence.

The four failures, for engineering leaders

The same four AI-era failures show up in every domain. At the leadership level they have specific shapes.

Fast waste in architecture — a well-written design that is wrong at scale or misaligned with a constraint nobody stated. The proposal looks complete, so the error passes review. Months later, production corrects it expensively.

Context rot in decisions — the constraint lives in the lead’s head, or in a closed Slack thread, or in an ADR from two years ago that nobody re-reads. Every AI session starts cold and re-guesses the settled decisions. Prior ADRs get quietly reversed by plausible proposals that do not know they existed.

Trust-by-inspection breaks down — an RFC is readable and internally consistent without being deployable within budget, team shape, or latency constraints. “The diagram looks clean” is not evidence the design satisfies its fitness functions.

Verification becomes the ceiling — when the org’s AI-accelerated teams produce more proposals, ADRs, and strategy documents than the lead can evaluate, the excess is not throughput. It is unreviewed direction risk accumulating.

The fourth failure is the one most leaders underestimate. At the code level, unreviewed output risks defects. At the architecture level, it risks structural decisions that take quarters to unwind.

The loop, translated for engineering leadership

ADD’s eight steps map cleanly onto the leadership workflow. The key shift: the frozen contract is the ADR and its fitness functions, the red tests are the fitness checks the design must satisfy before the team builds against it, and the green build is implementation — which is now fully delegated.

Ground: load the actual constraints

Before any design work begins, the leader loads the real context: SLOs and their current baselines, budget ceilings, team shape and review capacity, prior ADRs still in force, and explicit non-goals. This is not documentation theater — it is the artifact the AI producer reads before drafting anything.

A grounding file is short on purpose. If it takes more than a page to describe the binding constraints, it will not be read. The goal is to make the load-bearing facts impossible to miss.

Specify: the direction brief with named refusals

The direction brief is ADD’s Specify step applied to an architecture or strategy decision. It states what the design must achieve (the target quality attributes), what it must not do (each with a named reason code), and the after-state — what “a good decision” looks like when the work is done.

The refusal codes matter. They turn a list of preferences into a set of enforceable constraints that an AI producer can check against before submitting a proposal.

# DIRECTION-BRIEF: payments-service-decomposition
## Context
Monolith handles $X ARR in transactions. Current p99 checkout latency: 420ms (SLO: <200ms).
Team shape: 6 engineers, 2 with distributed systems depth.
Budget ceiling: +$Y/month infra. No headcount growth approved this half.
## Must achieve
- Checkout p99 < 200ms end-to-end (measured at API gateway)
- Zero downtime migration path from monolith
- No new operational surface requiring a dedicated on-call rotation
## Must NOT — with refusal codes
- VIOLATES-CONSTRAINT: design that requires cross-AZ synchronous calls on the write path
(breaks latency SLO under AZ degradation)
- UNSTATED-TRADEOFF: any approach that trades consistency for latency without explicit
acknowledgment that we are a payments processor and consistency is non-negotiable
- REVERSES-PRIOR-ADR: ADR-2023-04 chose Postgres as the single durable store;
proposals introducing a second durable store require a new ADR, not an implementation
- SCOPE-BEYOND-MANDATE: full microservices decomposition is a separate initiative;
this decision is scoped to the checkout critical path only
## After-state
An ADR that a senior engineer can implement against without asking clarifying questions
about latency, consistency, or migration strategy.
## Lowest-confidence assumption (flag first)
The migration path assumes the monolith can emit dual-writes cleanly. If event sourcing
is required to decouple writes, scope expands and team depth may be insufficient.
Flag this assumption explicitly in any proposal.

The lead reads the lowest-confidence assumption first. If it cannot be confirmed, direction is not ready to hand off.

Scenarios: model case, edge, failure

Three scenarios make the constraints concrete before any proposal is written.

Model case — a standard decomposition decision: extract the checkout service behind a well-defined interface, keeping Postgres as the durable store, migrating traffic gradually with feature flags. This is the baseline the AI should target.

Edge case — a scale or cost boundary: what if checkout volume spikes 10x during a seasonal peak? Does the proposed design hold its latency SLO without requiring a same-day infra intervention?

Failure case — a plausible proposal that quietly breaks an SLO: an event-driven design with an async queue on the write path that achieves the latency number in steady state but degrades to 2,000ms under backpressure, violating the SLO in the exact conditions where it matters most.

The failure scenario is not pessimism. It is the case most likely to survive a casual review because the diagram looks clean and the happy-path numbers are correct.

The frozen contract: the ADR and its fitness functions

The frozen contract in engineering leadership is the approved ADR — the architecture decision, its rationale, its constraints, and the fitness functions that the implementation must satisfy. This is the one human gate.

The leader signs off on direction: the what the design must achieve and the what it must not do. The how — the specific implementation, library choices, migration sequence — is delegated. Once the ADR is frozen, it does not change casually. A team that needs to violate a constraint files a change request that reopens the brief, not an implementation PR that quietly reverses the decision.

Acceptance checks: architecture fitness functions

The fitness functions are the “red tests” — the checkable definition of done that the design and its implementation must satisfy. They are written before any code exists.

# FITNESS FUNCTIONS: payments-service-decomposition
## Latency
[ ] p99 checkout latency < 200ms at API gateway (measured, not modeled)
[ ] p99 holds under synthetic AZ degradation (one zone removed from rotation)
[ ] No synchronous cross-AZ calls on write path (verifiable from service mesh telemetry)
## Consistency
[ ] Zero inconsistency events in dual-write shadow period (7-day minimum)
[ ] Rollback path tested: traffic reverted to monolith in < 5 minutes
## Cost
[ ] Infra cost delta within approved ceiling (measured post-migration, not estimated)
## Coupling
[ ] Checkout service has no compile-time dependency on monolith internals
[ ] ADR-2023-04 honored: Postgres remains single durable store
## Non-goals respected
[ ] No new operational surface requiring dedicated on-call
[ ] Scope limited to checkout critical path; no other services decomposed

A design proposal that cannot specify how each fitness function will be measured is not ready for approval. A proposal that assumes a fitness function away — “we expect p99 to be fine under AZ failure” — has an UNSTATED-TRADEOFF and should be returned.

Produce: delegate the how

Constrain the what — the quality attributes, the constraints, the fitness functions. Leave the how — the migration sequence, the service boundaries, the library choices — to the teams and agents doing the work.

Once the ADR is frozen, implementation is fully delegated. The leader does not review every PR. They do not approve every library choice. The engineering team implements against the direction; agents assist the implementation. The how is disposable — it can be revised without reopening the ADR, as long as it continues to satisfy the fitness functions.

This is the reframe most leaders find uncomfortable: the artifacts — the brief, the ADR, the fitness functions — are the asset. The implementation is the output, and it is replaceable.

Review every lineOwn direction and verify
Where judgment is applied Distributed across every artifact produced Concentrated at direction (brief + ADR) and verification (fitness evidence)
What the leader approves Each RFC, PR, and design doc The frozen ADR and its fitness functions — once
How correctness is established "The diagram looks clean" / "reads well" Fitness-function results + production signal
What happens when volume scales Leader becomes the bottleneck on throughput Throughput scales; verification capacity is the managed ceiling
What is the durable asset The code and the documents The ADR, the fitness functions, the decision log
Risk of AI-authored proposals Plausible-wrong survives review VIOLATES-CONSTRAINT and UNSTATED-TRADEOFF caught at brief stage

Verify by evidence: production and fitness results — not the diagram

This is where the analogy to software ADD is most direct, and where most engineering leaders underinvest.

Verification is not a design review meeting where the team presents a clean slide deck. It is a structured check of whether the implementation satisfies the fitness functions, in production or in a production-equivalent environment.

Evidence means: the p99 latency number from service mesh telemetry, measured under realistic load, not modeled. The AZ-degradation test, run, not estimated. The dual-write inconsistency count from the shadow period, actual, not projected. The dependency graph, generated from the running service, not from the architecture diagram.

The adversarial move is a refute-read of the fitness evidence: treat the implementation as having not earned its green, and look for the fitness function that was either not measured or measured in conditions too favorable to reflect production. Common findings: latency measured at the service, not at the API gateway (misses network overhead). Consistency measured over two hours, not seven days (misses intermittent failure modes). Cost estimated from list price, not from actual billing data.

A clean diagram is not a passing fitness check. A passing fitness check is a passing fitness check — with a timestamp, a measurement methodology, and a sign-off.

Verification capacity is the real ceiling on the org’s throughput. An engineering leader can approve more direction — more ADRs, more strategy decisions — than their organization can implement and verify. The output beyond verification capacity is not velocity. It is unreviewed direction debt, accumulating interest in production.

Observe and fold: production signal into the next brief

After migration, the fitness functions become the dashboards. The p99 target, the coupling constraints, the cost ceiling — each is now a production metric. If any regresses, it re-enters the brief as a named constraint violation, not as a vague concern in a retrospective.

The decision log stays living. When the next decomposition question arrives — a different service, a different team — the lead does not reconstruct the rationale from memory or from a Slack thread. The ADR records what was decided, why the VIOLATES-CONSTRAINT codes were chosen, and what the migration surface looked like. Later decisions inherit proven patterns by name instead of re-deriving them.

ADRs that are archived and not revisited become context rot in slow motion. An ADR that is operationally true — that its fitness functions are still measured, that the constraints are still accurate, that new proposals check against it — is a living decision, not a historical document.

What does not transfer

ADD’s analogy to software is productive but not perfect. Three things stay irreducible.

Org judgment cannot be specified. The direction brief structures a decision, but it cannot capture the political weight of a constraint, the morale cost of a technical direction, or the long-term career implications of asking senior engineers to maintain a system they did not choose. These are leadership judgments that no brief or fitness function can hold.

Mentoring and taste. An architect who approves well-structured ADRs without developing the engineers who proposed them is optimizing the wrong thing. The fitness-function mindset can crowd out the slower work of building engineering judgment across the org — which is what makes the next generation of ADRs trustworthy.

Accountability for outcomes. The leader who delegates the how and verifies by evidence still owns the outcome. A fitness check that passes in staging and fails in production is not a process failure to point at — it is the leader’s failure to choose fitness functions that reflected production reality. The method raises the quality of the process; it does not transfer accountability.

Over-proceduralizing is its own failure mode. A team that writes perfect ADRs and fitness functions but ships nothing has applied the form without the judgment. The brief and the fitness functions are tools for concentrating judgment at the right moments — they are not a substitute for moving.

Next in the series

Part 8 brings ADD to DevOps and SRE: how pipeline gates replace manual review, how the observe step becomes an automated feedback loop, and why the verification ceiling looks different when the verifier is a deployment pipeline rather than a person. Read it at /blog/add-for-devops.

0

Next in this series

ADD for DevOps and SRE: Policy as Contract, Evidence as the Gate

Continue reading