Series in progress

Anatomy of an AI Harness

An eight-part deep dive into the production architecture that turns a raw language model into a reliable engineering tool — dissected from real Claude Code source code. Covers tools, permissions, context, orchestration, skills, tasks, state, and cost.

8 parts 1h 11m total

Begin with Part 1

In this series

1 Anatomy of an AI Harness: What Lives Between You and the Model 9 min read
Everyone debates which AI model is best. The real engineering happens in the harness — the production system of tools, permissions, memory, and orchestration that makes any model actually useful. This is a map of that system, drawn from real source code.
2 The Tool System: How an AI Gets Hands 9 min read
A language model without tools is an expensive autocomplete. This post dissects how a production AI harness defines, registers, validates, and executes 40+ tools — from file reads to shell commands to MCP integrations — with type safety, concurrency control, and deferred loading.
3 The Permission Boundary: Human-in-the-Loop at Scale 10 min read
An AI with shell access and no guardrails will eventually destroy something you care about. This post dissects how a production harness implements layered permissions, hooks, dangerous pattern detection, and trust boundaries — balancing safety with usability.
4 Context Engineering: Building the Model's World 9 min read
The model is only as good as the context it receives. This post dissects how a production AI harness constructs system prompts, loads project instructions, manages persistent memory, and compresses context when the window fills — all to give the model the right information at the right time.
5 The Orchestration Loop: Where Everything Converges 9 min read
The orchestration loop is the heart of an AI harness — a state machine that coordinates API calls, streaming responses, concurrent tool execution, error recovery, and token budgets. This post traces the complete data flow from user input to final response.
6 Skills: Packaging AI Workflows as Code 8 min read
Ad-hoc prompting is fine for one-off questions. Repeatable workflows deserve structure. This post dissects how a production harness defines, discovers, loads, and executes skills — reusable AI workflows that turn tribal knowledge into executable automation.
7 Tasks and Concurrency: Background Agents at Work 8 min read
A production AI harness is not single-threaded. Background agents explore codebases, shell commands execute, remote agents run on cloud infrastructure — all while the main conversation continues. This post dissects the task system that manages this concurrency.
8 State, Cost, and the Production Surface 9 min read
The invisible foundation beneath every AI harness layer: centralized state management, per-model cost tracking, rate limit handling, a custom React-to-terminal renderer, and multiple entry points. This post covers what makes 'works in a demo' become 'works in production.'