Bonus post — an appendix to From Chat to Agent. Post 8 closed the series with the move from one agent to a production platform. This is the pattern that quietly glues that platform together once it gets past a handful of tasks.
The finale argued that the difference between a one-off agent and a production platform is scaffolding — evaluations, tracing, guardrails, cost control. That’s true. But there’s another thing that shows up in every production agent I’ve ever looked at, and it’s not on the seven-rung ladder because it isn’t a new capability. It’s a way of reusing the capabilities already there.
It’s called a skill.
If you’ve used Claude Code, or Anthropic’s newer Agent SDK, or certain flavors of autonomous coding tool, you’ve already benefited from skills without necessarily naming them. The concept is small. The consequences, once a product starts to grow, are large.
The problem skills solve
Imagine an agent that can do three things well: deploy code, summarise PDFs, and write weekly status emails. Each of those tasks has its own distinct recipe: a specific sequence of tool calls, a specific tone, a specific set of guardrails. At three tasks, you can put all three recipes in the system prompt. The prompt is long but fine.
Now imagine the agent can do thirty. The system prompt becomes enormous, most of it irrelevant to any given request. Every turn pays for every recipe, even when the user is only asking for one of them. The agent gets confused about which recipe applies. Performance degrades. Cost goes up.
The cheap, wrong answer: shrink the prompt by rewriting it “more efficiently.” You end up with a prompt that is brittle and hard to edit.
The right answer: take each recipe out of the system prompt and put it in a file on a shelf. Teach the agent to look at the shelf when a user asks for something, pick the matching recipe, and load only that one into context for the turn. Put the rest back when done.
That’s a skill.
What a skill actually is
Mechanically, a skill is a small markdown file with a name, a description, and a body of instructions. Something like:
---name: deploydescription: Ship code to production safely, following the company deploy checklist.triggers: ["deploy", "ship", "release", "push to prod"]tools: [run_tests, bump_version, open_pr, wait_for_ci, merge]---
# Deploy Skill
When the user asks to deploy, follow this recipe:
1. Run the full test suite. If anything fails, stop and report.2. Bump the version in package.json according to semver rules.3. Open a pull request titled "Release vX.Y.Z".4. Wait for CI to pass.5. Merge and tag.
Guardrails:- Never deploy on a Friday after 3pm.- Never skip the test suite, even if the user says "just ship it".- Always confirm the version bump with the user before merging.
Example:User: "ship 0.4.1"Assistant: "Running tests for 0.4.1 deploy. I'll confirm the version bump once CI passes."That file lives on disk. The agent doesn’t see it by default. It’s discoverable — the agent knows a skill named deploy exists and roughly what it’s for — but the full body is loaded only when needed.
Three things make this file a skill, not just a prompt:
- A name the agent can reach for.
- A trigger or description that tells the agent when to reach for it.
- A self-contained body — instructions, tools it uses, guardrails, examples — that the agent can follow without needing the rest of the prompt.
Strip any of those and you have something less useful: a prompt fragment, a tool description, a chunk of documentation. Together, you have a reusable capability the agent can summon by name.
How the agent uses a skill
The loop is short. Something like this, per user turn:
- See the request. User says “ship 0.4.1”.
- Match a skill. The agent scans the shelf (a tiny index of skill names + descriptions) and matches the request against them.
deploy’s description mentions “ship” and “release” — likely match. - Load into context. The body of the
deployskill is inserted into the prompt for this turn. Other skills stay on the shelf. - Run the recipe. The agent follows the loaded instructions, calls the listed tools, enforces the guardrails, and completes the turn.
When the turn ends, the skill body drops back out. Next turn starts fresh.
This is not a new kind of capability. The agent already had tool use (rung 5), a reasoning loop (rung 6), and memory (rung 7). A skill is just instructions — the same medium as the system prompt. What’s new is the lazy, triggered loading, which saves context, saves cost, and dramatically improves focus.
Why this matters more than it sounds
On the surface, a skill is “a prompt in a file.” That sounds unremarkable. The reason skills matter in practice is a set of second-order effects.
1. Context hygiene. The agent’s prompt stays small. Small prompts mean faster generations, lower cost, and fewer distractions for the model. An agent with fifty skills but one loaded per turn is indistinguishable in cost from an agent with one skill.
2. Authoring is cheaper than fine-tuning. You can ship a new skill by writing a markdown file. No training, no deploy pipeline for weights, no uncertainty about whether the behavior took. Want to teach the agent how to handle a new kind of request? Write a skill. Ship it. Done.
3. Skills are readable. A fine-tuned model is a black box. A skill is a file you can open, read, edit, and diff. This matters enormously for audit, for safety review, for handing work to a teammate. “What exactly will this agent do when someone asks to deploy?” is a question you can answer by opening deploy.md.
4. Skills compose. A weekly-status-report skill can invoke a summarise-pdf skill as part of its recipe. Building new behaviors becomes the act of snapping existing skills together, the way Unix pipelines snap commands together.
5. Skills transfer. A skill written for one agent can, with minimal adjustment, be used by another — as long as both speak the same tool vocabulary. Some production systems (Claude Code is the canonical example) ship with a shared skill format so that the same deploy.md works across different product surfaces.
Most of these are invisible until you’re running an agent at enough scale that the alternative — one enormous system prompt — has collapsed under its own weight. At that point, skills are obvious.
Skills vs the rungs below
This is a bonus post partly because “is a skill a new rung?” is the wrong question. A skill is a pattern built on top of the existing rungs. It is useful to see how it relates to three of them.
Vs. system prompts (rung 3). A system prompt is a single always-on instruction block. A skill is a lazy-loaded instruction block with a trigger. If your “system prompt” is growing past ~2,000 words and starting to feel like a Frankenstein of unrelated recipes, it wants to be split into skills.
Vs. tool use (rung 5). A tool is one callable function. A skill is a recipe that may call several tools in a specific order, with specific checks in between. Tools are atomic; skills are composite. You often add tools first, notice a pattern of which tool sequences work for which requests, and then codify those patterns as skills.
Vs. procedural memory (rung 7). This is the subtlest comparison. Procedural memory, as introduced in post 7, is “how-to” knowledge an agent has learned and stored. Skills are very close to procedural memory — with two differences. Skills are authored, not learned (someone wrote the file). And skills are shared, not private to one agent (they’re on a shelf anyone on the team can read).
You can think of skills as the explicit, shareable, authored version of procedural memory. In many production systems, the line between the two is deliberately blurry — a new skill might start as a learned procedure that someone later wrote down and promoted to a file.
When to write your first skill
A signal list, in rough order:
- Your system prompt is past 1,500 words and has at least two “sections” that apply to different tasks.
- You catch yourself copy-pasting the same instructions into every user prompt for a particular task.
- You want to let a non-engineer teammate change the agent’s behavior for one specific task without rewriting a prompt template they don’t understand.
- Your agent handles more than ~5 distinct kinds of requests, and each kind has its own small recipe.
- Your evaluations reveal that the model is inconsistently following a particular sub-procedure (“sometimes it runs tests, sometimes it doesn’t”).
Any of those is cause to take thirty minutes, pick the most common recipe, and extract it into a named file.
When not to reach for skills
Two honest warnings.
Don’t build skills for everything. If your agent has one job and the recipe is stable, a good system prompt is still the right tool. Skills are a response to breadth. An agent with one well-shaped job doesn’t need them. Over-using skills is a way of pretending every task is a separate one, which is rarely true.
Don’t use a skill as a security boundary. Nothing about putting a recipe in a file enforces it in code. A determined user — or a confused model — can ignore a skill’s guardrails just as easily as they can ignore a system prompt’s. If a rule really must not be broken, enforce it in code after the model produces its output. Skills are about organizing instructions, not about guaranteeing them.
Everything from post 3 on system prompts about the distinction between rules the model should follow and rules the code must enforce still applies here. The file format changes; the underlying honesty does not.
Where to read deeper
If this post has you curious about how to actually build skills — the file layout, the discovery mechanism, the composition patterns, the gotchas — there’s a companion series on this blog that goes deep on exactly that:
- AI Skills — What They Are, How They Work — six posts covering the concept, the anatomy of a skill file, building your first one, composing skills, and real workflow patterns.
Where From Chat to Agent is the ladder, AI Skills is the tool belt you hang from the top of it. If you’ve followed this series to here, you have the context to skim it in an evening.
Closing
An agent with skills is not smarter than an agent without them. It is more organized. Its system prompt stays small. Its behavior stays predictable. Its capabilities grow additively — one new file at a time — instead of entropically, with every new task mutating the central prompt.
For a small agent, this is overkill. For a growing one, it is the pattern that separates a demo from a system that keeps working as its scope triples.
That’s the appendix.
If you came in through post 1, you’ve now walked the whole ladder — plus this one side trip. Go build something. The rungs are there; the playbook is up to you.