Ai Ml

Tool Use — Giving the Model Hands

Text in, text out — until you let the model call functions. This is the moment a chatbot stops explaining how to do things and starts actually doing them. Here's what function calling really is, the typed contract that makes it work, and why this is the hinge rung of the whole ladder.

Tin Dang avatar
Tin Dang
Hand-drawn hub-and-spoke diagram with an LLM cloud at the center and six tool cards — read file, search web, send email, query database, run code, call API — radiating outward as labeled spokes

Every rung below this one has been about improving what the model says. System prompts shape how it talks. RAG controls what it knows. Chat history lets it remember what you said. But on every one of those rungs, the model is still a text-in, text-out function. It tells you how to boil water. It does not boil water.

This rung — tool use — is where that changes.

It is the hinge of the entire ladder. Everything above it (agent loops, memory, multi-agent platforms) assumes tool use is in place. Everything below it is a fundamentally different kind of product. The moment your AI can reach out and press a real button, it stops being a chatbot and starts being an assistant. The same underlying model, given tools, becomes a qualitatively different thing.

Here’s how it actually works.

Hand-drawn circular four-step tool-use cycle — offer schemas, model decides, harness executes, result feeds back — with the key insight: the model asks for the tool to be run
The four-step tool cycle. The model does not run the tool — the model asks for the tool to be run.

The conceptual shift

The core trick is so small it feels like cheating.

You show the model a menu of tools. For each tool, you describe: what it’s called, what it does, and what inputs it needs. You then tell the model — in the prompt — that when it wants to use a tool, it should emit a specific structured response (a small piece of JSON) naming the tool and filling in the inputs. Your harness watches for that response, runs the tool, and feeds the result back to the model as another message.

The model doesn’t run the tool. The model asks for the tool to be run.

That distinction is easy to gloss over and worth pausing on. The language model is still just producing text. What changed is that the harness around it now knows how to interpret certain pieces of that text as requests, execute them, and return the results. The model gained hands the way a person gains a phone: by being given a new way to affect the world beyond itself.

Anatomy of a tool call

A single tool call involves four steps. Let’s walk through one concretely.

Step 1 — The harness gives the model a tool schema.

{
"name": "send_email",
"description": "Send an email on behalf of the user.",
"input_schema": {
"type": "object",
"properties": {
"to": {"type": "string", "format": "email"},
"subject": {"type": "string"},
"body": {"type": "string"}
},
"required": ["to", "subject", "body"]
}
}

This lives alongside the system prompt. The model sees all available tools every turn.

Step 2 — The model decides to call it. Given a user message like “Email my boss and let her know I’ll be late,” the model’s response is not prose. It’s a structured call:

{
"tool": "send_email",
"input": {
"to": "jane@company.com",
"subject": "Running late this morning",
"body": "Hi Jane — quick note, I'll be about 15 minutes late..."
}
}

The model filled in plausible values based on context (it’s seen earlier in the conversation that “my boss” is Jane).

Step 3 — The harness validates and executes. The harness checks the input against the schema (is to really a valid email? is body within a sane length?), asks the user for permission if the tool is flagged as destructive, and — only then — actually calls whatever backend function sends an email. The tool returns a result:

{ "status": "sent", "id": "msg_0184a..." }

Step 4 — The harness feeds the result back to the model. The tool result is appended to the conversation as a new message, flagged as a tool response. The model can now see what happened and decide what to say next — usually, “Sent. I told Jane you’ll be 15 minutes late.”

That four-step cycle — offer → decide → execute → feed back — is the entire mechanism. Everything else is scale.

The typed contract

The single most important thing about a tool is its schema. Not the function it calls, not its implementation — the schema the model sees.

A good schema includes:

  • A name that tells the model when to reach for this tool.
  • A description that is genuinely a description, not a one-word label.
  • Input fields with clear types, required/optional flags, and explanations.
  • Optional examples of good calls.

A schema that nails these produces dramatically better tool use than a schema that’s been copy-pasted from an OpenAPI spec with no thought. The model is not reading your code; it is reading your description. Write it for a reader.

Behind the schema, the actual tool function can be anything: a database query, an HTTP call, a file read, a shell command, a call to another language model. The harness treats them uniformly; the model doesn’t know or care.

When to give the model tools — and when not to

The temptation, once tools are wired up, is to wire up every tool. Don’t.

Each tool you expose:

  • Costs context window (the schemas take tokens).
  • Costs latency (the model has to consider more options each turn).
  • Costs clarity (the model may pick the wrong tool from a crowded menu).
  • Costs safety (more tools, more ways things can go wrong).

A product with three well-chosen tools almost always beats a product with thirty sloppy ones. A useful rule of thumb: expose a tool only if you’ve watched a real user ask for exactly that action more than once. Everything else stays a possibility for later.

When the tool count does grow past what fits comfortably in context, the standard move is deferred loading: the model sees tool names and one-line descriptions, and must call a meta-tool (search_tools) to fetch the full schema of the few it wants to use. This scales to hundreds of tools without drowning the context window.

Tool choice: letting the model decide

Most APIs now let you control how the model chooses a tool. There are three common modes:

  • Auto — the model picks any tool (or none) based on the user’s message. This is the default and what you usually want.
  • Required — the model must call some tool. Use this when you’re running a workflow where text-only replies are wrong (e.g., an extraction pipeline).
  • Specific — you force the model to call exactly one named tool, passing all the choice back to your code. Useful for structured outputs in disguise.

The common mistake is using “required” when “auto” would do, and getting tool calls on turns where a plain reply would have been fine. Give the model the option to just answer.

Handling errors, gracefully

Tools fail. APIs time out. Inputs are invalid. Permissions get denied. The question is not whether your tools will fail — they will — but how gracefully the model recovers.

The trick is to return the error to the model as a tool result, not to crash or swallow it silently. If send_email returns {"error": "recipient not found"}, the model sees that and can respond naturally: “I couldn’t find that recipient — which email address should I use?” It will not do this if the harness swallows the error and returns null.

The same rule applies to validation failures. If the model calls a tool with a missing required field, the cleanest move is to return a structured error describing what’s missing, and let the model try again. Models are remarkably good at fixing their own calls when they can see what went wrong.

One call vs a loop

A single tool call is just a function call with a nicer API. It doesn’t yet make a product “agentic” — it makes it scripted.

The real power comes when the model can chain calls: “first look up the flight, then check the calendar for conflicts, then ask the user, then book it.” That chaining is the next rung, the agent loop, and it’s where this series’ story pivots from augmented chat to genuine autonomy. Post 6 is where we go.

One important design note before we leave tools. Everything in this post has been local: a harness, a model, and a set of tools you hand-wrote. But tools can also come from elsewhere — from an external server that advertises what it offers. That’s the Model Context Protocol (MCP), a growing standard that lets a harness discover and call tools published by other services, without the harness author having to write any code. It’s the same four-step cycle dressed in a network protocol. The existence of MCP is why “tool use” is rapidly becoming less about writing tools and more about connecting to them.

The honest limits

Tools are not a fix for every weakness of the model. Two in particular:

1. Hallucinated tool calls. The model will, occasionally, invent a tool that doesn’t exist or pass invalid arguments. Good harnesses validate before execution. Great harnesses feed the error back and let the model retry. A hallucinated call that gets blocked is a minor annoyance; one that gets executed is a disaster.

2. Overconfident tool use. Given tools, a model will sometimes use them when it shouldn’t — calling search_web for a question it could answer from its own training, for instance. The fix is in the system prompt: give the model explicit guidance on when not to use tools. “If the answer is a basic fact you already know, just answer.”

The broader lesson is that tool use multiplies both the model’s power and its failure modes. Every rung above tool use — agent loops especially — is partly a story of how to keep that multiplication controlled.

Where we go next

Tool use gave the model hands. The next rung gives it the capacity to use those hands more than once, in sequence, with checking along the way. That’s the agent loop: think → act → observe → reflect, until the goal is met or the budget is spent.

Read next: The Agent Loop — ReAct, Plan-Act-Observe.

0

Next in this series

The Agent Loop — ReAct, Plan-Act-Observe

Continue reading