Ai Ml

The Agent Loop — ReAct, Plan-Act-Observe

One tool call is an API. A loop of tool calls with reasoning in between is an agent. This post walks through the four-step cycle that turns one-shot chat into step-by-step work — and the surprisingly tricky question of when to stop.

Tin Dang April 17, 2026 10 min read

Hand-drawn circular wheel diagram with four quadrants — think, act, observe, reflect — arrows curving clockwise, and an exit arrow labeled 'done' leaving to the right

At the last rung, we gave the model hands. It can now call a tool, read a file, send an email, query a database, run a piece of code. But every example in that post was a single call — the model decides to do one thing, the harness does it, the result comes back, and the model composes a reply.

Most real work is not one thing. It is a sequence.

“Find me a flight to Tokyo next Wednesday under $800 that leaves after 9 AM.” That request, on inspection, is four tool calls long and requires looking at results between each one. If the flight search returns no matches, the model should notice, widen the date range, and try again. If the budget’s too tight, it should say so rather than silently giving up. The work is not a call. It is a loop.

This rung is that loop.

Hand-drawn circular wheel with four quadrants — Think, Act, Observe, Reflect — with clockwise arrows, an exit arrow labeled Done, and four termination safeguards around the perimeter — The agent loop: think, act, observe, reflect — until the goal is met or the budget is spent.

From one call to a loop

If the previous post’s mental model was offer → decide → execute → feed back, the agent loop is that cycle running multiple times per user turn, until the goal is genuinely done.

The model is not “an agent” by virtue of being smart. It’s an agent by virtue of being embedded in this outer loop. The same model that behaves as a chatbot when called once behaves as an agent when called in a loop. The difference is not in the model; it is in the scaffolding.

A minimal agent loop, in pseudocode:

context = [system_prompt, user_goal]

while True:
    response = model.generate(context, tools=TOOLS)

    if response.is_text():
        return response.text            # done — model is replying

    if response.is_tool_call():
        result = run_tool(response.tool, response.input)
        context.append(response)        # record the call
        context.append(result)          # record the result
        continue                        # loop back

    if response.asks_for_clarification():
        return response.text            # surface the question to the user

That’s the whole thing. An agent is literally a few dozen lines of Python around the model call from post 5.

The four-step cycle, named

The loop above has a name in the AI literature: ReAct (short for “Reason + Act”), introduced in a 2022 paper and since adopted as the default mental model for simple agents. The idea is that the model interleaves two kinds of outputs:

Thoughts — short natural-language reasoning steps: “the user wants a flight; first I should search.”
Actions — tool calls that turn thoughts into effects.

And then observes the result, and repeats. Four steps per iteration:

Think. The model writes a short reasoning trace: “What do I know? What do I need? What’s the best next step?”
Act. It picks a tool and issues the call.
Observe. The tool result is appended to the context. The model reads it.
Reflect. The model evaluates: did that work? Am I closer to the goal? What’s next?

Steps 1 and 4 often blur together in modern models — a single generation pass can include both reasoning-about-what’s-next and reasoning-about-what-just-happened. But conceptually, four steps is the right mental model.

Why the loop beats one giant call

It’s tempting to ask: why bother with a loop? Why not just write a bigger prompt that tells the model to do everything in one go — “first search flights, then check my calendar, then book it” — and let it produce the answer?

Two reasons, and both are practical.

First, you can’t predict what the tools will return. The flight search might return zero results, or exorbitant prices, or a schedule conflict the model didn’t know about. In a one-shot prompt, the model has to guess what the world will say and write a plan that handles every branch in advance. In a loop, it handles each branch as it comes, with the actual data in front of it. The loop is just the world’s cheapest form of reactive programming.

Second, errors compound. In a one-shot plan, a small mistake early — wrong date format, misread parameter, a tool that returned an unexpected shape — cascades through everything downstream. In a loop, the model sees the error, reads the context, and adjusts. It’s the same reason humans solve problems step-by-step rather than writing out the whole plan before starting.

The loop is not always better. For very short, linear tasks (one tool call, a single reply), loop overhead is wasted effort. But for anything resembling real work, the loop is dramatically more robust.

Termination: the problem nobody warns you about

The first agent loop you build will, almost without exception, either terminate too early (stopping before the job is done) or too late (spinning in circles at your expense).

Termination is surprisingly hard. A few strategies that work:

1. Goal-based termination. The model is told, in the system prompt, what “done” looks like: “You are finished when the user has a confirmed booking or an explicit failure explanation.” The loop ends when the model produces a plain-text reply instead of a tool call — a natural signal that it thinks it’s done.

2. Step budget. A hard cap — say, 20 iterations — on how many loop passes are allowed per user turn. If the cap is hit, the harness stops and returns whatever the model has gathered, with a note that the budget was exhausted. This is your insurance against infinite loops.

3. No-progress detection. If three consecutive iterations produce the same tool call with the same arguments, something’s wrong. Break the loop. This catches the most common failure mode: the model repeating itself when it’s genuinely stuck.

4. Time budget. For user-facing agents, a wall-clock timeout (say, 60 seconds) that ends the loop and surfaces partial progress to the user. Better to say “I couldn’t finish in time — here’s what I have so far” than to silently hang.

Production agents use all four, combined. The loop is the easy part. Stopping it gracefully is where most “autonomous agent” products quietly ship their worst code.

The cost of a loop

Every iteration of the loop is a full model call, with all the accumulated context sent again. For a 15-iteration loop, the user paid for 15 model calls and the context got longer each time. Costs add up faster than people expect.

Back-of-the-envelope math: a single call on a frontier model costs maybe $0.02 for a simple turn. A 15-iteration loop on the same model might cost $0.50 — or several dollars, if the context has grown large. Multiply that by a few thousand user turns a day and the economics change.

Two practical mitigations:

Use smaller models for the inner loop. A cheap, fast model for routine steps; a strong model only when genuinely needed. Many production systems have a “router” pattern where the main loop runs on a small model and delegates tricky reasoning to a larger one.
Summarize instead of re-sending. When the loop gets long, replace old tool calls and results with a compact summary. The model can still see the gist of what happened without paying for every detail again.

This is the same context-management game we’ll see again at the memory rung. Agents and memory share a budget: the context window.

Planner-executor: one more pattern

ReAct is the simplest agent loop. A common elaboration is planner-executor: split the agent into two models (or two calls of the same model), one that plans and one that executes.

The planner reads the user goal and produces a plan — a list of steps in natural language, without running anything yet.
The executor then runs through the plan step by step, calling tools for each, adjusting as results come in.

The advantage: the plan is legible. You can show it to the user (“here’s what I’m going to do — okay?”), cache it, reuse it, or hand it to a different agent to execute. You also get clearer debugging when something goes wrong: you can inspect the plan and see where it diverged.

The cost is latency — you now have two model calls for the initial plan instead of one. For short tasks, overkill. For long or consequential ones, often worth it.

What makes a loop feel “intelligent”

After you’ve built a few agent loops, you’ll notice that the ones that feel most intelligent to users share a few properties:

They narrate. A good loop tells the user what it’s doing, at the right granularity. Not “I just called tool_17” — “Checking your calendar for Wednesday.”
They ask before acting on anything irreversible. Permission prompts for destructive actions are not a nuisance; they’re a trust-building feature.
They know how to stop. They finish cleanly when done. They say they’re stuck when they’re stuck.
They don’t over-plan small things. A one-step request gets a one-step answer. A ten-step request gets a plan. Matching the response shape to the task is itself a design choice.

Building agents is mostly engineering these behaviors. The model is the engine; the loop, the budget, the narration, the termination — these are the car around it.

What loops still cannot do

The loop only runs during a single user turn. Once the turn ends, everything in the context is gone — unless something else preserves it. The model has no notion that you came back. It does not know that last Tuesday it helped you debug the same issue.

That’s the next rung. Memory. Not the fake memory the chat baseline provides by re-sending the transcript — real, persistent, retrievable memory that survives across sessions and builds over time.

Next in this series

Memory — How Agents Build Continuity