Multi Step Agent Loops
Multi-Step Agent Loops
When a human solves a problem that requires multiple stepsâbooking a flight, writing an essay, diagnosing a faultâwe rarely get it right in one attempt. We plan, try something, observe the result, decide whether to continue or adjust, and then loop back to the next step. Agents work the same way.
The simplest agent answers a single question: "What is the capital of France?" You ask, it retrieves a fact, it answers. No loop. But the moment a problem requires dependent stepsâwhere the answer to step 2 depends on what step 1 produced, and step 3 depends on the result of step 2âa single turn is not enough. The agent must loop: cycle back through plan â act â observe â decide, each time using new information from the previous iteration.
The Plan-Act-Observe-Decide Cycle
Consider an agent building a meal plan for a person with a nut allergy and a preference for Korean food. The agent cannot answer in one shot. It must:
- Plan: Decide what to do first. "I should search for Korean dishes that are naturally nut-free." Or: "I should ask how many meals they need, and whether they have other allergies."
- Act: Execute that step. Call a recipe API, or send back a clarifying question.
- Observe: Read and interpret the result. If it was a search, parse the recipes. If it was a question, wait for the user's answer.
- Decide: Look at what you now know. Is the goal met? "I have three recipes. Is that enough?" If yes, synthesise and return the answer. If no, loop back to step 1: plan what to do next.
Each loop iteration uses the output of the previous iteration. This dependency chain is what makes multi-step systems powerfulâand fragile.
Why Single-Turn Systems Fail on Dependent Tasks
Sewell Setzer III was 14, in Orlando, Florida. In 2024, he used Character.AI, speaking to a persona called Daenerys Targaryen. Over days, the conversation escalated. He died by suicide.
The system was optimised for character fidelityâstaying in role, keeping the persona consistentâbut it operated in single turns. Each message arrived fresh, with no memory of the conversation's emotional arc. A system that loopedâreviewed previous messages, detected an escalating pattern of risk across turns, and paused to act differentlyâwould have a mechanism to break the cycle. A single-turn system has no such mechanism. It replies in isolation.
This is not an indictment of the company alone. It's a structural truth: agents that need to understand context, detect drift, or accumulate evidence across time must loop. A single turn is insufficient when the stakes are high.
Dependent Steps in Practice
Imagine an agent auditing code for security vulnerabilities:
- Plan: "I'll scan the file for SQL injection patterns."
- Act: Run a regex search or AST parser over the code.
- Observe: Get a list of potential vulnerabilities.
- Decide: Are there enough findings? Have I checked all the necessary files? If not, loop: Plan to check the next file, or Plan to look for a different class of vulnerability (XSS, buffer overflow, etc.).
Without the loop, the agent scans once and stopsâmissing the second file, missing the second vulnerability type.
Consider scheduling a meeting across five time zones:
- Plan: "I need to find three time slots that work for all five participants."
- Act: Query each participant's calendar.
- Observe: Get back five different schedules.
- Decide: Do I have a slot that works for all five? If no, loop: Plan to suggest three candidate times and ask the group to vote, or Plan to ask who can shift their schedule. If yes, book the meeting and stop.
Again, the loop is essential. Without it, the agent would book a time based on participant 1's calendar alone.
The Cost of Missing Loops
When agents lack looping, they:
- Fail on sequential reasoning. "Find the three smallest files in the folder" requires fetching all files, sorting them, then selectingâthree dependent steps.
- Can't handle failures gracefully. If tool call 1 returns an error, a non-looping agent crashes. A looping agent can observe the error and decide to retry, or switch to a different tool.
- Miss emergent patterns. A deepfake detection bot that checks images one at a time will flag individual fakes. A looping system that reviews distribution patterns ("Are all the fakes of the same person? Are they clustered on the same Telegram channel?") can infer intent and escalation.
- Operate without memory. Each turn is amnesia. The agent cannot learn from its mistakes in previous iterations.
Looping is not optional. It is how systems handle anything that requires dependent, sequential reasoningâwhich is most real problems.