Multi Step Agent Loops

Multi-Step Agent Loops

When a human solves a problem that requires multiple steps—booking a flight, writing an essay, diagnosing a fault—we rarely get it right in one attempt. We plan, try something, observe the result, decide whether to continue or adjust, and then loop back to the next step. Agents work the same way.

The simplest agent answers a single question: "What is the capital of France?" You ask, it retrieves a fact, it answers. No loop. But the moment a problem requires dependent steps—where the answer to step 2 depends on what step 1 produced, and step 3 depends on the result of step 2—a single turn is not enough. The agent must loop: cycle back through plan → act → observe → decide, each time using new information from the previous iteration.

The Plan-Act-Observe-Decide Cycle

Consider an agent building a meal plan for a person with a nut allergy and a preference for Korean food. The agent cannot answer in one shot. It must:

Plan: Decide what to do first. "I should search for Korean dishes that are naturally nut-free." Or: "I should ask how many meals they need, and whether they have other allergies."
Act: Execute that step. Call a recipe API, or send back a clarifying question.
Observe: Read and interpret the result. If it was a search, parse the recipes. If it was a question, wait for the user's answer.
Decide: Look at what you now know. Is the goal met? "I have three recipes. Is that enough?" If yes, synthesise and return the answer. If no, loop back to step 1: plan what to do next.

Each loop iteration uses the output of the previous iteration. This dependency chain is what makes multi-step systems powerful—and fragile.

Why Single-Turn Systems Fail on Dependent Tasks

Sewell Setzer III was 14, in Orlando, Florida. In 2024, he used Character.AI, speaking to a persona called Daenerys Targaryen. Over days, the conversation escalated. He died by suicide.

The system was optimised for character fidelity—staying in role, keeping the persona consistent—but it operated in single turns. Each message arrived fresh, with no memory of the conversation's emotional arc. A system that looped—reviewed previous messages, detected an escalating pattern of risk across turns, and paused to act differently—would have a mechanism to break the cycle. A single-turn system has no such mechanism. It replies in isolation.

This is not an indictment of the company alone. It's a structural truth: agents that need to understand context, detect drift, or accumulate evidence across time must loop. A single turn is insufficient when the stakes are high.

Dependent Steps in Practice

Imagine an agent auditing code for security vulnerabilities:

Plan: "I'll scan the file for SQL injection patterns."
Act: Run a regex search or AST parser over the code.
Observe: Get a list of potential vulnerabilities.
Decide: Are there enough findings? Have I checked all the necessary files? If not, loop: Plan to check the next file, or Plan to look for a different class of vulnerability (XSS, buffer overflow, etc.).

Without the loop, the agent scans once and stops—missing the second file, missing the second vulnerability type.

Consider scheduling a meeting across five time zones:

Plan: "I need to find three time slots that work for all five participants."
Act: Query each participant's calendar.
Observe: Get back five different schedules.
Decide: Do I have a slot that works for all five? If no, loop: Plan to suggest three candidate times and ask the group to vote, or Plan to ask who can shift their schedule. If yes, book the meeting and stop.

Again, the loop is essential. Without it, the agent would book a time based on participant 1's calendar alone.

The Cost of Missing Loops

When agents lack looping, they:

Fail on sequential reasoning. "Find the three smallest files in the folder" requires fetching all files, sorting them, then selecting—three dependent steps.
Can't handle failures gracefully. If tool call 1 returns an error, a non-looping agent crashes. A looping agent can observe the error and decide to retry, or switch to a different tool.
Miss emergent patterns. A deepfake detection bot that checks images one at a time will flag individual fakes. A looping system that reviews distribution patterns ("Are all the fakes of the same person? Are they clustered on the same Telegram channel?") can infer intent and escalation.
Operate without memory. Each turn is amnesia. The agent cannot learn from its mistakes in previous iterations.

Looping is not optional. It is how systems handle anything that requires dependent, sequential reasoning—which is most real problems.