The five patterns behind every
multi-agent system that actually works
Design around context boundaries, not org charts. Start simple, earn complexity.
Not everything needs multiple agents. That sounds obvious, but the current hype cycle makes it easy to forget. Teams read about orchestrator-worker patterns and multi-agent collaboration, then immediately try to build a fleet of specialized agents for a problem that one well-prompted agent handles perfectly.
We see this constantly. A founder wants an AI that can research competitors, draft outbound messages, and summarize support tickets. The instinct is to build three agents and connect them. The right move, almost always, is to start with one agent that does all three sequentially, then split only when you hit a real bottleneck.
This article is a practical guide to making that call. When does one agent stop being enough? What are the actual patterns for multi-agent systems? And what goes wrong when you add complexity before you've earned it?
The single-agent baseline
A single agent with a good prompt, the right tools, and clear context can do far more than most teams realize. It can chain multiple steps, call APIs, read files, make decisions, and produce structured output. The limitation is not capability. The limitation is context.
Every model has a finite context window. As you pack more instructions, more tool outputs, and more intermediate results into that window, the agent's performance degrades. It starts forgetting earlier instructions. It conflates information from different sources. It makes connections that aren't there.
This is the only reason to use multiple agents: you have hit a context boundary that a single agent cannot manage. If you haven't hit that boundary, stay with one agent. It is simpler to debug, cheaper to run, and easier to maintain.
Redis put it well: single agents excel in low-latency sequential workflows. If your tasks run one after another and the output of each step fits comfortably in context alongside everything else, a single agent is the right architecture. Full stop.
Two modes: sub-agents vs. agent teams
When you do need multiple agents, there are two fundamentally different approaches. They solve different problems, and choosing the wrong one creates more friction than it removes.
Sub-agents: isolated workers
A sub-agent is a worker that receives a compressed task, runs in isolation, and returns a compressed result. The parent agent never shares its full context with the sub-agent. The sub-agent never sees what the other sub-agents are doing.
This is the pattern you want for embarrassingly parallel work. Research ten companies simultaneously. Analyze twenty files at once. Run code fixes across separate modules. Each task is independent. No sub-agent needs information from another sub-agent to do its job.
The key benefit is context protection. The parent agent stays lean. It sends out focused prompts, gets back summaries, and never pollutes its own context with the full details of each sub-task. This is how you scale without degrading quality.
Agent teams: persistent collaboration
An agent team is a group of agents that share a task list and communicate with each other. When Agent A discovers something relevant to Agent B's work, it updates the shared state. Agent B adjusts accordingly. Coordination is ongoing.
This is the pattern you want when discoveries reshape parallel tasks. You're building a complex feature and the database agent realizes the schema needs to change, which affects the API agent, which affects the frontend agent. They need to know about each other's decisions in real time.
Agent teams are powerful, but they are significantly harder to build and debug. The coordination overhead is real. Shared state introduces race conditions, stale reads, and conflicting updates. Use this pattern only when the work genuinely requires it.
The five orchestration patterns
Anthropic's research on building effective agents identified five core patterns that cover the vast majority of multi-agent use cases. These are not theoretical. We use variations of all five in production.
Five orchestration patterns
1. Prompt chaining
The simplest pattern. Agent A produces output, which becomes the input for Agent B, which produces output for Agent C. Strictly sequential. Each agent has a focused role and a clean context.
Use this when your workflow is naturally sequential and each step benefits from a fresh context. Example: one agent researches a topic, another structures the findings, a third writes the final output. Each step is self-contained.
2. Routing
A classifier agent examines each incoming request and routes it to the right specialist. Customer support is a common use case: billing questions go to the billing agent, technical questions go to the tech agent, account changes go to the account agent.
Routing works because different tasks often require conflicting system prompts. A billing agent needs detailed knowledge of pricing tiers and refund policies. A tech agent needs access to documentation and error logs. Trying to stuff all of that into one agent degrades performance on everything.
3. Parallelization
Multiple agents work simultaneously on independent tasks. Results are collected and merged. This is the sub-agent pattern in its purest form.
The critical requirement: tasks must be genuinely independent. If Agent A's output could change what Agent B should do, parallelization will produce inconsistent results. Test for independence before choosing this pattern.
4. Orchestrator-worker
A central orchestrator agent receives a complex task, breaks it into sub-tasks, delegates each to a worker agent, collects results, and synthesizes a final output. This is the most common pattern for complex workflows.
OpenAI calls this the “Manager pattern” in their Agents SDK. The manager uses tool calls to spawn workers. Each worker runs to completion and returns a result. The manager decides what to do next based on the aggregate output.
5. Evaluator-optimizer
One agent generates, another evaluates. The output loops back for improvement until the evaluator is satisfied. Google's ADK documentation calls this the “Generator-Critic” pattern.
This is powerful for quality-sensitive outputs: code review, content editing, test validation. The generator agent doesn't need to be perfect on the first pass. It just needs to be good enough for the evaluator to refine.
When to use sub-agents
The decision to use sub-agents comes down to three questions:
- Is the parent's context getting too large? If your single agent is processing so much information that its performance is degrading, sub-agents let you offload work while keeping the parent focused.
- Are the tasks truly independent? If task A's output would never change how you approach task B, they can run in parallel as sub-agents. If they might, you need coordination.
- Do you need conflicting prompts? Sometimes the same workflow requires agents with fundamentally different instructions. A research agent should explore broadly. A summarization agent should be concise and opinionated. Putting both personas in one prompt creates tension.
The key mechanism is context compression. The parent sends a focused prompt to the sub-agent. The sub-agent does its work in isolation. The result gets compressed back into a summary before the parent sees it. The parent's context never bloats with the raw details.
When to use agent teams
Agent teams make sense in a narrow set of situations:
- Discoveries change the plan. You are running a multi-part investigation and what Agent A finds in step two should change what Agent B does in step three. Without real-time coordination, Agent B will do wasted work.
- Shared state is essential. The agents need to read and write to the same data structure. A shared task list, a document they are co-editing, a database schema they are co-designing. This requires coordination primitives that sub-agents don't have.
- Quality depends on iteration between agents. One agent writes code, another reviews it, the first agent fixes the review comments. This back-and-forth loop is a team behavior, not a sub-agent behavior.
OpenAI's Agents SDK offers two patterns here. The “Manager pattern” keeps control centralized. The “Handoff pattern” transfers execution entirely to the next agent, like passing a baton. Handoff is simpler but gives up central oversight. Choose based on how much control you need.
The decision framework
When you are standing at the whiteboard trying to decide how many agents you need, run through this checklist:
Start with one agent. Build the simplest version that works. Run it. Measure where it breaks. If it handles the workload within its context window without quality degradation, you are done.
Split when context is the bottleneck. If the agent's context window is overflowing, or if performance drops because it is juggling too many instructions, split the work into sub-agents. Keep the parent lean. Compress results on the way back.
Coordinate only when forced. If parallel tasks genuinely depend on each other's output, you need agent teams with shared state. But accept the cost: more complexity, harder debugging, higher token usage.
Andrew Ng's framework for agentic patterns maps well here. Start with tool use (single agent, multiple tools). Add reflection (evaluator- optimizer). Layer in planning (orchestrator- worker). Only reach for multi-agent collaboration when the simpler patterns are insufficient.
Microsoft's research reinforces this: they recommend starting single-agent to validate ROI before scaling to multi-agent architectures. Their analysis suggests that 40% or more of agentic projects may be canceled by 2027, often because teams over-architected before proving the basic value.
Three failure modes everyone hits
We have seen these in our own work and in every team we advise. They are predictable and preventable.
1. Vague task descriptions
The most common failure. You tell a sub-agent to “research the competitive landscape” and get back a generic summary that could apply to any industry. The sub-agent had no constraints, no specific questions to answer, no format requirements. It did its best with a vague prompt and produced vague output.
The fix: every sub-agent task should specify exactly what to find, where to look, what format to return, and what counts as done. Treat it like a work ticket for a contractor who has never seen your project before.
2. Verification shortcuts
You build a multi-agent pipeline and skip verification between steps. Agent A produces output, Agent B consumes it without checking, Agent C builds on top of that. If Agent A made a subtle error, it compounds through the entire chain. By the time you see the final output, the error is buried three layers deep.
The fix: add verification at every handoff point. This can be as simple as a schema check on the output or as thorough as a dedicated evaluator agent. The cost of checking is always less than the cost of debugging a corrupted pipeline.
3. Token cost explosion
Multi-agent systems multiply token usage. Every agent call includes a system prompt, tool descriptions, and context. If you have an orchestrator that spawns five sub-agents, each making three tool calls, you are looking at 15+ LLM round trips per request. At production scale, this adds up fast.
The fix: measure token usage from day one. Set budgets per agent and per workflow. Use smaller, faster models for simple routing and classification. Reserve the expensive models for tasks that actually need advanced reasoning. Anthropic's guidance is clear: start with the simplest solution and only increase complexity when the measured benefit justifies the cost.
Cost and token considerations
The economics of multi-agent systems are often underestimated. A single-agent workflow that costs $0.02 per request can easily become $0.15 per request when you add an orchestrator, three sub-agents, and a verification step. At 10,000 requests per day, that is the difference between $200 and $1,500 per day.
Practical strategies we use:
- Model tiering. Use fast, cheap models for routing and classification. Use capable models for generation and reasoning. Not every agent in the pipeline needs the most powerful model.
- Context compression at every boundary. When a sub-agent returns results, summarize before passing to the parent. When an orchestrator delegates, send only what the worker needs.
- Caching. If multiple requests trigger the same sub-agent with similar inputs, cache the results. This is especially effective for research and classification tasks where the underlying data changes slowly.
- Early termination. Build exit conditions into your loops. If the evaluator- optimizer pattern is running and the output is good enough after one iteration, stop. Don't burn tokens chasing marginal improvements.
Start simple, earn complexity
The core thesis is straightforward. Design around context boundaries, not around roles or org charts. The question is never “how many agents should we build?” The question is “where does context need to be isolated, compressed, or shared?”
If you are building your first agentic system, start with a single agent. Give it good tools, clear instructions, and structured context. Run it in production. Watch where it struggles.
When it struggles because the context is too large, split into sub-agents. When it struggles because tasks depend on each other, introduce coordination. When it struggles because quality isn't consistent, add evaluation loops.
Each step adds complexity. Each step should be justified by a specific, measured problem that the simpler architecture could not solve.
The teams that build the best multi-agent systems are the ones that resisted building multi-agent systems until they had no other choice. They understood the single-agent case deeply. They knew exactly where the boundaries needed to be. And when they split, they split cleanly.
Start simple. Earn complexity. The architecture will tell you when it's time.