Agent Orchestration & Handoffs
How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load -- the orchestration pattern is the most consequential architectural decision in any agentic system.
How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load – the orchestration pattern is the most consequential architectural decision in any agentic system.
The Core Decision: Single vs. Multi-Agent
Before reaching for multi-agent, ask: can a single agent with good tools solve this? Most production systems that shipped in 2024-2025 are single-agent with tool use. Multi-agent adds latency, cost, debugging complexity, and failure modes. Use it when you genuinely need specialized personas, parallel work streams, or domain isolation.
Go single-agent when: the task is linear, tools are well-defined, and context fits in one window. Go multi-agent when: tasks require different system prompts/personas, you need parallel execution, or context separation improves reliability (e.g., a coding agent should not see customer PII from the support agent).
Pattern 1: Single Agent with Tools
The simplest pattern. One LLM, one system prompt, multiple tools. The agent decides which tool to call and when.
1
2
3
User --> [Agent] --> Tool A
--> Tool B
--> Tool C
When to use: Most internal tools, chatbots, copilots. Covers 70%+ of real use cases. Weakness: Falls apart when the agent needs contradictory instructions for different subtasks, or when context window fills up.
Pattern 2: Supervisor (Router)
A central “supervisor” agent receives the user request, decides which specialist agent handles it, and routes accordingly. The supervisor collects results and synthesizes a final response.
1
2
3
4
User --> [Supervisor] --> Agent A (research)
--> Agent B (code)
--> Agent C (data analysis)
<-- [Supervisor] aggregates results
LangGraph implementation: The langgraph library provides a first-class Supervisor node type. The supervisor is itself an LLM that outputs structured routing decisions. Each sub-agent runs as a separate graph node with its own state, tools, and system prompt.
1
2
3
4
5
6
7
8
9
10
11
12
# LangGraph supervisor pattern (conceptual)
from langgraph.prebuilt import create_react_agent, Supervisor
research_agent = create_react_agent(model, tools=[search, scrape])
code_agent = create_react_agent(model, tools=[execute_code, file_read])
supervisor = Supervisor(
agents=[research_agent, code_agent],
model=model,
prompt="Route to the appropriate specialist agent."
)
graph = supervisor.compile()
Strengths: Clean separation of concerns, each agent gets a focused system prompt, easy to add/remove specialists. Weaknesses: Supervisor is a single point of failure. Adds one extra LLM call per turn. Supervisor can misroute.
Pattern 3: Hierarchical Teams
Supervisors managing supervisors. A top-level orchestrator delegates to team leads, each of which manages a group of specialist agents. This is the pattern for complex enterprise workflows.
1
2
3
4
User --> [Orchestrator]
--> [Team Lead: Research] --> Searcher, Summarizer
--> [Team Lead: Engineering] --> Coder, Reviewer, Deployer
--> [Team Lead: QA] --> Tester, Security Scanner
When to use: Large-scope tasks like “build and deploy a feature” or “research, draft, review, and publish a report.” Enterprise AI platforms with distinct functional domains.
Key challenge: Error propagation. A failure three levels deep is hard to surface meaningfully. Build explicit error channels – don’t rely on LLM summarization of failures.
Pattern 4: Handoffs (OpenAI Agents SDK / Anthropic Pattern)
Instead of a supervisor routing from above, agents hand off control to each other laterally. The active agent decides “I’m done, Agent B should take over” and transfers context.
OpenAI Agents SDK made this a first-class primitive:
1
2
3
4
5
6
7
8
9
10
from agents import Agent, handoff
triage_agent = Agent(
name="Triage",
instructions="Determine the user's intent and hand off.",
handoffs=[
handoff(target=billing_agent, description="Billing questions"),
handoff(target=technical_agent, description="Technical support"),
]
)
The handoff transfers the conversation history (or a filtered subset) to the target agent. The target agent becomes the active agent and can hand off further or respond to the user.
Anthropic’s recommended pattern is similar: use tool calls to transfer control. The orchestrator loop checks if the agent called a transfer_to_X tool and swaps the active agent.
1
2
3
4
5
6
7
# Anthropic-style handoff via tool use
tools = [{
"name": "transfer_to_billing",
"description": "Hand off to billing specialist",
"input_schema": {"type": "object", "properties": {"summary": {"type": "string"}}}
}]
# Orchestrator loop detects this tool call and swaps agents
Strengths: Natural conversation flow, no extra supervisor LLM call, agents self-organize. Weaknesses: Agents can enter handoff loops (A hands to B, B hands back to A). Requires handoff guards – max handoff depth, cycle detection, or a “no-return” flag.
Pattern 5: Swarm / Mesh
All agents can communicate with any other agent. No hierarchy. Agents broadcast messages or publish to shared state, and any agent can pick up work.
When to use: Rare in production. Useful for brainstorming/debate patterns (multiple agents argue different positions) or simulation. CrewAI uses a lightweight version of this.
Why it’s risky in production: Non-deterministic execution order, hard to debug, unpredictable cost. Every agent potentially triggers every other agent.
Handoff Protocol Design
Regardless of pattern, handoffs need a protocol. Key decisions:
| Decision | Options | Recommendation |
|---|---|---|
| Context transfer | Full history, summary only, structured handoff object | Structured object for production (control token count) |
| State preservation | Shared memory, passed in handoff, reconstructed | Shared memory store (Redis/DB) for enterprise |
| Error handling | Retry, escalate, fallback agent | Escalate to human after 2 retries |
| Handoff trigger | Agent decides, orchestrator decides, rule-based | Agent decides with orchestrator guardrails |
| Max depth | Unlimited, fixed cap | Fixed cap (3-5 handoffs typical) |
Structured Handoff Object
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"from_agent": "triage",
"to_agent": "billing",
"summary": "Customer wants to dispute charge from March 15",
"context": {
"customer_id": "cust_123",
"relevant_facts": ["charge_amount: 49.99", "dispute_reason: duplicate"]
},
"constraints": {
"max_resolution_time": "5min",
"allowed_actions": ["refund", "escalate_to_human"]
}
}
When to Use Which Pattern
| Scenario | Pattern | Why |
|---|---|---|
| Customer support bot | Handoffs | Natural routing between triage, billing, tech |
| Code generation pipeline | Supervisor | Clear stages: plan, code, review, test |
| Enterprise AI platform (MMS-scale) | Hierarchical | Multiple domains, teams, governance layers |
| Side venture MVP | Single agent | Ship fast, add complexity only when needed |
| Research + synthesis | Supervisor | Parallelize research, centralize synthesis |
| Async overnight tasks | Supervisor + queue | Supervisor dispatches, agents work from queue |
Production Considerations
Observability: Every handoff and agent invocation must emit a trace. Use OpenTelemetry spans per agent turn. Without this, debugging multi-agent systems is impossible.
Determinism: Multi-agent systems are inherently less deterministic. For enterprise, pin model versions, use temperature=0 for routing decisions, and log every intermediate state.
Latency: Each agent hop adds 1-5 seconds. A 4-agent chain is 4-20 seconds. For real-time UX, parallelize where possible and stream intermediate results.
Testing: Test individual agents in isolation first. Then test handoff pairs. Full end-to-end multi-agent tests are slow and flaky – use them sparingly, mostly for regression.
Key Frameworks & Their Patterns
| Framework | Primary Pattern | Handoff Support | Production-Ready |
|---|---|---|---|
| LangGraph | Supervisor, hierarchical | Via graph edges | Yes |
| OpenAI Agents SDK | Handoffs | First-class | Yes |
| Anthropic Claude Agent SDK | Single agent, handoffs via tools | Via tool calls | Yes |
| CrewAI | Role-based (mesh-lite) | Implicit | Maturing |
| AutoGen (Microsoft) | Conversation-based multi-agent | Chat-based | Research-grade |
| Google ADK | Agent-to-agent, hierarchical | Built-in delegation | Early |
References
- Building effective agents – Anthropic (2024), argues for simplicity, single-agent-first
- OpenAI Agents SDK documentation – handoff primitives and Swarm predecessor
- LangGraph documentation – supervisor and hierarchical patterns
- Harrison Chase on agent orchestration – “Multi-agent is not a framework problem, it’s a design problem” (2024)
- Andrew Ng: Agentic Design Patterns – series (2024), reflection, tool use, planning, multi-agent. Also available as a DeepLearning.AI course