Agent Orchestration & Handoffs

How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load -- the orchestration pattern is the most consequential architectural decision in any agentic system.

Posted Nov 20, 2025

6 min read

How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load – the orchestration pattern is the most consequential architectural decision in any agentic system.

The Core Decision: Single vs. Multi-Agent

Before reaching for multi-agent, ask: can a single agent with good tools solve this? Most production systems that shipped in 2024-2025 are single-agent with tool use. Multi-agent adds latency, cost, debugging complexity, and failure modes. Use it when you genuinely need specialized personas, parallel work streams, or domain isolation.

Go single-agent when: the task is linear, tools are well-defined, and context fits in one window. Go multi-agent when: tasks require different system prompts/personas, you need parallel execution, or context separation improves reliability (e.g., a coding agent should not see customer PII from the support agent).

Pattern 1: Single Agent with Tools

The simplest pattern. One LLM, one system prompt, multiple tools. The agent decides which tool to call and when.

User --> [Agent] --> Tool A
                 --> Tool B
                 --> Tool C

When to use: Most internal tools, chatbots, copilots. Covers 70%+ of real use cases. Weakness: Falls apart when the agent needs contradictory instructions for different subtasks, or when context window fills up.

Pattern 2: Supervisor (Router)

A central “supervisor” agent receives the user request, decides which specialist agent handles it, and routes accordingly. The supervisor collects results and synthesizes a final response.

User --> [Supervisor] --> Agent A (research)
                      --> Agent B (code)
                      --> Agent C (data analysis)
         <-- [Supervisor] aggregates results

LangGraph implementation: The langgraph library provides a first-class Supervisor node type. The supervisor is itself an LLM that outputs structured routing decisions. Each sub-agent runs as a separate graph node with its own state, tools, and system prompt.

        
      
# LangGraph supervisor pattern (conceptual)
from langgraph.prebuilt import create_react_agent, Supervisor

research_agent = create_react_agent(model, tools=[search, scrape])
code_agent = create_react_agent(model, tools=[execute_code, file_read])

supervisor = Supervisor(
    agents=[research_agent, code_agent],
    model=model,
    prompt="Route to the appropriate specialist agent."
)
graph = supervisor.compile()

Strengths: Clean separation of concerns, each agent gets a focused system prompt, easy to add/remove specialists. Weaknesses: Supervisor is a single point of failure. Adds one extra LLM call per turn. Supervisor can misroute.

Pattern 3: Hierarchical Teams

Supervisors managing supervisors. A top-level orchestrator delegates to team leads, each of which manages a group of specialist agents. This is the pattern for complex enterprise workflows.

User --> [Orchestrator]
            --> [Team Lead: Research] --> Searcher, Summarizer
            --> [Team Lead: Engineering] --> Coder, Reviewer, Deployer
            --> [Team Lead: QA] --> Tester, Security Scanner

When to use: Large-scope tasks like “build and deploy a feature” or “research, draft, review, and publish a report.” Enterprise AI platforms with distinct functional domains.

Key challenge: Error propagation. A failure three levels deep is hard to surface meaningfully. Build explicit error channels – don’t rely on LLM summarization of failures.

Pattern 4: Handoffs (OpenAI Agents SDK / Anthropic Pattern)

Instead of a supervisor routing from above, agents hand off control to each other laterally. The active agent decides “I’m done, Agent B should take over” and transfers context.

OpenAI Agents SDK made this a first-class primitive:

        
      
from agents import Agent, handoff

triage_agent = Agent(
    name="Triage",
    instructions="Determine the user's intent and hand off.",
    handoffs=[
        handoff(target=billing_agent, description="Billing questions"),
        handoff(target=technical_agent, description="Technical support"),
    ]
)

The handoff transfers the conversation history (or a filtered subset) to the target agent. The target agent becomes the active agent and can hand off further or respond to the user.

Anthropic’s recommended pattern is similar: use tool calls to transfer control. The orchestrator loop checks if the agent called a transfer_to_X tool and swaps the active agent.

        
      
# Anthropic-style handoff via tool use
tools = [{
    "name": "transfer_to_billing",
    "description": "Hand off to billing specialist",
    "input_schema": {"type": "object", "properties": {"summary": {"type": "string"}}}
}]
# Orchestrator loop detects this tool call and swaps agents

Strengths: Natural conversation flow, no extra supervisor LLM call, agents self-organize. Weaknesses: Agents can enter handoff loops (A hands to B, B hands back to A). Requires handoff guards – max handoff depth, cycle detection, or a “no-return” flag.

Pattern 5: Swarm / Mesh

All agents can communicate with any other agent. No hierarchy. Agents broadcast messages or publish to shared state, and any agent can pick up work.

When to use: Rare in production. Useful for brainstorming/debate patterns (multiple agents argue different positions) or simulation. CrewAI uses a lightweight version of this.

Why it’s risky in production: Non-deterministic execution order, hard to debug, unpredictable cost. Every agent potentially triggers every other agent.

Handoff Protocol Design

Regardless of pattern, handoffs need a protocol. Key decisions:

Decision	Options	Recommendation
Context transfer	Full history, summary only, structured handoff object	Structured object for production (control token count)
State preservation	Shared memory, passed in handoff, reconstructed	Shared memory store (Redis/DB) for enterprise
Error handling	Retry, escalate, fallback agent	Escalate to human after 2 retries
Handoff trigger	Agent decides, orchestrator decides, rule-based	Agent decides with orchestrator guardrails
Max depth	Unlimited, fixed cap	Fixed cap (3-5 handoffs typical)

Structured Handoff Object

        
      
{
  "from_agent": "triage",
  "to_agent": "billing",
  "summary": "Customer wants to dispute charge from March 15",
  "context": {
    "customer_id": "cust_123",
    "relevant_facts": ["charge_amount: 49.99", "dispute_reason: duplicate"]
  },
  "constraints": {
    "max_resolution_time": "5min",
    "allowed_actions": ["refund", "escalate_to_human"]
  }
}

When to Use Which Pattern

Scenario	Pattern	Why
Customer support bot	Handoffs	Natural routing between triage, billing, tech
Code generation pipeline	Supervisor	Clear stages: plan, code, review, test
Enterprise AI platform (MMS-scale)	Hierarchical	Multiple domains, teams, governance layers
Side venture MVP	Single agent	Ship fast, add complexity only when needed
Research + synthesis	Supervisor	Parallelize research, centralize synthesis
Async overnight tasks	Supervisor + queue	Supervisor dispatches, agents work from queue

Production Considerations

Observability: Every handoff and agent invocation must emit a trace. Use OpenTelemetry spans per agent turn. Without this, debugging multi-agent systems is impossible.

Determinism: Multi-agent systems are inherently less deterministic. For enterprise, pin model versions, use temperature=0 for routing decisions, and log every intermediate state.

Latency: Each agent hop adds 1-5 seconds. A 4-agent chain is 4-20 seconds. For real-time UX, parallelize where possible and stream intermediate results.

Testing: Test individual agents in isolation first. Then test handoff pairs. Full end-to-end multi-agent tests are slow and flaky – use them sparingly, mostly for regression.

Key Frameworks & Their Patterns

Framework	Primary Pattern	Handoff Support	Production-Ready
LangGraph	Supervisor, hierarchical	Via graph edges	Yes
OpenAI Agents SDK	Handoffs	First-class	Yes
Anthropic Claude Agent SDK	Single agent, handoffs via tools	Via tool calls	Yes
CrewAI	Role-based (mesh-lite)	Implicit	Maturing
AutoGen (Microsoft)	Conversation-based multi-agent	Chat-based	Research-grade
Google ADK	Agent-to-agent, hierarchical	Built-in delegation	Early

References

Building effective agents – Anthropic (2024), argues for simplicity, single-agent-first
OpenAI Agents SDK documentation – handoff primitives and Swarm predecessor
LangGraph documentation – supervisor and hierarchical patterns
Harrison Chase on agent orchestration – “Multi-agent is not a framework problem, it’s a design problem” (2024)
Andrew Ng: Agentic Design Patterns – series (2024), reflection, tool use, planning, multi-agent. Also available as a DeepLearning.AI course

AI & Agents, Agentic AI

agent-patterns

This post is licensed under CC BY 4.0 by the author.