Post

Agent Orchestration & Handoffs

How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load -- the orchestration pattern is the most consequential architectural decision in any agentic system.

Agent Orchestration & Handoffs

How you wire agents together determines whether your system is a reliable product or a demo that falls apart under real load – the orchestration pattern is the most consequential architectural decision in any agentic system.


The Core Decision: Single vs. Multi-Agent

Before reaching for multi-agent, ask: can a single agent with good tools solve this? Most production systems that shipped in 2024-2025 are single-agent with tool use. Multi-agent adds latency, cost, debugging complexity, and failure modes. Use it when you genuinely need specialized personas, parallel work streams, or domain isolation.

Go single-agent when: the task is linear, tools are well-defined, and context fits in one window. Go multi-agent when: tasks require different system prompts/personas, you need parallel execution, or context separation improves reliability (e.g., a coding agent should not see customer PII from the support agent).


Pattern 1: Single Agent with Tools

The simplest pattern. One LLM, one system prompt, multiple tools. The agent decides which tool to call and when.

1
2
3
User --> [Agent] --> Tool A
                 --> Tool B
                 --> Tool C

When to use: Most internal tools, chatbots, copilots. Covers 70%+ of real use cases. Weakness: Falls apart when the agent needs contradictory instructions for different subtasks, or when context window fills up.


Pattern 2: Supervisor (Router)

A central “supervisor” agent receives the user request, decides which specialist agent handles it, and routes accordingly. The supervisor collects results and synthesizes a final response.

1
2
3
4
User --> [Supervisor] --> Agent A (research)
                      --> Agent B (code)
                      --> Agent C (data analysis)
         <-- [Supervisor] aggregates results

LangGraph implementation: The langgraph library provides a first-class Supervisor node type. The supervisor is itself an LLM that outputs structured routing decisions. Each sub-agent runs as a separate graph node with its own state, tools, and system prompt.

1
2
3
4
5
6
7
8
9
10
11
12
# LangGraph supervisor pattern (conceptual)
from langgraph.prebuilt import create_react_agent, Supervisor

research_agent = create_react_agent(model, tools=[search, scrape])
code_agent = create_react_agent(model, tools=[execute_code, file_read])

supervisor = Supervisor(
    agents=[research_agent, code_agent],
    model=model,
    prompt="Route to the appropriate specialist agent."
)
graph = supervisor.compile()

Strengths: Clean separation of concerns, each agent gets a focused system prompt, easy to add/remove specialists. Weaknesses: Supervisor is a single point of failure. Adds one extra LLM call per turn. Supervisor can misroute.


Pattern 3: Hierarchical Teams

Supervisors managing supervisors. A top-level orchestrator delegates to team leads, each of which manages a group of specialist agents. This is the pattern for complex enterprise workflows.

1
2
3
4
User --> [Orchestrator]
            --> [Team Lead: Research] --> Searcher, Summarizer
            --> [Team Lead: Engineering] --> Coder, Reviewer, Deployer
            --> [Team Lead: QA] --> Tester, Security Scanner

When to use: Large-scope tasks like “build and deploy a feature” or “research, draft, review, and publish a report.” Enterprise AI platforms with distinct functional domains.

Key challenge: Error propagation. A failure three levels deep is hard to surface meaningfully. Build explicit error channels – don’t rely on LLM summarization of failures.


Pattern 4: Handoffs (OpenAI Agents SDK / Anthropic Pattern)

Instead of a supervisor routing from above, agents hand off control to each other laterally. The active agent decides “I’m done, Agent B should take over” and transfers context.

OpenAI Agents SDK made this a first-class primitive:

1
2
3
4
5
6
7
8
9
10
from agents import Agent, handoff

triage_agent = Agent(
    name="Triage",
    instructions="Determine the user's intent and hand off.",
    handoffs=[
        handoff(target=billing_agent, description="Billing questions"),
        handoff(target=technical_agent, description="Technical support"),
    ]
)

The handoff transfers the conversation history (or a filtered subset) to the target agent. The target agent becomes the active agent and can hand off further or respond to the user.

Anthropic’s recommended pattern is similar: use tool calls to transfer control. The orchestrator loop checks if the agent called a transfer_to_X tool and swaps the active agent.

1
2
3
4
5
6
7
# Anthropic-style handoff via tool use
tools = [{
    "name": "transfer_to_billing",
    "description": "Hand off to billing specialist",
    "input_schema": {"type": "object", "properties": {"summary": {"type": "string"}}}
}]
# Orchestrator loop detects this tool call and swaps agents

Strengths: Natural conversation flow, no extra supervisor LLM call, agents self-organize. Weaknesses: Agents can enter handoff loops (A hands to B, B hands back to A). Requires handoff guards – max handoff depth, cycle detection, or a “no-return” flag.


Pattern 5: Swarm / Mesh

All agents can communicate with any other agent. No hierarchy. Agents broadcast messages or publish to shared state, and any agent can pick up work.

When to use: Rare in production. Useful for brainstorming/debate patterns (multiple agents argue different positions) or simulation. CrewAI uses a lightweight version of this.

Why it’s risky in production: Non-deterministic execution order, hard to debug, unpredictable cost. Every agent potentially triggers every other agent.


Handoff Protocol Design

Regardless of pattern, handoffs need a protocol. Key decisions:

Decision Options Recommendation
Context transfer Full history, summary only, structured handoff object Structured object for production (control token count)
State preservation Shared memory, passed in handoff, reconstructed Shared memory store (Redis/DB) for enterprise
Error handling Retry, escalate, fallback agent Escalate to human after 2 retries
Handoff trigger Agent decides, orchestrator decides, rule-based Agent decides with orchestrator guardrails
Max depth Unlimited, fixed cap Fixed cap (3-5 handoffs typical)

Structured Handoff Object

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "from_agent": "triage",
  "to_agent": "billing",
  "summary": "Customer wants to dispute charge from March 15",
  "context": {
    "customer_id": "cust_123",
    "relevant_facts": ["charge_amount: 49.99", "dispute_reason: duplicate"]
  },
  "constraints": {
    "max_resolution_time": "5min",
    "allowed_actions": ["refund", "escalate_to_human"]
  }
}

When to Use Which Pattern

Scenario Pattern Why
Customer support bot Handoffs Natural routing between triage, billing, tech
Code generation pipeline Supervisor Clear stages: plan, code, review, test
Enterprise AI platform (MMS-scale) Hierarchical Multiple domains, teams, governance layers
Side venture MVP Single agent Ship fast, add complexity only when needed
Research + synthesis Supervisor Parallelize research, centralize synthesis
Async overnight tasks Supervisor + queue Supervisor dispatches, agents work from queue

Production Considerations

Observability: Every handoff and agent invocation must emit a trace. Use OpenTelemetry spans per agent turn. Without this, debugging multi-agent systems is impossible.

Determinism: Multi-agent systems are inherently less deterministic. For enterprise, pin model versions, use temperature=0 for routing decisions, and log every intermediate state.

Latency: Each agent hop adds 1-5 seconds. A 4-agent chain is 4-20 seconds. For real-time UX, parallelize where possible and stream intermediate results.

Testing: Test individual agents in isolation first. Then test handoff pairs. Full end-to-end multi-agent tests are slow and flaky – use them sparingly, mostly for regression.


Key Frameworks & Their Patterns

Framework Primary Pattern Handoff Support Production-Ready
LangGraph Supervisor, hierarchical Via graph edges Yes
OpenAI Agents SDK Handoffs First-class Yes
Anthropic Claude Agent SDK Single agent, handoffs via tools Via tool calls Yes
CrewAI Role-based (mesh-lite) Implicit Maturing
AutoGen (Microsoft) Conversation-based multi-agent Chat-based Research-grade
Google ADK Agent-to-agent, hierarchical Built-in delegation Early

References

This post is licensed under CC BY 4.0 by the author.