OpenAI Agents SDK
OpenAI's production framework for building agentic applications -- evolved from the experimental Swarm project into a lightweight SDK with first-class primitives for agents, handoffs, guardrails, and tracing, tightly integrated with the Responses API.
OpenAI’s production framework for building agentic applications – evolved from the experimental Swarm project into a lightweight SDK with first-class primitives for agents, handoffs, guardrails, and tracing, tightly integrated with the Responses API.
From Swarm to Agents SDK
Swarm (late 2024) was OpenAI’s experimental, educational framework demonstrating multi-agent patterns – lightweight, stateless, and explicitly “not for production.” The Agents SDK (March 2025) is its production successor, keeping Swarm’s elegant primitives while adding the infrastructure needed for real deployments: tracing, guardrails, streaming, and model-agnostic support.
Key Philosophy:
- Few primitives, lots of composability. Three core concepts: Agents, Handoffs, Guardrails
- Convention over configuration. Sensible defaults, escape hatches when you need them
- Tracing built in. Every agent run is automatically traced for debugging and evaluation
- OpenAI-native but not locked in. Works best with OpenAI models but supports any model provider
Status:
- Open-source (MIT license)
- Python SDK stable; Node.js SDK available
- Active development, tightly coupled to OpenAI’s model releases
- Backed by the Responses API (successor to Chat Completions for agentic use)
Core Primitives
Agent
An agent wraps a model with instructions, tools, and behavioral configuration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from agents import Agent, Runner
# Define an agent
research_agent = Agent(
name="Research Assistant",
instructions="""You are a senior research analyst specializing in enterprise
technology trends. When asked to research a topic:
1. Search for recent data and reports
2. Identify key trends and patterns
3. Present findings with specific data points
Always cite your sources.""",
model="gpt-4o",
tools=[web_search, read_document],
)
# Run the agent
result = await Runner.run(
research_agent,
input="What are the top enterprise AI platform trends in European retail?"
)
print(result.final_output)
Tools
Tools are Python functions decorated with type hints. The SDK auto-generates the JSON schema for the model.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from agents import function_tool
@function_tool
def search_products(query: str, category: str = "all", max_results: int = 10) -> str:
"""Search the product catalog for matching items.
Args:
query: Search terms for finding products
category: Product category to filter by
max_results: Maximum number of results to return
"""
# implementation
return json.dumps(results)
@function_tool
def create_order(product_id: str, quantity: int, customer_id: str) -> str:
"""Create a new order for a customer.
Args:
product_id: The product to order
quantity: Number of items
customer_id: The customer placing the order
"""
# implementation
return f"Order created: {order_id}"
agent = Agent(
name="Sales Agent",
instructions="Help customers find and order products.",
tools=[search_products, create_order]
)
Handoffs
Handoffs enable agents to transfer control to other specialized agents. This is the core multi-agent primitive – clean, explicit delegation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from agents import Agent, handoff
# Specialist agents
billing_agent = Agent(
name="Billing Specialist",
instructions="You handle billing inquiries, refunds, and payment issues.",
tools=[lookup_invoice, process_refund]
)
technical_agent = Agent(
name="Technical Support",
instructions="You troubleshoot technical issues with products and services.",
tools=[check_device_status, create_ticket]
)
# Triage agent with handoffs to specialists
triage_agent = Agent(
name="Customer Service Triage",
instructions="""You are the first point of contact for customer service.
Determine the customer's issue and hand off to the appropriate specialist:
- Billing issues -> Billing Specialist
- Technical issues -> Technical Support
Ask clarifying questions if the issue category is unclear.""",
handoffs=[
handoff(billing_agent, description="Hand off billing and payment issues"),
handoff(technical_agent, description="Hand off technical and product issues"),
]
)
result = await Runner.run(triage_agent, input="My payment was charged twice")
# The triage agent routes to billing_agent, which handles the issue
Handoff behavior: When an agent performs a handoff, the conversation history transfers to the target agent. The target agent picks up the conversation seamlessly. Handoffs can be chained (agent A -> B -> C) or circular (B can hand back to A).
Guardrails
Guardrails run validation checks on inputs and outputs, blocking or modifying agent behavior when constraints are violated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from agents import Agent, GuardrailFunctionOutput, input_guardrail, output_guardrail
@input_guardrail
async def check_for_pii(input: str, context) -> GuardrailFunctionOutput:
"""Block requests that contain personal identifiable information."""
# Use a classifier or regex to detect PII
contains_pii = detect_pii(input)
return GuardrailFunctionOutput(
output_info={"pii_detected": contains_pii},
tripwire_triggered=contains_pii # blocks execution if True
)
@output_guardrail
async def check_for_hallucination(output: str, context) -> GuardrailFunctionOutput:
"""Flag outputs that may contain hallucinated information."""
hallucination_score = await check_factuality(output, context.input)
return GuardrailFunctionOutput(
output_info={"hallucination_score": hallucination_score},
tripwire_triggered=hallucination_score > 0.7
)
agent = Agent(
name="Customer Agent",
instructions="Help customers with their accounts.",
input_guardrails=[check_for_pii],
output_guardrails=[check_for_hallucination],
tools=[lookup_account]
)
When a guardrail triggers, the SDK raises a GuardrailTripwireTriggered exception that you handle in your application layer.
The Responses API
The Agents SDK uses OpenAI’s Responses API (not the older Chat Completions API). The Responses API is designed for agentic use cases:
| Feature | Chat Completions | Responses API |
|---|---|---|
| Tool use | Supported | Supported + better handling |
| Web search | Not available | Built-in tool |
| File search | Not available | Built-in tool |
| Computer use | Not available | Supported |
| Structured outputs | JSON mode | Native schema enforcement |
| Reasoning | Not exposed | o3, o4-mini with visible reasoning |
1
2
3
4
5
6
7
# The SDK handles the Responses API calls internally
# You interact through the Agent/Runner abstractions
result = await Runner.run(agent, input="Find me headphones under 100 euros")
# But you can access raw API details if needed
for item in result.raw_responses:
print(item.usage) # token usage per API call
Tracing
Every agent run is automatically traced. Traces capture the full execution: agent calls, tool invocations, handoffs, guardrail checks, and model responses.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from agents import trace, Runner
# Traces are automatic — just run your agent
result = await Runner.run(agent, input="Help me with my order")
# Custom trace spans for your own logic
with trace("custom_processing"):
processed = preprocess(data)
result = await Runner.run(agent, input=processed)
# Traces go to OpenAI's dashboard by default
# Or configure a custom trace processor
from agents import set_trace_processors
class CustomTraceProcessor:
def process(self, trace_data):
# Send to your observability stack (Datadog, etc.)
send_to_datadog(trace_data)
set_trace_processors([CustomTraceProcessor()])
Trace data includes: latency per step, token usage, tool call inputs/outputs, handoff decisions, guardrail results.
Runner: Execution Engine
The Runner manages the agent loop: call model, execute tools, handle handoffs, check guardrails, repeat.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from agents import Runner
# Simple run (blocks until complete)
result = await Runner.run(agent, input="What are today's deals?")
# Streaming run (yields events as they happen)
async for event in Runner.run_streamed(agent, input="Analyze this data"):
if event.type == "agent_updated":
print(f"Now talking to: {event.agent.name}")
elif event.type == "tool_called":
print(f"Tool: {event.tool_name}({event.tool_input})")
elif event.type == "text_delta":
print(event.delta, end="")
# Access conversation context
print(result.final_output) # final text response
print(result.last_agent) # which agent handled the final response
print(result.input_guardrail_results)
print(result.output_guardrail_results)
Multi-Agent Patterns
Pattern 1: Triage and Route
1
User -> Triage Agent -> [Billing | Technical | Sales] -> User
The triage agent uses handoffs (shown above). This is the most common pattern.
Pattern 2: Pipeline
1
User -> Research Agent -> Analysis Agent -> Report Agent -> User
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
report_agent = Agent(
name="Report Writer",
instructions="Write executive reports from analyzed data.",
)
analysis_agent = Agent(
name="Data Analyst",
instructions="Analyze research data and identify key insights.",
handoffs=[handoff(report_agent)]
)
research_agent = Agent(
name="Researcher",
instructions="Research the given topic thoroughly.",
tools=[web_search],
handoffs=[handoff(analysis_agent)]
)
# Start with researcher — flows through analysis to report
result = await Runner.run(research_agent, input="Enterprise AI adoption in retail")
Pattern 3: Agent as Tool
Instead of handoffs, use an agent as a tool for another agent. The calling agent stays in control.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from agents import Agent, Runner
translator = Agent(
name="Translator",
instructions="Translate the given text to German.",
)
@function_tool
async def translate_to_german(text: str) -> str:
"""Translate text to German."""
result = await Runner.run(translator, input=text)
return result.final_output
main_agent = Agent(
name="International Support",
instructions="Help customers. Translate responses to German when needed.",
tools=[translate_to_german, search_products]
)
Key Properties
| Property | OpenAI Agents SDK |
|---|---|
| Primary model | Agent handoffs and guardrails |
| Core primitives | Agent, Handoff, Guardrail, Tool |
| Orchestration | Handoff-based routing |
| Tracing | Built-in, automatic |
| Guardrails | First-class (input + output) |
| Streaming | Full event stream |
| Model support | OpenAI-native; any model via adapters |
| Human-in-the-loop | Via application-layer control |
| State/persistence | Conversation context; no built-in checkpointing |
| Languages | Python, Node.js |
| License | MIT |
Agents SDK vs Alternatives
| Dimension | Agents SDK | LangGraph | CrewAI |
|---|---|---|---|
| Abstraction | Medium (agents + handoffs) | Low (graph primitives) | High (roles + backstories) |
| Multi-agent model | Handoff chains | Graph routing | Team delegation |
| Guardrails | First-class | External | External |
| Tracing | Built-in | Via LangSmith | Via callbacks |
| Persistence | Conversation only | Full checkpointing | Memory DB |
| Code execution | Via tools | Via tools | Via tools |
| Vendor coupling | OpenAI-optimized | Model-agnostic | Model-agnostic |
| Setup time | Fast | Slow | Fast |
| Best for | OpenAI-centric production apps | Complex stateful workflows | Quick role-based prototypes |
When to Use
Choose the Agents SDK when:
- You are primarily using OpenAI models (GPT-4o, o3, o4-mini)
- Your multi-agent pattern is handoff-based (triage, escalation, pipeline)
- You want guardrails as a first-class concept, not bolted on
- Built-in tracing matters and you want it without additional infrastructure
- You prefer a minimal SDK over a heavyweight framework
Avoid when:
- You need model-agnostic support as a hard requirement (LangGraph, CrewAI are better)
- Your workflow requires complex cycles, parallel branches, or graph-based orchestration
- You need durable checkpointing and crash recovery (no built-in persistence layer)
- You are building on Anthropic/Google models primarily (use their native SDKs)
- You need code execution sandboxing as a core feature (use AutoGen)
Practical Tips for Enterprise Use
- Guardrails are your compliance layer. Use input guardrails for PII detection and output guardrails for content policy. They run on every invocation automatically.
- Trace everything, export to your stack. The built-in tracing is good for dev; for production, write a custom trace processor to forward to Datadog/Grafana.
- Handoff descriptions matter. The model uses the handoff description to decide when to route. Be specific: “Hand off billing issues including refunds, invoices, and payment failures” beats “Hand off to billing.”
- Use agent-as-tool for control. Handoffs transfer full control. If the calling agent needs to stay in charge and just get a result, wrap the sub-agent as a tool instead.
- Set max_turns on the Runner. Prevent runaway agent loops:
Runner.run(agent, input=..., max_turns=10).
References
- Agents SDK docs: https://openai.github.io/openai-agents-python/
- GitHub: https://github.com/openai/openai-agents-python
- Responses API: https://platform.openai.com/docs/api-reference/responses
- Swarm (predecessor): https://github.com/openai/swarm
- Blog post: https://openai.com/index/new-tools-for-building-agents/