OpenAI Agents SDK

OpenAI's production framework for building agentic applications -- evolved from the experimental Swarm project into a lightweight SDK with first-class primitives for agents, handoffs, guardrails, and tracing, tightly integrated with the Responses API.

Posted Mar 15, 2026 Updated Apr 25, 2026

9 min read

OpenAI Agents SDK

OpenAI’s production framework for building agentic applications – evolved from the experimental Swarm project into a lightweight SDK with first-class primitives for agents, handoffs, guardrails, and tracing, tightly integrated with the Responses API.

From Swarm to Agents SDK

Swarm (late 2024) was OpenAI’s experimental, educational framework demonstrating multi-agent patterns – lightweight, stateless, and explicitly “not for production.” The Agents SDK (March 2025) is its production successor, keeping Swarm’s elegant primitives while adding the infrastructure needed for real deployments: tracing, guardrails, streaming, and model-agnostic support.

Key Philosophy:

Few primitives, lots of composability. Three core concepts: Agents, Handoffs, Guardrails
Convention over configuration. Sensible defaults, escape hatches when you need them
Tracing built in. Every agent run is automatically traced for debugging and evaluation
OpenAI-native but not locked in. Works best with OpenAI models but supports any model provider

Status:

Open-source (MIT license)
Python SDK stable; Node.js SDK available
Active development, tightly coupled to OpenAI’s model releases
Backed by the Responses API (successor to Chat Completions for agentic use)

Core Primitives

Agent

An agent wraps a model with instructions, tools, and behavioral configuration.

        
      
from agents import Agent, Runner

# Define an agent
research_agent = Agent(
    name="Research Assistant",
    instructions="""You are a senior research analyst specializing in enterprise 
    technology trends. When asked to research a topic:
    1. Search for recent data and reports
    2. Identify key trends and patterns
    3. Present findings with specific data points
    Always cite your sources.""",
    model="gpt-4o",
    tools=[web_search, read_document],
)

# Run the agent
result = await Runner.run(
    research_agent,
    input="What are the top enterprise AI platform trends in European retail?"
)
print(result.final_output)

Tools

Tools are Python functions decorated with type hints. The SDK auto-generates the JSON schema for the model.

        
      
from agents import function_tool

@function_tool
def search_products(query: str, category: str = "all", max_results: int = 10) -> str:
    """Search the product catalog for matching items.
    
    Args:
        query: Search terms for finding products
        category: Product category to filter by
        max_results: Maximum number of results to return
    """
    # implementation
    return json.dumps(results)

@function_tool
def create_order(product_id: str, quantity: int, customer_id: str) -> str:
    """Create a new order for a customer.
    
    Args:
        product_id: The product to order
        quantity: Number of items
        customer_id: The customer placing the order
    """
    # implementation
    return f"Order created: {order_id}"

agent = Agent(
    name="Sales Agent",
    instructions="Help customers find and order products.",
    tools=[search_products, create_order]
)

Handoffs

Handoffs enable agents to transfer control to other specialized agents. This is the core multi-agent primitive – clean, explicit delegation.

        
      
from agents import Agent, handoff

# Specialist agents
billing_agent = Agent(
    name="Billing Specialist",
    instructions="You handle billing inquiries, refunds, and payment issues.",
    tools=[lookup_invoice, process_refund]
)

technical_agent = Agent(
    name="Technical Support",
    instructions="You troubleshoot technical issues with products and services.",
    tools=[check_device_status, create_ticket]
)

# Triage agent with handoffs to specialists
triage_agent = Agent(
    name="Customer Service Triage",
    instructions="""You are the first point of contact for customer service.
    Determine the customer's issue and hand off to the appropriate specialist:
    - Billing issues -> Billing Specialist
    - Technical issues -> Technical Support
    Ask clarifying questions if the issue category is unclear.""",
    handoffs=[
        handoff(billing_agent, description="Hand off billing and payment issues"),
        handoff(technical_agent, description="Hand off technical and product issues"),
    ]
)

result = await Runner.run(triage_agent, input="My payment was charged twice")
# The triage agent routes to billing_agent, which handles the issue

Handoff behavior: When an agent performs a handoff, the conversation history transfers to the target agent. The target agent picks up the conversation seamlessly. Handoffs can be chained (agent A -> B -> C) or circular (B can hand back to A).

Guardrails

Guardrails run validation checks on inputs and outputs, blocking or modifying agent behavior when constraints are violated.

        
      
from agents import Agent, GuardrailFunctionOutput, input_guardrail, output_guardrail

@input_guardrail
async def check_for_pii(input: str, context) -> GuardrailFunctionOutput:
    """Block requests that contain personal identifiable information."""
    # Use a classifier or regex to detect PII
    contains_pii = detect_pii(input)
    return GuardrailFunctionOutput(
        output_info={"pii_detected": contains_pii},
        tripwire_triggered=contains_pii  # blocks execution if True
    )

@output_guardrail
async def check_for_hallucination(output: str, context) -> GuardrailFunctionOutput:
    """Flag outputs that may contain hallucinated information."""
    hallucination_score = await check_factuality(output, context.input)
    return GuardrailFunctionOutput(
        output_info={"hallucination_score": hallucination_score},
        tripwire_triggered=hallucination_score > 0.7
    )

agent = Agent(
    name="Customer Agent",
    instructions="Help customers with their accounts.",
    input_guardrails=[check_for_pii],
    output_guardrails=[check_for_hallucination],
    tools=[lookup_account]
)

When a guardrail triggers, the SDK raises a GuardrailTripwireTriggered exception that you handle in your application layer.

The Responses API

The Agents SDK uses OpenAI’s Responses API (not the older Chat Completions API). The Responses API is designed for agentic use cases:

Feature	Chat Completions	Responses API
Tool use	Supported	Supported + better handling
Web search	Not available	Built-in tool
File search	Not available	Built-in tool
Computer use	Not available	Supported
Structured outputs	JSON mode	Native schema enforcement
Reasoning	Not exposed	`o3`, `o4-mini` with visible reasoning

        
      
# The SDK handles the Responses API calls internally
# You interact through the Agent/Runner abstractions
result = await Runner.run(agent, input="Find me headphones under 100 euros")

# But you can access raw API details if needed
for item in result.raw_responses:
    print(item.usage)  # token usage per API call

Tracing

Every agent run is automatically traced. Traces capture the full execution: agent calls, tool invocations, handoffs, guardrail checks, and model responses.

        
      
from agents import trace, Runner

# Traces are automatic — just run your agent
result = await Runner.run(agent, input="Help me with my order")

# Custom trace spans for your own logic
with trace("custom_processing"):
    processed = preprocess(data)
    result = await Runner.run(agent, input=processed)

# Traces go to OpenAI's dashboard by default
# Or configure a custom trace processor
from agents import set_trace_processors

class CustomTraceProcessor:
    def process(self, trace_data):
        # Send to your observability stack (Datadog, etc.)
        send_to_datadog(trace_data)

set_trace_processors([CustomTraceProcessor()])

Trace data includes: latency per step, token usage, tool call inputs/outputs, handoff decisions, guardrail results.

Runner: Execution Engine

The Runner manages the agent loop: call model, execute tools, handle handoffs, check guardrails, repeat.

        
      
from agents import Runner

# Simple run (blocks until complete)
result = await Runner.run(agent, input="What are today's deals?")

# Streaming run (yields events as they happen)
async for event in Runner.run_streamed(agent, input="Analyze this data"):
    if event.type == "agent_updated":
        print(f"Now talking to: {event.agent.name}")
    elif event.type == "tool_called":
        print(f"Tool: {event.tool_name}({event.tool_input})")
    elif event.type == "text_delta":
        print(event.delta, end="")

# Access conversation context
print(result.final_output)      # final text response
print(result.last_agent)        # which agent handled the final response
print(result.input_guardrail_results)
print(result.output_guardrail_results)

Multi-Agent Patterns

Pattern 1: Triage and Route

User -> Triage Agent -> [Billing | Technical | Sales] -> User

The triage agent uses handoffs (shown above). This is the most common pattern.

Pattern 2: Pipeline

User -> Research Agent -> Analysis Agent -> Report Agent -> User

        
      
report_agent = Agent(
    name="Report Writer",
    instructions="Write executive reports from analyzed data.",
)

analysis_agent = Agent(
    name="Data Analyst", 
    instructions="Analyze research data and identify key insights.",
    handoffs=[handoff(report_agent)]
)

research_agent = Agent(
    name="Researcher",
    instructions="Research the given topic thoroughly.",
    tools=[web_search],
    handoffs=[handoff(analysis_agent)]
)

# Start with researcher — flows through analysis to report
result = await Runner.run(research_agent, input="Enterprise AI adoption in retail")

Pattern 3: Agent as Tool

Instead of handoffs, use an agent as a tool for another agent. The calling agent stays in control.

        
      
from agents import Agent, Runner

translator = Agent(
    name="Translator",
    instructions="Translate the given text to German.",
)

@function_tool
async def translate_to_german(text: str) -> str:
    """Translate text to German."""
    result = await Runner.run(translator, input=text)
    return result.final_output

main_agent = Agent(
    name="International Support",
    instructions="Help customers. Translate responses to German when needed.",
    tools=[translate_to_german, search_products]
)

Key Properties

Property	OpenAI Agents SDK
Primary model	Agent handoffs and guardrails
Core primitives	Agent, Handoff, Guardrail, Tool
Orchestration	Handoff-based routing
Tracing	Built-in, automatic
Guardrails	First-class (input + output)
Streaming	Full event stream
Model support	OpenAI-native; any model via adapters
Human-in-the-loop	Via application-layer control
State/persistence	Conversation context; no built-in checkpointing
Languages	Python, Node.js
License	MIT

Agents SDK vs Alternatives

Dimension	Agents SDK	LangGraph	CrewAI
Abstraction	Medium (agents + handoffs)	Low (graph primitives)	High (roles + backstories)
Multi-agent model	Handoff chains	Graph routing	Team delegation
Guardrails	First-class	External	External
Tracing	Built-in	Via LangSmith	Via callbacks
Persistence	Conversation only	Full checkpointing	Memory DB
Code execution	Via tools	Via tools	Via tools
Vendor coupling	OpenAI-optimized	Model-agnostic	Model-agnostic
Setup time	Fast	Slow	Fast
Best for	OpenAI-centric production apps	Complex stateful workflows	Quick role-based prototypes

When to Use

Choose the Agents SDK when:

You are primarily using OpenAI models (GPT-4o, o3, o4-mini)
Your multi-agent pattern is handoff-based (triage, escalation, pipeline)
You want guardrails as a first-class concept, not bolted on
Built-in tracing matters and you want it without additional infrastructure
You prefer a minimal SDK over a heavyweight framework

Avoid when:

You need model-agnostic support as a hard requirement (LangGraph, CrewAI are better)
Your workflow requires complex cycles, parallel branches, or graph-based orchestration
You need durable checkpointing and crash recovery (no built-in persistence layer)
You are building on Anthropic/Google models primarily (use their native SDKs)
You need code execution sandboxing as a core feature (use AutoGen)

Practical Tips for Enterprise Use

Guardrails are your compliance layer. Use input guardrails for PII detection and output guardrails for content policy. They run on every invocation automatically.
Trace everything, export to your stack. The built-in tracing is good for dev; for production, write a custom trace processor to forward to Datadog/Grafana.
Handoff descriptions matter. The model uses the handoff description to decide when to route. Be specific: “Hand off billing issues including refunds, invoices, and payment failures” beats “Hand off to billing.”
Use agent-as-tool for control. Handoffs transfer full control. If the calling agent needs to stay in charge and just get a result, wrap the sub-agent as a tool instead.
Set max_turns on the Runner. Prevent runaway agent loops: Runner.run(agent, input=..., max_turns=10).

References

Agents SDK docs: https://openai.github.io/openai-agents-python/
GitHub: https://github.com/openai/openai-agents-python
Responses API: https://platform.openai.com/docs/api-reference/responses
Swarm (predecessor): https://github.com/openai/swarm
Blog post: https://openai.com/index/new-tools-for-building-agents/

AI & Agents, Agentic AI

agent-frameworks

This post is licensed under CC BY 4.0 by the author.