OpenAI API and Agents SDK

The OpenAI API provides REST endpoints, streaming, function calling, and structured outputs; the new Agents SDK enables multi-agent systems with handoffs, guardrails, and full observability for production AI.

Posted Sep 6, 2025

4 min read

The OpenAI API provides REST endpoints, streaming, function calling, and structured outputs; the new Agents SDK enables multi-agent systems with handoffs, guardrails, and full observability for production AI.

API Fundamentals

The OpenAI API is REST-based with these core capabilities: Chat Completions, Streaming, Function Calling, Structured Outputs, Vision, Audio (Whisper + TTS), Embeddings, and Fine-tuning.

Basic Chat Completion Flow

        
      
import openai

client = openai.OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement to a 5-year-old."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Streaming for Real-Time Responses

        
      
stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling (Tool Use)

Function calling lets the model request that you execute a tool, then incorporate the result back into reasoning.

        
      
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol (e.g., AAPL, TSLA)"
                    }
                },
                "required": ["ticker"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

Structured Outputs: Type-Safe JSON

Structured outputs guarantee that the model’s response matches your JSON schema exactly – no parsing errors, no unexpected fields.

        
      
from pydantic import BaseModel

class EmailAnalysis(BaseModel):
    sentiment: str
    urgency: str
    category: str
    action_required: bool
    summary: str

response = client.beta.chat.completions.parse(
    model="gpt-5.4",
    messages=[{
        "role": "user",
        "content": "Classify this email: 'Hi, my account was charged twice and I need a refund ASAP!'"
    }],
    response_format=EmailAnalysis
)

result = response.choices[0].message.parsed
# Output: Sentiment: negative, Urgency: high

Why structured outputs matter:

No parsing errors. Guaranteed valid JSON matching your schema.
Type safety. No surprise values.
Cost savings. 10-15% token savings at scale.

OpenAI Agents SDK (New, 2025)

The Agents SDK is OpenAI’s Python library for building multi-agent systems with handoffs, guardrails, and full observability.

Core Concepts

Agent: An AI system with an LLM, instructions, tools, and the ability to hand off to other agents
Handoff: Transfer control to another agent with context
Guardrails: Input/output validation to ensure agents stay within bounds
Runner: Executes the agent loop
Tracing: Every LLM call, tool invocation, and handoff is logged

Multi-Agent Architecture Example

        
      
from openai.agents import Agent, Handoff, Runner
from openai.agents.tools import WebSearch, FileSearch, CodeInterpreter

research_agent = Agent(
    name="Research",
    instructions="You are a research analyst. Use WebSearch to find current information.",
    tools=[WebSearch()],
    model="gpt-5.4"
)

writing_agent = Agent(
    name="Writer",
    instructions="You are a technical writer. Synthesize research into clear documents.",
    tools=[FileSearch()],
    model="gpt-5.4"
)

orchestrator_agent = Agent(
    name="Orchestrator",
    instructions="""
    Manage this research-to-document workflow:
    1. Hand off to Research agent to gather information
    2. Hand off to Writer agent to synthesize into a draft
    3. Compile final document and return to user
    """,
    handoffs=[
        Handoff(target=research_agent),
        Handoff(target=writing_agent)
    ],
    model="gpt-5.4"
)

runner = Runner(agent=orchestrator_agent)
result = runner.run(user_input="Write a technical report on vector databases")
print(f"Trace ID: {result.trace_id}")

Guardrails Example

        
      
from openai.agents import Guardrail

input_guard = Guardrail(
    name="topic_boundary",
    check=lambda user_input: (
        "confidential" not in user_input.lower() and
        "secret" not in user_input.lower()
    ),
    error_message="I can't process requests involving confidential information."
)

agent = Agent(
    name="SafeAgent",
    instructions="You are a helpful assistant.",
    guardrails=[input_guard],
    model="gpt-5.4"
)

Pricing and Cost Optimization

Current Pricing (April 2026)

Model	Input	Output
o3 (Low effort)	$2/M	$4/M
o3 (High effort)	$8/M	$16/M
o3-mini	$0.40/M	$0.80/M
GPT-5.4	$3/M	$12/M
GPT-5.2 Instant	$0.10/M	$0.40/M

Cost Optimization Strategies

1. Tiered models by task: Use GPT-5.2 Instant ($0.10/M) for classification, GPT-5.4 ($3/M) only for complex analysis.

2. Structured output to reduce tokens:

# Bad: model generates prose + JSON (~300 tokens)
# Good: structured output forces concise response (~20 tokens, 15x savings!)

3. Caching: 90% cost savings on cached tokens for repeated queries on the same document.

API vs. ChatGPT vs. Azure OpenAI

Use Case	API	ChatGPT	Azure OpenAI
Prototype in web UI	No	Best	Need infra
Production application	Recommended	Not designed for this	Enterprise choice
Compliance (GDPR/HIPAA)	No (data sent to OpenAI)	No	Yes (data residency)
Custom fine-tuning	Yes	No	Yes

Error Handling and Resilience

        
      
import time
from openai import RateLimitError, APIConnectionError

def call_openai_with_retry(prompt, max_retries=3, model="gpt-5.4"):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                timeout=30
            )
            return response.choices[0].message.content
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        except APIConnectionError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        except Exception as e:
            raise
    raise Exception(f"Failed after {max_retries} retries")

Key Properties and SLAs

Property	Specification
Latency (GPT-5.4)	<1s median, <2s p99
Availability	99.9% SLA
Rate Limit	Tier 1: 10K TPM, 200 RPM
Max context	200K tokens (o-series); 128K (GPT-5.4)

References

AI & Agents, AI Tools & Platforms

llm agent-frameworks

This post is licensed under CC BY 4.0 by the author.