Post

OpenAI API and Agents SDK

The OpenAI API provides REST endpoints, streaming, function calling, and structured outputs; the new Agents SDK enables multi-agent systems with handoffs, guardrails, and full observability for production AI.

OpenAI API and Agents SDK

The OpenAI API provides REST endpoints, streaming, function calling, and structured outputs; the new Agents SDK enables multi-agent systems with handoffs, guardrails, and full observability for production AI.


API Fundamentals

The OpenAI API is REST-based with these core capabilities: Chat Completions, Streaming, Function Calling, Structured Outputs, Vision, Audio (Whisper + TTS), Embeddings, and Fine-tuning.

Basic Chat Completion Flow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import openai

client = openai.OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement to a 5-year-old."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Streaming for Real-Time Responses

1
2
3
4
5
6
7
8
9
stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling (Tool Use)

Function calling lets the model request that you execute a tool, then incorporate the result back into reasoning.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol (e.g., AAPL, TSLA)"
                    }
                },
                "required": ["ticker"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

Structured Outputs: Type-Safe JSON

Structured outputs guarantee that the model’s response matches your JSON schema exactly – no parsing errors, no unexpected fields.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from pydantic import BaseModel

class EmailAnalysis(BaseModel):
    sentiment: str
    urgency: str
    category: str
    action_required: bool
    summary: str

response = client.beta.chat.completions.parse(
    model="gpt-5.4",
    messages=[{
        "role": "user",
        "content": "Classify this email: 'Hi, my account was charged twice and I need a refund ASAP!'"
    }],
    response_format=EmailAnalysis
)

result = response.choices[0].message.parsed
# Output: Sentiment: negative, Urgency: high

Why structured outputs matter:

  1. No parsing errors. Guaranteed valid JSON matching your schema.
  2. Type safety. No surprise values.
  3. Cost savings. 10-15% token savings at scale.

OpenAI Agents SDK (New, 2025)

The Agents SDK is OpenAI’s Python library for building multi-agent systems with handoffs, guardrails, and full observability.

Core Concepts

  • Agent: An AI system with an LLM, instructions, tools, and the ability to hand off to other agents
  • Handoff: Transfer control to another agent with context
  • Guardrails: Input/output validation to ensure agents stay within bounds
  • Runner: Executes the agent loop
  • Tracing: Every LLM call, tool invocation, and handoff is logged

Multi-Agent Architecture Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from openai.agents import Agent, Handoff, Runner
from openai.agents.tools import WebSearch, FileSearch, CodeInterpreter

research_agent = Agent(
    name="Research",
    instructions="You are a research analyst. Use WebSearch to find current information.",
    tools=[WebSearch()],
    model="gpt-5.4"
)

writing_agent = Agent(
    name="Writer",
    instructions="You are a technical writer. Synthesize research into clear documents.",
    tools=[FileSearch()],
    model="gpt-5.4"
)

orchestrator_agent = Agent(
    name="Orchestrator",
    instructions="""
    Manage this research-to-document workflow:
    1. Hand off to Research agent to gather information
    2. Hand off to Writer agent to synthesize into a draft
    3. Compile final document and return to user
    """,
    handoffs=[
        Handoff(target=research_agent),
        Handoff(target=writing_agent)
    ],
    model="gpt-5.4"
)

runner = Runner(agent=orchestrator_agent)
result = runner.run(user_input="Write a technical report on vector databases")
print(f"Trace ID: {result.trace_id}")

Guardrails Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from openai.agents import Guardrail

input_guard = Guardrail(
    name="topic_boundary",
    check=lambda user_input: (
        "confidential" not in user_input.lower() and
        "secret" not in user_input.lower()
    ),
    error_message="I can't process requests involving confidential information."
)

agent = Agent(
    name="SafeAgent",
    instructions="You are a helpful assistant.",
    guardrails=[input_guard],
    model="gpt-5.4"
)

Pricing and Cost Optimization

Current Pricing (April 2026)

Model Input Output
o3 (Low effort) $2/M $4/M
o3 (High effort) $8/M $16/M
o3-mini $0.40/M $0.80/M
GPT-5.4 $3/M $12/M
GPT-5.2 Instant $0.10/M $0.40/M

Cost Optimization Strategies

1. Tiered models by task: Use GPT-5.2 Instant ($0.10/M) for classification, GPT-5.4 ($3/M) only for complex analysis.

2. Structured output to reduce tokens:

1
2
# Bad: model generates prose + JSON (~300 tokens)
# Good: structured output forces concise response (~20 tokens, 15x savings!)

3. Caching: 90% cost savings on cached tokens for repeated queries on the same document.


API vs. ChatGPT vs. Azure OpenAI

Use Case API ChatGPT Azure OpenAI
Prototype in web UI No Best Need infra
Production application Recommended Not designed for this Enterprise choice
Compliance (GDPR/HIPAA) No (data sent to OpenAI) No Yes (data residency)
Custom fine-tuning Yes No Yes

Error Handling and Resilience

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import time
from openai import RateLimitError, APIConnectionError

def call_openai_with_retry(prompt, max_retries=3, model="gpt-5.4"):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                timeout=30
            )
            return response.choices[0].message.content
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        except APIConnectionError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        except Exception as e:
            raise
    raise Exception(f"Failed after {max_retries} retries")

Key Properties and SLAs

Property Specification
Latency (GPT-5.4) <1s median, <2s p99
Availability 99.9% SLA
Rate Limit Tier 1: 10K TPM, 200 RPM
Max context 200K tokens (o-series); 128K (GPT-5.4)

References

This post is licensed under CC BY 4.0 by the author.