The OpenAI API provides REST endpoints, streaming, function calling, and structured outputs; the new Agents SDK enables multi-agent systems with handoffs, guardrails, and full observability for production AI.
API Fundamentals
The OpenAI API is REST-based with these core capabilities: Chat Completions, Streaming, Function Calling, Structured Outputs, Vision, Audio (Whisper + TTS), Embeddings, and Fine-tuning.
Basic Chat Completion Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import openai
client = openai.OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "Explain quantum entanglement to a 5-year-old."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
|
Streaming for Real-Time Responses
1
2
3
4
5
6
7
8
9
| stream = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Count to 10"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
|
Function calling lets the model request that you execute a tool, then incorporate the result back into reasoning.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| import json
tools = [
{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current stock price for a ticker symbol",
"parameters": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol (e.g., AAPL, TSLA)"
}
},
"required": ["ticker"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-5.4",
messages=messages,
tools=tools,
tool_choice="auto"
)
|
Structured Outputs: Type-Safe JSON
Structured outputs guarantee that the model’s response matches your JSON schema exactly – no parsing errors, no unexpected fields.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| from pydantic import BaseModel
class EmailAnalysis(BaseModel):
sentiment: str
urgency: str
category: str
action_required: bool
summary: str
response = client.beta.chat.completions.parse(
model="gpt-5.4",
messages=[{
"role": "user",
"content": "Classify this email: 'Hi, my account was charged twice and I need a refund ASAP!'"
}],
response_format=EmailAnalysis
)
result = response.choices[0].message.parsed
# Output: Sentiment: negative, Urgency: high
|
Why structured outputs matter:
- No parsing errors. Guaranteed valid JSON matching your schema.
- Type safety. No surprise values.
- Cost savings. 10-15% token savings at scale.
OpenAI Agents SDK (New, 2025)
The Agents SDK is OpenAI’s Python library for building multi-agent systems with handoffs, guardrails, and full observability.
Core Concepts
- Agent: An AI system with an LLM, instructions, tools, and the ability to hand off to other agents
- Handoff: Transfer control to another agent with context
- Guardrails: Input/output validation to ensure agents stay within bounds
- Runner: Executes the agent loop
- Tracing: Every LLM call, tool invocation, and handoff is logged
Multi-Agent Architecture Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| from openai.agents import Agent, Handoff, Runner
from openai.agents.tools import WebSearch, FileSearch, CodeInterpreter
research_agent = Agent(
name="Research",
instructions="You are a research analyst. Use WebSearch to find current information.",
tools=[WebSearch()],
model="gpt-5.4"
)
writing_agent = Agent(
name="Writer",
instructions="You are a technical writer. Synthesize research into clear documents.",
tools=[FileSearch()],
model="gpt-5.4"
)
orchestrator_agent = Agent(
name="Orchestrator",
instructions="""
Manage this research-to-document workflow:
1. Hand off to Research agent to gather information
2. Hand off to Writer agent to synthesize into a draft
3. Compile final document and return to user
""",
handoffs=[
Handoff(target=research_agent),
Handoff(target=writing_agent)
],
model="gpt-5.4"
)
runner = Runner(agent=orchestrator_agent)
result = runner.run(user_input="Write a technical report on vector databases")
print(f"Trace ID: {result.trace_id}")
|
Guardrails Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| from openai.agents import Guardrail
input_guard = Guardrail(
name="topic_boundary",
check=lambda user_input: (
"confidential" not in user_input.lower() and
"secret" not in user_input.lower()
),
error_message="I can't process requests involving confidential information."
)
agent = Agent(
name="SafeAgent",
instructions="You are a helpful assistant.",
guardrails=[input_guard],
model="gpt-5.4"
)
|
Pricing and Cost Optimization
Current Pricing (April 2026)
| Model |
Input |
Output |
| o3 (Low effort) |
$2/M |
$4/M |
| o3 (High effort) |
$8/M |
$16/M |
| o3-mini |
$0.40/M |
$0.80/M |
| GPT-5.4 |
$3/M |
$12/M |
| GPT-5.2 Instant |
$0.10/M |
$0.40/M |
Cost Optimization Strategies
1. Tiered models by task: Use GPT-5.2 Instant ($0.10/M) for classification, GPT-5.4 ($3/M) only for complex analysis.
2. Structured output to reduce tokens:
1
2
| # Bad: model generates prose + JSON (~300 tokens)
# Good: structured output forces concise response (~20 tokens, 15x savings!)
|
3. Caching: 90% cost savings on cached tokens for repeated queries on the same document.
API vs. ChatGPT vs. Azure OpenAI
| Use Case |
API |
ChatGPT |
Azure OpenAI |
| Prototype in web UI |
No |
Best |
Need infra |
| Production application |
Recommended |
Not designed for this |
Enterprise choice |
| Compliance (GDPR/HIPAA) |
No (data sent to OpenAI) |
No |
Yes (data residency) |
| Custom fine-tuning |
Yes |
No |
Yes |
Error Handling and Resilience
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| import time
from openai import RateLimitError, APIConnectionError
def call_openai_with_retry(prompt, max_retries=3, model="gpt-5.4"):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
timeout=30
)
return response.choices[0].message.content
except RateLimitError:
wait_time = 2 ** attempt
time.sleep(wait_time)
except APIConnectionError:
wait_time = 2 ** attempt
time.sleep(wait_time)
except Exception as e:
raise
raise Exception(f"Failed after {max_retries} retries")
|
Key Properties and SLAs
| Property |
Specification |
| Latency (GPT-5.4) |
<1s median, <2s p99 |
| Availability |
99.9% SLA |
| Rate Limit |
Tier 1: 10K TPM, 200 RPM |
| Max context |
200K tokens (o-series); 128K (GPT-5.4) |
References