Cloudflare AI Gateway

Cloudflare AI Gateway is the easiest on-ramp if you're already on Cloudflare -- add a URL prefix and you get caching, analytics, and rate limiting for free. The tradeoff: SaaS-only, no self-hosting, and limited advanced features.

Posted Feb 5, 2026

4 min read

Cloudflare AI Gateway is the easiest on-ramp if you’re already on Cloudflare – add a URL prefix and you get caching, analytics, and rate limiting for free. The tradeoff: SaaS-only, no self-hosting, and limited advanced features compared to purpose-built gateways.

What It Is

Cloudflare AI Gateway is a managed proxy layer that sits between your application and LLM providers, running on Cloudflare’s global edge network. It provides analytics, caching, rate limiting, retries, and fallbacks for AI API calls. It is part of Cloudflare’s Developer Platform (Workers ecosystem).

Unlike Kong, Portkey, or LiteLLM, Cloudflare AI Gateway is SaaS-only – there is no self-hosted option. Your LLM traffic routes through Cloudflare’s network.

Key Features

Analytics Dashboard

Track requests, tokens, costs, errors, and latency across all providers. See cache hit rates, error breakdowns, and usage patterns over time. Useful for understanding usage but not as deep as Portkey or dedicated observability tools.

Caching

Serves identical requests from Cloudflare’s global cache:

Reduces latency by up to 90% for repeated queries
Exact-match caching (not semantic)
Configurable cache TTL (up to 1 month)
Per-request cache control via HTTP headers
25MB max request size

Rate Limiting

Control request volume per time window:

Sliding window or fixed window
Per-gateway rate limits
Protects against abuse and cost overruns

Retries & Fallbacks

Automatic retries on provider failures. Configure fallback chains to route to alternative providers when the primary is down.

Logging

Request/response logging for debugging and audit. Logs include prompts, completions, tokens, latency, and cost.

Provider Support

Works with OpenAI, Anthropic, Google AI Studio, HuggingFace, Azure OpenAI, AWS Bedrock, Groq, Perplexity, Cohere, Mistral, and more. Integration is URL-based – prefix your provider URL with your gateway endpoint.

Architecture

Client App
    |
    v
+------------------------------------------+
| Cloudflare Edge Network (global)          |
|                                           |
|  +-------------------------------------+ |
|  | AI Gateway                            | |
|  |                                       | |
|  |  Rate Limiting                        | |
|  |       |                               | |
|  |  Cache Check                          | |
|  |       |                               | |
|  |  Retry / Fallback Logic               | |
|  |       |                               | |
|  |  Logging & Analytics                  | |
|  +-------------------------------------+ |
+------------------------------------------+
    |
    v
LLM Provider (OpenAI, Anthropic, etc.)

Integration

Integration is URL-based – no SDK changes needed:

        
      
# Before
client = OpenAI(api_key="sk-...")

# After -- just change the base URL
client = OpenAI(
    api_key="sk-...",
    base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai"
)

For Workers AI (Cloudflare’s own inference), integration is native and automatic.

Self-Hosting

Not available. Cloudflare AI Gateway is SaaS-only. All traffic routes through Cloudflare’s network.

This is the critical constraint for enterprises with data sovereignty requirements. If your compliance posture requires that LLM traffic never leaves your network, Cloudflare AI Gateway is not an option.

Pricing / Cost Model

Component	Cost
Gateway proxy	Free (no per-request fee)
Analytics & logging	Free up to 100K logs/month (Workers Free)
Workers Paid	$5/month for 1M logs/month
Caching	Free (included)
Rate limiting	Free (included)

Cloudflare AI Gateway is effectively free for most use cases. The only cost is if you exceed logging limits, which requires upgrading to Workers Paid ($5/month). There are no per-request gateway fees, no per-token charges, no markup on provider costs.

This is the most cost-effective gateway option – but you trade cost for control (no self-hosting, no advanced guardrails, no virtual keys).

Limitations

No self-hosting – SaaS-only, traffic must go through Cloudflare
No guardrails – no PII detection, content filtering, or prompt injection protection at the gateway level
No virtual keys – no per-team/per-user key management or budget isolation
No semantic caching – exact match only
Basic observability – dashboard is useful but not comparable to Portkey, Helicone, or Langfuse
No model routing – you choose the provider explicitly per request; no automatic routing by complexity or cost
Vendor lock-in – tied to Cloudflare’s platform

When to Use

Strong fit:

Already on Cloudflare and want quick AI observability with zero setup
Side projects, MVPs, or low-volume apps where cost must be near-zero
Simple use cases: caching + rate limiting + basic analytics
Teams that route to a single provider and just want a safety layer

Weak fit:

Enterprise with data sovereignty or compliance requirements (no self-hosting)
Teams needing guardrails, PII detection, or content filtering
Multi-provider routing strategies with automatic fallback logic
High-volume production where you need per-team budgets and virtual keys

References

AI & Agents, AI Ops

agent-frameworks

This post is licensed under CC BY 4.0 by the author.