Cloudflare AI Gateway
Cloudflare AI Gateway is the easiest on-ramp if you're already on Cloudflare -- add a URL prefix and you get caching, analytics, and rate limiting for free. The tradeoff: SaaS-only, no self-hosting, and limited advanced features.
Cloudflare AI Gateway is the easiest on-ramp if you’re already on Cloudflare – add a URL prefix and you get caching, analytics, and rate limiting for free. The tradeoff: SaaS-only, no self-hosting, and limited advanced features compared to purpose-built gateways.
What It Is
Cloudflare AI Gateway is a managed proxy layer that sits between your application and LLM providers, running on Cloudflare’s global edge network. It provides analytics, caching, rate limiting, retries, and fallbacks for AI API calls. It is part of Cloudflare’s Developer Platform (Workers ecosystem).
Unlike Kong, Portkey, or LiteLLM, Cloudflare AI Gateway is SaaS-only – there is no self-hosted option. Your LLM traffic routes through Cloudflare’s network.
Key Features
Analytics Dashboard
Track requests, tokens, costs, errors, and latency across all providers. See cache hit rates, error breakdowns, and usage patterns over time. Useful for understanding usage but not as deep as Portkey or dedicated observability tools.
Caching
Serves identical requests from Cloudflare’s global cache:
- Reduces latency by up to 90% for repeated queries
- Exact-match caching (not semantic)
- Configurable cache TTL (up to 1 month)
- Per-request cache control via HTTP headers
- 25MB max request size
Rate Limiting
Control request volume per time window:
- Sliding window or fixed window
- Per-gateway rate limits
- Protects against abuse and cost overruns
Retries & Fallbacks
Automatic retries on provider failures. Configure fallback chains to route to alternative providers when the primary is down.
Logging
Request/response logging for debugging and audit. Logs include prompts, completions, tokens, latency, and cost.
Provider Support
Works with OpenAI, Anthropic, Google AI Studio, HuggingFace, Azure OpenAI, AWS Bedrock, Groq, Perplexity, Cohere, Mistral, and more. Integration is URL-based – prefix your provider URL with your gateway endpoint.
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Client App
|
v
+------------------------------------------+
| Cloudflare Edge Network (global) |
| |
| +-------------------------------------+ |
| | AI Gateway | |
| | | |
| | Rate Limiting | |
| | | | |
| | Cache Check | |
| | | | |
| | Retry / Fallback Logic | |
| | | | |
| | Logging & Analytics | |
| +-------------------------------------+ |
+------------------------------------------+
|
v
LLM Provider (OpenAI, Anthropic, etc.)
Integration
Integration is URL-based – no SDK changes needed:
1
2
3
4
5
6
7
8
# Before
client = OpenAI(api_key="sk-...")
# After -- just change the base URL
client = OpenAI(
api_key="sk-...",
base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai"
)
For Workers AI (Cloudflare’s own inference), integration is native and automatic.
Self-Hosting
Not available. Cloudflare AI Gateway is SaaS-only. All traffic routes through Cloudflare’s network.
This is the critical constraint for enterprises with data sovereignty requirements. If your compliance posture requires that LLM traffic never leaves your network, Cloudflare AI Gateway is not an option.
Pricing / Cost Model
| Component | Cost |
|---|---|
| Gateway proxy | Free (no per-request fee) |
| Analytics & logging | Free up to 100K logs/month (Workers Free) |
| Workers Paid | $5/month for 1M logs/month |
| Caching | Free (included) |
| Rate limiting | Free (included) |
Cloudflare AI Gateway is effectively free for most use cases. The only cost is if you exceed logging limits, which requires upgrading to Workers Paid ($5/month). There are no per-request gateway fees, no per-token charges, no markup on provider costs.
This is the most cost-effective gateway option – but you trade cost for control (no self-hosting, no advanced guardrails, no virtual keys).
Limitations
- No self-hosting – SaaS-only, traffic must go through Cloudflare
- No guardrails – no PII detection, content filtering, or prompt injection protection at the gateway level
- No virtual keys – no per-team/per-user key management or budget isolation
- No semantic caching – exact match only
- Basic observability – dashboard is useful but not comparable to Portkey, Helicone, or Langfuse
- No model routing – you choose the provider explicitly per request; no automatic routing by complexity or cost
- Vendor lock-in – tied to Cloudflare’s platform
When to Use
Strong fit:
- Already on Cloudflare and want quick AI observability with zero setup
- Side projects, MVPs, or low-volume apps where cost must be near-zero
- Simple use cases: caching + rate limiting + basic analytics
- Teams that route to a single provider and just want a safety layer
Weak fit:
- Enterprise with data sovereignty or compliance requirements (no self-hosting)
- Teams needing guardrails, PII detection, or content filtering
- Multi-provider routing strategies with automatic fallback logic
- High-volume production where you need per-team budgets and virtual keys