Envoy AI Gateway
Envoy AI Gateway is the CNCF-backed open-source option for AI traffic management -- built on the battle-tested Envoy proxy, it brings LLM routing, credential management, and inference-aware load balancing to Kubernetes-native deployments.
Envoy AI Gateway is the CNCF-backed open-source option for AI traffic management – built on the battle-tested Envoy proxy, it brings LLM routing, credential management, and inference-aware load balancing to Kubernetes-native deployments.
What It Is
Envoy AI Gateway (EAIGW) is an open-source project built on top of Envoy Gateway (the CNCF Kubernetes Gateway API implementation). It provides unified access to LLM providers and self-hosted models with enterprise concerns: authentication, rate limiting, cost tracking, and intelligent inference routing.
Backed by Tetrate and Bloomberg, it was the first CNCF-backed AI gateway project (v0.1 released February 2025).
Key Features
Unified LLM Access
Single entry point for multiple LLM providers (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, etc.) with provider-agnostic routing.
Two-Tier Gateway Architecture
1
2
3
4
5
6
7
8
9
Tier 1 Gateway (ingress)
- Authentication, global rate limiting
- Top-level routing to provider or self-hosted cluster
|
v
Tier 2 Gateway (model serving)
- Fine-grained model routing
- Inference-aware load balancing
- KV-cache-aware endpoint selection
Tier 1 handles external traffic (auth, quotas, provider routing). Tier 2 handles internal traffic to self-hosted model serving clusters with inference-specific intelligence.
Intelligent Inference Routing (EPP)
Endpoint Picker (EPP) integration enables routing decisions based on real-time inference metrics:
- KV-cache utilization per GPU
- Queued requests per endpoint
- LoRA adapter availability
- GPU memory pressure
This is unique to Envoy AI Gateway and critical for organizations running self-hosted models (vLLM, llm-d, TGI).
MCP Support
As of 2025, EAIGW added Model Context Protocol support with full spec compliance, OAuth authentication, and zero-friction deployment for MCP server proxying.
Credential Management
Centralized credential storage and rotation for LLM provider API keys. Applications never handle raw credentials.
Cost & Quota Enforcement
Token-based rate limiting and cost tracking per consumer, per model, per team.
Kubernetes-Native
Built on Kubernetes Gateway API standard. Configured via CRDs, integrates with Istio service mesh, and runs as a standard K8s deployment.
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
External Clients / Agents
|
v
+---------------------------+
| Tier 1: Envoy AI Gateway |
| - Auth (JWT, API key) |
| - Global rate limiting |
| - Provider routing |
| - Cost tracking |
+---------------------------+
|
+----+----+
v v
[OpenAI] [Self-hosted cluster]
+---------------------------+
| Tier 2: Inference Gateway |
| - EPP (endpoint picker) |
| - KV-cache-aware routing |
| - LoRA adapter routing |
| - GPU-aware balancing |
+---------------------------+
|
+-----+-----+
v v v
[vLLM] [llm-d] [TGI]
Self-Hosting & Pricing
Fully open-source (Apache 2.0). No enterprise license required for any feature. Self-hosted on Kubernetes.
| Component | License | Cost |
|---|---|---|
| Envoy AI Gateway | Apache 2.0 | Free |
| Envoy Gateway (dependency) | Apache 2.0 | Free |
| Envoy Proxy (foundation) | Apache 2.0 | Free |
Tetrate offers commercial support but the project is fully functional open-source.
Limitations
- Kubernetes-only – no bare-metal or Docker Compose deployment. Requires K8s + Gateway API.
- Younger project – v0.1 in Feb 2025, still rapidly evolving. Less battle-tested than Kong.
- No built-in guardrails – no PII detection, content filtering. Relies on external guardrail services.
- No virtual keys / budget management UI – more infrastructure-level than application-level.
- Agent-to-agent routing is basic – primarily LLM and inference routing. Not A2A-protocol-aware like Kong or agentgateway.
When to Use
Strong fit:
- Kubernetes-native organizations that want open-source AI gateway with zero licensing cost
- Running self-hosted models (vLLM, llm-d) and need inference-aware routing (EPP)
- Want to extend existing Envoy/Istio service mesh with AI traffic management
- CNCF-aligned infrastructure strategy
Weak fit:
- Not on Kubernetes – can’t use it
- Need application-level features (virtual keys, budgets, guardrails) – use Portkey or Kong
- Need mature A2A agent-to-agent routing – use agentgateway or Kong
- Small team that wants simple setup – LiteLLM or Cloudflare is easier