Post

Kong AI Gateway

If you already run Kong for API management, Kong AI Gateway is the path of least resistance to enterprise LLM governance -- it extends what you have rather than adding another proxy layer.

Kong AI Gateway

If you already run Kong for API management, Kong AI Gateway is the path of least resistance to enterprise LLM governance – it extends what you have rather than adding another proxy layer.


What It Is

Kong AI Gateway is an extension of Kong Gateway (the open-source API gateway built on Nginx/OpenResty) that adds AI-specific capabilities: LLM proxy, token-based rate limiting, semantic caching, multi-model routing, guardrails, and cost controls. It treats LLM APIs as first-class traffic, not just passthrough HTTP.

Kong is not a standalone LLM gateway – it is AI functionality layered onto a general-purpose API gateway. This is both its strength (one gateway for everything) and its constraint (heavier than purpose-built LLM proxies).


Key Features

LLM Proxy (AI Proxy Plugin)

Standardizes API signatures across providers (OpenAI, Anthropic, Azure OpenAI, Cohere, Mistral, Bedrock, etc.). Your application calls one endpoint; Kong translates to the target provider’s format.

Token-Based Rate Limiting

Rate limits based on token consumption, not just request count. Critical for LLM workloads where one request can consume 100K tokens while another uses 500.

Semantic Caching

Caches LLM responses based on semantic similarity of prompts (not exact match). Reduces redundant LLM calls and cuts costs for repetitive queries.

Guardrails & Content Safety

  • Prompt template enforcement – lock down what users can send
  • PII stripping before requests reach the LLM
  • Content filtering on responses with semantic understanding
  • Integration with third-party guardrail providers

Dollar-Based Quotas

Set budget caps in actual currency, not just tokens. Prevents cost overruns at the organizational level.

Multi-Model Routing

Route requests to different models based on rules: complexity, cost tier, latency requirements, or custom headers.

MCP Gateway (Enterprise)

As of 2025, Kong added MCP (Model Context Protocol) gateway support for managing agent-to-tool communication, authentication, and governance at the protocol level.


Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Client App
    |
    v
+---------------------------+
| Kong Gateway               |
|  +---------------------+  |
|  | AI Proxy Plugin       | |  <-- Provider normalization
|  +---------------------+  |
|  | Token Rate Limiter    | |  <-- Token-aware throttling
|  +---------------------+  |
|  | Semantic Cache        | |  <-- Redis-backed similarity cache
|  +---------------------+  |
|  | Guardrails Plugin     | |  <-- PII, content filtering
|  +---------------------+  |
|  | Analytics / Logging   | |  <-- Cost, latency, token metrics
|  +---------------------+  |
+---------------------------+
    |
    v
LLM Providers (OpenAI, Anthropic, Azure, Bedrock, etc.)

Kong runs as a reverse proxy (Nginx-based) with plugins executed in order per request. AI plugins slot into the same pipeline as existing auth, rate-limit, and logging plugins.


Self-Hosting

Kong Gateway OSS is fully self-hostable (Apache 2.0 license). However, most AI-specific plugins (AI Proxy, semantic caching, guardrails, MCP Gateway) require Kong Gateway Enterprise or Kong Konnect.

Deployment Options

| Mode | Description | |——|————-| | Self-hosted (Enterprise) | Full control, runs in your infra (K8s, VMs, bare metal). Requires enterprise license. | | Hybrid (Konnect) | Control plane in Kong’s cloud, data plane in your infra. LLM traffic never leaves your network. | | Dedicated Cloud (Konnect) | Fully managed by Kong in a dedicated environment. | | Serverless (Konnect) | Pay-per-use, Kong-managed. |

For enterprise with data sovereignty requirements (like MMS), self-hosted Enterprise or hybrid Konnect are the relevant options.


Pricing / Cost Model

Tier Cost AI Features
OSS Free Basic proxying, no AI plugins
Enterprise Custom (contact sales) All AI plugins, self-hosted, enterprise support
Konnect Plus ~$1,050/mo per service AI Gateway plugins, hybrid deployment
Konnect Enterprise Custom Full feature set, dedicated support, SLAs

Pricing is not transparent – enterprise licensing requires sales conversations. This is the main friction point for teams evaluating Kong vs open-source alternatives. Expect annual contracts in the $50K-200K+ range for enterprise deployments depending on scale.


Performance

Kong published benchmarks (2024) showing:

  • 200%+ throughput vs Portkey
  • 800%+ throughput vs LiteLLM
  • Sub-millisecond added latency per request in proxy mode

Take vendor benchmarks with a grain of salt, but Kong’s Nginx foundation gives it a genuine performance advantage for high-throughput scenarios.


When to Use

Strong fit:

  • You already run Kong API Gateway and want to extend it for LLM traffic (MMS case)
  • Enterprise with existing API governance that needs to cover AI endpoints
  • High-throughput production workloads where proxy latency matters
  • Need a single gateway for both traditional APIs and LLM APIs

Weak fit:

  • Startup/small team – enterprise pricing is overkill
  • You only need LLM routing with cost tracking – LiteLLM or Portkey are simpler
  • You want fully open-source with no enterprise license dependency

References

This post is licensed under CC BY 4.0 by the author.