MCP Gateway — Requirements Summary

Centralized, secure entry point for AI agents to invoke MCP tools. Routes all MCP traffic through one gateway to enforce consistent authN/authZ, audit logging, and observability.

Posted Mar 15, 2026

15 min read

Purpose

Centralized, secure entry point for AI agents to invoke MCP tools exposed by product teams across MMS. Routes all MCP traffic through one gateway to enforce consistent authN/authZ, audit logging, and observability — so individual product teams don’t each rebuild those concerns.

Stakeholders

Stakeholder	Role	Key contacts
AI Platform	Governance + drives AI initiatives	Kirsten Sons (VP), Felix Meyner (Platform Owner), Erik Neurohr (Principal), Imrul Sheikh (EM)
Product Teams	Expose capabilities via MCP	Ervis Duraj (Principal)
Cyber Security	Production-grade security review	Dimitrios Dimitriadis (Lead), Orest Horodetskyy Nagaychuk, Nuwan Rathnayaka
API Platform	Potential operator + self-service onboarding	Thomas Lee (Senior)

Request flow

flowchart LR
    Agent[AI Agent] -->|OAuth 2.1 / API key| GW[MCP Gateway]
    GW --> Catalogue[(MCP Catalogue)]
    GW --> Auth[AuthN/AuthZ<br/>Entra ID]
    GW --> Audit[(Audit Log<br/>7d retention)]
    GW --> Metrics[(Metrics<br/>Prometheus-style)]
    GW -->|Streamable HTTP / SSE| MCP1[MCP Server A]
    GW -->|Streamable HTTP / SSE| MCP2[MCP Server B]
    GW -->|Streamable HTTP / SSE| MCPN[MCP Server N ...]
    MCP1 --> RBAC1[Resource RBAC]
    MCP2 --> RBAC2[Resource RBAC]

Functional requirements (F1–F4)

#	Requirement	Essence
F1	Discoverable	Central catalogue; agents enumerate tools at runtime via stable API — no manual coordination with product teams.
F2	Access Control	OAuth 2.1 for users (Entra ID), API-keys for agents. Identity propagated downstream for resource-level RBAC. No human access to raw MCP endpoints.
F3	Traffic Routing	Subpath → upstream host mapping, GitOps-managed. Versioned upstreams. Stage-specific routing allowed.
F4	Proxy Behavior	Transparent proxy. Streamable HTTP (primary), HTTP+SSE (backward compat). Payload transforms for masking and extra auth.

Non-functional requirements (NF1–NF9)

flowchart TB
    subgraph Performance
        NF1[NF1 Latency<br/>P99 &lt; 100ms overhead]
        NF2[NF2 Scalability<br/>100+ upstreams,<br/>horizontal, independent onboarding]
        NF3[NF3 Availability<br/>99.XXXX%, multi-region,<br/>isolated from upstream health]
        NF8[NF8 Instance Separation<br/>MCP traffic isolated<br/>from classic API gateway]
    end
    subgraph Observability
        NF4[NF4 Monitoring<br/>requests, bytes, duration<br/>+ MCP tags]
        NF5[NF5 Auditability<br/>per-invocation trace,<br/>7-day retention]
    end
    subgraph Governance
        NF6[NF6 Self-Service<br/>onboard/decommission/<br/>access mgmt]
        NF7[NF7 Freeze Bypass<br/>criteria + escalation +<br/>VP/Platform-Owner decision]
        NF9[NF9 Security<br/>CyberSec review, OWASP,<br/>rate limits, approval gates]
    end

Monitoring tags (NF4): response_code, upstream_response_code, mcp_method, tool_name, resource_uri, prompt_name.

Audit payload (NF5): timestamp, user_id, agent_identity, mcp_server, mcp_method, plus method-specific (tool_name / resource_uri / prompt_name), response_code, upstream_response_code.

Rate limiting (NF9): company / MCP server / tool / session level.

Open questions (from source doc)

AI Platform: foreseeable number of MCP tools? (drives sizing for NF2)
API Platform: appropriate MCP SLA?
API Platform: use case for auto-transforming OAS → MCP Server?

Gaps & critical challenges to raise in the team session

Items below are based on the current MCP threat landscape and production learnings (April 2026). Not exhaustive — each one is a concrete question you can push on.

Security & identity (highest priority)

Token passthrough is forbidden — but F2 implies it. F2 says “identities should be propagated to MCP Servers.” The MCP Authorization spec explicitly prohibits the gateway from forwarding the user’s access token downstream (confused-deputy problem). The gateway must do an OAuth 2.1 token exchange (RFC 8693 / On-Behalf-Of flow) and mint a new, audience-scoped token per upstream. Ask: is this the intended design, and is it called out?
Resource Indicators (RFC 8707) missing. MCP clients are required to include the resource parameter in authorization and token requests so a token stolen from server A cannot be replayed against server B. Not mentioned in F2.
Audience claim (aud) validation per upstream. Each MCP server must reject tokens whose aud isn’t its own identifier. Where is this enforced — gateway, upstream, or both?
Tool poisoning / description injection. Tool descriptions are prompt-context for agents; malicious descriptions can hijack the model. Who reviews tool descriptions? Is there static validation at registration? Signed manifests? (Not covered by F1 catalogue or NF9.)
Supply-chain risk on MCP server images/packages. 2025–26 incidents: postmark-mcp npm backdoor, WhatsApp MCP rug-pull, CVEs in mcp-server-git. NF9 mentions OWASP checks — do those cover dependency provenance, image signing, SBOM?
Indirect prompt injection in tool responses. F4 mentions masking sensitive data on egress, but not scanning upstream responses for injected instructions that could hijack the calling agent. Is there content inspection or response sanitization?
MCP sampling attack surface. If any MCP server uses the sampling capability (server asks the agent’s model to run inference), malicious servers can drain compute quotas or exfiltrate via model output. Is sampling allowed? If yes, where’s the guardrail?
Fail-open vs fail-closed when the gateway can’t reach the IdP / catalogue / audit sink. Not specified. Security-critical: a silent fail-open is a compliance incident.
Egress / data-exfiltration controls. No mention of URL allow-listing or network egress policy on MCP servers. An injected tool call with a malicious resource_uri can leak data.

Scalability & session management

Streamable HTTP is stateful. Horizontal scaling (NF2) fights with session affinity. How are sessions externalized — sticky routing, shared session store, or are sessions fully stateless at the gateway? Well-known MCP production pain point.
SSE deprecation timeline. F4 supports HTTP+SSE “for backward compatibility.” When does it sunset? Security/ops risk if it lingers indefinitely.
Circuit breakers & backpressure on upstreams. NF3 says gateway health is independent of upstream health — but how? Per-upstream concurrency limits, circuit breakers, budgeted timeouts? Without this, one slow upstream eats gateway threads.

Cost & quota (gap — fully missing)

No token/compute budget enforcement. Rate limiting (NF9) ≠ cost governance. A runaway agent can burn through LLM + tool-execution spend in minutes. Missing: per-tenant, per-agent, per-tool cost budgets, checked before dispatch. Industry benchmark: Unity AI Gateway, Portkey, Bifrost all treat this as table-stakes.
Tool-level quotas. Rate limits at “session level” (NF9) are too coarse. Needs per-agent × per-tool quotas with burst handling.

Observability & governance

End-to-end tracing across tool chains. NF5 audits single invocations. When agent A calls tool X which internally triggers tool Y on another MCP server, is there a correlating trace ID? This is the most-cited observability gap in MCP production deployments.
Tool versioning at consumer level. F3 says upstreams are versioned, but not how an agent pins to a version or what happens on schema breakage. Deprecation policy? Compatibility window?
Catalogue trust model. F1 says “agents query the catalogue at runtime.” Is the catalogue signed? Can a compromised catalogue register a rogue tool and be enumerated by every agent in the company?
7-day audit retention — is that enough for compliance? German data regulations and incident forensics often need ≥90 days. Who signed off on 7d? Is there cold storage?
PII in audit logs. user_id, resource_uri can contain PII. GDPR impact? Log access controls and redaction policy?

Operational readiness

Tool onboarding quality gates. NF6 is about self-service speed. NF9 mentions approval — but what are the quality criteria? Contract tests? Golden evals? Schema validation? Red-team review? Today without this, “fast onboarding” means fast garbage.
Testing framework for MCP servers. No mention of how product teams test their MCP server before go-live. Sandbox environment? Test agent? Conformance suite?
Noisy-neighbor isolation between tenants. NF8 separates MCP from classic APIs, but not tenant-from-tenant within MCP. One abusive team can degrade everyone.
Deprecation and decommissioning flow. NF6 lists decommissioning as self-service — but what about consumer migration, sunset notices, and enforcement dates? Missing lifecycle policy.

Suggested talking-points priority for your session

Must-fix before build: #1 (token exchange), #2 (resource indicators), #4 (tool poisoning), #13 (cost budgets), #20 (onboarding quality gates).
Design decisions to lock in: #8 (fail-open/closed), #10 (session state), #15 (tracing), #18 (audit retention).
Nice-to-clarify: the rest.

Kong POC — questions for the engineer to take into the call

Kong released AI Gateway 3.14 / Kong Agent Gateway in April 2026 with explicit MCP support. On paper, they cover most of the requirements. The POC should focus on validating depth of coverage, not just presence. Questions below are grouped by requirement area so the engineer can walk through them with Kong systematically.

0. Framing — ask Kong to map their features to our requirements

“We have F1–F4 and NF1–NF9. Please show us concretely which Kong plugin / Konnect feature implements each one, and where the gaps are.”

This forces them to own the mapping. Do not accept hand-waving on any single requirement.

1. Identity & access control (F2) — critical

Token exchange flow. Kong claims the ai-mcp-oauth2 plugin does RFC 8693 token exchange and does not pass the user token to upstream by default. Ask for a live demo: show the inbound token, the exchange call, and the exchanged token going to upstream. Confirm aud is rewritten per upstream.
Entra ID integration. Kong’s openid-connect plugin supports any OIDC IdP. Ask specifically: any known issues with Entra ID (B2B, conditional access, multi-tenant)? Which Entra grant flows are tested?
Resource Indicators (RFC 8707). Does ai-mcp-oauth2 send and enforce the resource parameter end-to-end?
Service-account / API-key auth for agents. What plugin, and does it compose cleanly with the OAuth flow when an agent acts on behalf of a user? Dual-credential request semantics?
Downstream identity propagation. Kong forwards claims as headers to upstream MCP servers. Which headers? Are they signed (preventing upstream spoofing)? Or do upstreams need to re-validate via JWKS?
No-human-direct-access (F2). How is this enforced — a deny rule on human UA? Per-route ACL? What prevents a curl with a stolen agent API-key?

2. Prompt injection & tool poisoning (gap #4, #6, #7)

AI Prompt Guard & Semantic Prompt Guard. What do they actually scan — only user role messages, or also tool responses? We need response-side scanning to catch indirect prompt injection from upstream tools.
Lakera Guard integration. Cost model? Latency impact on NF1 (<100ms P99)? Is Lakera data-residency compliant for EU?
Tool description validation at registration. When a product team onboards an MCP server, does Kong statically scan tool descriptions for poisoning patterns, or is that a separate process? If no — that’s a gap we own.
MCP sampling. Does Kong’s MCP proxy support the MCP sampling capability? If yes, what controls prevent a malicious server from exhausting our LLM quota?
Invisible-character / homoglyph detection. Kong claims coverage — confirm in demo with a zero-width unicode injection test.

3. MCP-specific features (F1, F3, F4)

MCP Registry / catalogue (F1). Is the catalogue in Kong Konnect signed? Who can add/modify entries? Audit trail on catalogue changes? Can agents query it via standard MCP tools/list through the gateway?
Streamable HTTP support (F4). Confirmed supported. How are long-lived streams handled through horizontal scaling? Sticky sessions, externalized state, or fully stateless? This is a known MCP pain point.
HTTP+SSE backward compat (F4). Supported, and what’s Kong’s own deprecation plan for SSE?
Payload transform for masking (F4). Which plugin — is it ai-prompt-decorator / response-transformer or a dedicated MCP-aware one? Can it mask inside streaming responses, not just single payloads?
OAS → MCP auto-generation. This is Kong’s differentiator and directly addresses open question #3 from the source doc. Ask for a demo on a real OAS spec, and push on: what happens when OAS is incomplete/ambiguous? Manual override? Versioning of the generated MCP server?
GitOps routing config (F3). Kong supports decK / Konnect-as-code. What’s the actual workflow — git PR → CI → Konnect sync? Drift detection?

4. Scalability, resilience, multi-region (NF1, NF2, NF3, NF8)

P99 <100ms overhead (NF1). Ask for Kong’s own benchmark with ai-mcp-oauth2 + prompt-guard + rate-limiting enabled (not a naked proxy pass-through). That’s our real config. Request the latency distribution, not just the median.
Horizontal scaling with streaming MCP sessions (NF2). How does Kong DP handle long-lived streams across pod restarts, rolling updates, autoscaling events?
Multi-region failover (NF3). Active-active or active-passive? RPO/RTO for Konnect control-plane failure? What breaks when the control plane is unreachable — data plane keeps serving from last cached config, fails open, fails closed?
Circuit breakers & upstream isolation (NF3). How are slow upstreams prevented from starving the gateway? Per-upstream thread pool? Concurrency limits per route?
Instance separation from classic APIs (NF8). Do we run a separate Kong cluster for MCP, or a separate workspace in Konnect? Resource isolation guarantees?

5. Cost control & quotas (gap #13, #14) — critical

AI Rate Limiting Advanced plugin. Kong does token-based rate limiting with input/output/total token limits per user/app/time. Confirm:
- Enforcement before upstream dispatch (not after — else a runaway has already spent the tokens).
- Per-agent × per-tool granularity (not just per-route).
- Does this work for MCP tool calls the same way it works for raw LLM calls? (MCP calls don’t always have a token concept — how is cost calculated?)
Budget vs. rate limit. Can we set a monthly hard budget per tenant, or only rolling rate limits? Alerting at 80/90/100%?
Cost reporting. What dashboards come out of the box in Konnect? Can we export to our BigQuery / FinOps tooling?

6. Observability & audit (NF4, NF5, gap #15, #18)

Metrics. Does Kong emit the specific MCP tags we listed (mcp_method, tool_name, resource_uri, prompt_name) out of the box, or do we need custom plugins?
OpenTelemetry. End-to-end tracing when one tool call triggers another MCP call through a different agent — does trace propagation just work, or is it on us?
Audit log structure. Does Kong’s audit plugin emit exactly the NF5 schema, or do we need a post-processor? Where do logs land — Konnect, our sink, both?
Retention. NF5 says 7d. Kong Konnect stores how long? Can we stream to our own long-term sink (GCS, BigQuery) for compliance?
PII redaction. GDPR. Built-in claim redaction on audit records? Pseudonymisation options?

7. Onboarding & governance (NF6, NF7, NF9, gap #20)

Self-service MCP onboarding. What does the product-team experience look like end-to-end? PR to a repo → automatic catalogue entry? Or a Konnect UI workflow? Who approves?
Quality gates at onboarding. Can we plug in custom validation (contract tests, schema lint, red-team eval) as a gate before an MCP server goes live? Or is onboarding pure config?
Freeze bypass (NF7). Nothing specifically about this — it’s our own process. But: does Kong support feature flags / staged rollouts so a bypass change can be reverted fast?
Security review hooks (NF9). Can a new MCP server be registered in “canary” mode (limited consumers) before full rollout? OWASP check integration?

8. Agent-to-agent / future-proofing

A2A support. Kong announced agent-to-agent traffic support in 2026. What’s the MMS relevance — do we have agent-to-agent use cases in our roadmap that we’d want to use the same gateway for, or is this premature?
MCP spec cadence. MCP is moving fast. What’s Kong’s SLA on supporting new spec versions? Breaking-change policy?

9. Commercial & organisational fit

Licensing model. Kong Enterprise vs. Konnect SaaS vs. self-hosted Konnect — which tier do we need for MCP features, AI plugins, and what are the cost drivers (data planes, requests, tokens)?
Data residency. Can Konnect control plane be EU-hosted? Any telemetry leaving EU?
Support SLA. Response times, dedicated TAM, hotfix process?
Open-source vs. enterprise. Which of the plugins we need (ai-mcp-oauth2, ai-rate-limiting-advanced, lakera-guard, prompt-guard) are OSS vs. paid? Lock-in risk?
Reference customers. Who’s running Kong MCP Gateway in production at our scale? Preferably EU, regulated, 100+ upstream services.

Demo scenarios to request from Kong

Don’t accept slides — ask them to demonstrate these live or in a shared sandbox:

End-to-end OAuth flow: Entra ID user → gateway → token exchange → two different MCP servers with distinct audiences. Show the tokens at each hop.
Prompt injection: send a malicious tool description through the onboarding flow; show it being blocked. Then a successful tool call with a poisoned response; show Prompt Guard catching it.
Cost control: burn through a tenant’s token budget in a loop; show rate limit kicking in before upstream dispatch.
Streaming session under rolling update: open a long-lived MCP session; restart a data plane pod; observe client behaviour.
Upstream failure isolation: hang one upstream MCP server; show gateway remains healthy and other upstreams unaffected.
Onboarding workflow: from a product team’s perspective, add a new MCP server via their self-service path, end-to-end, in <15 minutes.

Non-negotiables to confirm before signing

✅ Token exchange (RFC 8693), no passthrough
✅ Entra ID production validation
✅ Token-based cost budgets enforced pre-dispatch
✅ Response-side prompt-injection scanning
✅ Multi-region deployment with documented failover
✅ Audit log export to our sink for ≥90d retention
✅ P99 <100ms with our full plugin stack enabled (not a naked config)
✅ EU data residency

References

AI & Agents, AI Ops

agent-frameworks guardrails

This post is licensed under CC BY 4.0 by the author.