Post

MCP Gateway — Detailed Specification

Technical specification for a centralized MCP Gateway with detailed functional and non-functional requirements, security considerations, and stakeholder sign-off matrix.

MCP Gateway — Detailed Specification

The MCP Gateway provides a secure, observable, and self-service entry point for AI agents to invoke Model Context Protocol (MCP) tools exposed by engineering teams across the company. By routing all MCP traffic through a centralized gateway, the platform enforces consistent authentication, access control, and audit logging — without burdening individual product teams with those concerns.

Stakeholders

Stakeholder Description Contact Persons
AI Platform Governance and drive AI related initiatives Sons, Kirsten / VP; Meyner, Felix / Platform Owner; Neurohr, Erik / Principal Engineer; Sheikh, Imrul Hassan / Engineering Manager
Product Team Expose existing/new capability via MCP Duraj, Ervis / Principal Engineer
Cyber Security Ensure gateway/MCP tools deployed are production-grade in terms of security context Dimitriadis, Dimitrios / Lead; Horodetskyy Nagaychuk, Orest / Exp. Engineer; Rathnayaka, Nuwan / Exp. Engineer
API Platform Potential operator of the gateway, provide automated self-service and guidance for onboarding a new MCP tool Thomas, Lee / Senior Engineer

Functional Requirements

F1 Discoverable

  • A central MCP catalogue lists all registered MCP Servers and their available tools.
  • AI agents can query the catalogue at runtime to enumerate available tools without prior knowledge of upstream server addresses.
  • The catalogue is accessible to authorized consumers via a stable API; no manual coordination with product teams is required to discover tools.

F2 Access Control

  • The gateway controls who has access to which MCP tools.
  • The gateway must implement token-based authentication:
    • Support authentication OAuth 2.1 specification to identify users behind the AI agent, ideally supports the company standard OAuth provider (Entra ID)
    • The gateway should also support authentication for external services, either via internal or external OAuth provider
  • The gateway should support authentication using service account API-key to identify AI agents
  • Identities should then be propagated to the MCP Servers for further role-based access control on specific resources (e.g., user viewing their own webshop account data)
  • No human should access by direct call to the exposed MCP endpoint on the gateway

F3 Traffic Routing

  • The gateway must perform correct routing to the upstream MCP Server based on pre-configured mapping from subpath to hosts and routes.
  • Each upstream MCP Server should follow a company standard versioning.
  • A different routing setup is possible based on development stage and its necessity.
  • Routing configuration is managed declaratively (GitOps)
  • The gateway should act as a transparent MCP proxy.
  • The gateway must support Streamable HTTP as the transport protocol, should also support HTTP+SSE for backward compatibility
  • The gateway must be able to transform the payload in specific cases:
    • Removal or masking of sensitive information
    • Perform additional authentication

Non-Functional Requirements

NF1 Latency

  • P99 gateway overhead for a tool-call request must be < 100 ms (excluding upstream MCP Server processing time). Accounting for features like response masking and additional authentication.

NF2 Scalability

  • The gateway must handle 100+ MCP Servers backend across all domains.
  • The gateway must be horizontally scalable with no single point of failure.
  • Onboarding process should be scalable and independent, i.e., speed of onboarding must not be bound by the gateway owning team.

NF3 Availability

  • The gateway must be 99.XXXX% available
  • The gateway must be hosted multi-region.
  • The gateway health must be independent of degradation or outage of any upstream MCP Servers.

NF4 Monitoring

Appropriate metrics and tags should be provided similar to any standard REST API:

Standard Metrics:

  • requests_total
  • request_body_bytes_total
  • response_body_bytes_total
  • request_duration_seconds

Standard Tags:

  • response_code
  • upstream_response_code

MCP-Specific Tags:

  • mcp_method
  • tool_name
  • resource_uri
  • prompt_name

NF5 Auditability

Each invocation of MCP Server should emit an audit entry with the following metadata:

  • timestamp
  • user_id (when applicable)
  • agent_identity
  • mcp_server
  • mcp_method (tools/list, tools/call…)
  • For tool invocation: tool_name
  • For resource invocation: resource_uri
  • For prompt invocation: prompt_name
  • response_code
  • upstream_response_code

The audit traces should be available for 7 days in productive environments.

NF6 Self Service

The gateway owner should provide self-service in best effort:

  • Onboarding of new MCP Server
  • Decommissioning of MCP Server
  • Adding new consumer
  • Access control modification
  • Upstream host modification

NF7 Approval Bypass in Freeze Period for MCP Gateway

A clear bypass process should be defined for last-minute changes right before / during the freeze period, that includes:

  • Critical criteria
  • Escalation path
  • Final decision maker (e.g., platform owner and VP)

NF8 Separation of MCP Gateway Instance

  • The traffic towards MCP Servers should not affect the performance of the gateway of classic APIs.

NF9 Security

  • The final gateway implementation should undergo a complete security assessment by the cyber security team
  • Each MCP Server should pass a simple OWASP security checks as part of the go-live checklist
  • Approval process must be set for onboarding new API and access granting process.
  • Rate limiting must be implemented to avoid AI attacks. The implementation can be on company, MCP server, tool, and session level

Stakeholder Sign-Offs

# Requirements AI Platform Cyber Security API Platform
F1 Discoverable ✅/❌ ✅/❌ ✅/❌
F2 Access Control ✅/❌ ✅/❌ ✅/❌
F3 Traffic Routing ✅/❌ ✅/❌ ✅/❌
F4 Proxy Related ✅/❌ ✅/❌ ✅/❌
NF1 Latency ✅/❌ ✅/❌ ✅/❌
NF2 Scalability ✅/❌ ✅/❌ ✅/❌
NF3 Availability ✅/❌ ✅/❌ ✅/❌
NF4 Monitoring ✅/❌ ✅/❌ ✅/❌
NF5 Auditability ✅/❌ ✅/❌ ✅/❌
NF6 Self Service ✅/❌ ✅/❌ ✅/❌
NF7 Approval bypass in freeze period for MCP gateway ✅/❌ ✅/❌ ✅/❌
NF8 Separation of MCP gateway instance ✅/❌ ✅/❌ ✅/❌
NF9 Security ✅/❌ ✅/❌ ✅/❌

Points to Clarify

  1. AI Platform: What is the foreseeable number of MCP tools?
  2. API Platform: Appropriate SLA of MCP?
  3. API Platform: Do we have use case to seamlessly transform OAS to MCP Server?

References

  1. MCP Specification — Authorization
  2. MCP Specification — Transport
  3. AI Platform Docs — Gateway Architectural Pattern
This post is licensed under CC BY 4.0 by the author.