Post

Helicone AI Gateway

Helicone is what you pick when observability is your primary concern and you want a fast, lightweight gateway that doubles as your LLM monitoring layer -- Rust-based, open-source, and deployable as a single binary.

Helicone AI Gateway

Helicone is what you pick when observability is your primary concern and you want a fast, lightweight gateway that doubles as your LLM monitoring layer – Rust-based, open-source, and deployable as a single binary.


What It Is

Helicone is an open-source AI gateway and observability platform built in Rust. It started as an LLM observability tool (YC W23) and evolved to include gateway capabilities: routing, caching, rate limiting, failovers, and load balancing. The gateway is a lightweight Rust binary inspired by NGINX – minimal latency overhead with high throughput.

Helicone’s differentiator is that it treats observability as a first-class concern, not an add-on. Every request through the gateway is automatically logged, traced, and available for analysis.


Key Features

Rust-Based Performance

  • 8ms P50 latency
  • Single binary deployment
  • Horizontally scalable
  • Minimal resource footprint compared to Python-based alternatives (LiteLLM)

Multi-Provider Routing

Supports OpenAI, Anthropic, Google, AWS Bedrock, Azure OpenAI, and 20+ more providers. Smart routing based on:

  • Provider uptime awareness
  • Rate limit awareness
  • Cost optimization
  • Latency optimization

Built-In Observability

  • Request/response logging with full prompt and completion capture
  • Cost tracking per request, per user, per model
  • Latency distributions and percentile tracking
  • Error rate monitoring by provider
  • Token usage analytics
  • OpenTelemetry support for integration with existing observability stacks

Caching

Response caching to reduce costs and latency for repeated queries. Supports configurable TTL and cache strategies.

Rate Limiting

Protect against abuse and cost overruns. Configurable per-key and per-user rate limits.

Failovers

Automatic failover to alternative providers when the primary is down or rate-limited. Configurable fallback chains.

Experiments & Evaluation

Built-in support for A/B testing prompts and models. Track which configurations perform better across cost, latency, and quality metrics.


Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Client App
    |
    v
+----------------------------------+
| Helicone AI Gateway (Rust)        |
|  Single binary / Docker           |
|                                   |
|  Rate Limiting                    |
|       |                           |
|  Cache Check                      |
|       |                           |
|  Smart Router                     |
|    (uptime, latency, cost aware)  |
|       |                           |
|  Provider API Call                |
|       |                           |
|  Logging & Tracing                |
|    (automatic, every request)     |
|       |                           |
|  Analytics Engine                 |
+----------------------------------+
    |
    v
LLM Providers

Integration Options

Proxy mode (recommended): Change your base URL to point at Helicone.

1
2
3
4
client = OpenAI(
    api_key="sk-...",
    base_url="https://oai.helicone.ai/v1"  # or your self-hosted URL
)

Header-based mode: Add Helicone headers to existing requests (for async logging without proxying).


Self-Hosting

Helicone is fully open-source and self-hostable:

Component Details
Gateway Single Rust binary, deploy via Docker, K8s, or bare metal
Platform Helicone observability platform (also open source)
Storage ClickHouse for analytics, PostgreSQL for metadata
1
2
# Self-hosted deployment
docker run -p 8080:8080 helicone/ai-gateway

Runs on AWS, GCP, Azure, on-prem, Kubernetes, or bare metal. No license key, no phone-home.


Pricing / Cost Model

Tier Cost Features
Free $0 10K requests/month, core features
Growth $20/mo per seat 1M+ requests, advanced analytics, team features
Enterprise Custom SSO, dedicated support, SLAs, custom deployment
Self-hosted Free (open source) Full feature set, no per-request fees

Self-hosted Helicone has zero cost beyond your own infrastructure. The cloud-hosted version has generous free tiers and reasonable scaling costs.

No per-request gateway fees, no token markup. You pay for the platform/analytics, not the proxying.


Limitations

  • Smaller ecosystem than LiteLLM – fewer providers supported (20+ vs 100+)
  • Less mature guardrails – no built-in PII detection or content filtering (unlike Portkey or Kong)
  • No virtual keys / budget management – you manage API keys directly, no per-team budget isolation at the gateway level
  • Observability-first, gateway-second – the routing and caching features are good but not as configurable as Portkey or LiteLLM
  • Younger project – less battle-tested at very large scale compared to Kong or LiteLLM

When to Use

Strong fit:

  • Teams where observability is the primary concern (want logging + analytics built into the gateway)
  • Performance-sensitive workloads – Rust binary is the fastest option
  • Organizations that want open-source self-hosted with no licensing cost
  • Teams already using Helicone for observability that want to add gateway capabilities

Weak fit:

  • Need 100+ provider support – LiteLLM covers more
  • Need advanced guardrails (PII, content filtering) – use Portkey or Kong
  • Need per-team budget isolation and virtual keys – use LiteLLM or Portkey
  • Large enterprise that needs vendor support and SLAs without enterprise contract

References

This post is licensed under CC BY 4.0 by the author.