Helicone AI Gateway
Helicone is what you pick when observability is your primary concern and you want a fast, lightweight gateway that doubles as your LLM monitoring layer -- Rust-based, open-source, and deployable as a single binary.
Helicone is what you pick when observability is your primary concern and you want a fast, lightweight gateway that doubles as your LLM monitoring layer – Rust-based, open-source, and deployable as a single binary.
What It Is
Helicone is an open-source AI gateway and observability platform built in Rust. It started as an LLM observability tool (YC W23) and evolved to include gateway capabilities: routing, caching, rate limiting, failovers, and load balancing. The gateway is a lightweight Rust binary inspired by NGINX – minimal latency overhead with high throughput.
Helicone’s differentiator is that it treats observability as a first-class concern, not an add-on. Every request through the gateway is automatically logged, traced, and available for analysis.
Key Features
Rust-Based Performance
- 8ms P50 latency
- Single binary deployment
- Horizontally scalable
- Minimal resource footprint compared to Python-based alternatives (LiteLLM)
Multi-Provider Routing
Supports OpenAI, Anthropic, Google, AWS Bedrock, Azure OpenAI, and 20+ more providers. Smart routing based on:
- Provider uptime awareness
- Rate limit awareness
- Cost optimization
- Latency optimization
Built-In Observability
- Request/response logging with full prompt and completion capture
- Cost tracking per request, per user, per model
- Latency distributions and percentile tracking
- Error rate monitoring by provider
- Token usage analytics
- OpenTelemetry support for integration with existing observability stacks
Caching
Response caching to reduce costs and latency for repeated queries. Supports configurable TTL and cache strategies.
Rate Limiting
Protect against abuse and cost overruns. Configurable per-key and per-user rate limits.
Failovers
Automatic failover to alternative providers when the primary is down or rate-limited. Configurable fallback chains.
Experiments & Evaluation
Built-in support for A/B testing prompts and models. Track which configurations perform better across cost, latency, and quality metrics.
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Client App
|
v
+----------------------------------+
| Helicone AI Gateway (Rust) |
| Single binary / Docker |
| |
| Rate Limiting |
| | |
| Cache Check |
| | |
| Smart Router |
| (uptime, latency, cost aware) |
| | |
| Provider API Call |
| | |
| Logging & Tracing |
| (automatic, every request) |
| | |
| Analytics Engine |
+----------------------------------+
|
v
LLM Providers
Integration Options
Proxy mode (recommended): Change your base URL to point at Helicone.
1
2
3
4
client = OpenAI(
api_key="sk-...",
base_url="https://oai.helicone.ai/v1" # or your self-hosted URL
)
Header-based mode: Add Helicone headers to existing requests (for async logging without proxying).
Self-Hosting
Helicone is fully open-source and self-hostable:
| Component | Details |
|---|---|
| Gateway | Single Rust binary, deploy via Docker, K8s, or bare metal |
| Platform | Helicone observability platform (also open source) |
| Storage | ClickHouse for analytics, PostgreSQL for metadata |
1
2
# Self-hosted deployment
docker run -p 8080:8080 helicone/ai-gateway
Runs on AWS, GCP, Azure, on-prem, Kubernetes, or bare metal. No license key, no phone-home.
Pricing / Cost Model
| Tier | Cost | Features |
|---|---|---|
| Free | $0 | 10K requests/month, core features |
| Growth | $20/mo per seat | 1M+ requests, advanced analytics, team features |
| Enterprise | Custom | SSO, dedicated support, SLAs, custom deployment |
| Self-hosted | Free (open source) | Full feature set, no per-request fees |
Self-hosted Helicone has zero cost beyond your own infrastructure. The cloud-hosted version has generous free tiers and reasonable scaling costs.
No per-request gateway fees, no token markup. You pay for the platform/analytics, not the proxying.
Limitations
- Smaller ecosystem than LiteLLM – fewer providers supported (20+ vs 100+)
- Less mature guardrails – no built-in PII detection or content filtering (unlike Portkey or Kong)
- No virtual keys / budget management – you manage API keys directly, no per-team budget isolation at the gateway level
- Observability-first, gateway-second – the routing and caching features are good but not as configurable as Portkey or LiteLLM
- Younger project – less battle-tested at very large scale compared to Kong or LiteLLM
When to Use
Strong fit:
- Teams where observability is the primary concern (want logging + analytics built into the gateway)
- Performance-sensitive workloads – Rust binary is the fastest option
- Organizations that want open-source self-hosted with no licensing cost
- Teams already using Helicone for observability that want to add gateway capabilities
Weak fit:
- Need 100+ provider support – LiteLLM covers more
- Need advanced guardrails (PII, content filtering) – use Portkey or Kong
- Need per-team budget isolation and virtual keys – use LiteLLM or Portkey
- Large enterprise that needs vendor support and SLAs without enterprise contract