LiteLLM
LiteLLM is the default choice when you need a self-hosted, open-source LLM proxy with zero licensing cost -- it does one thing well (unified LLM API) and stays out of your way.
LiteLLM is the default choice when you need a self-hosted, open-source LLM proxy with zero licensing cost – it does one thing well (unified LLM API) and stays out of your way.
What It Is
LiteLLM is an open-source (MIT license) Python SDK and proxy server that provides an OpenAI-compatible API to 100+ LLM providers. Any application, SDK, or tool that speaks OpenAI format can use LiteLLM as a drop-in proxy to route requests to Anthropic, Google, AWS Bedrock, Azure OpenAI, Cohere, HuggingFace, vLLM, Ollama, and dozens more.
It is both a Python library (for direct SDK use) and a standalone proxy server (for team-wide deployment). The proxy is the more common production use case.
Key Features
OpenAI-Compatible Proxy
Drop-in replacement for OpenAI’s API. Supports:
/chat/completions,/completions,/embeddings/images,/audio,/batches,/rerank/responses(Responses API)/messages(Anthropic native)/a2a(Agent-to-Agent)
Applications point their OpenAI SDK at LiteLLM’s URL – no code changes needed.
100+ Provider Support
Unified interface across OpenAI, Anthropic, Google Vertex/Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, HuggingFace, Ollama, vLLM, NVIDIA NIM, Sagemaker, and more.
Cost Tracking & Budget Management
- Track spend per virtual key, per user, per team
- Set budget caps that block requests when exceeded
- Cost alerts and monitoring for unusual patterns
- Built-in cost tables for major providers (auto-calculated)
Virtual Keys
Generate proxy API keys with:
- Budget limits
- Model access restrictions
- Rate limits
- Team/user assignment
Load Balancing & Fallbacks
Configure multiple deployments of the same model. LiteLLM load-balances across them and falls back on failure. Supports weighted routing, least-latency routing, and round-robin.
Guardrails
- PII masking via Presidio plugin
- Content moderation
- Custom pre/post-processing hooks
- Prompt injection detection (via plugins)
Admin UI
Web-based dashboard for managing models, virtual keys, users, budgets, and viewing usage analytics.
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Client App (any OpenAI-compatible SDK)
|
v
+----------------------------------+
| LiteLLM Proxy Server |
| (Python / Docker) |
| |
| Authentication (API keys) |
| | |
| Rate Limiting |
| | |
| Router (model mapping, |
| load balancing, fallbacks) |
| | |
| Pre-processing (PII, guardrails) |
| | |
| Provider API Call |
| | |
| Post-processing |
| | |
| Cost Tracking & Logging |
| | |
| PostgreSQL (logs, budgets, keys) |
+----------------------------------+
|
v
LLM Providers
Configuration
LiteLLM is configured via YAML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4-turbo
api_base: https://my-azure.openai.azure.com
api_key: os.environ/AZURE_API_KEY
- model_name: gpt-4
litellm_params:
model: openai/gpt-4-turbo
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: sk-my-master-key
database_url: postgresql://user:pass@db:5432/litellm
Multiple entries for the same model_name enables automatic load balancing and fallback.
Self-Hosting
LiteLLM is fully self-hosted with no SaaS dependency. Typical deployment:
1
docker compose up -d # LiteLLM proxy + PostgreSQL
| Component | Purpose |
|---|---|
| LiteLLM Proxy | Python server (FastAPI) |
| PostgreSQL | Persistent storage for keys, budgets, logs |
| Redis (optional) | Caching, rate limiting |
Runs on Kubernetes, Docker, bare metal, or any cloud. No phone-home, no license key, no vendor lock-in.
Enterprise (LiteLLM Enterprise)
BerriAI offers an enterprise tier with:
- SSO / SAML
- Advanced analytics
- Premium support
- Additional enterprise features
But the core proxy is fully functional in the open-source version.
Pricing / Cost Model
| Tier | Cost | Features |
|---|---|---|
| Open Source | Free (MIT) | Full proxy, 100+ providers, cost tracking, virtual keys, admin UI |
| Enterprise | Custom | SSO, advanced analytics, premium support |
LiteLLM is the only major LLM gateway where the full feature set is available for free. No per-request fees, no log-based billing, no feature gating.
Limitations
- Python-only – the proxy is a Python/FastAPI server. Higher resource consumption than Rust/Go alternatives (Helicone, Kong). At very high throughput, this matters.
- 8ms P95 latency at 1K RPS – good but not best-in-class. Kong and Helicone are faster.
- Observability is basic – built-in logging works but is not as deep as Portkey or Helicone. Most teams pair LiteLLM with Langfuse or similar.
- No semantic caching out of the box (exact match only, or integrate Redis).
- Community support – no SLA unless you buy enterprise.
When to Use
Strong fit:
- Teams that want a free, self-hosted LLM proxy with no licensing cost
- Startups and side projects where budget matters
- Organizations that already have observability (Langfuse, Datadog) and just need routing
- Dev teams that want OpenAI-compatible API across all providers
- Air-gapped or regulated environments where SaaS is not an option
Weak fit:
- High-throughput enterprise workloads where proxy latency is critical – Kong or Helicone are faster
- Teams that want integrated guardrails + observability in one tool – Portkey is more complete
- Organizations that need enterprise support and SLAs without paying for enterprise tier