Post

LiteLLM

LiteLLM is the default choice when you need a self-hosted, open-source LLM proxy with zero licensing cost -- it does one thing well (unified LLM API) and stays out of your way.

LiteLLM

LiteLLM is the default choice when you need a self-hosted, open-source LLM proxy with zero licensing cost – it does one thing well (unified LLM API) and stays out of your way.


What It Is

LiteLLM is an open-source (MIT license) Python SDK and proxy server that provides an OpenAI-compatible API to 100+ LLM providers. Any application, SDK, or tool that speaks OpenAI format can use LiteLLM as a drop-in proxy to route requests to Anthropic, Google, AWS Bedrock, Azure OpenAI, Cohere, HuggingFace, vLLM, Ollama, and dozens more.

It is both a Python library (for direct SDK use) and a standalone proxy server (for team-wide deployment). The proxy is the more common production use case.


Key Features

OpenAI-Compatible Proxy

Drop-in replacement for OpenAI’s API. Supports:

  • /chat/completions, /completions, /embeddings
  • /images, /audio, /batches, /rerank
  • /responses (Responses API)
  • /messages (Anthropic native)
  • /a2a (Agent-to-Agent)

Applications point their OpenAI SDK at LiteLLM’s URL – no code changes needed.

100+ Provider Support

Unified interface across OpenAI, Anthropic, Google Vertex/Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, HuggingFace, Ollama, vLLM, NVIDIA NIM, Sagemaker, and more.

Cost Tracking & Budget Management

  • Track spend per virtual key, per user, per team
  • Set budget caps that block requests when exceeded
  • Cost alerts and monitoring for unusual patterns
  • Built-in cost tables for major providers (auto-calculated)

Virtual Keys

Generate proxy API keys with:

  • Budget limits
  • Model access restrictions
  • Rate limits
  • Team/user assignment

Load Balancing & Fallbacks

Configure multiple deployments of the same model. LiteLLM load-balances across them and falls back on failure. Supports weighted routing, least-latency routing, and round-robin.

Guardrails

  • PII masking via Presidio plugin
  • Content moderation
  • Custom pre/post-processing hooks
  • Prompt injection detection (via plugins)

Admin UI

Web-based dashboard for managing models, virtual keys, users, budgets, and viewing usage analytics.


Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Client App (any OpenAI-compatible SDK)
    |
    v
+----------------------------------+
| LiteLLM Proxy Server              |
|  (Python / Docker)                |
|                                   |
|  Authentication (API keys)        |
|       |                           |
|  Rate Limiting                    |
|       |                           |
|  Router (model mapping,           |
|    load balancing, fallbacks)     |
|       |                           |
|  Pre-processing (PII, guardrails) |
|       |                           |
|  Provider API Call                |
|       |                           |
|  Post-processing                  |
|       |                           |
|  Cost Tracking & Logging          |
|       |                           |
|  PostgreSQL (logs, budgets, keys) |
+----------------------------------+
    |
    v
LLM Providers

Configuration

LiteLLM is configured via YAML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-turbo
      api_base: https://my-azure.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4-turbo
      api_key: os.environ/OPENAI_API_KEY
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY

general_settings:
  master_key: sk-my-master-key
  database_url: postgresql://user:pass@db:5432/litellm

Multiple entries for the same model_name enables automatic load balancing and fallback.


Self-Hosting

LiteLLM is fully self-hosted with no SaaS dependency. Typical deployment:

1
docker compose up -d  # LiteLLM proxy + PostgreSQL
Component Purpose
LiteLLM Proxy Python server (FastAPI)
PostgreSQL Persistent storage for keys, budgets, logs
Redis (optional) Caching, rate limiting

Runs on Kubernetes, Docker, bare metal, or any cloud. No phone-home, no license key, no vendor lock-in.

Enterprise (LiteLLM Enterprise)

BerriAI offers an enterprise tier with:

  • SSO / SAML
  • Advanced analytics
  • Premium support
  • Additional enterprise features

But the core proxy is fully functional in the open-source version.


Pricing / Cost Model

Tier Cost Features
Open Source Free (MIT) Full proxy, 100+ providers, cost tracking, virtual keys, admin UI
Enterprise Custom SSO, advanced analytics, premium support

LiteLLM is the only major LLM gateway where the full feature set is available for free. No per-request fees, no log-based billing, no feature gating.


Limitations

  • Python-only – the proxy is a Python/FastAPI server. Higher resource consumption than Rust/Go alternatives (Helicone, Kong). At very high throughput, this matters.
  • 8ms P95 latency at 1K RPS – good but not best-in-class. Kong and Helicone are faster.
  • Observability is basic – built-in logging works but is not as deep as Portkey or Helicone. Most teams pair LiteLLM with Langfuse or similar.
  • No semantic caching out of the box (exact match only, or integrate Redis).
  • Community support – no SLA unless you buy enterprise.

When to Use

Strong fit:

  • Teams that want a free, self-hosted LLM proxy with no licensing cost
  • Startups and side projects where budget matters
  • Organizations that already have observability (Langfuse, Datadog) and just need routing
  • Dev teams that want OpenAI-compatible API across all providers
  • Air-gapped or regulated environments where SaaS is not an option

Weak fit:

  • High-throughput enterprise workloads where proxy latency is critical – Kong or Helicone are faster
  • Teams that want integrated guardrails + observability in one tool – Portkey is more complete
  • Organizations that need enterprise support and SLAs without paying for enterprise tier

References

This post is licensed under CC BY 4.0 by the author.