Load Balancing Algorithms
Distribute incoming traffic across multiple servers to maximize throughput, minimize latency, and prevent overload -- least connections for general workloads, consistent hashing for stateful services, Maglev hashing for massive scale.
Distribute incoming traffic across multiple servers to maximize throughput, minimize latency, and prevent overload — least connections for general workloads, consistent hashing for stateful services, Maglev hashing for massive scale.
Key Properties Table
| Algorithm | Time Complexity | State Required | Allows Burst | Session Affinity | Load Balance Quality | Failure Handling |
|---|---|---|---|---|---|---|
| Round Robin | O(1) | None | ✅ | ❌ | Fair (equal) | Poor (slow server still gets same) |
| Weighted Round Robin | O(1) | Per-server weights | ✅ | ❌ | Fair (weighted) | Good (respects capacity ratios) |
| Random | O(1) | None | ✅ | ❌ | Fair (statistical) | Poor (slow server gets equal) |
| Least Connections | O(log n) | Active count/server | ✅ | ❌ | Excellent (dynamic) | Excellent (adapts to varying latency) |
| Least Response Time | O(log n) | Latency + connections | ✅ | ❌ | Excellent (latency-aware) | Excellent (proactive) |
| Consistent Hashing | O(1) avg, O(log n) worst | Ring state | ✅ | ✅ | Fair (hash-dependent) | Minimal reshuffling on change |
| Maglev Hashing | O(n) init, O(1) lookup | Lookup table | ✅ | ✅ | Excellent (< 0.5% variance) | Minimal reshuffling on change |
When to Use / Avoid
Round Robin
- ✅ Use when: Homogeneous stateless servers, identical latency, simple setup needed, equal CPU/memory
- ❌ Avoid: Mixed request sizes, varying server capacity, latency-sensitive applications
Weighted Round Robin
- ✅ Use when: Servers with different specs (e.g., 2 large + 1 small), blue-green deployments, canary releases
- ❌ Avoid: Variable request latency (weights don’t account for in-flight time)
Random
- ✅ Use when: Extremely simple load balancer, no state tracking possible, edge cases (rarely used)
- ❌ Avoid: Production critical systems (poor distribution statistical guarantee)
Least Connections
- ✅ Use when: Mixed request types (short + long), HTTP/2, WebSocket, variable latency
- ❌ Avoid: Stateless services with equal requests (overkill complexity)
Least Response Time
- ✅ Use when: Latency-critical services, varying backend performance, strict SLO requirements
- ❌ Avoid: Simple workloads (adds observability overhead)
Consistent Hashing (Session Affinity)
- ✅ Use when: Caching services, stateful sessions, shopping carts, in-process client state, minimal reshuffling during scale
- ❌ Avoid: Stateless services (wastes cache locality), creates hot spots if traffic skewed
Maglev Hashing
- ✅ Use when: Ultra-scale load balancing (billions of flows), minimal traffic loss on failover, network-level LB
- ❌ Avoid: Layer-7 load balancing (overkill), small deployments
Real Systems Table
| System | Algorithm | Config | Production Detail |
|---|---|---|---|
| AWS ALB | Round Robin (default) + LOR | Default RR, optional LOR | Handles millions of concurrent connections with health checks |
| AWS NLB | Flow Hash (5-tuple) | Deterministic per flow | All packets in a flow go to same backend, millions of RPS |
| Nginx | Round Robin (default), Least Conn | 5+ algorithms available | upstream module, keepalive connections, quick failover |
| Envoy / Istio | Ring Hash, Least Request | Locality-weighted, zone-aware | Service mesh, automatic retries, circuit breaking |
| HAProxy | Round Robin, Least Conn, Source IP | Stick tables for persistence | 2M+ concurrent connections, L4/L7 switching |
| Cloudflare | Weighted Round Robin + Maglev | Weighted per endpoint | Health checks, DDos-aware selection per PoP |
| Google Cloud LB | Maglev consistent hashing | Lookup table, 128 backends | <0.5% deviation from perfect distribution, sub-microsecond lookup |
How Real Systems Use This
AWS Application Load Balancer (ALB)
AWS ALB implements load balancing using round-robin by default with an optional least outstanding requests (LOR) algorithm. The default round-robin distributes requests evenly across healthy target instances in a target group, cycling through in order. For workloads with variable request processing time, ALB supports LOR (enabled since Nov 2019), which routes each new request to the target with the fewest active connections. ALB can handle millions of concurrent connections and supports cross-zone load balancing to distribute traffic evenly across all availability zones. Health checks run every 5 seconds — if a target fails 3 consecutive checks, it’s removed from rotation; after 5 consecutive successes, it’s re-added. ALB also supports weighted target groups for canary deployments: route 10% to new version (weight 1) and 90% to stable version (weight 9), gradually shifting weight as confidence increases.
AWS Network Load Balancer (NLB)
NLB operates at Layer 4 and uses a flow hash algorithm based on the 5-tuple: source IP, source port, destination IP, destination port, and protocol. This ensures all packets in a single TCP connection route to the same backend, critical for stateful protocols. NLB can sustain millions of requests per second and maintain millions of concurrent connections using Ultra High Performance mode. Unlike ALB’s round-robin, NLB’s hash-based routing means traffic distribution depends on the actual source/destination distribution — a single client establishing multiple connections may hash to different backends. NLB is stateless and requires no connection tracking, enabling extreme performance on a 10 Gbps link.
Nginx upstream Module
Nginx provides five load balancing algorithms via the upstream block: round-robin (default), weighted round-robin, least_conn (fewest active connections), ip_hash (consistent hashing by client IP), and least_time (least_conn + response time). A typical production config: upstream backend { server srv1.example.com weight=3; server srv2.example.com weight=1; least_conn; } routes 75% of new connections to srv1 and 25% to srv2, with least-conn tie-breaking. Nginx maintains per-server connection counts in memory and selects the server with minimum active connections. With keepalive connections enabled (upstream keepalive 32), Nginx reuses backend connections, reducing overhead. Nginx can handle thousands of concurrent connections per upstream and scales via multiple worker processes.
Envoy / Istio Service Mesh
Envoy implements ring hash (consistent hashing) and least request algorithms. In service mesh deployments, Envoy proxies add locality-weighted load balancing — prefer backends in the same zone/region before falling back to remote zones. For example, if user requests a service, Envoy first tries to route to an instance in the same AWS AZ; only if all local instances are overloaded does it route cross-zone. Envoy also supports automatic retries with exponential backoff — if a request fails, Envoy retries on a different backend automatically. Istio adds failure injection for chaos testing: inject 10% errors or 500ms delays to test client resilience. Ring hash (consistent) ensures session affinity: the same user ID always routes to the same pod, useful for caching or session storage.
HAProxy
HAProxy supports 8+ load balancing algorithms and claims to handle 2+ million concurrent connections in production. The most common: balance roundrobin (default), balance leastconn, balance source (source IP hash for affinity), and balance url_param (sticky sessions by URL parameter). HAProxy uses stick tables for persistence — store client→server mappings in shared memory so requests from the same client always route to the same backend, even across HAProxy restarts. Health checks are configurable (HTTP GET, TCP, custom scripts) and run every 1-5 seconds. HAProxy’s Layer 4/Layer 7 switching allows different algorithms per backend — some services might use roundrobin while others use source IP hashing.
Cloudflare Unimog L4 Load Balancer
Cloudflare’s Unimog is an L4 load balancer using weighted round robin with Maglev-inspired hashing for ultra-scale. Instead of per-connection state, Unimog uses a deterministic hash table (permutation-based) to select backends — this avoids connection state synchronization across distributed LB machines. Each Cloudflare PoP independently load balances traffic using the same hash function, so if a customer’s requests arrive at different edge locations, they may route to different backends (good for cache distribution). Unimog handles 45M+ requests per second globally by avoiding centralized coordination. Health checks are integrated — if a backend goes down, Cloudflare’s hash tables are recalculated and future flows redistribute with minimal disruption.
Google Maglev Load Balancer
Google’s Maglev is a software load balancer running on commodity Linux servers, processing all Google traffic since 2008. Maglev uses consistent hashing with a permutation-based lookup table that distributes traffic with <0.5% deviation from perfect distribution — far better than ring hash (which typically has 10%+ variance). During initialization, Maglev builds a 64K lookup table using a deterministic algorithm: for each backend, it generates permutation sequence using an offset and skip value; these permutations fill the lookup table. On lookup, a flow (source IP + dest IP + port) hashes to a position in the table in O(1) time. When a backend fails, only the affected table entries are rerouted — 1/N of traffic, minimal impact. Maglev saturates a 10 Gbps link with small packets (line rate throughput) and is used at Google for front-end load balancing with hundreds of backends.
Implementation
Round Robin Load Balancer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
class RoundRobinLoadBalancer:
"""
Simple round-robin load balancer.
Cycles through servers in order, ensuring equal distribution
when all requests have equal cost. O(1) per request.
Used by: AWS ALB (default), Nginx (default), HAProxy (default)
"""
def __init__(self, servers: list):
"""
Initialize the load balancer.
Args:
servers: List of backend server addresses (strings or objects).
"""
self.servers = servers
self.current_index = 0
def select_server(self):
"""
Select the next server in round-robin order.
Returns:
The selected server address.
"""
server = self.servers[self.current_index]
self.current_index = (self.current_index + 1) % len(self.servers)
return server
class WeightedRoundRobinLoadBalancer:
"""
Weighted round-robin load balancer.
Each server has a weight; servers with higher weight get more requests.
For example, weight=[3, 1] sends 3 requests to server[0] per 1 to server[1].
Used by: Nginx, HAProxy, Cloudflare, AWS ALB for canary deployments.
"""
def __init__(self, servers: list, weights: list):
"""
Initialize weighted round-robin balancer.
Args:
servers: List of backend servers.
weights: Integer weight for each server (higher = more traffic).
"""
self.servers = servers
self.weights = weights
self.total_weight = sum(weights)
self.current_weight = 0
self.current_index = 0
def select_server(self):
"""
Select server using weighted round-robin algorithm.
The algorithm increases current_weight by each server's weight
and selects the server that reaches the highest effective weight.
Returns:
The selected server address.
"""
self.current_weight += self.weights[self.current_index]
max_weight = -1
selected_index = 0
for i in range(len(self.servers)):
if self.weights[i] > max_weight:
max_weight = self.weights[i]
selected_index = i
# Reduce all weights for next iteration
for i in range(len(self.servers)):
self.weights[i] -= max_weight
return self.servers[selected_index]
# Example usage
if __name__ == "__main__":
# Round robin: 3 identical servers
rr = RoundRobinLoadBalancer(["srv1", "srv2", "srv3"])
print("Round Robin (3 servers):")
for i in range(9):
print(f" Request {i+1}: {rr.select_server()}")
print("\nWeighted Round Robin (weight=[3, 1]):")
servers = ["srv_large", "srv_small"]
weights = [3, 1]
wrr = WeightedRoundRobinLoadBalancer(servers, weights)
for i in range(8):
print(f" Request {i+1}: {wrr.select_server()}")
Least Connections Load Balancer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import heapq
from typing import List, Tuple
class LeastConnectionsLoadBalancer:
"""
Least connections load balancer using a min-heap.
Routes each request to the server with the fewest active connections.
Tracks when connections close to update state.
Used by: Nginx, HAProxy, AWS ALB (LOR variant), Envoy
Time: O(log n) to select + update
Space: O(n) for heap per server
"""
def __init__(self, servers: List[str]):
"""
Initialize the least connections balancer.
Args:
servers: List of backend server addresses.
"""
self.servers = servers
# Min-heap: (active_connections, server_id, server_name)
self.heap = [(0, i, server) for i, server in enumerate(servers)]
heapq.heapify(self.heap)
def select_server(self) -> str:
"""
Select the server with fewest active connections.
Returns:
The selected server address.
"""
# Pop the server with minimum connections
active_conns, server_id, server_name = heapq.heappop(self.heap)
# Increment active connections and push back
active_conns += 1
heapq.heappush(self.heap, (active_conns, server_id, server_name))
return server_name
def connection_closed(self, server_name: str):
"""
Notify the balancer that a connection to server has closed.
In production, this is called when the backend TCP connection closes.
This decrements the active connection count.
Args:
server_name: The server to decrement.
"""
# Rebuild heap after connection closes
# (In production, use a separate tracking structure for efficiency)
new_heap = []
for conns, sid, name in self.heap:
if name == server_name and conns > 0:
conns -= 1
new_heap.append((conns, sid, name))
heapq.heapify(new_heap)
self.heap = new_heap
class ConsistentHashingLoadBalancer:
"""
Consistent hashing load balancer for session affinity.
Routes the same key (user_id, session_id) to the same server.
Useful for caching, session storage, stateful services.
When servers change, only 1/N of requests are reshuffled.
Used by: Nginx (ip_hash), HAProxy (source IP), Envoy (ring hash), Memcached
Time: O(log n) to find server via binary search
"""
def __init__(self, servers: List[str], replicas: int = 3):
"""
Initialize consistent hashing balancer.
Args:
servers: List of backend servers.
replicas: Virtual nodes per server (higher = better distribution).
"""
self.servers = servers
self.replicas = replicas
self.ring = {}
self._build_ring()
def _hash(self, key: str) -> int:
"""Simple hash function (use SHA-256 in production)."""
return hash(key) % (2 ** 32)
def _build_ring(self):
"""Build the hash ring with virtual nodes."""
self.ring = {}
for server in self.servers:
for i in range(self.replicas):
virtual_key = f"{server}:{i}"
hash_value = self._hash(virtual_key)
self.ring[hash_value] = server
def select_server(self, key: str) -> str:
"""
Select server for a given key (e.g., user_id).
Routes the same key to the same server consistently.
Args:
key: The key to hash (user_id, session_id, IP, etc.).
Returns:
The selected server address.
"""
if not self.ring:
return self.servers[0]
hash_value = self._hash(key)
# Find the next server on the ring >= hash_value
sorted_keys = sorted(self.ring.keys())
for ring_key in sorted_keys:
if ring_key >= hash_value:
return self.ring[ring_key]
# Wrap around: use first server on ring
return self.ring[sorted_keys[0]]
def add_server(self, server: str):
"""Add a new server and rebuild the ring."""
self.servers.append(server)
self._build_ring()
def remove_server(self, server: str):
"""Remove a server and rebuild the ring."""
self.servers.remove(server)
self._build_ring()
# Example usage
if __name__ == "__main__":
# Least connections
lc = LeastConnectionsLoadBalancer(["srv1", "srv2", "srv3"])
print("Least Connections (select 6 requests):")
for i in range(6):
srv = lc.select_server()
print(f" Request {i+1}: {srv}")
print("\nConsistent Hashing (same user_id routes to same server):")
ch = ConsistentHashingLoadBalancer(["cache1", "cache2", "cache3"], replicas=3)
user_ids = ["user_42", "user_99", "user_42", "user_100", "user_99"]
for uid in user_ids:
srv = ch.select_server(uid)
print(f" {uid}: {srv}")
Comparison & Failure Handling
Health Checks (Critical Companion)
All production load balancers use health checks to detect failed backends:
1
2
3
TCP Health Check: Open connection to port 80/443 → if fail, mark down
HTTP Health Check: GET /health → expect 200 OK → if fail, mark down
Deep Health Check: Check DB, cache, dependencies → if any fail, mark down
Typical flow:
- Health check runs every 3-5 seconds per backend
- If 3 consecutive checks fail → mark DOWN, remove from rotation
- If 5 consecutive checks succeed → mark UP, re-add to rotation
Handling Slow Servers
| Algorithm | Handles Slow Server | How |
|---|---|---|
| Round Robin | ❌ Poor | Slow server still gets equal share → blocks other requests |
| Least Connections | ✅ Excellent | Adapts: slow server accumulates connections, fewer new requests routed there |
| Weighted RR | ✅ Good | Admin can reduce weight, but manual intervention needed |
| Least Response Time | ✅ Excellent | Automatically deprioritizes slow backends based on observed latency |
References
- Maglev: A Fast and Reliable Software Network Load Balancer — Google (NSDI 2016) — Consistent hashing achieving <0.5% distribution variance, handles billions of flows, production use since 2008.
- A Fast and Reliable Software Network Load Balancer — Eisenbud et al. — USENIX presentation on Maglev architecture, permutation-based lookup tables, failure recovery.
- ByteByteGo — Load Balancing — Visual walkthrough of algorithms with trade-offs and production examples.
- AWS ALB Load Balancing Algorithms — Round-robin (default), Least Outstanding Requests (Nov 2019+).
- Nginx Upstream Module Documentation — Round robin, weighted, least_conn, ip_hash, least_time algorithms.
- HAProxy Configuration Manual — 8+ algorithms, stick tables, 2M+ concurrent connection examples.
- Google Cloud Load Balancing Architecture — Maglev-based LB for Google Cloud, minimal traffic loss on failover.
- Envoy Proxy Load Balancing Documentation — Ring hash, least request, locality-weighted selection in service mesh.