Load Balancing Algorithms

Distribute incoming traffic across multiple servers to maximize throughput, minimize latency, and prevent overload -- least connections for general workloads, consistent hashing for stateful services, Maglev hashing for massive scale.

Posted Jan 25, 2025

15 min read

Distribute incoming traffic across multiple servers to maximize throughput, minimize latency, and prevent overload — least connections for general workloads, consistent hashing for stateful services, Maglev hashing for massive scale.

Key Properties Table

Algorithm	Time Complexity	State Required	Allows Burst	Session Affinity	Load Balance Quality	Failure Handling
Round Robin	O(1)	None	✅	❌	Fair (equal)	Poor (slow server still gets same)
Weighted Round Robin	O(1)	Per-server weights	✅	❌	Fair (weighted)	Good (respects capacity ratios)
Random	O(1)	None	✅	❌	Fair (statistical)	Poor (slow server gets equal)
Least Connections	O(log n)	Active count/server	✅	❌	Excellent (dynamic)	Excellent (adapts to varying latency)
Least Response Time	O(log n)	Latency + connections	✅	❌	Excellent (latency-aware)	Excellent (proactive)
Consistent Hashing	O(1) avg, O(log n) worst	Ring state	✅	✅	Fair (hash-dependent)	Minimal reshuffling on change
Maglev Hashing	O(n) init, O(1) lookup	Lookup table	✅	✅	Excellent (< 0.5% variance)	Minimal reshuffling on change

When to Use / Avoid

Round Robin

✅ Use when: Homogeneous stateless servers, identical latency, simple setup needed, equal CPU/memory
❌ Avoid: Mixed request sizes, varying server capacity, latency-sensitive applications

Weighted Round Robin

✅ Use when: Servers with different specs (e.g., 2 large + 1 small), blue-green deployments, canary releases
❌ Avoid: Variable request latency (weights don’t account for in-flight time)

Random

✅ Use when: Extremely simple load balancer, no state tracking possible, edge cases (rarely used)
❌ Avoid: Production critical systems (poor distribution statistical guarantee)

Least Connections

✅ Use when: Mixed request types (short + long), HTTP/2, WebSocket, variable latency
❌ Avoid: Stateless services with equal requests (overkill complexity)

Least Response Time

✅ Use when: Latency-critical services, varying backend performance, strict SLO requirements
❌ Avoid: Simple workloads (adds observability overhead)

Consistent Hashing (Session Affinity)

✅ Use when: Caching services, stateful sessions, shopping carts, in-process client state, minimal reshuffling during scale
❌ Avoid: Stateless services (wastes cache locality), creates hot spots if traffic skewed

Maglev Hashing

✅ Use when: Ultra-scale load balancing (billions of flows), minimal traffic loss on failover, network-level LB
❌ Avoid: Layer-7 load balancing (overkill), small deployments

Real Systems Table

System	Algorithm	Config	Production Detail
AWS ALB	Round Robin (default) + LOR	Default RR, optional LOR	Handles millions of concurrent connections with health checks
AWS NLB	Flow Hash (5-tuple)	Deterministic per flow	All packets in a flow go to same backend, millions of RPS
Nginx	Round Robin (default), Least Conn	5+ algorithms available	upstream module, keepalive connections, quick failover
Envoy / Istio	Ring Hash, Least Request	Locality-weighted, zone-aware	Service mesh, automatic retries, circuit breaking
HAProxy	Round Robin, Least Conn, Source IP	Stick tables for persistence	2M+ concurrent connections, L4/L7 switching
Cloudflare	Weighted Round Robin + Maglev	Weighted per endpoint	Health checks, DDos-aware selection per PoP
Google Cloud LB	Maglev consistent hashing	Lookup table, 128 backends	<0.5% deviation from perfect distribution, sub-microsecond lookup

How Real Systems Use This

AWS Application Load Balancer (ALB)

AWS ALB implements load balancing using round-robin by default with an optional least outstanding requests (LOR) algorithm. The default round-robin distributes requests evenly across healthy target instances in a target group, cycling through in order. For workloads with variable request processing time, ALB supports LOR (enabled since Nov 2019), which routes each new request to the target with the fewest active connections. ALB can handle millions of concurrent connections and supports cross-zone load balancing to distribute traffic evenly across all availability zones. Health checks run every 5 seconds — if a target fails 3 consecutive checks, it’s removed from rotation; after 5 consecutive successes, it’s re-added. ALB also supports weighted target groups for canary deployments: route 10% to new version (weight 1) and 90% to stable version (weight 9), gradually shifting weight as confidence increases.

AWS Network Load Balancer (NLB)

NLB operates at Layer 4 and uses a flow hash algorithm based on the 5-tuple: source IP, source port, destination IP, destination port, and protocol. This ensures all packets in a single TCP connection route to the same backend, critical for stateful protocols. NLB can sustain millions of requests per second and maintain millions of concurrent connections using Ultra High Performance mode. Unlike ALB’s round-robin, NLB’s hash-based routing means traffic distribution depends on the actual source/destination distribution — a single client establishing multiple connections may hash to different backends. NLB is stateless and requires no connection tracking, enabling extreme performance on a 10 Gbps link.

Nginx upstream Module

Nginx provides five load balancing algorithms via the upstream block: round-robin (default), weighted round-robin, least_conn (fewest active connections), ip_hash (consistent hashing by client IP), and least_time (least_conn + response time). A typical production config: upstream backend { server srv1.example.com weight=3; server srv2.example.com weight=1; least_conn; } routes 75% of new connections to srv1 and 25% to srv2, with least-conn tie-breaking. Nginx maintains per-server connection counts in memory and selects the server with minimum active connections. With keepalive connections enabled (upstream keepalive 32), Nginx reuses backend connections, reducing overhead. Nginx can handle thousands of concurrent connections per upstream and scales via multiple worker processes.

Envoy / Istio Service Mesh

Envoy implements ring hash (consistent hashing) and least request algorithms. In service mesh deployments, Envoy proxies add locality-weighted load balancing — prefer backends in the same zone/region before falling back to remote zones. For example, if user requests a service, Envoy first tries to route to an instance in the same AWS AZ; only if all local instances are overloaded does it route cross-zone. Envoy also supports automatic retries with exponential backoff — if a request fails, Envoy retries on a different backend automatically. Istio adds failure injection for chaos testing: inject 10% errors or 500ms delays to test client resilience. Ring hash (consistent) ensures session affinity: the same user ID always routes to the same pod, useful for caching or session storage.

HAProxy

HAProxy supports 8+ load balancing algorithms and claims to handle 2+ million concurrent connections in production. The most common: balance roundrobin (default), balance leastconn, balance source (source IP hash for affinity), and balance url_param (sticky sessions by URL parameter). HAProxy uses stick tables for persistence — store client→server mappings in shared memory so requests from the same client always route to the same backend, even across HAProxy restarts. Health checks are configurable (HTTP GET, TCP, custom scripts) and run every 1-5 seconds. HAProxy’s Layer 4/Layer 7 switching allows different algorithms per backend — some services might use roundrobin while others use source IP hashing.

Cloudflare Unimog L4 Load Balancer

Cloudflare’s Unimog is an L4 load balancer using weighted round robin with Maglev-inspired hashing for ultra-scale. Instead of per-connection state, Unimog uses a deterministic hash table (permutation-based) to select backends — this avoids connection state synchronization across distributed LB machines. Each Cloudflare PoP independently load balances traffic using the same hash function, so if a customer’s requests arrive at different edge locations, they may route to different backends (good for cache distribution). Unimog handles 45M+ requests per second globally by avoiding centralized coordination. Health checks are integrated — if a backend goes down, Cloudflare’s hash tables are recalculated and future flows redistribute with minimal disruption.

Google Maglev Load Balancer

Google’s Maglev is a software load balancer running on commodity Linux servers, processing all Google traffic since 2008. Maglev uses consistent hashing with a permutation-based lookup table that distributes traffic with <0.5% deviation from perfect distribution — far better than ring hash (which typically has 10%+ variance). During initialization, Maglev builds a 64K lookup table using a deterministic algorithm: for each backend, it generates permutation sequence using an offset and skip value; these permutations fill the lookup table. On lookup, a flow (source IP + dest IP + port) hashes to a position in the table in O(1) time. When a backend fails, only the affected table entries are rerouted — 1/N of traffic, minimal impact. Maglev saturates a 10 Gbps link with small packets (line rate throughput) and is used at Google for front-end load balancing with hundreds of backends.

Implementation

Round Robin Load Balancer

        
      
class RoundRobinLoadBalancer:
    """
    Simple round-robin load balancer.

    Cycles through servers in order, ensuring equal distribution
    when all requests have equal cost. O(1) per request.

    Used by: AWS ALB (default), Nginx (default), HAProxy (default)
    """

    def __init__(self, servers: list):
        """
        Initialize the load balancer.

        Args:
            servers: List of backend server addresses (strings or objects).
        """
        self.servers = servers
        self.current_index = 0

    def select_server(self):
        """
        Select the next server in round-robin order.

        Returns:
            The selected server address.
        """
        server = self.servers[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.servers)
        return server


class WeightedRoundRobinLoadBalancer:
    """
    Weighted round-robin load balancer.

    Each server has a weight; servers with higher weight get more requests.
    For example, weight=[3, 1] sends 3 requests to server[0] per 1 to server[1].

    Used by: Nginx, HAProxy, Cloudflare, AWS ALB for canary deployments.
    """

    def __init__(self, servers: list, weights: list):
        """
        Initialize weighted round-robin balancer.

        Args:
            servers: List of backend servers.
            weights: Integer weight for each server (higher = more traffic).
        """
        self.servers = servers
        self.weights = weights
        self.total_weight = sum(weights)
        self.current_weight = 0
        self.current_index = 0

    def select_server(self):
        """
        Select server using weighted round-robin algorithm.

        The algorithm increases current_weight by each server's weight
        and selects the server that reaches the highest effective weight.

        Returns:
            The selected server address.
        """
        self.current_weight += self.weights[self.current_index]
        max_weight = -1
        selected_index = 0

        for i in range(len(self.servers)):
            if self.weights[i] > max_weight:
                max_weight = self.weights[i]
                selected_index = i

        # Reduce all weights for next iteration
        for i in range(len(self.servers)):
            self.weights[i] -= max_weight

        return self.servers[selected_index]


# Example usage
if __name__ == "__main__":
    # Round robin: 3 identical servers
    rr = RoundRobinLoadBalancer(["srv1", "srv2", "srv3"])
    print("Round Robin (3 servers):")
    for i in range(9):
        print(f"  Request {i+1}: {rr.select_server()}")

    print("\nWeighted Round Robin (weight=[3, 1]):")
    servers = ["srv_large", "srv_small"]
    weights = [3, 1]
    wrr = WeightedRoundRobinLoadBalancer(servers, weights)
    for i in range(8):
        print(f"  Request {i+1}: {wrr.select_server()}")

Least Connections Load Balancer

        
      
import heapq
from typing import List, Tuple

class LeastConnectionsLoadBalancer:
    """
    Least connections load balancer using a min-heap.

    Routes each request to the server with the fewest active connections.
    Tracks when connections close to update state.

    Used by: Nginx, HAProxy, AWS ALB (LOR variant), Envoy
    Time: O(log n) to select + update
    Space: O(n) for heap per server
    """

    def __init__(self, servers: List[str]):
        """
        Initialize the least connections balancer.

        Args:
            servers: List of backend server addresses.
        """
        self.servers = servers
        # Min-heap: (active_connections, server_id, server_name)
        self.heap = [(0, i, server) for i, server in enumerate(servers)]
        heapq.heapify(self.heap)

    def select_server(self) -> str:
        """
        Select the server with fewest active connections.

        Returns:
            The selected server address.
        """
        # Pop the server with minimum connections
        active_conns, server_id, server_name = heapq.heappop(self.heap)

        # Increment active connections and push back
        active_conns += 1
        heapq.heappush(self.heap, (active_conns, server_id, server_name))

        return server_name

    def connection_closed(self, server_name: str):
        """
        Notify the balancer that a connection to server has closed.

        In production, this is called when the backend TCP connection closes.
        This decrements the active connection count.

        Args:
            server_name: The server to decrement.
        """
        # Rebuild heap after connection closes
        # (In production, use a separate tracking structure for efficiency)
        new_heap = []
        for conns, sid, name in self.heap:
            if name == server_name and conns > 0:
                conns -= 1
            new_heap.append((conns, sid, name))
        heapq.heapify(new_heap)
        self.heap = new_heap


class ConsistentHashingLoadBalancer:
    """
    Consistent hashing load balancer for session affinity.

    Routes the same key (user_id, session_id) to the same server.
    Useful for caching, session storage, stateful services.

    When servers change, only 1/N of requests are reshuffled.

    Used by: Nginx (ip_hash), HAProxy (source IP), Envoy (ring hash), Memcached
    Time: O(log n) to find server via binary search
    """

    def __init__(self, servers: List[str], replicas: int = 3):
        """
        Initialize consistent hashing balancer.

        Args:
            servers: List of backend servers.
            replicas: Virtual nodes per server (higher = better distribution).
        """
        self.servers = servers
        self.replicas = replicas
        self.ring = {}
        self._build_ring()

    def _hash(self, key: str) -> int:
        """Simple hash function (use SHA-256 in production)."""
        return hash(key) % (2 ** 32)

    def _build_ring(self):
        """Build the hash ring with virtual nodes."""
        self.ring = {}
        for server in self.servers:
            for i in range(self.replicas):
                virtual_key = f"{server}:{i}"
                hash_value = self._hash(virtual_key)
                self.ring[hash_value] = server

    def select_server(self, key: str) -> str:
        """
        Select server for a given key (e.g., user_id).

        Routes the same key to the same server consistently.

        Args:
            key: The key to hash (user_id, session_id, IP, etc.).

        Returns:
            The selected server address.
        """
        if not self.ring:
            return self.servers[0]

        hash_value = self._hash(key)

        # Find the next server on the ring >= hash_value
        sorted_keys = sorted(self.ring.keys())
        for ring_key in sorted_keys:
            if ring_key >= hash_value:
                return self.ring[ring_key]

        # Wrap around: use first server on ring
        return self.ring[sorted_keys[0]]

    def add_server(self, server: str):
        """Add a new server and rebuild the ring."""
        self.servers.append(server)
        self._build_ring()

    def remove_server(self, server: str):
        """Remove a server and rebuild the ring."""
        self.servers.remove(server)
        self._build_ring()


# Example usage
if __name__ == "__main__":
    # Least connections
    lc = LeastConnectionsLoadBalancer(["srv1", "srv2", "srv3"])
    print("Least Connections (select 6 requests):")
    for i in range(6):
        srv = lc.select_server()
        print(f"  Request {i+1}: {srv}")

    print("\nConsistent Hashing (same user_id routes to same server):")
    ch = ConsistentHashingLoadBalancer(["cache1", "cache2", "cache3"], replicas=3)
    user_ids = ["user_42", "user_99", "user_42", "user_100", "user_99"]
    for uid in user_ids:
        srv = ch.select_server(uid)
        print(f"  {uid}: {srv}")

Comparison & Failure Handling

Health Checks (Critical Companion)

All production load balancers use health checks to detect failed backends:

TCP Health Check:  Open connection to port 80/443 → if fail, mark down
HTTP Health Check: GET /health → expect 200 OK → if fail, mark down
Deep Health Check: Check DB, cache, dependencies → if any fail, mark down

Typical flow:

Health check runs every 3-5 seconds per backend
If 3 consecutive checks fail → mark DOWN, remove from rotation
If 5 consecutive checks succeed → mark UP, re-add to rotation

Handling Slow Servers

Algorithm	Handles Slow Server	How
Round Robin	❌ Poor	Slow server still gets equal share → blocks other requests
Least Connections	✅ Excellent	Adapts: slow server accumulates connections, fewer new requests routed there
Weighted RR	✅ Good	Admin can reduce weight, but manual intervention needed
Least Response Time	✅ Excellent	Automatically deprioritizes slow backends based on observed latency

References

Maglev: A Fast and Reliable Software Network Load Balancer — Google (NSDI 2016) — Consistent hashing achieving <0.5% distribution variance, handles billions of flows, production use since 2008.
A Fast and Reliable Software Network Load Balancer — Eisenbud et al. — USENIX presentation on Maglev architecture, permutation-based lookup tables, failure recovery.
ByteByteGo — Load Balancing — Visual walkthrough of algorithms with trade-offs and production examples.
AWS ALB Load Balancing Algorithms — Round-robin (default), Least Outstanding Requests (Nov 2019+).
Nginx Upstream Module Documentation — Round robin, weighted, least_conn, ip_hash, least_time algorithms.
HAProxy Configuration Manual — 8+ algorithms, stick tables, 2M+ concurrent connection examples.
Google Cloud Load Balancing Architecture — Maglev-based LB for Google Cloud, minimal traffic loss on failover.
Envoy Proxy Load Balancing Documentation — Ring hash, least request, locality-weighted selection in service mesh.

Software Architecture

load-balancing distributed-systems system-design

This post is licensed under CC BY 4.0 by the author.