Post

Redis -- Architecture & Patterns

In-memory data structure store optimized for microsecond latency -- single-threaded event-driven architecture written in C, handling 1M+ operations per second for cache, sessions, counters, leaderboards, rate limiting, and real-time analytics.

Redis -- Architecture & Patterns

In-memory data structure store optimized for microsecond latency — single-threaded event-driven architecture written in C, handling 1M+ operations per second for cache, sessions, counters, leaderboards, rate limiting, and real-time analytics.

Core Architecture Principles

Single-threaded event loop: Redis uses a single OS thread with I/O multiplexing (epoll/kqueue) to handle thousands of concurrent connections without lock contention. Each operation is atomic because there are no race conditions — a critical design choice that simplifies the codebase and eliminates synchronization overhead. This works because modern Redis operations are microsecond-scale; if you need parallel execution, use Redis Cluster instead.

No blocking operations: Commands like BLPOP block the client connection but not the entire server. The event loop suspends that client’s context and continues serving other requests.

Memory efficiency: Redis compresses small collections (ziplists for hashes/lists < 512 bytes, intsetes for integer sets) and uses lazy encoding — data structures only upgrade to their full form when they exceed size thresholds.

Data Structures & Key Properties

Structure Operations Complexity Best For Real Systems
String GET, SET, INCR, APPEND O(1) / O(n) Counters, caching, sessions Any cache layer
Hash HGET, HSET, HMGET, HINCRBY O(1) / O(n) Objects, user profiles, configs User session metadata
List LPUSH, RPOP, LRANGE, LINDEX O(1) / O(n) Queues, activity feeds, job queues Message queues, feeds
Set SADD, SISMEMBER, SMEMBERS, SUNION O(1) avg Deduplication, followers, tags Social graphs, tag systems
Sorted Set ZADD, ZRANGE, ZRANK, ZINCRBY O(log n) Leaderboards, rate limiting, priority queues Live leaderboards, rates
HyperLogLog PFADD, PFCOUNT O(1) Cardinality estimation Unique visitor counts
Bloom Filter BF.ADD, BF.EXISTS O(k) Membership testing Deduplication checks
Stream XADD, XREAD, XGROUP O(1) Event streaming, log retention Real-time event pipelines
Geo GEOADD, GEORADIUS, GEODIST O(log n) Location queries Location-based services

When to Use / Avoid

✅ Use Redis When

  • Need sub-millisecond latency for frequently accessed data
  • Cache hit rate > 80% — memory overhead justifies the benefit
  • Write-heavy workload with high cardinality — throughput > 100K ops/sec
  • Need atomic operations on counters, lists, or sets (single-threaded guarantees)
  • Building real-time features — leaderboards, notifications, rate limiting
  • Can afford data loss on crash (or using AOF + RDB for durability)
  • Working with simple data types — strings, hashes, sorted sets

❌ Avoid Redis When

  • Primary data store for critical data (use with replication)
  • Data is larger than available RAM — Redis eviction becomes unpredictable
  • Need complex queries or transactions across multiple keys
  • Workload is read-only and low-latency isn’t critical (cheaper options exist)
  • Consistency is paramount and you can’t tolerate eventual consistency
  • Data has weak temporal locality — cache miss rate > 50%

How Real Systems Use This

Twitter (X) — Real-Time Metrics & Leaderboards

Twitter maintains billions of time-series metrics in Redis using sorted sets with timestamps and counters. When a user loads their timeline, Twitter queries Redis for trending topics, which are stored as sorted sets with engagement scores (likes, retweets) updated in microseconds. Each trending topic is a key like trending:topics:2024-01-15 with entries like {"covid": 5.2M, "election": 4.8M}. Redis handles up to 100M trending topic lookups per day with microsecond latency. Beyond caching, Twitter uses Redis to track live active sessions — a single INCR on active:users:live increments atomically, and EXPIRE guarantees the counter resets after 5 minutes of inactivity. The entire trending system runs on 5-node Redis clusters in multiple geographies, with AOF persistence enabled to survive brief outages.

GitHub — Session Storage & Rate Limiting

GitHub uses Redis as the primary session store for its 100M+ users. When you log in to GitHub, your session token (encrypted JWT) is stored as a Redis string key session:user:12345:token with TTL of 24 hours. Every request lookups the token in microseconds, avoiding expensive database queries. GitHub also implements API rate limiting (5,000 requests per hour for authenticated users) using a “token bucket” pattern in Redis: each user gets a key rate:user:12345, incremented with INCR on each request, with EXPIRE set to 3600 seconds. If the counter exceeds the limit, the request is rejected. This atomic INCR operation eliminates race conditions that would occur with separate read-then-write patterns. GitHub’s Redis infrastructure spans 50+ nodes and handles 1B+ rate limit checks daily with 99.9% accuracy.

Shopify — Distributed Lock for Inventory

Shopify uses Redis distributed locks to prevent overselling inventory during flash sales. When a customer purchases an item, Shopify acquires a lock with SET inventory:product:999 "customer:456" NX PX 5000 (NX = only set if not exists, PX = expire in 5 seconds). This ensures only one customer can decrement inventory at a time. If the lock times out after 5 seconds (e.g., due to slow database write), the next customer can acquire it. Shopify handles 100K+ purchases per minute across 1M+ product SKUs using Redis Cluster with 16,384 hash slots distributed across 20 nodes. Without Redis locks, inventory would be oversold 0.5% of the time (1-2 million dollars in daily loss).

Stack Overflow — Tag & Search Cache

Stack Overflow caches question tags and search results in Redis to power instant typeahead and trending tags. Every question is tagged with 1-5 tags (e.g., “python”, “redis”, “performance”). Stack Overflow maintains a Redis set for each tag: tag:python containing all question IDs using that tag. When a user searches for questions tagged with “python AND redis”, SO retrieves both sets and computes their intersection with SINTER in microseconds, avoiding expensive SQL JOINs. Over 500K typeahead lookups per day are served from Redis cache layers, each returning results in < 10ms. Stack Overflow also uses sorted sets for trending questions: trending:24h stores question IDs with scores = view counts + upvotes, ranked in real time.

Instagram — Leaderboard & Feed Cache

Instagram uses Redis to power real-time leaderboards and feed caching for 2B+ users. Stories of celebrities are stored in Redis sorted sets with scores = engagement metrics (views + interactions). When you open the app, Instagram queries leaderboard:friends:user:123 using ZRANGE to fetch the top 50 friends’ stories in ranked order. Each friend entry is scored with their average engagement; ZINCRBY updates scores atomically when new interactions occur. Behind the scenes, Instagram maintains 500+ Redis clusters globally, handling 5M leaderboard queries per second. For feed caching, Instagram stores the last 500 posts from a user’s feed in a Redis list: feed:user:123 using RPUSH (right push). When a user scrolls, LRANGE fetches posts incrementally (pagination), reducing database load by 40%.

Key Patterns

Pattern 1: Cache-Aside (Most Common)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def get_user(user_id):
    """
    Classic cache-aside pattern: check cache first, fall back to DB.
    - Hit: return immediately (microseconds)
    - Miss: query DB, populate cache with TTL, return
    """
    # Try cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Cache miss: query database
    user = db.query_user(user_id)

    # Store in cache with 1-hour TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))

    return user

Pattern 2: Distributed Lock (Redlock)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def redlock_acquire(resource, owner_id, timeout_ms=5000, max_retries=3):
    """
    Acquire a distributed lock using Redis SET with NX (not exists) and PX (expire).
    - NX: only set if key doesn't exist (atomically)
    - PX: key expires after timeout_ms (prevents deadlock)
    - owner_id: unique identifier to verify lock owner
    """
    lock_key = f"lock:{resource}"

    for attempt in range(max_retries):
        result = redis.set(
            lock_key,
            owner_id,
            nx=True,
            px=timeout_ms
        )

        if result:
            return True  # Lock acquired

        time.sleep(50)  # Back off before retry

    return False  # Failed to acquire lock

def redlock_release(resource, owner_id):
    """
    Release lock only if owner matches (prevents other owners releasing your lock).
    Uses Lua script for atomic check-then-delete.
    """
    lock_key = f"lock:{resource}"
    script = """
    if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
    else
        return 0
    end
    """
    redis.eval(script, 1, lock_key, owner_id)

Pattern 3: Rate Limiting (Token Bucket)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def rate_limit_check(user_id, limit=100, window_sec=60):
    """
    Token bucket rate limiter using Redis INCR + EXPIRE.
    - INCR: atomic increment (no race conditions)
    - EXPIRE: reset counter after window_sec
    Returns: (is_allowed, remaining_quota)
    """
    key = f"rate:{user_id}:{int(time.time() // window_sec)}"

    # Atomically increment counter
    count = redis.incr(key)

    # Set expiration on first increment
    if count == 1:
        redis.expire(key, window_sec)

    remaining = max(0, limit - count)
    is_allowed = count <= limit

    return is_allowed, remaining

Pattern 4: Leaderboard with Sorted Sets

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def leaderboard_update(game_id, user_id, score_delta):
    """
    Real-time leaderboard using Redis sorted set.
    ZADD: add or update member's score
    ZINCRBY: increment score atomically
    """
    key = f"leaderboard:{game_id}"

    # Increment user's score
    new_score = redis.zincrby(key, score_delta, user_id)

    return new_score

def leaderboard_get_top_n(game_id, n=100):
    """
    Fetch top N players with scores.
    ZRANGE with WITHSCORES returns (player, score) pairs in rank order.
    """
    key = f"leaderboard:{game_id}"

    # ZRANGE with rev=True sorts descending (highest first)
    top_players = redis.zrange(
        key,
        0,
        n - 1,
        withscores=True,
        rev=True
    )

    return [(player, int(score)) for player, score in top_players]

def leaderboard_rank(game_id, user_id):
    """
    Get a player's rank (0-indexed position).
    ZREVRANK: rank in descending order (rank 0 = highest score).
    """
    key = f"leaderboard:{game_id}"
    rank = redis.zrevrank(key, user_id)

    return rank + 1 if rank is not None else None  # 1-indexed rank

Pattern 5: Pub/Sub for Real-Time Notifications

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def subscriber_thread(channel):
    """
    Subscribe to a Redis pub/sub channel and block waiting for messages.
    Useful for real-time notifications, live updates.
    """
    pubsub = redis.pubsub()
    pubsub.subscribe(channel)

    for message in pubsub.listen():
        if message['type'] == 'message':
            print(f"Received: {message['data']}")

def publisher_send(channel, message):
    """
    Publish a message to all subscribers on a channel.
    Non-blocking for publisher; subscribers receive immediately.
    """
    redis.publish(channel, json.dumps(message))

# Example usage
# Subscriber: subscriber_thread("notifications:user:123")
# Publisher: publisher_send("notifications:user:123", {"event": "new_order", "id": 456})

Persistence Models

Model Mechanism Data Loss Recovery Time File Size CPU
None Pure cache (all in-memory) All on crash N/A N/A Minimal
RDB Snapshot Fork + write entire dataset Since last snapshot (can be hours) Minutes (load file) Small (compressed) Medium
AOF Append-only log of all writes Since last fsync (default 1/sec) Seconds (replay log) Large (uncompressed) High
RDB + AOF Both mechanisms < 1 second (fsync per operation) Seconds (RDB + AOF) Medium High

Production recommendation: RDB + AOF with fsync=everysec (compromise between durability and performance). RDB ensures fast recovery from crashes; AOF provides durability between snapshots.

Redis Cluster — Horizontal Scaling

Redis Cluster partitions data across 6+ nodes using consistent hashing with 16,384 “hash slots.” Each key is mapped to a slot: slot = CRC16(key) % 16384. The cluster distributes slots across nodes (e.g., 3 nodes handle 5,461 slots each). When a client sends a request to the wrong node, it receives a MOVED redirect telling it to try the correct node.

Replication within cluster: Each node can have 1+ replicas. A primary handles writes; replicas serve reads and provide failover. If a primary crashes, the cluster promotes a replica to primary (automatic, no manual intervention).

Key constraint: Multi-key operations across different hash slots are not allowed (e.g., MGET on keys that hash to different slots). Workaround: use hash tags — MGET user:123:profile user:123:settings both hash to slot of 123 if written as {user:123}:profile {user:123}:settings.

Performance Characteristics

  • Throughput: 1-2M ops/sec per node (strings); 100K-500K ops/sec (sorted sets with large cardinality)
  • Latency: 0.1-1ms (local), 1-5ms (network round-trip)
  • Memory: ~ 1 KB per small string, 100 bytes per hash field
  • Connection limit: 10K-100K concurrent connections per node (configurable)

References

This post is licensed under CC BY 4.0 by the author.