Post

System Design & Infrastructure — Reading Order

A structured path through 20 posts — from algorithmic building blocks through distributed systems to production reliability. Built for backend engineers, platform engineers, and anyone preparing for system design interviews.

System Design & Infrastructure — Reading Order

A structured path through my system design posts. The progression moves from fundamental data structures and algorithms through distributed systems primitives to reliability patterns and production operations. Each layer builds on the one below.

1. Algorithmic Building Blocks

The primitives that show up everywhere in system design — caching layers, indexing, query optimization, streaming analytics. You don’t need to implement these from scratch, but you need to know when and why to reach for them.

  1. Sorting Algorithms — O(n log n) vs O(n+k) vs external merge sort — choosing by data shape
  2. Hashing Algorithms — integrity, identity, distribution, authentication
  3. Bloom Filters — probabilistic membership testing at scale
  4. Binary Search & Variations — O(log n) on any sorted or monotonic space
  5. Dynamic Programming Patterns — resource allocation, query optimization, capacity planning
  6. Graph Algorithms — BFS, Dijkstra, topological sort — modeling relationships and dependencies
  7. Sliding Window & Two Pointer — streaming analytics, rate limiting, network protocols

2. Distributed Systems Primitives

How data moves, how nodes agree, how state gets distributed. These are the concepts that separate single-machine thinking from distributed-systems thinking.

  1. Consistent Hashing — distributing data so adding/removing nodes doesn’t reshuffle everything
  2. Consensus — Paxos & Raft — how distributed nodes agree despite failures
  3. Kafka Deep Dive — partitioned log architecture for messaging at scale

3. Reliability & Traffic Patterns

The patterns that keep systems alive under real-world conditions — traffic spikes, cascading failures, downstream outages.

  1. Load Balancing Algorithms — least connections, consistent hashing, Maglev
  2. Rate Limiting Algorithms — token bucket, sliding window — protecting services from overload
  3. Circuit Breaker & Bulkhead — failing fast and isolating blast radius

4. System Design in Practice

Putting it all together — frameworks for reasoning about large-scale systems.

  1. System Design Framework — the 4-step method: requirements, capacity, design, deep dive

5. Production Operations

Designing systems is half the job. Running them is the other half. These posts bridge architecture and operations.

  1. On-Call & Incident Management — designing on-call as a system, not a people problem
  2. Engineering Excellence & Quality — systems and culture where quality is the default
  3. Developer Experience & Productivity — the multiplier across every engineer
  4. Platform Engineering & Self-Service — internal platforms as products
  5. Cloud Cost Optimization — FinOps discipline for engineering leaders

Where to Go Next

If you’re building AI systems on top of this infrastructure, continue to the AI & Agents Roadmap. If you’re leading the teams that build and operate these systems, continue to the Engineering Leadership Roadmap.

This post is licensed under CC BY 4.0 by the author.