Post

System Design Framework

The 4-step method for any system design interview: Requirements, Capacity, Design, Deep Dive.

System Design Framework

The 4-step method for any system design interview: Requirements, Capacity, Design, Deep Dive

The 4-Step Flow

flowchart LR
    A["1️⃣ Requirements<br/>5–7 min"] --> B["2️⃣ Capacity<br/>3–5 min"]
    B --> C["3️⃣ High-Level Design<br/>15–20 min"]
    C --> D["4️⃣ Deep Dive<br/>10–15 min"]

Step 1 – Requirements (5-7 min)

Functional Requirements (what the system does)

  • List 3-5 core features only – resist scope creep
  • Clarify: read-heavy or write-heavy? Real-time or async?
  • Ask: mobile/web? Global or regional? API or UI?

Non-Functional Requirements (how well it does it)

| NFR | Typical Target | Questions to Ask | |—–|—————|—————–| | Availability | 99.9% - 99.99% | Tolerate downtime? Active-active? | | Latency | P99 < 200ms | Which operations are latency-sensitive? | | Throughput | X req/sec | Peak vs average? | | Consistency | Strong / Eventual | Can users see stale data? | | Durability | No data loss | RPO / RTO targets? |

Step 2 – Capacity Estimation (3-5 min)

flowchart TD
    A["Users/DAU"] --> B["Requests/sec<br/>(read + write)"]
    B --> C["Storage/year"]
    B --> D["Bandwidth<br/>(MB/s)"]
    C --> E["Infrastructure<br/>sketch"]
    D --> E

Key numbers to derive:

  • QPS = DAU x actions/day / 86,400
  • Peak QPS = avg QPS x 2-10x (spiky traffic)
  • Storage/year = write QPS x record size x seconds/year
  • Bandwidth = read QPS x response size

Step 3 – High-Level Design (15-20 min)

flowchart LR
    Client["Client"] --> CDN["CDN"]
    CDN --> LB["Load Balancer"]
    LB --> API["API Servers<br/>(stateless)"]
    API --> Cache["Cache<br/>(Redis)"]
    API --> MQ["Message Queue<br/>(Kafka)"]
    API --> DB["Primary DB"]
    DB --> Replica["Read Replicas"]
    MQ --> Worker["Background Workers"]

Draw these components in order:

  1. Client -> entry point (CDN, API gateway)
  2. Stateless app servers behind load balancer
  3. Data stores (which DB? why?)
  4. Async components (queues, workers)
  5. Supporting services (cache, search, blob store)

Step 4 – Deep Dive (10-15 min)

Pick 2-3 areas to go deep – let the interviewer guide:

Deep Dive Area Key Questions
Scaling reads Caching strategy? Cache invalidation? Read replicas?
Scaling writes Sharding? Write-ahead log? CQRS?
Fault tolerance What fails? Circuit breaker? Retry with backoff?
Consistency Which consistency model? Trade-offs?
Unique constraint e.g. short URL uniqueness, ID generation

Common Mistakes to Avoid

  • Jumping to design before clarifying requirements
  • Over-engineering from the start – start simple, then scale
  • Ignoring failure modes – always ask “what if X fails?”
  • Forgetting to justify your choices – explain the why
  • Designing for perfect from day one – mention trade-offs
This post is licensed under CC BY 4.0 by the author.