Post

The Core Decision Framework

The “right” database depends on three dimensions:

  1. Data shape — structured (schema-enforced) vs flexible (schema-less)
  2. Access pattern — how data is read/written (random vs sequential, point vs range, graph traversal)
  3. Scale — throughput (requests/sec), latency, and data volume

Data Shape

Shape Definition Examples
Structured Fixed schema, enforced at write Financial transactions, user accounts, orders
Semi-structured JSON documents, flexible fields User profiles, logs, events, configs
Unstructured Binary files, no schema Images, videos, PDFs, text documents
Graph Nodes + edges, relationships matter Social networks, fraud rings, recommendation graphs
Time-series Timestamp-ordered, high volume Metrics, events, IoT sensor readings

Access Pattern Analysis

Point Lookups: “Get user with ID=123”

  • SQL: SELECT * FROM users WHERE id = 123
  • NoSQL: db.get("user:123")
  • Optimal: Hash index (O(1)) or B+ tree (O(log n))

Range Queries: “Get all users aged 25–35”

  • SQL: SELECT * FROM users WHERE age BETWEEN 25 AND 35
  • NoSQL: Not native; often require application-side filtering
  • Optimal: B+ tree (O(log n) + range size)

Aggregations: “Sum of all purchases by country”

  • SQL: SELECT country, SUM(amount) FROM purchases GROUP BY country
  • NoSQL: Usually requires full table scan + application aggregation
  • Optimal: Columnar DB (OLAP) for billion-row tables

Traversals: “Find all friends of friends of friends”

  • SQL: 3+ JOINs, exponential complexity
  • Graph DB: Linear in number of edges
  • Optimal: Graph DB (O(edges)) vs SQL (O(n³))

Text Search: “Find all articles containing ‘machine learning’”

  • SQL: SELECT * FROM articles WHERE content LIKE '%machine%' (slow full table scan)
  • Search Engine: Inverted index, instant
  • Optimal: Elasticsearch (O(k) where k = matching docs)

Similarity Search: “Find similar documents to query embedding”

  • SQL: Compute distance to all N documents, sort (O(N log N))
  • Vector DB: HNSW indexing, k-nearest neighbors (O(log N))
  • Optimal: Vector DB for embeddings

Database Categories & Trade-offs

1. Relational (SQL)

Best for: Structured data + complex queries + transactions

Property Strength
ACID transactions ✅ Full ACID (Atomicity, Consistency, Isolation, Durability)
Schema Enforced at write (prevents garbage data)
Joins Native, optimized by query planner
Scaling Vertical (add RAM/CPU) + read replicas (read-only copies); sharding requires consistent hashing
Consistency Strong (read your own writes)
Query flexibility Arbitrary queries across tables

Examples:

  • PostgreSQL — Feature-rich, JSON support, full-text search extension
  • MySQL — Web workloads, stable, wide tooling (WordPress, Laravel)
  • MariaDB — MySQL fork, open-source, drop-in replacement
  • Aurora (AWS) — Managed PostgreSQL/MySQL with auto-scaling read replicas

When to use:

  • Complex business logic (invoicing, reservations, financial transfers)
  • Multiple tables with cross-table queries
  • Data integrity is critical (foreign keys, constraints)

When NOT to use:

  • Need horizontal scaling (sharding with consistent hashing is complex)
  • Write-heavy workload (> 50% writes) — use LSM-based DB
  • Unstructured data (JSON documents are fine; but binary blobs are expensive)

2. Document (NoSQL)

Best for: Flexible schema + hierarchical data + horizontal scaling

Property Benefit
Schema-less No schema enforcement (add fields freely)
JSON storage Native nested objects (no multi-table JOINs needed)
Horizontal scaling Sharding built-in (data partitioned by ID)
Eventual consistency Faster writes (trade immediate consistency)

Examples:

  • MongoDB — Document-oriented, aggregation pipeline, wide adoption
  • Firestore (Google Cloud) — Managed Firebase DB, real-time sync
  • CouchDB — Multi-master replication, offline-first
  • RethinkDB — Real-time feeds, changefeeds

When to use:

  • Flexible schema (fields vary per document)
  • Hierarchical data (user + profile + preferences in one doc)
  • Horizontal scaling needed (millions of users)

When NOT to use:

  • Complex multi-document transactions needed (use relational DB)
  • Heavy aggregations across entire collection (use OLAP/data warehouse)

3. Key-Value (In-Memory / Cache)

Best for: Simple lookups + extreme speed + caching

Property Benefit
O(1) lookup Microsecond latency
In-memory RAM speed, not disk
Horizontal scaling Consistent hashing, sharding

Examples:

  • Redis — In-memory data structures, caching, sessions, leaderboards
  • Memcached — Simple key-value cache
  • DynamoDB (AWS) — Managed key-value, auto-scaling

When to use:

  • Caching hot data (user sessions, frequently accessed records)
  • Counters and real-time aggregations
  • Rate limiting, leaderboards, distributed locks

When NOT to use:

  • Primary data store for critical data (use with replication)
  • Complex queries needed (use relational DB)
  • Data larger than available RAM

4. Wide-Column (NoSQL)

Best for: Write-heavy time-series + massive scale

Property Benefit
Column families Group related data (e.g., user metrics, user metadata)
Sorted by row key Range queries on row key (e.g., timestamp range)
Horizontal scaling Built-in sharding, consistent hashing

Examples:

  • Cassandra — Facebook-origin, 1M+ writes/sec per cluster, eventual consistency
  • HBase — Hadoop ecosystem, serious operations overhead
  • Bigtable (Google) — Proprietary, powers Google Analytics, Search

When to use:

  • Time-series data (metrics, logs, events)
  • Write-heavy workload (> 100K writes/sec)
  • Can tolerate eventual consistency

When NOT to use:

  • Need strong consistency (use relational DB)
  • Complex queries (use relational or OLAP)

5. Search Engine

Best for: Full-text search + relevance ranking

Property Benefit
Inverted index Instant full-text search
Relevance scoring TF-IDF, BM25 scoring
Faceted search Filter by category, tag, etc.

Examples:

  • Elasticsearch — Distributed Lucene, REST API, rich aggregations
  • OpenSearch — AWS fork of Elasticsearch
  • Solr — Mature Lucene wrapper
  • Meilisearch — Developer-friendly, typo-tolerant search

When to use:

  • Full-text search (users type keywords)
  • Relevance ranking needed
  • Fast filtering + sorting (e.g., e-commerce search)

When NOT to use:

  • Structured queries only (use relational DB)
  • Exact-match lookups only (use key-value cache)

6. Graph

Best for: Relationship traversal + complex connections

Property Benefit
Adjacency lists Sub-millisecond traversal
Path queries “Find all friends of friends”
Pattern matching Cypher language for graph queries

Examples:

  • Neo4j — Leading graph DB, Cypher language, ACID transactions
  • Amazon Neptune — Managed graph, Gremlin/SPARQL APIs
  • TigerGraph — Large-scale graph analytics

When to use:

  • Social networks (friend connections, recommendations)
  • Fraud detection (find connected accounts)
  • Knowledge graphs (entity relationships)

When NOT to use:

  • No relationship queries (use relational DB)
  • Graph too large for memory (distributed graphs have latency overhead)

7. Time-Series

Best for: Metrics + IoT + append-only sequential data

Property Benefit
Timestamp ordering Data is naturally sorted
Compression Delta encoding, dictionary → 40–50× compression
Auto-TTL Automatic expiration of old data
High ingest 1M+ metrics/sec per node

Examples:

  • InfluxDB — Purpose-built time-series, Flux query language
  • Prometheus — Metrics scraping, 2-hour local retention
  • TimescaleDB — PostgreSQL extension, hypertable partitioning
  • ClickHouse — Columnar, 100M+ events/sec

When to use:

  • Metrics (CPU, memory, network)
  • Events (user actions, system logs)
  • High-volume sequential data

When NOT to use:

  • Complex multi-dimensional queries (use OLAP)
  • Primary relational data (use SQL)

8. Vector

Best for: Semantic search + embeddings + similarity

Property Benefit
HNSW indexing Fast nearest-neighbor search
Embedding storage Store LLM embeddings natively
Semantic search “Find documents similar to query”

Examples:

  • Pinecone — Fully managed vector DB
  • pgvector (PostgreSQL extension) — Vector support in Postgres
  • Weaviate — Open-source vector DB with GraphQL

When to use:

  • LLM-powered search (embeddings from OpenAI, etc.)
  • Semantic similarity (not keyword matching)
  • Approximate nearest neighbors (k-NN)

When NOT to use:

  • Structured queries only (use relational DB)
  • Data is low-dimensional (< 10 dimensions, use B+ tree)

9. Data Warehouse (OLAP)

Best for: Analytics + aggregations + historical data

Property Benefit
Columnar storage 50–100× compression, column-only scans
Massive parallelism Query billion-row tables in seconds
Complex aggregations GROUP BY, WINDOW functions

Examples:

  • BigQuery (Google) — Fully managed, petabyte scale
  • Redshift (AWS) — PostgreSQL-based, columnar, massive scale
  • Snowflake — Cloud-native, sharing, automatic scaling
  • ClickHouse — Fast columnar, open-source

When to use:

  • Analytics on historical data (last 1–5 years)
  • Aggregations across millions/billions of rows
  • Ad-hoc reporting and exploration

When NOT to use:

  • Real-time transactional updates (use OLTP)
  • Complex multi-table transactions (use relational DB)

10. NewSQL

Best for: SQL + horizontal scaling + strong consistency

Property Benefit
SQL interface Familiar SQL syntax
Horizontal scaling Partition data across nodes
Strong consistency ACID across shards (distributed transactions)
High availability Automatic failover, replicas

Examples:

  • Google Spanner — Globally distributed, ACID, nanosecond-precision time
  • CockroachDB — PostgreSQL-compatible, auto-sharding
  • TiDB — MySQL-compatible, distributed ACID

When to use:

  • Need both SQL + horizontal scaling
  • Global distribution (data centers across continents)
  • Strong consistency is non-negotiable

When NOT to use:

  • Single-datacenter (overhead not worth it)
  • Throughput < 10K req/sec (PostgreSQL replicas cheaper)

Interview Decision Template

When asked “what database would you choose?”, use this structure:

Step 1: Identify Data Shape

Question: “Is the data structured (fixed schema) or flexible (JSON)?

Answer: “The primary data is [structured/semi-structured/time-series/graph] because [reason].”

Example: “The primary data is structured because transactions must enforce referential integrity (orders must reference valid users, products, etc.), so I’d use a relational database.”

Step 2: Identify Access Patterns

Question: “What are the 3 most common queries?”

Answer: List top 3 access patterns + their complexity

Example:

  • Pattern 1: “Get order details by order ID” — point lookup, O(1) hash index
  • Pattern 2: “List orders placed between date X and Y, filtered by customer” — range + equality, O(log n) B+ tree
  • Pattern 3: “Sum order totals by region” — aggregation, O(n) full table scan

Step 3: Identify Scale

Question: “What’s your throughput and data volume?”

Answer: State throughput (req/sec), latency target, and data size

Example: “1M users, 100K purchases/day (1 req/sec average, 50 req/sec peak), <100ms latency requirement. Data: 100GB after 5 years (manageable).”

Step 4: Identify Consistency Requirements

Question: “Can you tolerate eventual consistency or do you need strong consistency?”

Answer: State consistency model + reasoning

Example: “We need strong consistency because users must see their updated profile immediately after editing. We cannot tolerate a 5-minute eventual consistency delay.”

Step 5: Name 2 Options & Trade-offs

Answer: Choose 2 databases, compare pros/cons

Example:

Option 1: PostgreSQL (relational)

  • ✅ ACID transactions (prevent duplicate orders)
  • ✅ JOIN queries (fetch order + user + product data in one query)
  • ✅ Constraints (foreign keys ensure data integrity)
  • ❌ Scaling limited to read replicas (sharding is manual, complex)
  • ❌ For 10M orders, single primary becomes bottleneck

Option 2: MongoDB (document)

  • ✅ Horizontal scaling (shard by customer_id, distribute 10M orders across nodes)
  • ✅ Flexible schema (add fields per order without migration)
  • ❌ No multi-document ACID (if updating order + inventory in one transaction, must handle rollback manually)
  • ❌ JOINs are application-level (fetch order, then fetch user separately)

Final recommendation: “Given the consistency requirement, I’d choose PostgreSQL with read replicas for reads (checkout, order history) and writes going to primary. If order volume exceeds 10M and becomes a bottleneck, I’d shard by customer_id and run multiple PostgreSQL primaries, each owning a customer range.”


Worked Example: E-Commerce Platform

Requirements

  • 1M users
  • 10M orders (5 years history)
  • 50 req/sec average, 500 req/sec peak
  • Read-heavy (80% reads: view orders, 20% writes: checkout)
  • Cross-user analytics: “top 100 products by revenue”

Analysis

Data shape: Structured (orders, products, users have fixed schemas)

Access patterns:

  1. “Get my orders” — range + equality, O(log n)
  2. “Get product details” — point lookup, O(1)
  3. “Top 100 products by revenue” — aggregation, O(10M) full scan

Scale: 10M orders, 500 req/sec peak

Consistency: Strong (orders must be accurate)

Decision

OLTP (transactional): PostgreSQL

  • Primary handles 20 writes/sec (checkout)
  • Read replicas handle 400 reads/sec (view orders, product details)
  • Queries 1 & 2 use B+ tree indexes on (customer_id, date) and (product_id)

Analytics (query 3): Separate ClickHouse or Redshift cluster

  • Mirror orders table from PostgreSQL → ClickHouse nightly
  • “Top 100 products by revenue” query runs in 100ms on billion-row table
  • Reporting dashboard queries the warehouse, not production DB

Caching layer: Redis

  • Cache hot products (top 100) — 1-hour TTL
  • Cache user sessions
  • Handle rate limiting per user

Architecture summary:

1
2
3
4
5
6
7
8
9
10
Users
  ↓
 [Application]
  ├→ PostgreSQL (Orders, Products, Users)
  │  ├→ Primary (writes)
  │  └→ Read Replicas (reads)
  │
  ├→ Redis (cache, sessions)
  │
  └→ ClickHouse (analytics, offline)

Quick Reference: Common Scenarios

Scenario Best DB Why
User authentication Redis (cache) + Postgres (authoritative) Sessions in Redis (fast), users in Postgres (reliable)
Shopping cart Redis (session) + Postgres (orders) Cart is short-lived (session), orders are permanent
Real-time metrics InfluxDB or TimescaleDB Time-series optimized, auto-TTL, high compression
Recommendations Graph (Neo4j) or Vector DB Traversals (graph) or semantic (vector)
Product catalog Elasticsearch + Postgres Search (ES), updates (Postgres), denormalize catalog to ES
Analytics dashboard BigQuery or Redshift OLAP, columnar, petabyte scale
Logs & events ClickHouse or ELK stack High-volume sequential, aggregations
Chat messages MongoDB or Firestore Flexible schema, easy horizontal scaling
Financial transactions Postgres (only) ACID non-negotiable, no eventual consistency

How Real Systems Use This — Polyglot Persistence at Scale

Airbnb — MySQL + Cassandra + Elasticsearch + Redis

Airbnb’s database architecture is a textbook example of polyglot persistence, designed to handle 1M+ listings, billions of searches, and complex reservations.

The Problem: Airbnb must support fast reservation lookups (strong consistency), full-text search over listings (relevance ranking), time-series metrics (availability calendars), and high-throughput data pipeline ingestion.

The Solution:

  • MySQL (Amazon RDS): Core transactional database for reservations, bookings, user accounts, and listing metadata. Strong ACID guarantees ensure a confirmed booking can’t be double-booked. Primary-replica setup handles read replicas for scaling; writes go to primary.
  • Cassandra: Distributed storage for user search history, availability calendars (time-series data per listing), and inbox messages. Write-heavy workload (millions of calendar updates/day), so Cassandra’s LSM write efficiency is ideal. Partition by listing_id; availability data is naturally sorted by date.
  • Elasticsearch: Full-text search over listing descriptions, review text, and location data. Inverted index powers faceted search (“private rooms, pool, patio, near downtown”). Relevance scoring based on review count, booking frequency, and host rating.
  • Redis: Session storage, rate limiting, real-time availability cache (hot listings), and leaderboards (trending listings by booking count).

Why This Mix: MySQL guarantees booking accuracy (critical), Cassandra absorbs massive write volume (calendars, messages), Elasticsearch powers user experience (search), Redis handles throughput (caching, sessions). No single database could handle all three requirements equally well.

Concrete numbers: Airbnb ingests ~100K bookings/day (consistent load on MySQL), ~1M calendar updates/day (perfect for Cassandra), ~1B search queries/month (ES cluster with 100+ nodes), and 10M+ concurrent sessions (Redis with cluster replication).

Lessons:

  1. Specialized tools beat generalists. Elasticsearch search is orders of magnitude faster than LIKE '%pattern%' in MySQL.
  2. Consistency tiers matter. Bookings need strong consistency; calendar availability can be eventually consistent.
  3. Hot path optimization. Redis caches hot listings; most users search the same popular destinations repeatedly.

Uber — MySQL → Cassandra → Schemaless KV Evolution

Uber’s database journey shows how architecture evolves as scale changes. They moved from MySQL (single database) to Cassandra (distributed writes) to a custom schemaless KV store (maximum flexibility and control).

Phase 1: MySQL (Early Stage) When Uber was smaller (2011-2013), a single MySQL instance held all data: rides, users, locations, payments. Queries were straightforward: “Get all active rides in a city,” “Fetch user by ID.” Replication handled reads. Sharding by city_id allowed basic horizontal scaling.

Problem: At 10K rides/day, MySQL primary became bottleneck. Write volume to rides table exceeded what a single primary could handle. Resharding was manual and painful.

Phase 2: Cassandra (Growth Phase) As Uber scaled (2013-2016), they migrated ride data to Cassandra. Cassandra’s distributed architecture and LSM writes were perfect for time-series data (ride history is append-mostly). Partition by rider_id + date; querying “All rides by user in Q3” scans one partition per node instead of full table scan.

Cassandra’s replication factor 3 gave them fault tolerance: losing one node didn’t impact availability. Write latency stayed low even at 100K rides/day (peak RPS > 1K writes/sec).

Problem: Cassandra’s fixed schema and strict column family structure forced schema migrations to be coordinated. Adding a new field to ride data required compaction and coordination across all nodes. Operational complexity grew with cluster size.

Phase 3: Schemaless KV (Maturity Phase) By 2016+, Uber built custom infrastructure: UberData (internal name), a key-value store built on top of RocksDB (LSM) with Cassandra-style distributed architecture but JSON schema-free. Data: { ride_id: "abc123", user_id: "user456", origin: {...}, destination: {...}, driver_id: "driver789", status: "completed", metadata: { ... } }.

Advantages:

  • No schema versioning. Add fields without coordination; old clients ignore new fields.
  • Lower operational overhead. Owned and tuned for Uber’s exact access patterns (point lookups by ride_id dominate).
  • Better compression. Schema-free JSON compresses better than fixed-column Cassandra format.
  • Faster writes. RocksDB’s native LSM is faster than Cassandra’s Thrift protocol overhead.

Trade-off: Uber lost Cassandra’s battle-tested operational tooling. They now manage the custom KV store’s replication, failover, and disaster recovery themselves. But at their scale, the gains (lower latency, fewer operational surprises, full control) outweighed the engineering cost.

Concrete numbers: Uber stores trillions of ride records (compressing to petabytes with their KV store). Write throughput is >100K ops/sec across the cluster. Read latency is <10ms p95 (critical for user app showing “Your driver is 5 min away”).

Lessons:

  1. Evolution is natural. Start with proven databases (MySQL, Cassandra); optimize at scale.
  2. Schema flexibility matters. Schemaless systems let product teams iterate faster (new ride metadata, new driver attributes) without database migrations.
  3. Operational burden increases with customization. Building your own KV store is a last resort, justified only by extreme scale.

References

This post is licensed under CC BY 4.0 by the author.