Time-Series & Specialized Databases
When standard relational or document databases don't fit -- specialized engines optimized for specific data shapes (time-ordered metrics, graph traversals, vector embeddings, columnar analytics). Each trades general-purpose flexibility for domain-specific performance gains (10-100x throughput or latency improvements).
Time-Series Databases
The Time-Series Problem
Time-series data differs fundamentally from transactional data:
- Access pattern: Always sorted by timestamp (time-ordered writes)
- Retention: Automatic expiration (metrics from 90 days ago often discarded)
- Aggregation: Range queries (`SELECT avg(cpu) FROM metrics WHERE timestamp >= now - 1h`)
- Cardinality: Millions of metric series (e.g., metrics from 10K servers, 100 metrics each = 1M series)
Why not PostgreSQL?
- PostgreSQL enforces ACID + complex query logic, adding overhead for append-only workloads
- No native TTL (time-to-live) for automatic data expiration
- Random I/O during compaction interferes with real-time write latency
- Compression is generic (10-15x); time-series compression can achieve 40-50x using delta encoding + dictionary
Time-Series Storage Model: InfluxDB TSM
Time-Structured Merge-tree (TSM) – InfluxDB’s storage engine. Similar to LSM but optimized for time-series:
- Write path: Incoming metrics go into in-memory shard. Once shard reaches 10MB, flush to immutable TSM file on disk.
- Compression: Delta encoding (store differences between consecutive timestamps, not absolute values). Example: `[1000, 1001, 1003, 1002]` stores as `[1000, +1, +2, -1]`, then variable-byte encodes to ~2 bytes per point vs 8.
- Compaction: Merges old TSM files, removing deleted points and compressing further. Can achieve 40-50x compression on raw metrics.
Concrete numbers (InfluxDB Cloud):
- Compression ratio: 40-50x (1B raw metrics ~ 20-25MB compressed)
- Write throughput: 1M metrics/sec per node
- Query latency: 50-200ms for 24-hour range query across 1M series
- Retention: Auto-expire old data (configurable, typically 7-90 days)
TimescaleDB – PostgreSQL Extension
What it is: Built-on-top PostgreSQL using “hypertables” – transparent partitioning by time. Writes still go to PostgreSQL but are partitioned automatically.
Hypertable concept: ``` – Create a hypertable partitioned by time CREATE TABLE metrics ( time TIMESTAMP, host TEXT, cpu FLOAT ) PARTITION BY (time ‘1 day’); – Auto-partition daily
– Transparently handles: – - Inserting into correct partition – - Automatic TTL drop (DROP PARTITION IF OLDER THAN 90 DAYS) – - Compression within each partition ```
Trade-offs:
- ✅ SQL queries (PostgreSQL syntax)
- ✅ Reuses PostgreSQL infrastructure (backups, replication)
- ❌ Slower than purpose-built systems (InfluxDB, Prometheus)
- ❌ Must manage partitions manually if not using auto-drop
Concrete numbers (TimescaleDB benchmarks):
- Compression ratio: 10-15x (worse than InfluxDB due to PostgreSQL overhead)
- Write throughput: 500K metrics/sec per node
- Query latency: 200-500ms for same 24-hour query
Prometheus – Pull-Based Metrics
Architecture: Prometheus scrapes metrics FROM applications (pull model) vs applications pushing to metric store (push model).
Storage: Custom TSDB (time-series database) in Prometheus:
- In-memory: Last 2 hours of metrics in RAM
- Long-term: Exports to external storage (S3, Thanos, Cortex, VictoriaMetrics)
- Blocks: 2-hour blocks written to disk; each block contains ~1M time-series x 7200 samples (2 hours at 1-second granularity)
Concrete numbers (Prometheus community data):
- In-memory footprint: ~1KB per active time-series (metric x labels)
- Scrape interval: Typically 15 seconds (4 samples/min per series)
- Block size: ~1 MB per 2-hour block (heavily compressed)
- Query latency: 10-100ms for PromQL queries (simpler than SQL)
Trade-off: Simpler than InfluxDB (no complex query language), but 2-hour local retention + manual long-term storage setup.
ClickHouse – Columnar Analytics
What it is: Columnar OLAP database optimized for analytics queries. Unlike row-oriented databases (PostgreSQL, MySQL) that store row by row, ClickHouse stores column by column.
Why columnar matters:
- Query: “Find average CPU over 1 day” – only reads CPU column (not host, timestamp columns)
- Row-oriented: Must load all columns for all rows, then filter
- Columnar: Loads only CPU column -> 50-100x faster for this query
Real-world:
- Cloudflare: Processes 100M+ HTTP request logs per second. ClickHouse ingests raw logs, aggregates into time-series, stores compressed in S3. Queries like “top 100 countries by request count” run in 100ms on 30-day history.
- Compression: Dictionary encoding (repeated values compressed), delta encoding (for timestamps), LZ4 compression -> 5-10x total
Concrete numbers (Cloudflare analytics):
- Ingest rate: 100M+ events/sec
- Retention: 30 days hot (SSD), older -> S3
- Query latency: 50-500ms for aggregations over billion-row tables
- Compression: 5-10x
Graph Databases
Why Graphs > SQL Joins
SQL problem: ```sql – Find friends-of-friends-of-friends (3 levels deep) SELECT DISTINCT f3.user_id FROM users u1 JOIN friendships f1 ON u1.id = f1.user_id JOIN users f2 ON f1.friend_id = f2.id JOIN friendships f2 ON f2.user_id = f2.id JOIN users f3 ON f2.friend_id = f3.id WHERE u1.id = ‘alice’ AND f2.id != ‘alice’ AND f3.id != ‘alice’; ```
Result: 3 JOINs x index lookups x nested loops = O(n^3) in worst case. Real query takes seconds.
Neo4j Cypher: ```cypher MATCH (alice:User {name: “alice”})-[:FRIENDS*1..3]-(foaf) RETURN DISTINCT foaf ```
Result: Graph traversal in microseconds (O(k) where k = number of edges traversed). Real query takes milliseconds.
Graph Storage Model
Graphs store nodes and edges directly:
- Node: ID, properties (name, email, etc.), outgoing edges
- Edge: Source node ID, target node ID, edge type (FRIENDS_WITH, WORKS_AT), properties
Adjacency list representation (in-memory): ```python node_alice = { “id”: “user:123”, “label”: “User”, “properties”: {“name”: “alice”, “email”: “alice@example.com”}, “edges”: { “FRIENDS_WITH”: [“user:456”, “user:789”], # Fast traversal “WORKS_AT”: [“company:999”] } } ```
Real Systems Using Graphs
- LinkedIn Economic Graph: 1B+ profiles + relationships. Neo4j powers recommendations (“People you may know”).
- Facebook Social Graph: 3B+ users + friendships. Custom graph engine (not Neo4j) for scale.
- Fraud Detection: Banks use graphs to detect fraud rings. Query: “Find all connected accounts that share phone number or email” – linear in graph size, instant detection.
When to Use Graph
| Use Case | SQL Complexity | Graph Efficiency |
|---|---|---|
| “Is Alice connected to Bob?” (distance <= 5) | 5 JOINs | Single traversal |
| “Who are Alice’s mutual friends?” | 2 JOINs | Intersection of adjacency lists |
| “Recommend products Alice’s friends bought” | 4 JOINs | Breadth-first search |
Vector Databases – Emerging Category
The Vector Problem
With LLMs and embeddings (e.g., OpenAI embeddings):
- Text -> fixed-size vector (1536 dimensions)
- “Find similar documents to query” = find vectors closest in space
Naive approach: Store vectors in PostgreSQL, compute distance to all N vectors -> O(N). Vector DB: Use spatial indexing (HNSW, IVF) -> O(log N) or O(sqrt N).
HNSW Indexing (Hierarchical Navigable Small World)
Idea: Build multi-level graph where nodes (vectors) are connected to nearby neighbors. Query starts at top level, jumps down to nearest neighbor, repeat. Mimics graph navigation.
Concrete numbers:
- 1M vectors (1536 dimensions each)
- HNSW index: ~2GB in memory (2KB per vector for graph pointers)
- Query latency: 1-10ms for top-10 nearest neighbors
Real systems:
- Pinecone: Fully managed vector DB, HNSW + GPU-accelerated search
- pgvector (PostgreSQL extension): IVF (Inverted File) indexing, subset of pgvector on GPU coming soon
- Weaviate: Open-source vector DB with HNSW
```python class SimpleVectorDB: def init(self, dimension=1536): self.vectors = {} # id -> vector self.dimension = dimension
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def insert(self, doc_id, vector):
"""Add vector for document."""
self.vectors[doc_id] = vector
def search(self, query_vector, k=10):
"""Find k nearest neighbors using brute-force (O(N))."""
distances = []
for doc_id, vector in self.vectors.items():
# Cosine similarity (dot product for normalized vectors)
sim = sum(a * b for a, b in zip(query_vector, vector))
distances.append((doc_id, sim))
# Sort by similarity, descending
distances.sort(key=lambda x: x[1], reverse=True)
return distances[:k] \`\`\`
Production HNSW details:
- Hierarchical layers (0, 1, 2…) where each layer is sparser
- Layer 0: fully connected (slow, doesn’t scale)
- Layer 1: 1 in 2 nodes (faster)
- Layer 2: 1 in 4 nodes (fastest, sparse)
- Query: Start at layer 2, greedy nearest neighbor descent -> layer 1 -> layer 0
Search Engines (Full-Text)
Elasticsearch – Distributed Lucene
What it is: Distributed wrapper around Lucene (Java full-text indexing library). Scales search across multiple nodes.
Inverted index structure: ``` Term Postings List (compressed, variable-byte encoding) “machine” [1, 5, 7, 42, 100, …] (document IDs) “learning” [1, 2, 4, 5, 8, …] ```
Query execution: Search “machine learning” -> find intersection of postings lists for “machine” AND “learning” -> return doc IDs 1, 5.
Real systems:
- Search queries: Web search, documentation search
- Log analysis: ELK stack (Elasticsearch, Logstash, Kibana) – parse logs, index, query
- Metrics: Elasticsearch also used for analytics dashboards
Concrete numbers (typical Elasticsearch cluster):
- Index size: 1 billion documents ~ 500 GB (depends on document size + compression)
- Query latency: 10-100ms for full-text search
- Indexing rate: 100K docs/sec per node
OLAP / Data Warehouse
Row vs Column Storage
Row-oriented (PostgreSQL, MySQL): ``` [user:1, alice, 100] [user:2, bob, 200] [user:3, carol, 150] ```
Query “SELECT SUM(balance) FROM users” must load all 3 columns for all rows.
Column-oriented (ClickHouse, BigQuery, Redshift): ``` user: [1, 2, 3] name: [alice, bob, carol] balance: [100, 200, 150] ```
Same query only loads balance column -> 3x faster (1/3 data read).
Concrete Numbers (BigQuery, Google’s OLAP)
Dataflow Analytics:
- Data scale: Petabytes (1000+ TB per project)
- Query latency: 1-10 seconds for billion-row aggregations
- Cost: Charged per GB scanned (not per query)
- Compression: 10-50x (highly compressible analytical data)
Use case: “How many purchases happened last week, grouped by country?” -> scans purchase table (5TB), groups by country column, returns 200-row result in 2 seconds.
Summary – Specialized Database Decision
| Data Type | Engine Category | Best Systems | Characteristic |
|---|---|---|---|
| Time-series metrics | Time-series DB | InfluxDB, Prometheus, TimescaleDB | Timestamp-ordered, auto-TTL, compression |
| Graph relationships | Graph DB | Neo4j, Amazon Neptune, TigerGraph | Traversal queries, millions of edges |
| Full-text search | Search Engine | Elasticsearch, Solr, Meilisearch | Inverted index, term relevance, Boolean queries |
| Vector similarity | Vector DB | Pinecone, pgvector, Weaviate | HNSW indexing, embeddings, semantic search |
| Analytics / OLAP | Data Warehouse | BigQuery, Redshift, ClickHouse | Columnar storage, billion-row aggregations |
| Relational + scale | NewSQL | Spanner, CockroachDB | SQL + horizontal partitioning, strong consistency |
When to Use / Avoid
✅ Use Time-Series DB When
- Append-only workload – data rarely updated or deleted
- High-volume metrics – 1M+ series across millions of data points
- Automatic TTL needed – old data should expire
- Time-range queries are primary – “stats over last hour/day/week”
- Compression is critical – disk I/O is a bottleneck
❌ Avoid When
- Complex queries across non-time dimensions – use ClickHouse
- Need complex transactions – use PostgreSQL
✅ Use Graph DB When
- Relationships are first-class – traversal speed matters
- Queries are “find connected nodes” – fraud detection, recommendations
- Depth of traversal is constant (not scaling with data)
❌ Avoid When
- No relationship queries – use relational DB
- Graph is too large to fit in memory – distributed graphs have latency
✅ Use Vector DB When
- Similarity search on embeddings – LLM semantic search
- High-dimensional data (1000+ dimensions)
- Need approximate nearest neighbors (exact is too slow)
❌ Avoid When
- Exact search only – use relational DB with B+ tree index
- Data is low-dimensional (< 10 dimensions)
✅ Use OLAP When
- Analytical queries on historical data – aggregations, reporting
- Data is rarely updated – append-only or batch updates
- Queries touch small % of rows/columns – columnar compression helps
❌ Avoid When
- Real-time transactional updates – use OLTP (PostgreSQL, MySQL)
- Small dataset (< 1GB) – overhead not worth it
References
- The Log-Structured Merge-Tree – O’Neil et al. (1996) – Foundation for time-series storage (RocksDB, InfluxDB)
- Efficient Similarity Search in Non-Metric Spaces – Malkov & Yashunin (HNSW, 2014) – Vector indexing algorithm for similarity search
- InfluxDB Documentation – TSM Storage Engine – Time-series compression details
- TimescaleDB Hypertables – Docs – PostgreSQL time-series partitioning
- Prometheus Documentation – Metrics scraping, retention, long-term storage
- Neo4j Cypher Language Guide – Graph query language, traversals
- Lucene Inverted Index – Elasticsearch foundation
- ByteByteGo – Time-Series Databases – Compression, TSM storage, real systems
- PostgreSQL vs ClickHouse – Comparison – Row vs column storage benchmarks
- Google BigQuery Documentation – OLAP architecture, columnar storage