Time-Series & Specialized Databases

When standard relational or document databases don't fit -- specialized engines optimized for specific data shapes (time-ordered metrics, graph traversals, vector embeddings, columnar analytics). Each trades general-purpose flexibility for domain-specific performance gains (10-100x throughput or latency improvements).

Posted May 20, 2024

10 min read

Time-Series Databases

The Time-Series Problem

Time-series data differs fundamentally from transactional data:

Access pattern: Always sorted by timestamp (time-ordered writes)
Retention: Automatic expiration (metrics from 90 days ago often discarded)
Aggregation: Range queries (`SELECT avg(cpu) FROM metrics WHERE timestamp >= now - 1h`)
Cardinality: Millions of metric series (e.g., metrics from 10K servers, 100 metrics each = 1M series)

Why not PostgreSQL?

PostgreSQL enforces ACID + complex query logic, adding overhead for append-only workloads
No native TTL (time-to-live) for automatic data expiration
Random I/O during compaction interferes with real-time write latency
Compression is generic (10-15x); time-series compression can achieve 40-50x using delta encoding + dictionary

Time-Series Storage Model: InfluxDB TSM

Time-Structured Merge-tree (TSM) – InfluxDB’s storage engine. Similar to LSM but optimized for time-series:

Write path: Incoming metrics go into in-memory shard. Once shard reaches 10MB, flush to immutable TSM file on disk.
Compression: Delta encoding (store differences between consecutive timestamps, not absolute values). Example: `[1000, 1001, 1003, 1002]` stores as `[1000, +1, +2, -1]`, then variable-byte encodes to ~2 bytes per point vs 8.
Compaction: Merges old TSM files, removing deleted points and compressing further. Can achieve 40-50x compression on raw metrics.

Concrete numbers (InfluxDB Cloud):

Compression ratio: 40-50x (1B raw metrics ~ 20-25MB compressed)
Write throughput: 1M metrics/sec per node
Query latency: 50-200ms for 24-hour range query across 1M series
Retention: Auto-expire old data (configurable, typically 7-90 days)

TimescaleDB – PostgreSQL Extension

What it is: Built-on-top PostgreSQL using “hypertables” – transparent partitioning by time. Writes still go to PostgreSQL but are partitioned automatically.

Hypertable concept: ``` – Create a hypertable partitioned by time CREATE TABLE metrics ( time TIMESTAMP, host TEXT, cpu FLOAT ) PARTITION BY (time ‘1 day’); – Auto-partition daily

– Transparently handles: – - Inserting into correct partition – - Automatic TTL drop (DROP PARTITION IF OLDER THAN 90 DAYS) – - Compression within each partition ```

Trade-offs:

✅ SQL queries (PostgreSQL syntax)
✅ Reuses PostgreSQL infrastructure (backups, replication)
❌ Slower than purpose-built systems (InfluxDB, Prometheus)
❌ Must manage partitions manually if not using auto-drop

Concrete numbers (TimescaleDB benchmarks):

Compression ratio: 10-15x (worse than InfluxDB due to PostgreSQL overhead)
Write throughput: 500K metrics/sec per node
Query latency: 200-500ms for same 24-hour query

Prometheus – Pull-Based Metrics

Architecture: Prometheus scrapes metrics FROM applications (pull model) vs applications pushing to metric store (push model).

Storage: Custom TSDB (time-series database) in Prometheus:

In-memory: Last 2 hours of metrics in RAM
Long-term: Exports to external storage (S3, Thanos, Cortex, VictoriaMetrics)
Blocks: 2-hour blocks written to disk; each block contains ~1M time-series x 7200 samples (2 hours at 1-second granularity)

Concrete numbers (Prometheus community data):

In-memory footprint: ~1KB per active time-series (metric x labels)
Scrape interval: Typically 15 seconds (4 samples/min per series)
Block size: ~1 MB per 2-hour block (heavily compressed)
Query latency: 10-100ms for PromQL queries (simpler than SQL)

Trade-off: Simpler than InfluxDB (no complex query language), but 2-hour local retention + manual long-term storage setup.

ClickHouse – Columnar Analytics

What it is: Columnar OLAP database optimized for analytics queries. Unlike row-oriented databases (PostgreSQL, MySQL) that store row by row, ClickHouse stores column by column.

Why columnar matters:

Query: “Find average CPU over 1 day” – only reads CPU column (not host, timestamp columns)
Row-oriented: Must load all columns for all rows, then filter
Columnar: Loads only CPU column -> 50-100x faster for this query

Real-world:

Cloudflare: Processes 100M+ HTTP request logs per second. ClickHouse ingests raw logs, aggregates into time-series, stores compressed in S3. Queries like “top 100 countries by request count” run in 100ms on 30-day history.
Compression: Dictionary encoding (repeated values compressed), delta encoding (for timestamps), LZ4 compression -> 5-10x total

Concrete numbers (Cloudflare analytics):

Ingest rate: 100M+ events/sec
Retention: 30 days hot (SSD), older -> S3
Query latency: 50-500ms for aggregations over billion-row tables
Compression: 5-10x

Graph Databases

Why Graphs > SQL Joins

SQL problem: ```sql – Find friends-of-friends-of-friends (3 levels deep) SELECT DISTINCT f3.user_id FROM users u1 JOIN friendships f1 ON u1.id = f1.user_id JOIN users f2 ON f1.friend_id = f2.id JOIN friendships f2 ON f2.user_id = f2.id JOIN users f3 ON f2.friend_id = f3.id WHERE u1.id = ‘alice’ AND f2.id != ‘alice’ AND f3.id != ‘alice’; ```

Result: 3 JOINs x index lookups x nested loops = O(n^3) in worst case. Real query takes seconds.

Neo4j Cypher: ```cypher MATCH (alice:User {name: “alice”})-[:FRIENDS*1..3]-(foaf) RETURN DISTINCT foaf ```

Result: Graph traversal in microseconds (O(k) where k = number of edges traversed). Real query takes milliseconds.

Graph Storage Model

Graphs store nodes and edges directly:

Node: ID, properties (name, email, etc.), outgoing edges
Edge: Source node ID, target node ID, edge type (FRIENDS_WITH, WORKS_AT), properties

Adjacency list representation (in-memory): ```python node_alice = { “id”: “user:123”, “label”: “User”, “properties”: {“name”: “alice”, “email”: “alice@example.com”}, “edges”: { “FRIENDS_WITH”: [“user:456”, “user:789”], # Fast traversal “WORKS_AT”: [“company:999”] } } ```

Real Systems Using Graphs

LinkedIn Economic Graph: 1B+ profiles + relationships. Neo4j powers recommendations (“People you may know”).
Facebook Social Graph: 3B+ users + friendships. Custom graph engine (not Neo4j) for scale.
Fraud Detection: Banks use graphs to detect fraud rings. Query: “Find all connected accounts that share phone number or email” – linear in graph size, instant detection.

When to Use Graph

Use Case	SQL Complexity	Graph Efficiency
“Is Alice connected to Bob?” (distance <= 5)	5 JOINs	Single traversal
“Who are Alice’s mutual friends?”	2 JOINs	Intersection of adjacency lists
“Recommend products Alice’s friends bought”	4 JOINs	Breadth-first search

Vector Databases – Emerging Category

The Vector Problem

With LLMs and embeddings (e.g., OpenAI embeddings):

Text -> fixed-size vector (1536 dimensions)
“Find similar documents to query” = find vectors closest in space

Naive approach: Store vectors in PostgreSQL, compute distance to all N vectors -> O(N). Vector DB: Use spatial indexing (HNSW, IVF) -> O(log N) or O(sqrt N).

HNSW Indexing (Hierarchical Navigable Small World)

Idea: Build multi-level graph where nodes (vectors) are connected to nearby neighbors. Query starts at top level, jumps down to nearest neighbor, repeat. Mimics graph navigation.

Concrete numbers:

1M vectors (1536 dimensions each)
HNSW index: ~2GB in memory (2KB per vector for graph pointers)
Query latency: 1-10ms for top-10 nearest neighbors

Real systems:

Pinecone: Fully managed vector DB, HNSW + GPU-accelerated search
pgvector (PostgreSQL extension): IVF (Inverted File) indexing, subset of pgvector on GPU coming soon
Weaviate: Open-source vector DB with HNSW

```python class SimpleVectorDB: def init(self, dimension=1536): self.vectors = {} # id -> vector self.dimension = dimension

def insert(self, doc_id, vector):
    """Add vector for document."""
    self.vectors[doc_id] = vector

def search(self, query_vector, k=10):
    """Find k nearest neighbors using brute-force (O(N))."""
    distances = []
    for doc_id, vector in self.vectors.items():
        # Cosine similarity (dot product for normalized vectors)
        sim = sum(a * b for a, b in zip(query_vector, vector))
        distances.append((doc_id, sim))

    # Sort by similarity, descending
    distances.sort(key=lambda x: x[1], reverse=True)
    return distances[:k] \`\`\`

Production HNSW details:

Hierarchical layers (0, 1, 2…) where each layer is sparser
Layer 0: fully connected (slow, doesn’t scale)
Layer 1: 1 in 2 nodes (faster)
Layer 2: 1 in 4 nodes (fastest, sparse)
Query: Start at layer 2, greedy nearest neighbor descent -> layer 1 -> layer 0

Search Engines (Full-Text)

Elasticsearch – Distributed Lucene

What it is: Distributed wrapper around Lucene (Java full-text indexing library). Scales search across multiple nodes.

Inverted index structure: ``` Term Postings List (compressed, variable-byte encoding) “machine” [1, 5, 7, 42, 100, …] (document IDs) “learning” [1, 2, 4, 5, 8, …] ```

Query execution: Search “machine learning” -> find intersection of postings lists for “machine” AND “learning” -> return doc IDs 1, 5.

Real systems:

Search queries: Web search, documentation search
Log analysis: ELK stack (Elasticsearch, Logstash, Kibana) – parse logs, index, query
Metrics: Elasticsearch also used for analytics dashboards

Concrete numbers (typical Elasticsearch cluster):

Index size: 1 billion documents ~ 500 GB (depends on document size + compression)
Query latency: 10-100ms for full-text search
Indexing rate: 100K docs/sec per node

OLAP / Data Warehouse

Row vs Column Storage

Row-oriented (PostgreSQL, MySQL): ``` [user:1, alice, 100] [user:2, bob, 200] [user:3, carol, 150] ```

Query “SELECT SUM(balance) FROM users” must load all 3 columns for all rows.

Column-oriented (ClickHouse, BigQuery, Redshift): ``` user: [1, 2, 3] name: [alice, bob, carol] balance: [100, 200, 150] ```

Same query only loads balance column -> 3x faster (1/3 data read).

Concrete Numbers (BigQuery, Google’s OLAP)

Dataflow Analytics:

Data scale: Petabytes (1000+ TB per project)
Query latency: 1-10 seconds for billion-row aggregations
Cost: Charged per GB scanned (not per query)
Compression: 10-50x (highly compressible analytical data)

Use case: “How many purchases happened last week, grouped by country?” -> scans purchase table (5TB), groups by country column, returns 200-row result in 2 seconds.

Summary – Specialized Database Decision

Data Type	Engine Category	Best Systems	Characteristic
Time-series metrics	Time-series DB	InfluxDB, Prometheus, TimescaleDB	Timestamp-ordered, auto-TTL, compression
Graph relationships	Graph DB	Neo4j, Amazon Neptune, TigerGraph	Traversal queries, millions of edges
Full-text search	Search Engine	Elasticsearch, Solr, Meilisearch	Inverted index, term relevance, Boolean queries
Vector similarity	Vector DB	Pinecone, pgvector, Weaviate	HNSW indexing, embeddings, semantic search
Analytics / OLAP	Data Warehouse	BigQuery, Redshift, ClickHouse	Columnar storage, billion-row aggregations
Relational + scale	NewSQL	Spanner, CockroachDB	SQL + horizontal partitioning, strong consistency

When to Use / Avoid

✅ Use Time-Series DB When

Append-only workload – data rarely updated or deleted
High-volume metrics – 1M+ series across millions of data points
Automatic TTL needed – old data should expire
Time-range queries are primary – “stats over last hour/day/week”
Compression is critical – disk I/O is a bottleneck

❌ Avoid When

Complex queries across non-time dimensions – use ClickHouse
Need complex transactions – use PostgreSQL

✅ Use Graph DB When

Relationships are first-class – traversal speed matters
Queries are “find connected nodes” – fraud detection, recommendations
Depth of traversal is constant (not scaling with data)

❌ Avoid When

No relationship queries – use relational DB
Graph is too large to fit in memory – distributed graphs have latency

✅ Use Vector DB When

Similarity search on embeddings – LLM semantic search
High-dimensional data (1000+ dimensions)
Need approximate nearest neighbors (exact is too slow)

❌ Avoid When

Exact search only – use relational DB with B+ tree index
Data is low-dimensional (< 10 dimensions)

✅ Use OLAP When

Analytical queries on historical data – aggregations, reporting
Data is rarely updated – append-only or batch updates
Queries touch small % of rows/columns – columnar compression helps

❌ Avoid When

Real-time transactional updates – use OLTP (PostgreSQL, MySQL)
Small dataset (< 1GB) – overhead not worth it

References

The Log-Structured Merge-Tree – O’Neil et al. (1996) – Foundation for time-series storage (RocksDB, InfluxDB)
Efficient Similarity Search in Non-Metric Spaces – Malkov & Yashunin (HNSW, 2014) – Vector indexing algorithm for similarity search
InfluxDB Documentation – TSM Storage Engine – Time-series compression details
TimescaleDB Hypertables – Docs – PostgreSQL time-series partitioning
Prometheus Documentation – Metrics scraping, retention, long-term storage
Neo4j Cypher Language Guide – Graph query language, traversals
Lucene Inverted Index – Elasticsearch foundation
ByteByteGo – Time-Series Databases – Compression, TSM storage, real systems
PostgreSQL vs ClickHouse – Comparison – Row vs column storage benchmarks
Google BigQuery Documentation – OLAP architecture, columnar storage

Software Architecture, Distributed Systems

distributed-systems

This post is licensed under CC BY 4.0 by the author.

Time-Series Databases

The Time-Series Problem

Time-Series Storage Model: InfluxDB TSM

TimescaleDB – PostgreSQL Extension

Prometheus – Pull-Based Metrics

ClickHouse – Columnar Analytics

Graph Databases

Why Graphs > SQL Joins

Graph Storage Model

Real Systems Using Graphs

When to Use Graph

Vector Databases – Emerging Category

The Vector Problem

HNSW Indexing (Hierarchical Navigable Small World)

Search Engines (Full-Text)

Elasticsearch – Distributed Lucene

OLAP / Data Warehouse

Row vs Column Storage

Concrete Numbers (BigQuery, Google’s OLAP)

Summary – Specialized Database Decision

When to Use / Avoid

✅ Use Time-Series DB When

❌ Avoid When

✅ Use Graph DB When

❌ Avoid When

✅ Use Vector DB When

❌ Avoid When

✅ Use OLAP When

❌ Avoid When

References

Trending Tags