B-Tree & B+ Tree
Self-balancing tree keeping data sorted with O(log n) reads/writes -- the dominant index structure in relational databases.
B-Tree Node Structure
- Every node holds multiple keys (minimises tree height -> fewer disk reads)
- All nodes store data pointers (B-Tree) or only leaves do (B+ Tree)
B-Tree vs B+ Tree
| Dimension | B-Tree | B+ Tree |
|---|---|---|
| Data in internal nodes | Yes | No (keys only) |
| Range queries | Slower | Fast (leaf linked list) |
| Tree height | Slightly shorter | Slightly taller |
| Used in | General purpose | Databases (MySQL InnoDB, Postgres) |
Key Properties
| Property | Value |
|---|---|
| Search | O(log n) |
| Insert | O(log n) + rebalance |
| Delete | O(log n) + rebalance |
| Order (t) | Max 2t-1 keys per node |
| Fan-out | High (fits many keys per disk page) |
When to Use / Avoid
- Use for read-heavy workloads with mixed reads/writes
- Use for range queries (B+ Tree leaf linked list)
- Use for standard relational DB indexing
- Avoid for write-heavy append workloads (use LSM Tree)
- Avoid for in-memory only structures (use Red-Black Tree / Skip List)
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
class BTreeNode:
"""Single B-Tree node (order 3 for simplicity: max 2 keys per node)."""
def __init__(self, order=3, is_leaf=True):
self.keys = [] # List of keys
self.children = [] # List of child pointers (empty if leaf)
self.order = order # Max keys = order - 1
self.is_leaf = is_leaf
def is_full(self):
"""Check if node has reached max keys."""
return len(self.keys) >= self.order - 1
def split(self, child_idx, child):
"""Split full child at index child_idx into two nodes."""
mid_idx = (self.order - 1) // 2
mid_key = child.keys[mid_idx]
# Create right sibling
right = BTreeNode(self.order, child.is_leaf)
right.keys = child.keys[mid_idx + 1:]
if not child.is_leaf:
right.children = child.children[mid_idx + 1:]
# Shrink left (original) child
child.keys = child.keys[:mid_idx]
if not child.is_leaf:
child.children = child.children[:mid_idx + 1]
# Insert mid_key into this node
self.keys.insert(child_idx, mid_key)
self.children.insert(child_idx + 1, right)
def insert_non_full(self, key):
"""Insert into non-full leaf node."""
idx = len(self.keys) - 1
if self.is_leaf:
self.keys.append(None)
while idx >= 0 and key < self.keys[idx]:
self.keys[idx + 1] = self.keys[idx]
idx -= 1
self.keys[idx + 1] = key
else:
# Find child to recurse into
while idx >= 0 and key < self.keys[idx]:
idx -= 1
idx += 1
if idx < len(self.children):
child = self.children[idx]
if child.is_full():
self.split(idx, child)
if key > self.keys[idx]:
idx += 1
self.children[idx].insert_non_full(key)
class BTree:
"""Simple B-Tree with order 3 (max 2 keys per node)."""
def __init__(self, order=3):
self.root = BTreeNode(order)
self.order = order
def search(self, key, node=None):
"""Search for key in B-Tree. Return node and index if found, else None."""
if node is None:
node = self.root
idx = 0
# Find position of key in current node
while idx < len(node.keys) and key > node.keys[idx]:
idx += 1
if idx < len(node.keys) and key == node.keys[idx]:
return (node, idx) # Found
if node.is_leaf:
return None # Not found
# Recurse into child
return self.search(key, node.children[idx])
def insert(self, key):
"""Insert key into B-Tree, splitting root if necessary."""
root = self.root
if root.is_full():
# Create new root if old root is full
new_root = BTreeNode(self.order, is_leaf=False)
new_root.children.append(root)
self.root = new_root
new_root.split(0, root)
self._insert_non_full(self.root, key)
def _insert_non_full(self, node, key):
"""Helper: insert into non-full node."""
idx = len(node.keys) - 1
if node.is_leaf:
# Insert directly into sorted leaf
node.keys.append(None)
while idx >= 0 and key < node.keys[idx]:
node.keys[idx + 1] = node.keys[idx]
idx -= 1
node.keys[idx + 1] = key
else:
# Find child to recurse into
while idx >= 0 and key < node.keys[idx]:
idx -= 1
idx += 1
child = node.children[idx]
if child.is_full():
node.split(idx, child)
if key > node.keys[idx]:
idx += 1
self._insert_non_full(node.children[idx], key)
def range_query(self, start_key, end_key):
"""Return all keys in [start_key, end_key]."""
results = []
def traverse(node):
idx = 0
for i, key in enumerate(node.keys):
if key >= start_key:
idx = i
break
else:
idx = len(node.keys)
if not node.is_leaf:
traverse(node.children[idx])
for i in range(idx, len(node.keys)):
key = node.keys[i]
if key <= end_key:
results.append(key)
if not node.is_leaf:
traverse(node.children[i + 1])
else:
return
traverse(self.root)
return sorted(set(results))
# Example usage
btree = BTree(order=3)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
btree.insert(key)
print(btree.search(6)) # Found
print(btree.search(100)) # Not found
print(btree.range_query(5, 20)) # Range [5, 20]
Algorithm Diagram:
MySQL InnoDB
MySQL InnoDB uses B+ Tree for both clustered primary key indexes and secondary indexes. The clustered index stores the entire row data in leaf nodes, while secondary indexes store only the primary key (leaf nodes point to rows via PK lookup). Default page size is 16KB; with ~500 keys per node, a 3-level B+ Tree can address millions of rows with just 3 disk reads. Leaf nodes are linked (doubly-linked list), enabling fast range scans and ORDER BY operations without re-traversing the tree.
PostgreSQL
PostgreSQL defaults to B-Tree indexing but supports other index types (HASH, BRIN, GiST). It extends B-Tree with partial indexes (WHERE clause) to index only relevant rows, and covering indexes (INCLUDE clause) to add columns without making them searchable—reducing table lookups. For sequential data (e.g., timestamps), PostgreSQL recommends BRIN (Block Range Index) to save space. Index page size matches table page size (8KB default); administrators tune shared_buffers to keep frequently accessed index pages in cache.
SQLite
SQLite uses B-Tree as the primary storage format for both table data and indexes. The entire database is a single file divided into pages (4096 bytes default). Every table is internally a B-Tree keyed by ROWID; every index is a separate B-Tree. The simple, embedded design means no separate index daemon—compaction and page management happen during queries. Configuration: PRAGMA page_size and PRAGMA cache_size tuning is common for embedded applications.
MongoDB
MongoDB’s WiredTiger engine (default since v3.2) implements B-Tree indexing. Compound indexes follow the leftmost prefix rule: an index on (a, b, c) is useful for queries on a, (a, b), but NOT for b alone. WiredTiger automatically selects the best index via a query planner. Sparse indexes (index only non-null values) save space for optional fields. Configuration: wiredTigerCacheSizeGB allocates memory for hot index/data pages.
Filesystem Implementations (ext4, NTFS)
Modern filesystems use B-Tree variants for directory structures and extent management. ext4 uses HTree (Hash Tree) for directory indexing, mapping filenames to inodes in O(log n). Extents (contiguous block ranges) are stored in extent trees to handle large files efficiently. NTFS uses a similar B-Tree for Master File Table (MFT) and file attributes. This O(log n) lookup is why filesystem performance degrades gracefully even with millions of files.
References
- Bayer & McCreight (1972) — Organization and Maintenance of Large Ordered Indexes — Original B-Tree paper introducing self-balancing property and fan-out optimization
- CLRS Introduction to Algorithms, Chapter 18 — Rigorous treatment of B-Tree insertions, deletions, and complexity analysis
- CMU Database Systems Lecture: B-Trees — Walkthrough of split, balance, range scan operations
- Use The Index, Luke! — B-Tree Indexing — Practical database indexing patterns and common mistakes
- Wikipedia: B-Tree — Quick reference with visual examples of splits and node structure