Post

B-Tree & B+ Tree

Self-balancing tree keeping data sorted with O(log n) reads/writes -- the dominant index structure in relational databases.

B-Tree & B+ Tree

B-Tree Node Structure

  • Every node holds multiple keys (minimises tree height -> fewer disk reads)
  • All nodes store data pointers (B-Tree) or only leaves do (B+ Tree)

B-Tree vs B+ Tree

Dimension B-Tree B+ Tree
Data in internal nodes Yes No (keys only)
Range queries Slower Fast (leaf linked list)
Tree height Slightly shorter Slightly taller
Used in General purpose Databases (MySQL InnoDB, Postgres)

Key Properties

Property Value
Search O(log n)
Insert O(log n) + rebalance
Delete O(log n) + rebalance
Order (t) Max 2t-1 keys per node
Fan-out High (fits many keys per disk page)

When to Use / Avoid

  • Use for read-heavy workloads with mixed reads/writes
  • Use for range queries (B+ Tree leaf linked list)
  • Use for standard relational DB indexing
  • Avoid for write-heavy append workloads (use LSM Tree)
  • Avoid for in-memory only structures (use Red-Black Tree / Skip List)

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
class BTreeNode:
    """Single B-Tree node (order 3 for simplicity: max 2 keys per node)."""

    def __init__(self, order=3, is_leaf=True):
        self.keys = []  # List of keys
        self.children = []  # List of child pointers (empty if leaf)
        self.order = order  # Max keys = order - 1
        self.is_leaf = is_leaf

    def is_full(self):
        """Check if node has reached max keys."""
        return len(self.keys) >= self.order - 1

    def split(self, child_idx, child):
        """Split full child at index child_idx into two nodes."""
        mid_idx = (self.order - 1) // 2
        mid_key = child.keys[mid_idx]

        # Create right sibling
        right = BTreeNode(self.order, child.is_leaf)
        right.keys = child.keys[mid_idx + 1:]
        if not child.is_leaf:
            right.children = child.children[mid_idx + 1:]

        # Shrink left (original) child
        child.keys = child.keys[:mid_idx]
        if not child.is_leaf:
            child.children = child.children[:mid_idx + 1]

        # Insert mid_key into this node
        self.keys.insert(child_idx, mid_key)
        self.children.insert(child_idx + 1, right)

    def insert_non_full(self, key):
        """Insert into non-full leaf node."""
        idx = len(self.keys) - 1
        if self.is_leaf:
            self.keys.append(None)
            while idx >= 0 and key < self.keys[idx]:
                self.keys[idx + 1] = self.keys[idx]
                idx -= 1
            self.keys[idx + 1] = key
        else:
            # Find child to recurse into
            while idx >= 0 and key < self.keys[idx]:
                idx -= 1
            idx += 1

            if idx < len(self.children):
                child = self.children[idx]
                if child.is_full():
                    self.split(idx, child)
                    if key > self.keys[idx]:
                        idx += 1
                self.children[idx].insert_non_full(key)

class BTree:
    """Simple B-Tree with order 3 (max 2 keys per node)."""

    def __init__(self, order=3):
        self.root = BTreeNode(order)
        self.order = order

    def search(self, key, node=None):
        """Search for key in B-Tree. Return node and index if found, else None."""
        if node is None:
            node = self.root

        idx = 0
        # Find position of key in current node
        while idx < len(node.keys) and key > node.keys[idx]:
            idx += 1

        if idx < len(node.keys) and key == node.keys[idx]:
            return (node, idx)  # Found

        if node.is_leaf:
            return None  # Not found

        # Recurse into child
        return self.search(key, node.children[idx])

    def insert(self, key):
        """Insert key into B-Tree, splitting root if necessary."""
        root = self.root

        if root.is_full():
            # Create new root if old root is full
            new_root = BTreeNode(self.order, is_leaf=False)
            new_root.children.append(root)
            self.root = new_root
            new_root.split(0, root)

        self._insert_non_full(self.root, key)

    def _insert_non_full(self, node, key):
        """Helper: insert into non-full node."""
        idx = len(node.keys) - 1

        if node.is_leaf:
            # Insert directly into sorted leaf
            node.keys.append(None)
            while idx >= 0 and key < node.keys[idx]:
                node.keys[idx + 1] = node.keys[idx]
                idx -= 1
            node.keys[idx + 1] = key
        else:
            # Find child to recurse into
            while idx >= 0 and key < node.keys[idx]:
                idx -= 1
            idx += 1

            child = node.children[idx]
            if child.is_full():
                node.split(idx, child)
                if key > node.keys[idx]:
                    idx += 1

            self._insert_non_full(node.children[idx], key)

    def range_query(self, start_key, end_key):
        """Return all keys in [start_key, end_key]."""
        results = []

        def traverse(node):
            idx = 0
            for i, key in enumerate(node.keys):
                if key >= start_key:
                    idx = i
                    break
            else:
                idx = len(node.keys)

            if not node.is_leaf:
                traverse(node.children[idx])

            for i in range(idx, len(node.keys)):
                key = node.keys[i]
                if key <= end_key:
                    results.append(key)
                    if not node.is_leaf:
                        traverse(node.children[i + 1])
                else:
                    return

        traverse(self.root)
        return sorted(set(results))

# Example usage
btree = BTree(order=3)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
    btree.insert(key)

print(btree.search(6))  # Found
print(btree.search(100))  # Not found
print(btree.range_query(5, 20))  # Range [5, 20]

Algorithm Diagram:

MySQL InnoDB

MySQL InnoDB uses B+ Tree for both clustered primary key indexes and secondary indexes. The clustered index stores the entire row data in leaf nodes, while secondary indexes store only the primary key (leaf nodes point to rows via PK lookup). Default page size is 16KB; with ~500 keys per node, a 3-level B+ Tree can address millions of rows with just 3 disk reads. Leaf nodes are linked (doubly-linked list), enabling fast range scans and ORDER BY operations without re-traversing the tree.

PostgreSQL

PostgreSQL defaults to B-Tree indexing but supports other index types (HASH, BRIN, GiST). It extends B-Tree with partial indexes (WHERE clause) to index only relevant rows, and covering indexes (INCLUDE clause) to add columns without making them searchable—reducing table lookups. For sequential data (e.g., timestamps), PostgreSQL recommends BRIN (Block Range Index) to save space. Index page size matches table page size (8KB default); administrators tune shared_buffers to keep frequently accessed index pages in cache.

SQLite

SQLite uses B-Tree as the primary storage format for both table data and indexes. The entire database is a single file divided into pages (4096 bytes default). Every table is internally a B-Tree keyed by ROWID; every index is a separate B-Tree. The simple, embedded design means no separate index daemon—compaction and page management happen during queries. Configuration: PRAGMA page_size and PRAGMA cache_size tuning is common for embedded applications.

MongoDB

MongoDB’s WiredTiger engine (default since v3.2) implements B-Tree indexing. Compound indexes follow the leftmost prefix rule: an index on (a, b, c) is useful for queries on a, (a, b), but NOT for b alone. WiredTiger automatically selects the best index via a query planner. Sparse indexes (index only non-null values) save space for optional fields. Configuration: wiredTigerCacheSizeGB allocates memory for hot index/data pages.

Filesystem Implementations (ext4, NTFS)

Modern filesystems use B-Tree variants for directory structures and extent management. ext4 uses HTree (Hash Tree) for directory indexing, mapping filenames to inodes in O(log n). Extents (contiguous block ranges) are stored in extent trees to handle large files efficiently. NTFS uses a similar B-Tree for Master File Table (MFT) and file attributes. This O(log n) lookup is why filesystem performance degrades gracefully even with millions of files.

References

This post is licensed under CC BY 4.0 by the author.