B-Tree & B+ Tree

Self-balancing tree keeping data sorted with O(log n) reads/writes -- the dominant index structure in relational databases.

Posted Aug 1, 2023

7 min read

B-Tree & B+ Tree

B-Tree Node Structure

Every node holds multiple keys (minimises tree height -> fewer disk reads)
All nodes store data pointers (B-Tree) or only leaves do (B+ Tree)

B-Tree vs B+ Tree

Dimension	B-Tree	B+ Tree
Data in internal nodes	Yes	No (keys only)
Range queries	Slower	Fast (leaf linked list)
Tree height	Slightly shorter	Slightly taller
Used in	General purpose	Databases (MySQL InnoDB, Postgres)

Key Properties

Property	Value
Search	O(log n)
Insert	O(log n) + rebalance
Delete	O(log n) + rebalance
Order (t)	Max 2t-1 keys per node
Fan-out	High (fits many keys per disk page)

When to Use / Avoid

Use for read-heavy workloads with mixed reads/writes
Use for range queries (B+ Tree leaf linked list)
Use for standard relational DB indexing
Avoid for write-heavy append workloads (use LSM Tree)
Avoid for in-memory only structures (use Red-Black Tree / Skip List)

Implementation

        
      
class BTreeNode:
    """Single B-Tree node (order 3 for simplicity: max 2 keys per node)."""

    def __init__(self, order=3, is_leaf=True):
        self.keys = []  # List of keys
        self.children = []  # List of child pointers (empty if leaf)
        self.order = order  # Max keys = order - 1
        self.is_leaf = is_leaf

    def is_full(self):
        """Check if node has reached max keys."""
        return len(self.keys) >= self.order - 1

    def split(self, child_idx, child):
        """Split full child at index child_idx into two nodes."""
        mid_idx = (self.order - 1) // 2
        mid_key = child.keys[mid_idx]

        # Create right sibling
        right = BTreeNode(self.order, child.is_leaf)
        right.keys = child.keys[mid_idx + 1:]
        if not child.is_leaf:
            right.children = child.children[mid_idx + 1:]

        # Shrink left (original) child
        child.keys = child.keys[:mid_idx]
        if not child.is_leaf:
            child.children = child.children[:mid_idx + 1]

        # Insert mid_key into this node
        self.keys.insert(child_idx, mid_key)
        self.children.insert(child_idx + 1, right)

    def insert_non_full(self, key):
        """Insert into non-full leaf node."""
        idx = len(self.keys) - 1
        if self.is_leaf:
            self.keys.append(None)
            while idx >= 0 and key < self.keys[idx]:
                self.keys[idx + 1] = self.keys[idx]
                idx -= 1
            self.keys[idx + 1] = key
        else:
            # Find child to recurse into
            while idx >= 0 and key < self.keys[idx]:
                idx -= 1
            idx += 1

            if idx < len(self.children):
                child = self.children[idx]
                if child.is_full():
                    self.split(idx, child)
                    if key > self.keys[idx]:
                        idx += 1
                self.children[idx].insert_non_full(key)

class BTree:
    """Simple B-Tree with order 3 (max 2 keys per node)."""

    def __init__(self, order=3):
        self.root = BTreeNode(order)
        self.order = order

    def search(self, key, node=None):
        """Search for key in B-Tree. Return node and index if found, else None."""
        if node is None:
            node = self.root

        idx = 0
        # Find position of key in current node
        while idx < len(node.keys) and key > node.keys[idx]:
            idx += 1

        if idx < len(node.keys) and key == node.keys[idx]:
            return (node, idx)  # Found

        if node.is_leaf:
            return None  # Not found

        # Recurse into child
        return self.search(key, node.children[idx])

    def insert(self, key):
        """Insert key into B-Tree, splitting root if necessary."""
        root = self.root

        if root.is_full():
            # Create new root if old root is full
            new_root = BTreeNode(self.order, is_leaf=False)
            new_root.children.append(root)
            self.root = new_root
            new_root.split(0, root)

        self._insert_non_full(self.root, key)

    def _insert_non_full(self, node, key):
        """Helper: insert into non-full node."""
        idx = len(node.keys) - 1

        if node.is_leaf:
            # Insert directly into sorted leaf
            node.keys.append(None)
            while idx >= 0 and key < node.keys[idx]:
                node.keys[idx + 1] = node.keys[idx]
                idx -= 1
            node.keys[idx + 1] = key
        else:
            # Find child to recurse into
            while idx >= 0 and key < node.keys[idx]:
                idx -= 1
            idx += 1

            child = node.children[idx]
            if child.is_full():
                node.split(idx, child)
                if key > node.keys[idx]:
                    idx += 1

            self._insert_non_full(node.children[idx], key)

    def range_query(self, start_key, end_key):
        """Return all keys in [start_key, end_key]."""
        results = []

        def traverse(node):
            idx = 0
            for i, key in enumerate(node.keys):
                if key >= start_key:
                    idx = i
                    break
            else:
                idx = len(node.keys)

            if not node.is_leaf:
                traverse(node.children[idx])

            for i in range(idx, len(node.keys)):
                key = node.keys[i]
                if key <= end_key:
                    results.append(key)
                    if not node.is_leaf:
                        traverse(node.children[i + 1])
                else:
                    return

        traverse(self.root)
        return sorted(set(results))

# Example usage
btree = BTree(order=3)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
    btree.insert(key)

print(btree.search(6))  # Found
print(btree.search(100))  # Not found
print(btree.range_query(5, 20))  # Range [5, 20]

Algorithm Diagram:

MySQL InnoDB

MySQL InnoDB uses B+ Tree for both clustered primary key indexes and secondary indexes. The clustered index stores the entire row data in leaf nodes, while secondary indexes store only the primary key (leaf nodes point to rows via PK lookup). Default page size is 16KB; with ~500 keys per node, a 3-level B+ Tree can address millions of rows with just 3 disk reads. Leaf nodes are linked (doubly-linked list), enabling fast range scans and ORDER BY operations without re-traversing the tree.

PostgreSQL

PostgreSQL defaults to B-Tree indexing but supports other index types (HASH, BRIN, GiST). It extends B-Tree with partial indexes (WHERE clause) to index only relevant rows, and covering indexes (INCLUDE clause) to add columns without making them searchable—reducing table lookups. For sequential data (e.g., timestamps), PostgreSQL recommends BRIN (Block Range Index) to save space. Index page size matches table page size (8KB default); administrators tune shared_buffers to keep frequently accessed index pages in cache.

SQLite

SQLite uses B-Tree as the primary storage format for both table data and indexes. The entire database is a single file divided into pages (4096 bytes default). Every table is internally a B-Tree keyed by ROWID; every index is a separate B-Tree. The simple, embedded design means no separate index daemon—compaction and page management happen during queries. Configuration: PRAGMA page_size and PRAGMA cache_size tuning is common for embedded applications.

MongoDB

MongoDB’s WiredTiger engine (default since v3.2) implements B-Tree indexing. Compound indexes follow the leftmost prefix rule: an index on (a, b, c) is useful for queries on a, (a, b), but NOT for b alone. WiredTiger automatically selects the best index via a query planner. Sparse indexes (index only non-null values) save space for optional fields. Configuration: wiredTigerCacheSizeGB allocates memory for hot index/data pages.

Filesystem Implementations (ext4, NTFS)

Modern filesystems use B-Tree variants for directory structures and extent management. ext4 uses HTree (Hash Tree) for directory indexing, mapping filenames to inodes in O(log n). Extents (contiguous block ranges) are stored in extent trees to handle large files efficiently. NTFS uses a similar B-Tree for Master File Table (MFT) and file attributes. This O(log n) lookup is why filesystem performance degrades gracefully even with millions of files.

References

Bayer & McCreight (1972) — Organization and Maintenance of Large Ordered Indexes — Original B-Tree paper introducing self-balancing property and fan-out optimization
CLRS Introduction to Algorithms, Chapter 18 — Rigorous treatment of B-Tree insertions, deletions, and complexity analysis
CMU Database Systems Lecture: B-Trees — Walkthrough of split, balance, range scan operations
Use The Index, Luke! — B-Tree Indexing — Practical database indexing patterns and common mistakes
Wikipedia: B-Tree — Quick reference with visual examples of splits and node structure

Software Architecture, Data Structures

algorithms

This post is licensed under CC BY 4.0 by the author.