Post

Trie & Prefix Tree

A tree where each path from root to node represents a string prefix -- O(L) lookups where L = string length, independent of number of strings.

Trie & Prefix Tree

A tree where each path from root to node represents a string prefix — O(L) lookups where L = string length, independent of number of strings.

Structure

  • marks end-of-word nodes
  • Shared prefixes = shared nodes (space efficient for large dictionaries)

Compressed Trie (Radix Tree)

Collapses single-child chains → less memory, same lookup.

Key Properties

Property Value
Search O(L) — L = key length
Insert O(L)
Delete O(L)
Prefix search O(P + results)
Space O(ALPHABET × N × L) — can be large
vs Hash Map Slower exact lookup, but supports prefix queries

Highlights

  • Prefix queries: Find all words starting with “car” in O(P) time
  • Lexicographic order: In-order traversal gives sorted words
  • Space sharing: “cat”, “card”, “care” share “ca” prefix node
  • Fast spellcheck: Store dictionary, match char-by-char

When to Use ✅ / Avoid ❌

✅ Autocomplete / search suggestions ✅ Spell checking ✅ IP prefix matching (routing tables) ✅ Word games, dictionary lookups ❌ Simple exact-match key-value (use hash map) ❌ Memory constrained + large alphabet (use DAWG / DAFSA)

Real Systems

System How Used
Google Search Autocomplete suggestions
Aho-Corasick Multi-pattern string matching
IP Routers Longest-prefix matching (CIDR)
DNS Resolvers Domain name hierarchy lookup
Elasticsearch Term dictionary for text search

Variants

  • Ternary Search Trie: Balances space with trie lookup benefits
  • DAWG / DAFSA: Shared tails + heads (more compression)
  • Suffix Trie: Index all suffixes of a string

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
class TrieNode:
    """
    A single node in the trie.
    - children: dict mapping character to child node
    - is_end: True if this node marks the end of a valid word
    """
    def __init__(self):
        self.children = {}  # char → TrieNode
        self.is_end = False

class Trie:
    """
    Trie (prefix tree) data structure for efficient prefix queries and word storage.
    """

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        """
        Insert a word into the trie.
        Time: O(L) where L = length of word
        Space: O(L) if word is not already present
        """
        node = self.root
        # Walk through each character in the word
        for char in word:
            # If this character doesn't exist as a child, create a new node
            if char not in node.children:
                node.children[char] = TrieNode()
            # Move down to the child node
            node = node.children[char]
        # Mark the final node as the end of a valid word
        node.is_end = True

    def search(self, word: str) -> bool:
        """
        Check if a complete word exists in the trie.
        Time: O(L) where L = length of word
        """
        node = self.root
        # Walk through each character
        for char in word:
            # If character doesn't exist, word is not in trie
            if char not in node.children:
                return False
            node = node.children[char]
        # Word is in trie only if the final node marks end of word
        return node.is_end

    def starts_with(self, prefix: str) -> list:
        """
        Find all words in trie that start with the given prefix.
        Time: O(P + M) where P = prefix length, M = total chars in results
        Returns: list of words with the given prefix
        """
        node = self.root
        # Walk to the end of the prefix
        for char in prefix:
            if char not in node.children:
                return []  # Prefix doesn't exist
            node = node.children[char]

        # DFS from this prefix node to collect all words below it
        words = []
        self._dfs_collect(node, prefix, words)
        return words

    def _dfs_collect(self, node: TrieNode, current: str, words: list) -> None:
        """
        Helper: DFS traversal to collect all words from current node.
        """
        if node.is_end:
            words.append(current)  # Found a complete word

        # Recursively explore all children
        for char, child in node.children.items():
            self._dfs_collect(child, current + char, words)

    def delete(self, word: str) -> bool:
        """
        Delete a word from the trie.
        Prunes empty nodes on the way back (optimization).
        Time: O(L) where L = length of word
        Returns: True if word was deleted, False if not found
        """
        def _delete_helper(node: TrieNode, word: str, idx: int) -> bool:
            # Base case: reached end of word
            if idx == len(word):
                if not node.is_end:
                    return False  # Word doesn't exist
                node.is_end = False  # Mark as not a word ending
                return len(node.children) == 0  # Return True if this node can be deleted

            char = word[idx]
            if char not in node.children:
                return False  # Word path doesn't exist

            child = node.children[char]
            # Recursively delete; should_delete = True if child has no more children
            should_delete_child = _delete_helper(child, word, idx + 1)

            # If child should be deleted, remove it
            if should_delete_child:
                del node.children[char]

            # Return True if this node has no children left
            return len(node.children) == 0 and not node.is_end

        return _delete_helper(self.root, word, 0)

# Example usage:
if __name__ == "__main__":
    trie = Trie()
    trie.insert("car")
    trie.insert("card")
    trie.insert("care")
    trie.insert("cat")

    print(trie.search("car"))  # True
    print(trie.search("ca"))   # False (prefix, not a word)
    print(trie.starts_with("ca"))  # ["car", "card", "care", "cat"]

    trie.delete("car")
    print(trie.search("car"))  # False
    print(trie.starts_with("ca"))  # ["card", "care", "cat"]

Algorithm Flow

INSERT “cat” into trie containing “car”, “cap”:

AUTOCOMPLETE “ca”: Find all words starting with “ca”:

Google Search / Bing Autocomplete

Tries power the prefix-completion backend for search suggestions. Each prefix node stores the top-K suggestions by query frequency, pre-computed so a single prefix walk returns instant results. In practice, a compressed DAFSA (Directed Acyclic Word Graph) is used to share suffix nodes too, reducing memory by 5-10x compared to a standard trie.

Linux Kernel — IP Routing (LC-Trie)

The kernel routing table uses a Level-Compressed Trie (LC-Trie) on 32-bit IP addresses. Binary trie nodes represent IP prefix bits; longest-prefix-match is O(W) where W=32. This replaced hash tables because routing policies require prefix matching (finding the longest matching CIDR block), not exact matching.

Built on a trie with added “failure links”. Allows searching for 1000 malware signatures simultaneously in one O(n) scan — used in Snort IDS, antivirus engines (ClamAV), and grep with -F flag. The trie backbone lets you build the automaton in O(sum of pattern lengths), making it essential for intrusion detection systems.

Elasticsearch Completion Suggester

Uses FST (Finite State Transducer), a trie variant that also compresses shared suffixes. Loaded entirely in JVM off-heap memory. Handles 100k+ suggest queries/sec because the entire FST fits in L3 cache for typical dictionaries, enabling sub-millisecond latency for autocomplete.

DNS Zone Trees

DNS is a distributed trie — domain labels are nodes (com → google → mail). Each DNS server owns a subtree (zone). NXDOMAIN response means the requested trie path terminates before the leaf, signaling the domain doesn’t exist.

References

This post is licensed under CC BY 4.0 by the author.