Trie & Prefix Tree
A tree where each path from root to node represents a string prefix -- O(L) lookups where L = string length, independent of number of strings.
A tree where each path from root to node represents a string prefix — O(L) lookups where L = string length, independent of number of strings.
Structure
- ✓ marks end-of-word nodes
- Shared prefixes = shared nodes (space efficient for large dictionaries)
Compressed Trie (Radix Tree)
Collapses single-child chains → less memory, same lookup.
Key Properties
| Property | Value |
|---|---|
| Search | O(L) — L = key length |
| Insert | O(L) |
| Delete | O(L) |
| Prefix search | O(P + results) |
| Space | O(ALPHABET × N × L) — can be large |
| vs Hash Map | Slower exact lookup, but supports prefix queries |
Highlights
- Prefix queries: Find all words starting with “car” in O(P) time
- Lexicographic order: In-order traversal gives sorted words
- Space sharing: “cat”, “card”, “care” share “ca” prefix node
- Fast spellcheck: Store dictionary, match char-by-char
When to Use ✅ / Avoid ❌
✅ Autocomplete / search suggestions ✅ Spell checking ✅ IP prefix matching (routing tables) ✅ Word games, dictionary lookups ❌ Simple exact-match key-value (use hash map) ❌ Memory constrained + large alphabet (use DAWG / DAFSA)
Real Systems
| System | How Used |
|---|---|
| Google Search | Autocomplete suggestions |
| Aho-Corasick | Multi-pattern string matching |
| IP Routers | Longest-prefix matching (CIDR) |
| DNS Resolvers | Domain name hierarchy lookup |
| Elasticsearch | Term dictionary for text search |
Variants
- Ternary Search Trie: Balances space with trie lookup benefits
- DAWG / DAFSA: Shared tails + heads (more compression)
- Suffix Trie: Index all suffixes of a string
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
class TrieNode:
"""
A single node in the trie.
- children: dict mapping character to child node
- is_end: True if this node marks the end of a valid word
"""
def __init__(self):
self.children = {} # char → TrieNode
self.is_end = False
class Trie:
"""
Trie (prefix tree) data structure for efficient prefix queries and word storage.
"""
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
"""
Insert a word into the trie.
Time: O(L) where L = length of word
Space: O(L) if word is not already present
"""
node = self.root
# Walk through each character in the word
for char in word:
# If this character doesn't exist as a child, create a new node
if char not in node.children:
node.children[char] = TrieNode()
# Move down to the child node
node = node.children[char]
# Mark the final node as the end of a valid word
node.is_end = True
def search(self, word: str) -> bool:
"""
Check if a complete word exists in the trie.
Time: O(L) where L = length of word
"""
node = self.root
# Walk through each character
for char in word:
# If character doesn't exist, word is not in trie
if char not in node.children:
return False
node = node.children[char]
# Word is in trie only if the final node marks end of word
return node.is_end
def starts_with(self, prefix: str) -> list:
"""
Find all words in trie that start with the given prefix.
Time: O(P + M) where P = prefix length, M = total chars in results
Returns: list of words with the given prefix
"""
node = self.root
# Walk to the end of the prefix
for char in prefix:
if char not in node.children:
return [] # Prefix doesn't exist
node = node.children[char]
# DFS from this prefix node to collect all words below it
words = []
self._dfs_collect(node, prefix, words)
return words
def _dfs_collect(self, node: TrieNode, current: str, words: list) -> None:
"""
Helper: DFS traversal to collect all words from current node.
"""
if node.is_end:
words.append(current) # Found a complete word
# Recursively explore all children
for char, child in node.children.items():
self._dfs_collect(child, current + char, words)
def delete(self, word: str) -> bool:
"""
Delete a word from the trie.
Prunes empty nodes on the way back (optimization).
Time: O(L) where L = length of word
Returns: True if word was deleted, False if not found
"""
def _delete_helper(node: TrieNode, word: str, idx: int) -> bool:
# Base case: reached end of word
if idx == len(word):
if not node.is_end:
return False # Word doesn't exist
node.is_end = False # Mark as not a word ending
return len(node.children) == 0 # Return True if this node can be deleted
char = word[idx]
if char not in node.children:
return False # Word path doesn't exist
child = node.children[char]
# Recursively delete; should_delete = True if child has no more children
should_delete_child = _delete_helper(child, word, idx + 1)
# If child should be deleted, remove it
if should_delete_child:
del node.children[char]
# Return True if this node has no children left
return len(node.children) == 0 and not node.is_end
return _delete_helper(self.root, word, 0)
# Example usage:
if __name__ == "__main__":
trie = Trie()
trie.insert("car")
trie.insert("card")
trie.insert("care")
trie.insert("cat")
print(trie.search("car")) # True
print(trie.search("ca")) # False (prefix, not a word)
print(trie.starts_with("ca")) # ["car", "card", "care", "cat"]
trie.delete("car")
print(trie.search("car")) # False
print(trie.starts_with("ca")) # ["card", "care", "cat"]
Algorithm Flow
INSERT “cat” into trie containing “car”, “cap”:
AUTOCOMPLETE “ca”: Find all words starting with “ca”:
Google Search / Bing Autocomplete
Tries power the prefix-completion backend for search suggestions. Each prefix node stores the top-K suggestions by query frequency, pre-computed so a single prefix walk returns instant results. In practice, a compressed DAFSA (Directed Acyclic Word Graph) is used to share suffix nodes too, reducing memory by 5-10x compared to a standard trie.
Linux Kernel — IP Routing (LC-Trie)
The kernel routing table uses a Level-Compressed Trie (LC-Trie) on 32-bit IP addresses. Binary trie nodes represent IP prefix bits; longest-prefix-match is O(W) where W=32. This replaced hash tables because routing policies require prefix matching (finding the longest matching CIDR block), not exact matching.
Aho-Corasick (Multi-Pattern Search)
Built on a trie with added “failure links”. Allows searching for 1000 malware signatures simultaneously in one O(n) scan — used in Snort IDS, antivirus engines (ClamAV), and grep with -F flag. The trie backbone lets you build the automaton in O(sum of pattern lengths), making it essential for intrusion detection systems.
Elasticsearch Completion Suggester
Uses FST (Finite State Transducer), a trie variant that also compresses shared suffixes. Loaded entirely in JVM off-heap memory. Handles 100k+ suggest queries/sec because the entire FST fits in L3 cache for typical dictionaries, enabling sub-millisecond latency for autocomplete.
DNS Zone Trees
DNS is a distributed trie — domain labels are nodes (com → google → mail). Each DNS server owns a subtree (zone). NXDOMAIN response means the requested trie path terminates before the leaf, signaling the domain doesn’t exist.
References
- 📄 Fredkin 1960 — Trie Memory — the original paper coining the term “trie”
- 📄 Aho & Corasick 1975 — Efficient String Matching — trie + failure links for multi-pattern matching
- 🎥 ByteByteGo — Trie Explained — visual walkthrough with autocomplete use case
- 🎥 NeetCode — Implement Trie — coding implementation walkthrough
- 📖 Wikipedia: Trie — variants: PATRICIA, DAWG, HAT-trie
- 📖 Wikipedia: Aho-Corasick — multi-pattern extension