Trie & Prefix Tree

A tree where each path from root to node represents a string prefix -- O(L) lookups where L = string length, independent of number of strings.

Posted Nov 1, 2023

6 min read

Trie & Prefix Tree

A tree where each path from root to node represents a string prefix — O(L) lookups where L = string length, independent of number of strings.

Structure

✓ marks end-of-word nodes
Shared prefixes = shared nodes (space efficient for large dictionaries)

Compressed Trie (Radix Tree)

Collapses single-child chains → less memory, same lookup.

Key Properties

Property	Value
Search	O(L) — L = key length
Insert	O(L)
Delete	O(L)
Prefix search	O(P + results)
Space	O(ALPHABET × N × L) — can be large
vs Hash Map	Slower exact lookup, but supports prefix queries

Highlights

Prefix queries: Find all words starting with “car” in O(P) time
Lexicographic order: In-order traversal gives sorted words
Space sharing: “cat”, “card”, “care” share “ca” prefix node
Fast spellcheck: Store dictionary, match char-by-char

When to Use ✅ / Avoid ❌

✅ Autocomplete / search suggestions ✅ Spell checking ✅ IP prefix matching (routing tables) ✅ Word games, dictionary lookups ❌ Simple exact-match key-value (use hash map) ❌ Memory constrained + large alphabet (use DAWG / DAFSA)

Real Systems

System	How Used
Google Search	Autocomplete suggestions
Aho-Corasick	Multi-pattern string matching
IP Routers	Longest-prefix matching (CIDR)
DNS Resolvers	Domain name hierarchy lookup
Elasticsearch	Term dictionary for text search

Variants

Ternary Search Trie: Balances space with trie lookup benefits
DAWG / DAFSA: Shared tails + heads (more compression)
Suffix Trie: Index all suffixes of a string

Implementation

        
      
class TrieNode:
    """
    A single node in the trie.
    - children: dict mapping character to child node
    - is_end: True if this node marks the end of a valid word
    """
    def __init__(self):
        self.children = {}  # char → TrieNode
        self.is_end = False

class Trie:
    """
    Trie (prefix tree) data structure for efficient prefix queries and word storage.
    """

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        """
        Insert a word into the trie.
        Time: O(L) where L = length of word
        Space: O(L) if word is not already present
        """
        node = self.root
        # Walk through each character in the word
        for char in word:
            # If this character doesn't exist as a child, create a new node
            if char not in node.children:
                node.children[char] = TrieNode()
            # Move down to the child node
            node = node.children[char]
        # Mark the final node as the end of a valid word
        node.is_end = True

    def search(self, word: str) -> bool:
        """
        Check if a complete word exists in the trie.
        Time: O(L) where L = length of word
        """
        node = self.root
        # Walk through each character
        for char in word:
            # If character doesn't exist, word is not in trie
            if char not in node.children:
                return False
            node = node.children[char]
        # Word is in trie only if the final node marks end of word
        return node.is_end

    def starts_with(self, prefix: str) -> list:
        """
        Find all words in trie that start with the given prefix.
        Time: O(P + M) where P = prefix length, M = total chars in results
        Returns: list of words with the given prefix
        """
        node = self.root
        # Walk to the end of the prefix
        for char in prefix:
            if char not in node.children:
                return []  # Prefix doesn't exist
            node = node.children[char]

        # DFS from this prefix node to collect all words below it
        words = []
        self._dfs_collect(node, prefix, words)
        return words

    def _dfs_collect(self, node: TrieNode, current: str, words: list) -> None:
        """
        Helper: DFS traversal to collect all words from current node.
        """
        if node.is_end:
            words.append(current)  # Found a complete word

        # Recursively explore all children
        for char, child in node.children.items():
            self._dfs_collect(child, current + char, words)

    def delete(self, word: str) -> bool:
        """
        Delete a word from the trie.
        Prunes empty nodes on the way back (optimization).
        Time: O(L) where L = length of word
        Returns: True if word was deleted, False if not found
        """
        def _delete_helper(node: TrieNode, word: str, idx: int) -> bool:
            # Base case: reached end of word
            if idx == len(word):
                if not node.is_end:
                    return False  # Word doesn't exist
                node.is_end = False  # Mark as not a word ending
                return len(node.children) == 0  # Return True if this node can be deleted

            char = word[idx]
            if char not in node.children:
                return False  # Word path doesn't exist

            child = node.children[char]
            # Recursively delete; should_delete = True if child has no more children
            should_delete_child = _delete_helper(child, word, idx + 1)

            # If child should be deleted, remove it
            if should_delete_child:
                del node.children[char]

            # Return True if this node has no children left
            return len(node.children) == 0 and not node.is_end

        return _delete_helper(self.root, word, 0)

# Example usage:
if __name__ == "__main__":
    trie = Trie()
    trie.insert("car")
    trie.insert("card")
    trie.insert("care")
    trie.insert("cat")

    print(trie.search("car"))  # True
    print(trie.search("ca"))   # False (prefix, not a word)
    print(trie.starts_with("ca"))  # ["car", "card", "care", "cat"]

    trie.delete("car")
    print(trie.search("car"))  # False
    print(trie.starts_with("ca"))  # ["card", "care", "cat"]

Algorithm Flow

INSERT “cat” into trie containing “car”, “cap”:

AUTOCOMPLETE “ca”: Find all words starting with “ca”:

Google Search / Bing Autocomplete

Tries power the prefix-completion backend for search suggestions. Each prefix node stores the top-K suggestions by query frequency, pre-computed so a single prefix walk returns instant results. In practice, a compressed DAFSA (Directed Acyclic Word Graph) is used to share suffix nodes too, reducing memory by 5-10x compared to a standard trie.

Linux Kernel — IP Routing (LC-Trie)

The kernel routing table uses a Level-Compressed Trie (LC-Trie) on 32-bit IP addresses. Binary trie nodes represent IP prefix bits; longest-prefix-match is O(W) where W=32. This replaced hash tables because routing policies require prefix matching (finding the longest matching CIDR block), not exact matching.

Aho-Corasick (Multi-Pattern Search)

Built on a trie with added “failure links”. Allows searching for 1000 malware signatures simultaneously in one O(n) scan — used in Snort IDS, antivirus engines (ClamAV), and grep with -F flag. The trie backbone lets you build the automaton in O(sum of pattern lengths), making it essential for intrusion detection systems.

Elasticsearch Completion Suggester

Uses FST (Finite State Transducer), a trie variant that also compresses shared suffixes. Loaded entirely in JVM off-heap memory. Handles 100k+ suggest queries/sec because the entire FST fits in L3 cache for typical dictionaries, enabling sub-millisecond latency for autocomplete.

DNS Zone Trees

DNS is a distributed trie — domain labels are nodes (com → google → mail). Each DNS server owns a subtree (zone). NXDOMAIN response means the requested trie path terminates before the leaf, signaling the domain doesn’t exist.

References

📄 Fredkin 1960 — Trie Memory — the original paper coining the term “trie”
📄 Aho & Corasick 1975 — Efficient String Matching — trie + failure links for multi-pattern matching
🎥 ByteByteGo — Trie Explained — visual walkthrough with autocomplete use case
🎥 NeetCode — Implement Trie — coding implementation walkthrough
📖 Wikipedia: Trie — variants: PATRICIA, DAWG, HAT-trie
📖 Wikipedia: Aho-Corasick — multi-pattern extension

Software Architecture, Data Structures

algorithms

This post is licensed under CC BY 4.0 by the author.