Post

Heap & Priority Queue

A complete binary tree where the parent is always <= (min-heap) or >= (max-heap) its children -- gives O(1) access to the min or max element.

Heap & Priority Queue

Structure

Array representation: [5, 10, 12, 11, 22, 24, 89] Parent of index i = (i-1)/2 | Children = 2i+1, 2i+2

Operations

Operation Complexity How
Peek min/max O(1) Root element
Insert O(log n) Insert at end, bubble up
Extract min/max O(log n) Swap root <-> last, remove, sift down
Build heap O(n) Heapify from bottom up
Heap sort O(n log n) Build + N extractions

When to Use / Avoid

  • Use when you need repeated access to min or max element
  • Use for Top-K problems on streams / large datasets
  • Use for priority-based scheduling
  • Use for graph algorithms (Dijkstra, Prim)
  • Avoid when you need arbitrary random access
  • Avoid when you need to search by value (use BST or hash map)

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class MinHeap:
    def __init__(self):
        """Initialize an empty min-heap."""
        self.heap = []

    def push(self, val):
        """Insert a value into the heap."""
        self.heap.append(val)
        self._sift_up(len(self.heap) - 1)

    def pop(self):
        """Remove and return the minimum element."""
        if not self.heap:
            return None

        # Swap root with last element
        min_val = self.heap[0]
        self.heap[0] = self.heap[-1]
        self.heap.pop()

        # Restore heap property by sifting down
        if self.heap:
            self._sift_down(0)

        return min_val

    def peek(self):
        """Return minimum element without removing it."""
        return self.heap[0] if self.heap else None

    def _sift_up(self, index):
        """Move element up the heap until heap property is restored."""
        while index > 0:
            parent_idx = (index - 1) // 2

            # If parent is greater, swap and continue
            if self.heap[parent_idx] > self.heap[index]:
                self.heap[parent_idx], self.heap[index] = (
                    self.heap[index],
                    self.heap[parent_idx],
                )
                index = parent_idx
            else:
                break

    def _sift_down(self, index):
        """Move element down the heap until heap property is restored."""
        while True:
            smallest = index
            left_child = 2 * index + 1
            right_child = 2 * index + 2

            # Compare with left child
            if left_child < len(self.heap) and self.heap[left_child] < self.heap[smallest]:
                smallest = left_child

            # Compare with right child
            if right_child < len(self.heap) and self.heap[right_child] < self.heap[smallest]:
                smallest = right_child

            # If smallest is not current index, swap and continue
            if smallest != index:
                self.heap[index], self.heap[smallest] = self.heap[smallest], self.heap[index]
                index = smallest
            else:
                break

    def __len__(self):
        return len(self.heap)

# Example usage
if __name__ == "__main__":
    heap = MinHeap()
    values = [5, 10, 3, 20, 1, 12]

    print("Inserting:", values)
    for v in values:
        heap.push(v)

    print("Extracting in order:")
    while len(heap) > 0:
        print(heap.pop(), end=" ")
    # Output: 1 3 5 10 12 20

Algorithm Flow:

OS Process Scheduler

Operating systems use priority queues to manage process scheduling. The Linux kernel’s Completely Fair Scheduler (CFS) uses a red-black tree for efficiency, but many real-time operating systems (RTOSes) and embedded systems use binary heaps. When an interrupt fires or a process yields the CPU, the OS extracts the highest-priority ready process from the heap in O(log N) time. Multi-level priority queues (e.g., VxWorks) maintain separate heaps for each priority level, ensuring low-latency scheduling for critical tasks.

Dijkstra’s Algorithm

Dijkstra’s shortest-path algorithm uses a min-heap to always expand the closest unvisited node next. When exploring a graph with V vertices and E edges, the algorithm extracts the minimum-distance node from the heap V times and relaxes up to E edges. With a binary heap, this achieves O((V + E) log V) complexity — far better than the naive O(V²) approach. Most production routing engines (Google Maps, OpenStreetMap) use heaps or advanced priority queue variants (Fibonacci heaps) for this reason.

Kafka Consumer Groups

Kafka brokers use priority queues internally to manage message consumption and partition assignment. When multiple consumers in a group request messages, the broker uses a heap to serve consumers in fairness order, ensuring no single consumer starves others. Additionally, Kafka maintains timestamp-based heaps to track lease expiration and group rebalancing events. The in-memory heap avoids expensive disk lookups during high-throughput scenarios.

Timer Wheels & Netty

High-performance networking libraries like Netty (used in Kafka, Cassandra, etc.) use hierarchical timer wheels to schedule future events (connection timeouts, request deadlines). A timer wheel is an array of “buckets,” each representing a time slice. For events that overflow (e.g., a deadline 1 hour in the future), Netty uses a heap as an overflow bucket. This design achieves O(1) insertion for typical timeouts and O(log N) for rare overflow cases — much faster than a single global heap.

Streaming Top-K Elements

Finding the K largest elements in a stream of N elements efficiently requires a min-heap of size K. The algorithm maintains a heap of the K largest values seen so far. For each new element, if it’s larger than the heap’s minimum, remove the minimum and insert the new element. This achieves O(N log K) time and O(K) space, far better than sorting all N elements. Companies like Netflix and Uber use this pattern to track top products, top users, or top error rates in real-time analytics pipelines.

References

This post is licensed under CC BY 4.0 by the author.