Heap & Priority Queue
A complete binary tree where the parent is always <= (min-heap) or >= (max-heap) its children -- gives O(1) access to the min or max element.
Structure
Array representation: [5, 10, 12, 11, 22, 24, 89]
Parent of index i = (i-1)/2 | Children = 2i+1, 2i+2
Operations
| Operation | Complexity | How |
|---|---|---|
| Peek min/max | O(1) | Root element |
| Insert | O(log n) | Insert at end, bubble up |
| Extract min/max | O(log n) | Swap root <-> last, remove, sift down |
| Build heap | O(n) | Heapify from bottom up |
| Heap sort | O(n log n) | Build + N extractions |
When to Use / Avoid
- Use when you need repeated access to min or max element
- Use for Top-K problems on streams / large datasets
- Use for priority-based scheduling
- Use for graph algorithms (Dijkstra, Prim)
- Avoid when you need arbitrary random access
- Avoid when you need to search by value (use BST or hash map)
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class MinHeap:
def __init__(self):
"""Initialize an empty min-heap."""
self.heap = []
def push(self, val):
"""Insert a value into the heap."""
self.heap.append(val)
self._sift_up(len(self.heap) - 1)
def pop(self):
"""Remove and return the minimum element."""
if not self.heap:
return None
# Swap root with last element
min_val = self.heap[0]
self.heap[0] = self.heap[-1]
self.heap.pop()
# Restore heap property by sifting down
if self.heap:
self._sift_down(0)
return min_val
def peek(self):
"""Return minimum element without removing it."""
return self.heap[0] if self.heap else None
def _sift_up(self, index):
"""Move element up the heap until heap property is restored."""
while index > 0:
parent_idx = (index - 1) // 2
# If parent is greater, swap and continue
if self.heap[parent_idx] > self.heap[index]:
self.heap[parent_idx], self.heap[index] = (
self.heap[index],
self.heap[parent_idx],
)
index = parent_idx
else:
break
def _sift_down(self, index):
"""Move element down the heap until heap property is restored."""
while True:
smallest = index
left_child = 2 * index + 1
right_child = 2 * index + 2
# Compare with left child
if left_child < len(self.heap) and self.heap[left_child] < self.heap[smallest]:
smallest = left_child
# Compare with right child
if right_child < len(self.heap) and self.heap[right_child] < self.heap[smallest]:
smallest = right_child
# If smallest is not current index, swap and continue
if smallest != index:
self.heap[index], self.heap[smallest] = self.heap[smallest], self.heap[index]
index = smallest
else:
break
def __len__(self):
return len(self.heap)
# Example usage
if __name__ == "__main__":
heap = MinHeap()
values = [5, 10, 3, 20, 1, 12]
print("Inserting:", values)
for v in values:
heap.push(v)
print("Extracting in order:")
while len(heap) > 0:
print(heap.pop(), end=" ")
# Output: 1 3 5 10 12 20
Algorithm Flow:
OS Process Scheduler
Operating systems use priority queues to manage process scheduling. The Linux kernel’s Completely Fair Scheduler (CFS) uses a red-black tree for efficiency, but many real-time operating systems (RTOSes) and embedded systems use binary heaps. When an interrupt fires or a process yields the CPU, the OS extracts the highest-priority ready process from the heap in O(log N) time. Multi-level priority queues (e.g., VxWorks) maintain separate heaps for each priority level, ensuring low-latency scheduling for critical tasks.
Dijkstra’s Algorithm
Dijkstra’s shortest-path algorithm uses a min-heap to always expand the closest unvisited node next. When exploring a graph with V vertices and E edges, the algorithm extracts the minimum-distance node from the heap V times and relaxes up to E edges. With a binary heap, this achieves O((V + E) log V) complexity — far better than the naive O(V²) approach. Most production routing engines (Google Maps, OpenStreetMap) use heaps or advanced priority queue variants (Fibonacci heaps) for this reason.
Kafka Consumer Groups
Kafka brokers use priority queues internally to manage message consumption and partition assignment. When multiple consumers in a group request messages, the broker uses a heap to serve consumers in fairness order, ensuring no single consumer starves others. Additionally, Kafka maintains timestamp-based heaps to track lease expiration and group rebalancing events. The in-memory heap avoids expensive disk lookups during high-throughput scenarios.
Timer Wheels & Netty
High-performance networking libraries like Netty (used in Kafka, Cassandra, etc.) use hierarchical timer wheels to schedule future events (connection timeouts, request deadlines). A timer wheel is an array of “buckets,” each representing a time slice. For events that overflow (e.g., a deadline 1 hour in the future), Netty uses a heap as an overflow bucket. This design achieves O(1) insertion for typical timeouts and O(log N) for rare overflow cases — much faster than a single global heap.
Streaming Top-K Elements
Finding the K largest elements in a stream of N elements efficiently requires a min-heap of size K. The algorithm maintains a heap of the K largest values seen so far. For each new element, if it’s larger than the heap’s minimum, remove the minimum and insert the new element. This achieves O(N log K) time and O(K) space, far better than sorting all N elements. Companies like Netflix and Uber use this pattern to track top products, top users, or top error rates in real-time analytics pipelines.
References
- Introduction to Algorithms (CLRS) — Chapter 6: Heapsort — Canonical reference covering heap structure, operations (insert, delete-min, heapify), and O(n log n) heapsort.
- MIT OpenCourseWare — Priority Queues and Binary Heaps — Visual walkthrough of sift-up/sift-down with complexity analysis.
- Wikipedia: Binary Heap — Quick reference with array indexing formulas, pseudo-code, and variants (d-ary heaps, Fibonacci heaps).
- ByteByteGo — Heap Data Structure — Step-by-step animation of insert and extract-min operations.
- Netty Documentation: Timer Wheel Implementation — Real-world use case: how Netty combines timer wheels with heaps for efficient timeout management.