Sorting Algorithms
Choosing the right sort determines latency, memory usage, and scalability in distributed systems -- O(n log n) on small datasets vs. O(n + k) for structured data vs. external merge sort for big data.
Choosing the right sort determines latency, memory usage, and scalability in distributed systems — O(n log n) on small datasets vs. O(n + k) for structured data vs. external merge sort for big data.
Key Properties
| Algorithm | Time (Avg) | Time (Worst) | Space | Stable? | In-Place? | Best For |
|---|---|---|---|---|---|---|
| Insertion Sort | O(n²) | O(n²) | O(1) | ✅ | ✅ | Small arrays, nearly sorted data, base case for hybrid sorts |
| Merge Sort | O(n log n) | O(n log n) | O(n) | ✅ | ❌ | External sort, linked lists, guaranteed performance, distributed sort |
| Quick Sort | O(n log n) | O(n²) | O(log n) | ❌ | ✅ | In-memory, cache-friendly, average fastest, most production default |
| Heap Sort | O(n log n) | O(n log n) | O(1) | ❌ | ✅ | In-place, guaranteed O(n log n), kernel/embedded systems |
| Tim Sort | O(n) best, O(n log n) avg/worst | O(n log n) | O(n) | ✅ | ❌ | Real-world data with runs, Python/Java default, adaptive sorting |
| Intro Sort | O(n log n) | O(n log n) | O(log n) | ❌ | ✅ | C++ std::sort, hybrid speed + guaranteed performance |
| Counting Sort | O(n + k) | O(n + k) | O(k) | ✅ | ❌ | Small integer range (age 1–120, scores 1–100) |
| Radix Sort | O(nk) | O(nk) | O(n+k) | ✅ | ❌ | Fixed-length integers/strings, no comparisons needed |
| Bucket Sort | O(n + k) | O(n²) | O(n + k) | ✅ | ❌ | Uniformly distributed data, float ranges |
When to Use / Avoid
✅ Use Comparison-Based Sort (O(n log n))
- General-purpose sorting — no constraints on data distribution
- Mixed data types — need to define comparator once
- Guaranteed performance — can accept O(n log n) worst-case
- Small datasets — performance difference negligible below ~10K elements
❌ Avoid Comparison-Based Sort
- Integers with small range (1–1000) — switch to counting/radix sort for O(n)
- Strict memory limit — merge sort needs O(n) extra space; use heap sort O(1) instead
- Linked lists — quick sort performs poorly; use merge sort
- Worst-case critical — quick sort has O(n²) worst case; use heap sort or intro sort
✅ Use Merge Sort
- External sorting — data doesn’t fit in RAM; optimal for k-way merge
- Stable sort required — deduplicate while sorting, preserve original order
- Linked lists — naturally suited, no random access needed
- Distributed sorting — MapReduce, Spark; built-in shuffle phase uses external merge
❌ Avoid Merge Sort
- Strict space constraints — needs O(n) temporary storage
- In-memory only — quick sort faster on arrays with good cache locality
- Partially sorted data — TimSort adapts better, runs in O(n) best case
✅ Use Tim Sort
- Real-world data — contains natural runs (already-sorted sequences)
- Language runtime — Python sorted() / list.sort(), Java Arrays.sort(objects)
- Adaptive sorting — 50% faster than O(n log n) on partially sorted data
- Stability + speed — best of both worlds for production systems
✅ Use Linear-Time Sort (Counting/Radix)
- Bounded integer range — ages (1–120), scores (0–100), digits (0–9)
- Need O(n) performance — comparison sort bottleneck unacceptable
- Not comparing objects — only keys of fixed width
✅ Use Heap Sort (Kernel/Embedded)
- Guaranteed O(n log n) — security-critical, must prevent timing attacks
- Constant space O(1) — embedded systems, kernel code
- No extra allocations — strictly in-place, no malloc calls
How Real Systems Use This
Python & Java Runtime (Tim Sort — Adaptive)
Python’s list.sort() and sorted() use TimSort, designed by Tim Peters in 2002. The algorithm identifies natural “runs” (maximal monotonic subsequences) in the input and merges them intelligently. For partially sorted data (which dominates real-world usage), TimSort runs in O(n) instead of O(n log n), delivering 2-3x speedup. Java 7+ uses identical logic for Arrays.sort(Object[]). The minimum run size is 32 elements (computed dynamically), chosen empirically; smaller runs trigger insertion sort, reducing comparisons. Real-world benchmark: sorting a list of 1 million pre-sorted integers takes ~5ms with TimSort vs. ~50ms with quicksort. Reference: Tim Peters’ original TimSort analysis.
MapReduce — External Merge Sort (Distributed)
MapReduce’s shuffle-and-sort phase uses external merge sort to handle datasets larger than any single machine’s RAM. The mapper emits (key, value) pairs sorted in-memory using quicksort; unsorted spills are written to local disk. During the reduce phase, MapReduce k-way merges hundreds of sorted segments from different mappers into a final sorted stream. For a 1TB dataset with 256MB per-node memory and 100 mappers: each mapper spills 4-5 sorted 256MB chunks; reducers perform a 400-way merge. This keeps I/O sequential (fast) and memory bounded. Complexity: O(n log n) for initial sort + O(n log k) for k-way merge where k is the number of sorted chunks. Reference: MapReduce Shuffle & Sort Architecture.
PostgreSQL — Internal Quick Sort + External Merge for ORDER BY
PostgreSQL sorts rows using quicksort for in-memory data (O(n log n) average, but limits recursion depth to prevent O(n²) worst-case). When result set exceeds work_mem (default 4MB), PostgreSQL spills to disk and performs external merge sort with automatic k-way merge. For a 1TB table with ORDER BY revenue DESC, the planner estimates: if <4MB fits in memory, quick sort in-memory; if 100GB output, spill to disk and k-way merge sorted runs. Modern PostgreSQL (13+) uses incremental sort — if input already partially sorted by index, it sorts only unsorted segments, reducing work. Reference: PostgreSQL ORDER BY Documentation.
Apache Spark — Sort-Merge Join (Distributed)
Spark’s sortByKey() and sort-merge join use external merge sort across distributed nodes. Each partition is sorted in-memory; if it exceeds executor memory, Spark spills to disk and performs external merge. For a join of two 500GB tables: Spark sorts both sides by join key using external sort, then streams through both sorted inputs simultaneously in O(n) merge phase. This avoids bringing full tables into memory (unlike hash join). Total complexity: O(n log n) sort + O(n) merge across distributed data. Reference: Apache Spark Sort & Shuffle Architecture.
Linux Kernel — Heap Sort (Safety-Critical)
The Linux kernel uses heapsort in lib/sort.c, not quicksort, because the kernel prioritizes guaranteed O(n log n) worst-case performance and constant O(1) extra space. Quicksort’s O(n²) worst-case could be exploited by malicious userspace to trigger kernel slowdown (denial-of-service). HeapSort provides an upper bound guarantee and is fully in-place, requiring no malloc (critical in kernel context). For sorting 1000 inode numbers during a filesystem operation, heapsort guarantees completion in ~10,000 comparisons; quicksort could require 1,000,000. Reference: Linux Kernel Sort Implementation.
C++ Standard Library — Intro Sort (Hybrid for Speed + Safety)
C++’s std::sort() uses introsort (introspective sort), a hybrid of quicksort + heapsort + insertion sort. It starts with quicksort (fast average case, cache-friendly), monitors recursion depth, and switches to heapsort if depth exceeds 2*log(n) (preventing O(n²) worst-case). Base case (arrays <16 elements) uses insertion sort (fewer comparisons, better cache). Real-world impact: sorting 10 million random integers runs in ~500ms with introsort vs. ~600ms with pure quicksort. Reference: IntroSort Overview.
Implementation
Merge Sort with External Sort Support
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def merge_sort(arr, left=0, right=None):
"""
Stable merge sort with O(n log n) guaranteed time and O(n) space.
Args:
arr: list to sort (modifies in-place)
left: start index
right: end index
Why merge sort:
- Guaranteed O(n log n) — no worst-case degradation
- Stable — preserves order of equal elements (important for deduplication)
- Natural for external sort — two sorted files merge trivially
"""
if right is None:
right = len(arr) - 1
if left < right:
mid = (left + right) // 2
merge_sort(arr, left, mid)
merge_sort(arr, mid + 1, right)
merge(arr, left, mid, right)
def merge(arr, left, mid, right):
"""
Merge two sorted subarrays arr[left:mid+1] and arr[mid+1:right+1].
Time: O(n), Space: O(n) for temporary storage.
"""
left_part = arr[left:mid + 1]
right_part = arr[mid + 1:right + 1]
i = j = 0
k = left
# Two-pointer merge: pick smaller element each iteration
while i < len(left_part) and j < len(right_part):
if left_part[i] <= right_part[j]: # <= maintains stability
arr[k] = left_part[i]
i += 1
else:
arr[k] = right_part[j]
j += 1
k += 1
# Copy remaining elements — use bounded slice to avoid overwriting
# beyond the merge region (critical when sorting subarrays)
remaining = left_part[i:] + right_part[j:]
arr[k:k + len(remaining)] = remaining
# Example usage
if __name__ == "__main__":
data = [38, 27, 43, 3, 9, 82, 10]
merge_sort(data)
print(f"Sorted: {data}") # [3, 9, 10, 27, 38, 43, 82]
External Merge Sort (Big Data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import heapq
import os
def external_merge_sort(input_file, output_file, chunk_size=1000, temp_dir="/tmp"):
"""
External merge sort for files larger than RAM.
Algorithm:
1. Read chunk_size lines, sort in-memory, write to temp file
2. Perform k-way merge on all temp files using min-heap
Time: O(n log n), Space: O(chunk_size)
Args:
input_file: path to unsorted data (one number per line)
output_file: path to sorted output
chunk_size: lines to hold in memory (tune based on available RAM)
temp_dir: directory for temporary sorted chunks
"""
# Phase 1: Split and sort chunks
chunk_files = []
with open(input_file, 'r') as f:
chunk_num = 0
chunk = []
for line in f:
chunk.append(int(line.strip()))
if len(chunk) >= chunk_size:
# Sort chunk in-memory and write to disk
chunk.sort()
chunk_path = os.path.join(temp_dir, f"chunk_{chunk_num}.txt")
with open(chunk_path, 'w') as cf:
cf.write('\n'.join(map(str, chunk)))
chunk_files.append(chunk_path)
chunk_num += 1
chunk = []
# Don't forget last partial chunk
if chunk:
chunk.sort()
chunk_path = os.path.join(temp_dir, f"chunk_{chunk_num}.txt")
with open(chunk_path, 'w') as cf:
cf.write('\n'.join(map(str, chunk)))
chunk_files.append(chunk_path)
# Phase 2: K-way merge using min-heap
# Open all chunk files and maintain heap of (value, chunk_index, line_iterator)
file_handles = [open(f, 'r') for f in chunk_files]
heap = []
# Initialize heap with first element from each chunk
for idx, fh in enumerate(file_handles):
line = fh.readline().strip()
if line:
heapq.heappush(heap, (int(line), idx, fh))
# Merge: repeatedly pop minimum and push next element from same chunk
with open(output_file, 'w') as out:
while heap:
val, chunk_idx, fh = heapq.heappop(heap)
out.write(f"{val}\n")
# Try to push next element from same chunk
line = fh.readline().strip()
if line:
heapq.heappush(heap, (int(line), chunk_idx, fh))
# Cleanup
for fh in file_handles:
fh.close()
for chunk_file in chunk_files:
os.remove(chunk_file)
print(f"External sort complete: {output_file}")
Quick Sort for In-Memory Performance
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def quick_sort(arr, left=0, right=None):
"""
Quick sort with average O(n log n) and cache-friendly performance.
Worst-case O(n²) mitigated by median-of-three pivot selection.
Why quicksort:
- Fastest average case for in-memory data
- In-place (O(log n) extra for recursion stack)
- Excellent cache locality (partitioning is sequential)
- Production default: std::sort in C++, Arrays.sort(primitives) in Java
"""
if right is None:
right = len(arr) - 1
if left < right:
# Partition and get pivot index
pivot_idx = partition(arr, left, right)
# Recursively sort left and right partitions
quick_sort(arr, left, pivot_idx - 1)
quick_sort(arr, pivot_idx + 1, right)
def partition(arr, left, right):
"""
Partition array around pivot (median-of-three to avoid worst-case).
Time: O(n), single pass through subarray.
"""
# Median-of-three: choose pivot from first, middle, last
mid = (left + right) // 2
pivot_idx = median_of_three_idx(arr, left, mid, right)
# Swap pivot to end
arr[pivot_idx], arr[right] = arr[right], arr[pivot_idx]
pivot_val = arr[right]
# Partition: elements < pivot on left, >= on right
i = left - 1
for j in range(left, right):
if arr[j] < pivot_val:
i += 1
arr[i], arr[j] = arr[j], arr[i]
# Place pivot in final position
arr[i + 1], arr[right] = arr[right], arr[i + 1]
return i + 1
def median_of_three_idx(arr, a, b, c):
"""Return index of median of arr[a], arr[b], arr[c]."""
if arr[a] <= arr[b]:
if arr[b] <= arr[c]:
return b
elif arr[a] <= arr[c]:
return c
else:
return a
else:
if arr[a] <= arr[c]:
return a
elif arr[b] <= arr[c]:
return c
else:
return b
Sorting Decision Tree
Problem: I need to sort N elements. Which algorithm?
- Does data fit in memory?
- No → External merge sort (MapReduce, Spark, databases)
- Yes → Continue
- Are elements comparison-comparable integers in small range [0, k]?
- Yes, k < 1000 → Counting sort O(n + k)
- Yes, fixed-width → Radix sort O(nk)
- No → Continue
- Can I afford O(n) extra space?
- Yes → Tim sort (Python/Java) or Merge sort (stable, linked lists)
- No → Continue
- Worst-case guarantee critical? (kernel, security, embedded)
- Yes → Heap sort O(n log n), O(1) space
- No → Intro sort (C++ std::sort) or Quick sort
- Data naturally partially sorted?
- Yes → Tim sort (O(n) best case)
- No → Quick sort (O(n log n) average, cache-friendly)
References
- Tim Peters’ TimSort Algorithm Analysis — Original design document; explains run detection, galloping mode, and why O(n) best case is possible.
- On the Worst-Case Complexity of TimSort — Munro & Wild (2018) — Mathematical proof of TimSort’s O(n log n) worst-case; practical implications for production sorting.
- A Fast Sorting Algorithm — Musser (1997, IntroSort) — Introduces introsort; explains hybrid quicksort+heapsort approach used by C++ std::sort.
- ByteByteGo — Sorting Algorithms Explained — Visual walkthrough of merge sort, quick sort, and when to use each; production examples.
- Wikipedia: Sorting Algorithm — Comprehensive comparison table; stable vs. unstable, best/worst/average complexity.
- Google MapReduce Paper: Shuffle and Sort Phase — Describes external merge sort in distributed context; k-way merge optimization.
- Linux Kernel Sort Implementation — Heapsort in kernel; explains why guaranteed O(n log n) is security-critical.
- PostgreSQL ORDER BY Documentation — Incremental sort, spill to disk, work_mem tuning.