Binary Search & Variations
Divide search space in half each step -- O(log n) on any sorted or monotonic space. Far more powerful than sorted array lookups: query optimization, version control, logging systems, and rate limiting.
Classic Binary Search
Problem: Find a target value in a sorted array.
Algorithm:
1
2
3
4
5
6
7
8
9
10
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid # Found!
elif arr[mid] < target:
left = mid + 1 # Search right half
else:
right = mid - 1 # Search left half
return -1 # Not found
| Complexity: O(log n) time | O(1) space |
Key invariant: After each step, target (if it exists) is in [left, right].
Variations — When to Use Each
| Variation | What It Solves | Key Change | Example |
|---|---|---|---|
| Classic | Find exact value in sorted array | Return on == |
Find key in B-Tree node |
| Leftmost | Find first occurrence (duplicates) | When found: right = mid - 1 |
Find start of range in database |
| Rightmost | Find last occurrence (duplicates) | When found: left = mid + 1 |
Find end of range in database |
| On answer space | “Minimum/maximum X satisfying condition” | Binary search on result range, not array | Minimum bandwidth to serve load |
| Rotated array | Sorted array rotated at unknown pivot | Check which half is sorted | Search in rotated sorted array |
| 2D matrix | Row × col as linear index | mid maps to [mid//cols][mid%cols] |
Search in sorted matrix |
Binary Search on Answer Space (Most Useful in System Design)
Pattern: You want to find the minimum (or maximum) value of X such that some condition holds.
Approach:
- Define the search space: [lo, hi] = possible range of X values
- For candidate X, write a feasibility check: can we achieve the goal with this X?
- If feasibility check returns True, try smaller X (right = mid - 1); else try larger X (left = mid + 1)
- Answer is the leftmost (or rightmost) X that passes
Example: Minimum Bandwidth for Request Service
Problem: Given N requests arriving over time T, what is the minimum bandwidth B needed to serve all requests?
Approach:
1
2
3
4
5
6
7
8
9
10
lo = 0, hi = MAX_BANDWIDTH
while lo < hi:
mid = (lo + hi) // 2
if can_serve_all_requests(requests, T, mid):
# mid bandwidth is enough; try smaller
hi = mid
else:
# mid bandwidth insufficient; need more
lo = mid + 1
answer = lo
Key Properties Table
| Property | Value |
|---|---|
| Time Complexity | O(log n) for search + O(f(n)) for feasibility check per iteration |
| Space Complexity | O(1) if iterative |
| Requires | Sorted array or monotonic predicate |
| Works on | Integer arrays, answer spaces, ranges |
When to Use / Avoid
✅ Use binary search when:
- Data is sorted or sorted order is easy to check
- Need to find a value, range boundary, or satisfy a condition
- Time budget is tight (log n is vastly better than linear)
- Can define a clear predicate: “does X work?” → true/false
✅ Use binary search on answer space when:
- Goal is to minimize/maximize a value X
- Can efficiently check if a candidate X satisfies the goal
- Answer space is large (linear search infeasible)
- Problem has monotonic property (if X works, all larger X work; if X fails, all smaller X fail)
❌ Avoid binary search when:
- Data is unsorted and sorting cost exceeds benefit
- Need all occurrences (linear scan may be simpler)
- Feasibility check is O(n) — total becomes O(n log n) (may not beat sorting once)
- Predicate is not monotonic (binary search correctness fails)
System Design Applications
| Use Case | How Binary Search Applies | Real System | Impact |
|---|---|---|---|
| Database Index Seek | B-Tree node search is binary search within each node | PostgreSQL, MySQL, RocksDB | O(log n) row lookups instead of O(n) table scan |
| Version Control Debugging | Binary search commit history for first bad version | Git bisect | Find regression in 1000 commits in ~10 tests instead of 500 |
| Kafka Log Lookup | Binary search offset index to find message by timestamp | Apache Kafka | Find message in 400GB+ broker logs in microseconds |
| Rate Limiter Window | Binary search request timestamps to find window boundary | Cloudflare, Stripe | Determine if request exceeds rate limit in O(log n) |
| Linux Memory Management | Binary search virtual address to find page table entry | Linux kernel page tables | Translate virtual→physical address in 4-5 memory lookups |
| Capacity Planning | Binary search fleet size: “minimum servers to handle load” | AWS Auto-Scaling | Find optimal instance count without manual tuning |
How Real Systems Use This
PostgreSQL & RocksDB — B-Tree Index Seeks
When you query SELECT * FROM users WHERE id = 42 in PostgreSQL, the database uses a B-Tree index to locate the row. The B-Tree structure is hierarchical: the root node points to child nodes, which point to grandchild nodes, and so on. At each level, PostgreSQL performs binary search within the node to find which child pointer to follow. For a typical B-Tree with 8KB page size and 100-byte entries, each node holds ~80 keys. A 1-billion-row table has a B-Tree depth of 4-5 levels. Binary search in each node takes log₂(80) ≈ 7 comparisons per level; total = ~35 comparisons to locate one row. Without the index, a full table scan would require 1 billion row comparisons. Real impact: indexed queries complete in microseconds; sequential scans take seconds. PostgreSQL’s implementation uses the bsearch() family of functions at each B-Tree node to find the correct child pointer.
Git Bisect — Finding First Bad Commit
A developer notices production is broken but doesn’t know when it broke. Git has 2000 commits in master branch since the last known-good version. Using git bisect, the developer runs: git bisect start, git bisect bad (current HEAD), git bisect good v2.1.0 (last known-good tag). Git binary searches the commit history: checks out the commit at the midpoint (commit 1000), and asks “is this good or bad?” Developer tests and replies. Git eliminates 1000 commits. On next iteration, checks commit 500, eliminates 500 more. After ~log₂(2000) ≈ 11 bisections, Git pinpoints the exact commit that introduced the bug. Without bisect, developer would manually test commits sequentially: ~1000 tests. With bisect: 11 tests. Real time savings: 2-3 hours of manual testing reduced to 15 minutes. For massive codebases (Linux kernel with 1M+ commits), bisect becomes indispensable.
Apache Kafka — Binary Search on Log Offsets
Kafka brokers store messages in immutable log files (segments). A consumer wants to read all messages produced after timestamp T (e.g., “give me messages from the last hour”). The broker maintains two indices per segment: an offset index (maps logical offset → byte position in log file) and a timestamp index (maps timestamp → offset). When a consumer requests messages after time T, Kafka uses binary search on the timestamp index to find the corresponding offset, then binary searches the offset index to find the byte position in the log file. Real numbers: a Kafka broker with 50 partitions, each with 48-hour retention, stores ~100GB per partition. Each segment is ~1GB. With 100 segments per partition, binary search finds a message in log₂(100) ≈ 7 comparisons instead of scanning linearly. For 1 billion messages per day per partition, this is the difference between microsecond lookups and multi-second scans. The index.interval.bytes parameter (default 4096) controls index density: lower density = fewer index entries = faster binary search, but coarser granularity within segments.
Git Bisect — Automated Regression Testing
More sophisticated: developers write a test script test-regression.sh that returns 0 (pass) or 1 (fail). git bisect run ./test-regression.sh automates the entire bisect: Git checks out commits and runs the script, interpreting results automatically. For a 1000-commit regression: manual bisect = 11 human iterations × 2 minutes each = 22 minutes. Automated bisect = 11 script runs × 30 seconds each = 5.5 minutes. Real systems (Kubernetes, Linux kernel) rely on bisect for continuous integration regression detection.
Linux Kernel — Binary Search in Page Tables
Virtual address translation: when a CPU accesses a virtual address, the kernel must translate it to a physical address. Modern x86-64 systems use a 5-level page table hierarchy. To translate address 0x7fff800000, the kernel navigates: L1 table → L2 table → L3 table → L4 table → L5 table. Each table is stored in memory (4KB page, 512 entries per page). Instead of linear search, the kernel uses binary search within each page table entry array to find the correct pointer. Real impact: address translation happens on every memory access (billions per second). Without binary search, CPU would need linear scans. Binary search (log₂(512) ≈ 9 comparisons) vs linear (avg 256 comparisons) per memory access = massive speedup. Most modern processors use a Translation Lookaside Buffer (TLB) cache to avoid this lookup, but when TLB misses occur, binary search is critical.
Cloudflare Rate Limiting — Window Boundary Detection
Cloudflare rate limits requests per IP address per time window (e.g., 100 requests/minute). For each IP, the system maintains a log of request timestamps over the last minute. When a new request arrives at timestamp T, the system must check: “how many requests occurred in [T-60s, T]?” Using linear scan, this requires iterating through all requests in the window (potentially thousands). With binary search on the timestamp log, Cloudflare finds the leftmost timestamp ≥ (T - 60s) in O(log n) time. Real-world scenario: DDoS attack sends 50,000 requests/second from different IPs. For each request, rate limiter must check if the IP is over quota. Binary search on the timestamp log makes this O(log n) per request; linear scan would be O(n) = O(10,000s) of computations per second, causing the rate limiter to become the bottleneck. Binary search enables Cloudflare to handle 50,000 RPS per edge server.
Amazon DynamoDB — Partition Key Binary Search
DynamoDB partitions data across shards by hash of the partition key. When a query comes in with partition key K, DynamoDB must determine which shard owns K. The shard range is divided into tokens: [0, 2^64-1]. Each shard owns a contiguous range of tokens. DynamoDB maintains a sorted list of shard ranges; binary search finds which shard owns K in O(log S) time (S = number of shards). Without binary search, lookup is O(S). At scale (Amazon runs millions of DynamoDB partitions), this matters: binary search keeps per-request latency at microseconds.
Implementation
Classic Binary Search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def binary_search(arr, target):
"""
Find target in sorted array.
Args:
arr: sorted array
target: value to find
Returns:
index of target, or -1 if not found
Time: O(log n)
Space: O(1)
"""
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
def binary_search_leftmost(arr, target):
"""
Find leftmost (first) occurrence of target.
Useful when array has duplicates.
"""
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
right = mid - 1 # Continue searching left
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
def binary_search_rightmost(arr, target):
"""
Find rightmost (last) occurrence of target.
"""
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
left = mid + 1 # Continue searching right
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
# Example usage
if __name__ == "__main__":
arr = [1, 3, 3, 3, 5, 7, 9, 11, 13]
print(binary_search(arr, 3)) # Output: 2 (any occurrence)
print(binary_search_leftmost(arr, 3)) # Output: 1 (first)
print(binary_search_rightmost(arr, 3)) # Output: 3 (last)
print(binary_search(arr, 6)) # Output: -1 (not found)
Binary Search on Answer Space
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def min_bandwidth_for_requests(requests, time_limit, throughput_options):
"""
Find minimum throughput (requests/sec) needed to serve all requests
within time_limit.
Args:
requests: list of request sizes (bytes)
time_limit: time window (seconds)
throughput_options: sorted list of available throughputs
Returns:
minimum throughput from options that can serve all requests
Time: O(log T * n) where T = number of throughput options, n = number of requests
"""
def can_serve(throughput):
"""Check if given throughput can serve all requests in time_limit."""
time_used = sum(req / throughput for req in requests)
return time_used <= time_limit
left, right = 0, len(throughput_options) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if can_serve(throughput_options[mid]):
result = throughput_options[mid]
right = mid - 1 # Try smaller throughput
else:
left = mid + 1 # Need larger throughput
return result
def min_servers_for_load(num_requests, request_distribution, time_window,
requests_per_server):
"""
Find minimum number of servers needed to handle peak load.
Args:
num_requests: total requests in time_window
request_distribution: peak requests/sec expected
time_window: time window (seconds)
requests_per_server: max RPS one server can handle
Returns:
minimum number of servers
Binary search on answer space: [0, num_requests]
"""
def can_handle(num_servers):
"""Can num_servers handle the peak load?"""
total_capacity = num_servers * requests_per_server
return total_capacity >= request_distribution
left, right = 1, num_requests
while left < right:
mid = (left + right) // 2
if can_handle(mid):
right = mid # Try fewer servers
else:
left = mid + 1 # Need more servers
return left
# Example usage
if __name__ == "__main__":
# Minimum throughput
requests = [100, 200, 150, 300] # sizes in bytes
time_limit = 1.0 # 1 second
throughputs = [100, 200, 500, 1000] # bytes/sec
min_tp = min_bandwidth_for_requests(requests, time_limit, throughputs)
print(f"Minimum throughput: {min_tp} bytes/sec")
# Minimum servers
peak_rps = 10000 # 10k requests/sec peak
rps_per_server = 1000 # each server handles 1k RPS
min_svr = min_servers_for_load(100000, peak_rps, 100, rps_per_server)
print(f"Minimum servers: {min_svr}")
Common Pitfalls
❌ Overflow: mid = (left + right) // 2 in languages like C/C++ can overflow if left and right are very large integers. Use mid = left + (right - left) // 2 instead. (Python handles big integers automatically, so this is not a concern in Python.)
❌ Off-by-one errors: The condition left <= right vs left < right differs by variant. For leftmost/rightmost searches, track what your loop invariant means at termination.
❌ Assuming monotonicity: Binary search only works if the predicate is monotonic: “if X works, all larger X work” or “if X fails, all smaller X fail.” If the predicate has a “valley” (works, then fails, then works again), binary search will fail.
❌ Not verifying the predicate: Always write a clear feasibility check can_achieve(X) and test it independently before using in binary search.
References
- The Art of Computer Programming, Volume 3 — Donald Knuth (1973) — Section 6.2.1 on binary search; foundational reference on correctness invariants and variants.
- Introduction to Algorithms (CLRS) — Chapter 12 — Comprehensive treatment of search in binary search trees and binary search correctness proofs.
- MIT OpenCourseWare — Binary Search Lecture — Clear explanation of classic and answer-space binary search with invariant analysis.
- ByteByteGo — Binary Search Interview Problems — Visual walkthrough of classic variants and real-world applications.
- PostgreSQL B-Tree Index Documentation — Details on binary search within B-Tree nodes; implementation uses bsearch().
- Git Bisect Documentation — Official guide with examples of automated bisect for regression testing.
- Kafka Log Indexing & Offset Lookup — Explanation of binary search on timestamp and offset indices.
- Cloudflare Rate Limiting Architecture — Production case study on efficient rate limiting with timestamp binary search.
- Linux Kernel Page Table Internals — Details on hierarchical page table structure and address translation (binary search context).
- Wikipedia: Binary Search — Quick reference with pseudocode, complexity analysis, and variant comparison.