Graph Algorithms
Graphs model relationships -- social networks, dependencies, routing, recommendations. Knowing when to apply each traversal is the key system design skill: BFS for unweighted paths, Dijkstra for weighted, topological sort for DAGs.
Graph Representations
| Aspect | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | O(V + E) | O(V²) |
| Check edge (u, v) | O(degree(u)) | O(1) |
| Find all neighbors | O(degree(u)) | O(V) |
| Add edge | O(1) | O(1) |
| Best for | Sparse graphs (V=1M, E=5M) | Dense graphs (E ≈ V²) |
| Cache locality | Poor (pointer chasing) | Good (sequential memory) |
Default: Use adjacency list for most problems. Switch to matrix only if E > V² / 10 or edge lookup is a bottleneck.
Key Properties by Algorithm
BFS — Breadth-First Search
| Property | Value |
|---|---|
| Time | O(V + E) |
| Space | O(V) for queue |
| Works on | Unweighted, weighted (ignores weights) |
| Finds | Shortest path in unweighted graph |
| Explores | Level by level (all neighbors before going deeper) |
| Data structure | Queue |
DFS — Depth-First Search
| Property | Value |
|---|---|
| Time | O(V + E) |
| Space | O(V) for stack/recursion |
| Works on | Unweighted, weighted |
| Finds | Connected components, cycles, topological order |
| Explores | Deep into a path before backtracking |
| Data structure | Stack or recursion |
Dijkstra’s Algorithm
| Property | Value |
|---|---|
| Time | O((V + E) log V) with binary heap |
| Time | O(V²) with simple array |
| Space | O(V) for priority queue |
| Works on | Non-negative edge weights only |
| Finds | Shortest path from source to all nodes |
| Doesn’t work | Negative weights, negative cycles |
| Data structure | Min-heap (priority queue) |
Topological Sort
| Property | Value |
|---|---|
| Time | O(V + E) |
| Space | O(V) |
| Works on | Directed acyclic graphs (DAG) only |
| Finds | Linear ordering respecting all edges |
| Use cases | Build dependency order, task scheduling, precedence |
| Algorithms | DFS post-order or Kahn’s (BFS-based) |
When to Use / Avoid
✅ Use BFS
- Shortest path in unweighted graph — guaranteed shortest by distance count
- Web crawler — visit pages level by level (hop distance)
- Social networks — “degrees of separation”, “2nd connections”
- Bipartite check — 2-color a graph layer by layer
- Virus spread simulation — infect neighbors, then their neighbors
❌ Avoid BFS
- Weighted shortest path — ignores edge weights; use Dijkstra
- Large graphs with sparse querying — BFS must explore entire component; use directed DFS instead
- Memory-critical — uses O(V) queue space; DFS uses O(depth) recursion
✅ Use DFS
- Cycle detection — mark visiting/visited; back edges indicate cycles
- Topological sort — DFS post-order gives valid ordering
- Connected components — flood-fill style component labeling
- Strongly connected components — Tarjan or Kosaraju algorithms (both use DFS)
- Maze solving — backtrack when stuck (natural stack behavior)
❌ Avoid DFS
- Shortest path (unweighted) — BFS is simpler and guaranteed optimal
- Level-order traversal — BFS is natural; DFS requires explicit level tracking
- Stack overflow risk — deep graphs trigger recursion depth limit; use iterative BFS instead
✅ Use Dijkstra
- Shortest path with positive weights — GPS routing, network delay
- Single-source shortest paths to all nodes — build distance table once
- Network routing protocols — OSPF, IS-IS compute tree from each router
- Game pathfinding — A* improves Dijkstra with heuristics
❌ Avoid Dijkstra
- Negative edge weights — Dijkstra’s greedy assumption (“the shortest known path to a settled node won’t improve”) breaks when negative edges exist, because a longer path through a negative edge can yield a shorter total. Use Bellman-Ford O(VE) instead, which relaxes all edges V-1 times
- All-pairs shortest path — use Floyd-Warshall O(V³) or Johnson’s O(VE log V)
- Very dense graphs — O(V²) simple array version might be faster than heap
✅ Use Topological Sort
- Build system task ordering — Make, Gradle, Maven; respect dependencies
- Microservice startup order — start services with no deps first, then dependents
- Instruction scheduling — compiler reorders instructions while preserving dependencies
- Course prerequisite ordering — compute valid enrollment sequence
❌ Avoid Topological Sort
- Cyclic dependencies — algorithm produces no valid ordering; detect and break cycle first
- Unordered data — not a DAG; use DFS or BFS for connectivity instead
How Real Systems Use This
Google Maps & Uber — Dijkstra’s Algorithm with Contraction Hierarchies
Google Maps solves “shortest driving route from A to B” using Dijkstra’s algorithm optimized with contraction hierarchies (CH), a technique that precomputes a compressed graph. On a naive Dijkstra of the full US road graph (~20M edges), finding a route between distant cities would explore millions of nodes and take several seconds. Contraction Hierarchies work by preprocessing: repeatedly remove low-importance nodes (small streets), store shortcuts, building a hierarchy. At query time, Dijkstra expands upward through major roads first (getting a “zoomed-out” view), then zooms in for local details. Real numbers: USA road graph with CH reduces query time from ~2 seconds (Dijkstra alone) to ~10-50 milliseconds. Google Maps also uses A* (Dijkstra + heuristic to bias search toward destination), further cutting explored nodes by 50-70%. Distance metric: edge weight = predicted driving time (not road length), incorporating traffic patterns, speed limits, and turn penalties. Reference: From A to B: Algorithms Powering Google Maps Navigation.
LinkedIn — BFS for “Degrees of Separation” & Recommendations
LinkedIn’s “2nd connections” feature uses breadth-first search to find mutual connections. When you visit a profile, LinkedIn queries: “How far is this person from me?” BFS expands your 1st-degree connections (all people you follow), then their 1st-degree connections (your 2nd-degree), up to depth 3. Time complexity: O(V + E) per query, but E is huge (~10 billion edges in the full graph). To avoid full BFS on every query, LinkedIn maintains cached “common connections” for top profiles. Recommendation engine uses graph traversal: “show jobs of companies that my 2nd-degree connections work at.” Practical numbers: computing 2nd-degree connections for 1 million users takes ~5 minutes on distributed BFS (Spark) vs. 3 hours with naive implementation. Reference: LinkedIn Graph Data Architecture.
Gradle & Maven — Topological Sort for Build Dependency Order
Gradle and Maven compile projects with topological sort on the dependency DAG. A project declares dependencies in build.gradle (Gradle) or pom.xml (Maven). Before compiling module C, both B and A must be built. Topological sort determines: build A, then B, then C (any valid topological ordering is correct). Gradle uses a task scheduler that parses build.gradle, constructs a DAG of tasks (compile, test, package), and performs topological sort (Kahn’s algorithm). For a project with 200 modules: naive compilation order could violate deps; topological sort guarantees valid order. Practical: a missing dependency detected at sort time prevents late-stage compile failures. Reference: Gradle Dependency Resolution & Task Ordering.
Kubernetes — Topological Sort for Pod Dependency Resolution
Kubernetes schedules pods respecting pod dependencies using topological sorting. In a microservices deployment, Pod B (web service) depends on Pod A (database); Kubernetes must schedule A before B. The scheduler analyzes depends_on or init containers, builds a DAG, and performs topological sort to compute startup order. For a cluster deploying 500 microservices with deep dependency chains: naive scheduling might start a service before its database; topological sort guarantees valid order. Complexity: O(V + E) where V = pods, E = dependency edges. Real deployment: orchestrating a data pipeline with stages (ingest → clean → analyze → export) uses topological sort to ensure stages run in order. Reference: Kubernetes Scheduling & Pod Dependencies.
OSPF/BGP — Dijkstra for Network Routing
The Open Shortest Path First (OSPF) routing protocol uses Dijkstra’s algorithm to compute shortest paths in a network. Each router runs Dijkstra on the network graph: nodes = routers, edges = links with cost (hop count or latency). Every router floods link-state updates to all neighbors (I’m connected to these neighbors with this cost), building identical local graph copies. Each router runs Dijkstra to compute: “shortest path from me to every other router.” Forwarding table populated with: “to reach router X, send packet via neighbor Y.” Cost metric: usually hop count (cost=1 per link) but can be latency, bandwidth, or custom. Real network: autonomous system with 1000 routers, 5000 links; each Dijkstra run O(E log V) ≈ 50,000 log 1000 ≈ 500,000 ops, computed every 30 seconds or on topology change. BGP (inter-AS routing) uses variants of Dijkstra optimized for scalability. Reference: OSPF Specification — RFC 2328.
Neo4j — Graph Traversal for Pattern Matching & Recommendations
Neo4j’s graph database uses BFS and DFS internally for pattern matching and recommendation queries. Query “find all friends of my friends who live in NYC” uses BFS: traverse 1-hop (friends), 2-hops (friends of friends), filter by location. Query “find longest path from me to a celebrity” uses DFS: explore all paths, track longest. Neo4j stores a property graph: nodes have labels (Person, Company), edges have types (FOLLOWS, WORKS_AT, LIVES_IN). Traversal algorithms exploit this structure. Real use case: recommendation engine queries “users who follow topics I follow” using BFS on the FOLLOWS graph. For a graph with 10M users, 500M FOLLOWS edges: naive traversal (checking all users) takes O(10M) time; BFS from me explores only reachable users (~1000), completing in milliseconds. Neo4j optimizes BFS with index lookups (fast neighbor retrieval) and query planner (reorder traversal steps). Reference: Neo4j Graph Algorithms Library.
PageRank — Graph-Wide Centrality for Relevance
PageRank uses iterative matrix multiplication (power iteration) on the web graph: nodes = web pages, directed edges = links. PageRank(page) = importance score computed as: “what fraction of random walks end at this page?” Approximates: probability that a random web surfer lands here. Algorithm: repeatedly multiply the graph’s transition matrix by a vector of page scores; after ~20-30 iterations, scores converge. PageRank score is used as one signal in Google Search ranking (along with text relevance, links, freshness). Real graph: 50+ billion web pages, trillions of links. Computing PageRank naively O(T * (V + E)) where T = 30 iterations, is infeasible. Google distributes PageRank computation across MapReduce clusters: split graph into partitions, compute local PageRank, iterate. Damping factor (default 0.85) prevents dangling nodes (pages with no outlinks) from accumulating all rank. Reference: PageRank Algorithm — Stanford CS.
Implementation
BFS — Breadth-First Search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from collections import deque
def bfs(graph, start, goal=None):
"""
Breadth-first search: explore neighbors level by level.
Args:
graph: adjacency list {node: [neighbors]}
start: starting node
goal: target node (optional); returns path if specified
Returns:
distance dict {node: shortest_distance} or path to goal
Time: O(V + E)
Space: O(V) for queue
Why BFS:
- Finds shortest path in UNWEIGHTED graphs
- Explores by distance: 1 hop, 2 hops, etc.
- Guarantees first reach to goal is shortest
"""
visited = {start}
queue = deque([start])
distance = {start: 0}
parent = {start: None}
while queue:
node = queue.popleft()
# Early termination if goal found
if goal and node == goal:
# Reconstruct path
path = []
current = goal
while current is not None:
path.append(current)
current = parent[current]
return path[::-1]
# Explore all unvisited neighbors
for neighbor in graph.get(node, []):
if neighbor not in visited:
visited.add(neighbor)
distance[neighbor] = distance[node] + 1
parent[neighbor] = node
queue.append(neighbor)
return distance if not goal else None # Return all distances if no goal
# Example: social network degrees of separation
if __name__ == "__main__":
network = {
'Alice': ['Bob', 'Charlie'],
'Bob': ['Alice', 'Diana'],
'Charlie': ['Alice', 'Eve'],
'Diana': ['Bob'],
'Eve': ['Charlie', 'Frank'],
'Frank': ['Eve']
}
# Find shortest path from Alice to Frank
path = bfs(network, 'Alice', 'Frank')
print(f"Path from Alice to Frank: {path}")
# Output: ['Alice', 'Charlie', 'Eve', 'Frank']
# Distance from Alice to all reachable nodes
distances = bfs(network, 'Alice')
print(f"Distances from Alice: {distances}")
# Output: {'Alice': 0, 'Bob': 1, 'Charlie': 1, 'Diana': 2, 'Eve': 2, 'Frank': 3}
DFS — Depth-First Search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def dfs_iterative(graph, start):
"""
Iterative DFS using explicit stack (avoids recursion depth limit).
Args:
graph: adjacency list {node: [neighbors]}
start: starting node
Returns:
postorder: list of nodes in post-order (used for topological sort)
Time: O(V + E)
Space: O(V) for stack + recursion
Why DFS:
- Detects cycles (back edges in directed graphs)
- Topological sort via post-order traversal
- Connected components via flood fill
- Requires less memory than BFS queue (especially for sparse trees)
"""
visited = set()
stack = [start]
postorder = []
# Iterative DFS with explicit stack
while stack:
node = stack[-1] # Peek at top
if node not in visited:
visited.add(node)
# Check if any unvisited neighbors remain
has_unvisited = False
for neighbor in graph.get(node, []):
if neighbor not in visited:
has_unvisited = True
stack.append(neighbor)
break # Push one neighbor; continue outer loop
# All neighbors visited or no neighbors: pop and add to postorder
if not has_unvisited:
postorder.append(stack.pop())
return postorder
def dfs_recursive(graph, start, visited=None, postorder=None):
"""
Recursive DFS (simpler but risks stack overflow on very deep graphs).
"""
if visited is None:
visited = set()
if postorder is None:
postorder = []
visited.add(start)
for neighbor in graph.get(start, []):
if neighbor not in visited:
dfs_recursive(graph, neighbor, visited, postorder)
postorder.append(start)
return postorder
# Example: detect cycle in directed graph
if __name__ == "__main__":
# Graph with a cycle: A -> B -> C -> A
graph_cyclic = {
'A': ['B'],
'B': ['C'],
'C': ['A'], # Creates cycle
}
def has_cycle(graph):
"""Check if directed graph has cycle using DFS coloring."""
color = {} # 0=white (unvisited), 1=gray (visiting), 2=black (visited)
def visit(node):
if node in color:
return color[node] == 1 # Cycle if visiting a gray node
color[node] = 1 # Mark as visiting (gray)
for neighbor in graph.get(node, []):
if visit(neighbor):
return True
color[node] = 2 # Mark as visited (black)
return False
for node in graph:
if node not in color:
if visit(node):
return True
return False
print(f"Cyclic graph has cycle: {has_cycle(graph_cyclic)}") # True
print(f"DAG has cycle: {has_cycle({'X': ['Y'], 'Y': ['Z'], 'Z': []})}") # False
Dijkstra’s Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import heapq
def dijkstra(graph, start):
"""
Dijkstra's shortest path algorithm (greedy with min-heap priority queue).
Args:
graph: adjacency list {node: [(neighbor, weight), ...]}
start: source node
Returns:
distance: {node: shortest_distance_from_start}
parent: {node: previous_node_on_shortest_path}
Time: O((V + E) log V) with binary heap
Space: O(V)
Key insight:
- Always expand the closest unvisited node (greedy)
- Min-heap ensures we process nodes in distance order
- Fails on negative weights because the greedy "settle closest node" assumption breaks
"""
distance = {node: float('inf') for node in graph}
distance[start] = 0
parent = {start: None}
# Min-heap of (distance, node)
heap = [(0, start)]
while heap:
curr_dist, node = heapq.heappop(heap)
# Skip if already processed with better distance
if curr_dist > distance[node]:
continue
# Relax edges to neighbors
for neighbor, weight in graph.get(node, []):
new_dist = distance[node] + weight
if new_dist < distance[neighbor]:
distance[neighbor] = new_dist
parent[neighbor] = node
heapq.heappush(heap, (new_dist, neighbor))
return distance, parent
def reconstruct_path(parent, target):
"""Reconstruct path from start to target using parent pointers."""
path = []
current = target
while current is not None:
path.append(current)
current = parent[current]
return path[::-1]
# Example: GPS routing
if __name__ == "__main__":
# Graph: (source, [(destination, distance), ...])
graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 1), ('D', 5)],
'C': [('D', 8), ('E', 10)],
'D': [('E', 2)],
'E': []
}
distance, parent = dijkstra(graph, 'A')
print(f"Shortest distances from A: {distance}")
# Output: {'A': 0, 'B': 4, 'C': 2, 'D': 9, 'E': 11}
print(f"Shortest path A -> E: {reconstruct_path(parent, 'E')}")
# Output: ['A', 'C', 'D', 'E'] with total distance 11
Algorithm Decision Guide
| Problem | Algorithm | Complexity | When to Use |
|---|---|---|---|
| Shortest path (unweighted) | BFS | O(V + E) | Social networks, web crawling |
| Shortest path (weighted, positive) | Dijkstra | O((V+E) log V) | GPS routing, network protocols |
| Shortest path (negative weights) | Bellman-Ford | O(VE) | Currency arbitrage detection |
| All-pairs shortest paths | Floyd-Warshall | O(V³) | Small networks, all routes needed |
| Detect cycle (undirected) | BFS/DFS + union-find | O(V + E) | Dependency detection |
| Detect cycle (directed) | DFS coloring | O(V + E) | Build system cycle detection |
| Task ordering / dependencies | Topological sort | O(V + E) | Gradle, Make, Kubernetes |
| Minimum spanning tree | Prim / Kruskal | O(E log V) | Network design, clustering |
| Strongly connected components | Tarjan / Kosaraju | O(V + E) | Condensation graph analysis |
| Bipartite check | BFS 2-coloring | O(V + E) | Matching problems |
| Centrality / importance | PageRank / Betweenness | O(iterations × E) | Ranking, influence |
References
- Dijkstra’s Shortest Path Algorithm — Dijkstra (1959) — Original paper; foundational for shortest path computation.
- Introduction to Algorithms (CLRS) — Cormen, Leiserson, Rivest, Stein — Chapters 22–24 cover BFS, DFS, shortest paths, topological sort; industry standard reference.
- MIT 6.006 — Graph Algorithms — Video lectures on BFS, DFS, Dijkstra, topological sort with visualizations.
- The Google PageRank Algorithm — Page, Brin, et al. (1998) — Explains PageRank as random walk on web graph; iterative computation.
- From A to B: Algorithms Powering Google Maps Navigation — Details on Dijkstra optimizations (contraction hierarchies), A* search, real-world routing.
- Neo4j Graph Data Science — BFS/DFS Algorithms — Built-in graph traversal in production graph database; query patterns.
- OSPF Specification — RFC 2328 — Dijkstra’s algorithm in network routing; link-state flooding, shortest path tree.
- ByteByteGo — Graph Algorithms — Visual explanation of traversal algorithms and real-world applications; practical examples.