Post

Graph Algorithms

Graphs model relationships -- social networks, dependencies, routing, recommendations. Knowing when to apply each traversal is the key system design skill: BFS for unweighted paths, Dijkstra for weighted, topological sort for DAGs.

Graph Algorithms

Graph Representations

Aspect Adjacency List Adjacency Matrix
Space O(V + E) O(V²)
Check edge (u, v) O(degree(u)) O(1)
Find all neighbors O(degree(u)) O(V)
Add edge O(1) O(1)
Best for Sparse graphs (V=1M, E=5M) Dense graphs (E ≈ V²)
Cache locality Poor (pointer chasing) Good (sequential memory)

Default: Use adjacency list for most problems. Switch to matrix only if E > V² / 10 or edge lookup is a bottleneck.

Key Properties by Algorithm

Property Value
Time O(V + E)
Space O(V) for queue
Works on Unweighted, weighted (ignores weights)
Finds Shortest path in unweighted graph
Explores Level by level (all neighbors before going deeper)
Data structure Queue
Property Value
Time O(V + E)
Space O(V) for stack/recursion
Works on Unweighted, weighted
Finds Connected components, cycles, topological order
Explores Deep into a path before backtracking
Data structure Stack or recursion

Dijkstra’s Algorithm

Property Value
Time O((V + E) log V) with binary heap
Time O(V²) with simple array
Space O(V) for priority queue
Works on Non-negative edge weights only
Finds Shortest path from source to all nodes
Doesn’t work Negative weights, negative cycles
Data structure Min-heap (priority queue)

Topological Sort

Property Value
Time O(V + E)
Space O(V)
Works on Directed acyclic graphs (DAG) only
Finds Linear ordering respecting all edges
Use cases Build dependency order, task scheduling, precedence
Algorithms DFS post-order or Kahn’s (BFS-based)

When to Use / Avoid

✅ Use BFS

  • Shortest path in unweighted graph — guaranteed shortest by distance count
  • Web crawler — visit pages level by level (hop distance)
  • Social networks — “degrees of separation”, “2nd connections”
  • Bipartite check — 2-color a graph layer by layer
  • Virus spread simulation — infect neighbors, then their neighbors

❌ Avoid BFS

  • Weighted shortest path — ignores edge weights; use Dijkstra
  • Large graphs with sparse querying — BFS must explore entire component; use directed DFS instead
  • Memory-critical — uses O(V) queue space; DFS uses O(depth) recursion

✅ Use DFS

  • Cycle detection — mark visiting/visited; back edges indicate cycles
  • Topological sort — DFS post-order gives valid ordering
  • Connected components — flood-fill style component labeling
  • Strongly connected components — Tarjan or Kosaraju algorithms (both use DFS)
  • Maze solving — backtrack when stuck (natural stack behavior)

❌ Avoid DFS

  • Shortest path (unweighted) — BFS is simpler and guaranteed optimal
  • Level-order traversal — BFS is natural; DFS requires explicit level tracking
  • Stack overflow risk — deep graphs trigger recursion depth limit; use iterative BFS instead

✅ Use Dijkstra

  • Shortest path with positive weights — GPS routing, network delay
  • Single-source shortest paths to all nodes — build distance table once
  • Network routing protocols — OSPF, IS-IS compute tree from each router
  • Game pathfinding — A* improves Dijkstra with heuristics

❌ Avoid Dijkstra

  • Negative edge weights — Dijkstra’s greedy assumption (“the shortest known path to a settled node won’t improve”) breaks when negative edges exist, because a longer path through a negative edge can yield a shorter total. Use Bellman-Ford O(VE) instead, which relaxes all edges V-1 times
  • All-pairs shortest path — use Floyd-Warshall O(V³) or Johnson’s O(VE log V)
  • Very dense graphs — O(V²) simple array version might be faster than heap

✅ Use Topological Sort

  • Build system task ordering — Make, Gradle, Maven; respect dependencies
  • Microservice startup order — start services with no deps first, then dependents
  • Instruction scheduling — compiler reorders instructions while preserving dependencies
  • Course prerequisite ordering — compute valid enrollment sequence

❌ Avoid Topological Sort

  • Cyclic dependencies — algorithm produces no valid ordering; detect and break cycle first
  • Unordered data — not a DAG; use DFS or BFS for connectivity instead

How Real Systems Use This

Google Maps & Uber — Dijkstra’s Algorithm with Contraction Hierarchies

Google Maps solves “shortest driving route from A to B” using Dijkstra’s algorithm optimized with contraction hierarchies (CH), a technique that precomputes a compressed graph. On a naive Dijkstra of the full US road graph (~20M edges), finding a route between distant cities would explore millions of nodes and take several seconds. Contraction Hierarchies work by preprocessing: repeatedly remove low-importance nodes (small streets), store shortcuts, building a hierarchy. At query time, Dijkstra expands upward through major roads first (getting a “zoomed-out” view), then zooms in for local details. Real numbers: USA road graph with CH reduces query time from ~2 seconds (Dijkstra alone) to ~10-50 milliseconds. Google Maps also uses A* (Dijkstra + heuristic to bias search toward destination), further cutting explored nodes by 50-70%. Distance metric: edge weight = predicted driving time (not road length), incorporating traffic patterns, speed limits, and turn penalties. Reference: From A to B: Algorithms Powering Google Maps Navigation.

LinkedIn — BFS for “Degrees of Separation” & Recommendations

LinkedIn’s “2nd connections” feature uses breadth-first search to find mutual connections. When you visit a profile, LinkedIn queries: “How far is this person from me?” BFS expands your 1st-degree connections (all people you follow), then their 1st-degree connections (your 2nd-degree), up to depth 3. Time complexity: O(V + E) per query, but E is huge (~10 billion edges in the full graph). To avoid full BFS on every query, LinkedIn maintains cached “common connections” for top profiles. Recommendation engine uses graph traversal: “show jobs of companies that my 2nd-degree connections work at.” Practical numbers: computing 2nd-degree connections for 1 million users takes ~5 minutes on distributed BFS (Spark) vs. 3 hours with naive implementation. Reference: LinkedIn Graph Data Architecture.

Gradle & Maven — Topological Sort for Build Dependency Order

Gradle and Maven compile projects with topological sort on the dependency DAG. A project declares dependencies in build.gradle (Gradle) or pom.xml (Maven). Before compiling module C, both B and A must be built. Topological sort determines: build A, then B, then C (any valid topological ordering is correct). Gradle uses a task scheduler that parses build.gradle, constructs a DAG of tasks (compile, test, package), and performs topological sort (Kahn’s algorithm). For a project with 200 modules: naive compilation order could violate deps; topological sort guarantees valid order. Practical: a missing dependency detected at sort time prevents late-stage compile failures. Reference: Gradle Dependency Resolution & Task Ordering.

Kubernetes — Topological Sort for Pod Dependency Resolution

Kubernetes schedules pods respecting pod dependencies using topological sorting. In a microservices deployment, Pod B (web service) depends on Pod A (database); Kubernetes must schedule A before B. The scheduler analyzes depends_on or init containers, builds a DAG, and performs topological sort to compute startup order. For a cluster deploying 500 microservices with deep dependency chains: naive scheduling might start a service before its database; topological sort guarantees valid order. Complexity: O(V + E) where V = pods, E = dependency edges. Real deployment: orchestrating a data pipeline with stages (ingest → clean → analyze → export) uses topological sort to ensure stages run in order. Reference: Kubernetes Scheduling & Pod Dependencies.

OSPF/BGP — Dijkstra for Network Routing

The Open Shortest Path First (OSPF) routing protocol uses Dijkstra’s algorithm to compute shortest paths in a network. Each router runs Dijkstra on the network graph: nodes = routers, edges = links with cost (hop count or latency). Every router floods link-state updates to all neighbors (I’m connected to these neighbors with this cost), building identical local graph copies. Each router runs Dijkstra to compute: “shortest path from me to every other router.” Forwarding table populated with: “to reach router X, send packet via neighbor Y.” Cost metric: usually hop count (cost=1 per link) but can be latency, bandwidth, or custom. Real network: autonomous system with 1000 routers, 5000 links; each Dijkstra run O(E log V) ≈ 50,000 log 1000 ≈ 500,000 ops, computed every 30 seconds or on topology change. BGP (inter-AS routing) uses variants of Dijkstra optimized for scalability. Reference: OSPF Specification — RFC 2328.

Neo4j — Graph Traversal for Pattern Matching & Recommendations

Neo4j’s graph database uses BFS and DFS internally for pattern matching and recommendation queries. Query “find all friends of my friends who live in NYC” uses BFS: traverse 1-hop (friends), 2-hops (friends of friends), filter by location. Query “find longest path from me to a celebrity” uses DFS: explore all paths, track longest. Neo4j stores a property graph: nodes have labels (Person, Company), edges have types (FOLLOWS, WORKS_AT, LIVES_IN). Traversal algorithms exploit this structure. Real use case: recommendation engine queries “users who follow topics I follow” using BFS on the FOLLOWS graph. For a graph with 10M users, 500M FOLLOWS edges: naive traversal (checking all users) takes O(10M) time; BFS from me explores only reachable users (~1000), completing in milliseconds. Neo4j optimizes BFS with index lookups (fast neighbor retrieval) and query planner (reorder traversal steps). Reference: Neo4j Graph Algorithms Library.

PageRank — Graph-Wide Centrality for Relevance

PageRank uses iterative matrix multiplication (power iteration) on the web graph: nodes = web pages, directed edges = links. PageRank(page) = importance score computed as: “what fraction of random walks end at this page?” Approximates: probability that a random web surfer lands here. Algorithm: repeatedly multiply the graph’s transition matrix by a vector of page scores; after ~20-30 iterations, scores converge. PageRank score is used as one signal in Google Search ranking (along with text relevance, links, freshness). Real graph: 50+ billion web pages, trillions of links. Computing PageRank naively O(T * (V + E)) where T = 30 iterations, is infeasible. Google distributes PageRank computation across MapReduce clusters: split graph into partitions, compute local PageRank, iterate. Damping factor (default 0.85) prevents dangling nodes (pages with no outlinks) from accumulating all rank. Reference: PageRank Algorithm — Stanford CS.

Implementation

BFS — Breadth-First Search

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from collections import deque

def bfs(graph, start, goal=None):
    """
    Breadth-first search: explore neighbors level by level.

    Args:
        graph: adjacency list {node: [neighbors]}
        start: starting node
        goal: target node (optional); returns path if specified

    Returns:
        distance dict {node: shortest_distance} or path to goal

    Time: O(V + E)
    Space: O(V) for queue

    Why BFS:
    - Finds shortest path in UNWEIGHTED graphs
    - Explores by distance: 1 hop, 2 hops, etc.
    - Guarantees first reach to goal is shortest
    """
    visited = {start}
    queue = deque([start])
    distance = {start: 0}
    parent = {start: None}

    while queue:
        node = queue.popleft()

        # Early termination if goal found
        if goal and node == goal:
            # Reconstruct path
            path = []
            current = goal
            while current is not None:
                path.append(current)
                current = parent[current]
            return path[::-1]

        # Explore all unvisited neighbors
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                distance[neighbor] = distance[node] + 1
                parent[neighbor] = node
                queue.append(neighbor)

    return distance if not goal else None  # Return all distances if no goal

# Example: social network degrees of separation
if __name__ == "__main__":
    network = {
        'Alice': ['Bob', 'Charlie'],
        'Bob': ['Alice', 'Diana'],
        'Charlie': ['Alice', 'Eve'],
        'Diana': ['Bob'],
        'Eve': ['Charlie', 'Frank'],
        'Frank': ['Eve']
    }

    # Find shortest path from Alice to Frank
    path = bfs(network, 'Alice', 'Frank')
    print(f"Path from Alice to Frank: {path}")
    # Output: ['Alice', 'Charlie', 'Eve', 'Frank']

    # Distance from Alice to all reachable nodes
    distances = bfs(network, 'Alice')
    print(f"Distances from Alice: {distances}")
    # Output: {'Alice': 0, 'Bob': 1, 'Charlie': 1, 'Diana': 2, 'Eve': 2, 'Frank': 3}

DFS — Depth-First Search

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def dfs_iterative(graph, start):
    """
    Iterative DFS using explicit stack (avoids recursion depth limit).

    Args:
        graph: adjacency list {node: [neighbors]}
        start: starting node

    Returns:
        postorder: list of nodes in post-order (used for topological sort)

    Time: O(V + E)
    Space: O(V) for stack + recursion

    Why DFS:
    - Detects cycles (back edges in directed graphs)
    - Topological sort via post-order traversal
    - Connected components via flood fill
    - Requires less memory than BFS queue (especially for sparse trees)
    """
    visited = set()
    stack = [start]
    postorder = []

    # Iterative DFS with explicit stack
    while stack:
        node = stack[-1]  # Peek at top

        if node not in visited:
            visited.add(node)

        # Check if any unvisited neighbors remain
        has_unvisited = False
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                has_unvisited = True
                stack.append(neighbor)
                break  # Push one neighbor; continue outer loop

        # All neighbors visited or no neighbors: pop and add to postorder
        if not has_unvisited:
            postorder.append(stack.pop())

    return postorder

def dfs_recursive(graph, start, visited=None, postorder=None):
    """
    Recursive DFS (simpler but risks stack overflow on very deep graphs).
    """
    if visited is None:
        visited = set()
    if postorder is None:
        postorder = []

    visited.add(start)

    for neighbor in graph.get(start, []):
        if neighbor not in visited:
            dfs_recursive(graph, neighbor, visited, postorder)

    postorder.append(start)
    return postorder

# Example: detect cycle in directed graph
if __name__ == "__main__":
    # Graph with a cycle: A -> B -> C -> A
    graph_cyclic = {
        'A': ['B'],
        'B': ['C'],
        'C': ['A'],  # Creates cycle
    }

    def has_cycle(graph):
        """Check if directed graph has cycle using DFS coloring."""
        color = {}  # 0=white (unvisited), 1=gray (visiting), 2=black (visited)

        def visit(node):
            if node in color:
                return color[node] == 1  # Cycle if visiting a gray node

            color[node] = 1  # Mark as visiting (gray)
            for neighbor in graph.get(node, []):
                if visit(neighbor):
                    return True
            color[node] = 2  # Mark as visited (black)
            return False

        for node in graph:
            if node not in color:
                if visit(node):
                    return True
        return False

    print(f"Cyclic graph has cycle: {has_cycle(graph_cyclic)}")  # True
    print(f"DAG has cycle: {has_cycle({'X': ['Y'], 'Y': ['Z'], 'Z': []})}")  # False

Dijkstra’s Algorithm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import heapq

def dijkstra(graph, start):
    """
    Dijkstra's shortest path algorithm (greedy with min-heap priority queue).

    Args:
        graph: adjacency list {node: [(neighbor, weight), ...]}
        start: source node

    Returns:
        distance: {node: shortest_distance_from_start}
        parent: {node: previous_node_on_shortest_path}

    Time: O((V + E) log V) with binary heap
    Space: O(V)

    Key insight:
    - Always expand the closest unvisited node (greedy)
    - Min-heap ensures we process nodes in distance order
    - Fails on negative weights because the greedy "settle closest node" assumption breaks
    """
    distance = {node: float('inf') for node in graph}
    distance[start] = 0
    parent = {start: None}

    # Min-heap of (distance, node)
    heap = [(0, start)]

    while heap:
        curr_dist, node = heapq.heappop(heap)

        # Skip if already processed with better distance
        if curr_dist > distance[node]:
            continue

        # Relax edges to neighbors
        for neighbor, weight in graph.get(node, []):
            new_dist = distance[node] + weight
            if new_dist < distance[neighbor]:
                distance[neighbor] = new_dist
                parent[neighbor] = node
                heapq.heappush(heap, (new_dist, neighbor))

    return distance, parent

def reconstruct_path(parent, target):
    """Reconstruct path from start to target using parent pointers."""
    path = []
    current = target
    while current is not None:
        path.append(current)
        current = parent[current]
    return path[::-1]

# Example: GPS routing
if __name__ == "__main__":
    # Graph: (source, [(destination, distance), ...])
    graph = {
        'A': [('B', 4), ('C', 2)],
        'B': [('C', 1), ('D', 5)],
        'C': [('D', 8), ('E', 10)],
        'D': [('E', 2)],
        'E': []
    }

    distance, parent = dijkstra(graph, 'A')
    print(f"Shortest distances from A: {distance}")
    # Output: {'A': 0, 'B': 4, 'C': 2, 'D': 9, 'E': 11}

    print(f"Shortest path A -> E: {reconstruct_path(parent, 'E')}")
    # Output: ['A', 'C', 'D', 'E'] with total distance 11

Algorithm Decision Guide

Problem Algorithm Complexity When to Use
Shortest path (unweighted) BFS O(V + E) Social networks, web crawling
Shortest path (weighted, positive) Dijkstra O((V+E) log V) GPS routing, network protocols
Shortest path (negative weights) Bellman-Ford O(VE) Currency arbitrage detection
All-pairs shortest paths Floyd-Warshall O(V³) Small networks, all routes needed
Detect cycle (undirected) BFS/DFS + union-find O(V + E) Dependency detection
Detect cycle (directed) DFS coloring O(V + E) Build system cycle detection
Task ordering / dependencies Topological sort O(V + E) Gradle, Make, Kubernetes
Minimum spanning tree Prim / Kruskal O(E log V) Network design, clustering
Strongly connected components Tarjan / Kosaraju O(V + E) Condensation graph analysis
Bipartite check BFS 2-coloring O(V + E) Matching problems
Centrality / importance PageRank / Betweenness O(iterations × E) Ranking, influence

References

This post is licensed under CC BY 4.0 by the author.