Graph Algorithms

Graphs model relationships -- social networks, dependencies, routing, recommendations. Knowing when to apply each traversal is the key system design skill: BFS for unweighted paths, Dijkstra for weighted, topological sort for DAGs.

Posted Jul 25, 2023

16 min read

Graph Algorithms

Graph Representations

Aspect	Adjacency List	Adjacency Matrix
Space	O(V + E)	O(V²)
Check edge (u, v)	O(degree(u))	O(1)
Find all neighbors	O(degree(u))	O(V)
Add edge	O(1)	O(1)
Best for	Sparse graphs (V=1M, E=5M)	Dense graphs (E ≈ V²)
Cache locality	Poor (pointer chasing)	Good (sequential memory)

Default: Use adjacency list for most problems. Switch to matrix only if E > V² / 10 or edge lookup is a bottleneck.

Key Properties by Algorithm

BFS — Breadth-First Search

Property	Value
Time	O(V + E)
Space	O(V) for queue
Works on	Unweighted, weighted (ignores weights)
Finds	Shortest path in unweighted graph
Explores	Level by level (all neighbors before going deeper)
Data structure	Queue

DFS — Depth-First Search

Property	Value
Time	O(V + E)
Space	O(V) for stack/recursion
Works on	Unweighted, weighted
Finds	Connected components, cycles, topological order
Explores	Deep into a path before backtracking
Data structure	Stack or recursion

Dijkstra’s Algorithm

Property	Value
Time	O((V + E) log V) with binary heap
Time	O(V²) with simple array
Space	O(V) for priority queue
Works on	Non-negative edge weights only
Finds	Shortest path from source to all nodes
Doesn’t work	Negative weights, negative cycles
Data structure	Min-heap (priority queue)

Topological Sort

Property	Value
Time	O(V + E)
Space	O(V)
Works on	Directed acyclic graphs (DAG) only
Finds	Linear ordering respecting all edges
Use cases	Build dependency order, task scheduling, precedence
Algorithms	DFS post-order or Kahn’s (BFS-based)

When to Use / Avoid

✅ Use BFS

Shortest path in unweighted graph — guaranteed shortest by distance count
Web crawler — visit pages level by level (hop distance)
Social networks — “degrees of separation”, “2nd connections”
Bipartite check — 2-color a graph layer by layer
Virus spread simulation — infect neighbors, then their neighbors

❌ Avoid BFS

Weighted shortest path — ignores edge weights; use Dijkstra
Large graphs with sparse querying — BFS must explore entire component; use directed DFS instead
Memory-critical — uses O(V) queue space; DFS uses O(depth) recursion

✅ Use DFS

Cycle detection — mark visiting/visited; back edges indicate cycles
Topological sort — DFS post-order gives valid ordering
Connected components — flood-fill style component labeling
Strongly connected components — Tarjan or Kosaraju algorithms (both use DFS)
Maze solving — backtrack when stuck (natural stack behavior)

❌ Avoid DFS

Shortest path (unweighted) — BFS is simpler and guaranteed optimal
Level-order traversal — BFS is natural; DFS requires explicit level tracking
Stack overflow risk — deep graphs trigger recursion depth limit; use iterative BFS instead

✅ Use Dijkstra

Shortest path with positive weights — GPS routing, network delay
Single-source shortest paths to all nodes — build distance table once
Network routing protocols — OSPF, IS-IS compute tree from each router
Game pathfinding — A* improves Dijkstra with heuristics

❌ Avoid Dijkstra

Negative edge weights — Dijkstra’s greedy assumption (“the shortest known path to a settled node won’t improve”) breaks when negative edges exist, because a longer path through a negative edge can yield a shorter total. Use Bellman-Ford O(VE) instead, which relaxes all edges V-1 times
All-pairs shortest path — use Floyd-Warshall O(V³) or Johnson’s O(VE log V)
Very dense graphs — O(V²) simple array version might be faster than heap

✅ Use Topological Sort

Build system task ordering — Make, Gradle, Maven; respect dependencies
Microservice startup order — start services with no deps first, then dependents
Instruction scheduling — compiler reorders instructions while preserving dependencies
Course prerequisite ordering — compute valid enrollment sequence

❌ Avoid Topological Sort

Cyclic dependencies — algorithm produces no valid ordering; detect and break cycle first
Unordered data — not a DAG; use DFS or BFS for connectivity instead

How Real Systems Use This

Google Maps & Uber — Dijkstra’s Algorithm with Contraction Hierarchies

Google Maps solves “shortest driving route from A to B” using Dijkstra’s algorithm optimized with contraction hierarchies (CH), a technique that precomputes a compressed graph. On a naive Dijkstra of the full US road graph (~20M edges), finding a route between distant cities would explore millions of nodes and take several seconds. Contraction Hierarchies work by preprocessing: repeatedly remove low-importance nodes (small streets), store shortcuts, building a hierarchy. At query time, Dijkstra expands upward through major roads first (getting a “zoomed-out” view), then zooms in for local details. Real numbers: USA road graph with CH reduces query time from ~2 seconds (Dijkstra alone) to ~10-50 milliseconds. Google Maps also uses A* (Dijkstra + heuristic to bias search toward destination), further cutting explored nodes by 50-70%. Distance metric: edge weight = predicted driving time (not road length), incorporating traffic patterns, speed limits, and turn penalties. Reference: From A to B: Algorithms Powering Google Maps Navigation.

LinkedIn — BFS for “Degrees of Separation” & Recommendations

LinkedIn’s “2nd connections” feature uses breadth-first search to find mutual connections. When you visit a profile, LinkedIn queries: “How far is this person from me?” BFS expands your 1st-degree connections (all people you follow), then their 1st-degree connections (your 2nd-degree), up to depth 3. Time complexity: O(V + E) per query, but E is huge (~10 billion edges in the full graph). To avoid full BFS on every query, LinkedIn maintains cached “common connections” for top profiles. Recommendation engine uses graph traversal: “show jobs of companies that my 2nd-degree connections work at.” Practical numbers: computing 2nd-degree connections for 1 million users takes ~5 minutes on distributed BFS (Spark) vs. 3 hours with naive implementation. Reference: LinkedIn Graph Data Architecture.

Gradle & Maven — Topological Sort for Build Dependency Order

Gradle and Maven compile projects with topological sort on the dependency DAG. A project declares dependencies in build.gradle (Gradle) or pom.xml (Maven). Before compiling module C, both B and A must be built. Topological sort determines: build A, then B, then C (any valid topological ordering is correct). Gradle uses a task scheduler that parses build.gradle, constructs a DAG of tasks (compile, test, package), and performs topological sort (Kahn’s algorithm). For a project with 200 modules: naive compilation order could violate deps; topological sort guarantees valid order. Practical: a missing dependency detected at sort time prevents late-stage compile failures. Reference: Gradle Dependency Resolution & Task Ordering.

Kubernetes — Topological Sort for Pod Dependency Resolution

Kubernetes schedules pods respecting pod dependencies using topological sorting. In a microservices deployment, Pod B (web service) depends on Pod A (database); Kubernetes must schedule A before B. The scheduler analyzes depends_on or init containers, builds a DAG, and performs topological sort to compute startup order. For a cluster deploying 500 microservices with deep dependency chains: naive scheduling might start a service before its database; topological sort guarantees valid order. Complexity: O(V + E) where V = pods, E = dependency edges. Real deployment: orchestrating a data pipeline with stages (ingest → clean → analyze → export) uses topological sort to ensure stages run in order. Reference: Kubernetes Scheduling & Pod Dependencies.

OSPF/BGP — Dijkstra for Network Routing

The Open Shortest Path First (OSPF) routing protocol uses Dijkstra’s algorithm to compute shortest paths in a network. Each router runs Dijkstra on the network graph: nodes = routers, edges = links with cost (hop count or latency). Every router floods link-state updates to all neighbors (I’m connected to these neighbors with this cost), building identical local graph copies. Each router runs Dijkstra to compute: “shortest path from me to every other router.” Forwarding table populated with: “to reach router X, send packet via neighbor Y.” Cost metric: usually hop count (cost=1 per link) but can be latency, bandwidth, or custom. Real network: autonomous system with 1000 routers, 5000 links; each Dijkstra run O(E log V) ≈ 50,000 log 1000 ≈ 500,000 ops, computed every 30 seconds or on topology change. BGP (inter-AS routing) uses variants of Dijkstra optimized for scalability. Reference: OSPF Specification — RFC 2328.

Neo4j — Graph Traversal for Pattern Matching & Recommendations

Neo4j’s graph database uses BFS and DFS internally for pattern matching and recommendation queries. Query “find all friends of my friends who live in NYC” uses BFS: traverse 1-hop (friends), 2-hops (friends of friends), filter by location. Query “find longest path from me to a celebrity” uses DFS: explore all paths, track longest. Neo4j stores a property graph: nodes have labels (Person, Company), edges have types (FOLLOWS, WORKS_AT, LIVES_IN). Traversal algorithms exploit this structure. Real use case: recommendation engine queries “users who follow topics I follow” using BFS on the FOLLOWS graph. For a graph with 10M users, 500M FOLLOWS edges: naive traversal (checking all users) takes O(10M) time; BFS from me explores only reachable users (~1000), completing in milliseconds. Neo4j optimizes BFS with index lookups (fast neighbor retrieval) and query planner (reorder traversal steps). Reference: Neo4j Graph Algorithms Library.

PageRank — Graph-Wide Centrality for Relevance

PageRank uses iterative matrix multiplication (power iteration) on the web graph: nodes = web pages, directed edges = links. PageRank(page) = importance score computed as: “what fraction of random walks end at this page?” Approximates: probability that a random web surfer lands here. Algorithm: repeatedly multiply the graph’s transition matrix by a vector of page scores; after ~20-30 iterations, scores converge. PageRank score is used as one signal in Google Search ranking (along with text relevance, links, freshness). Real graph: 50+ billion web pages, trillions of links. Computing PageRank naively O(T * (V + E)) where T = 30 iterations, is infeasible. Google distributes PageRank computation across MapReduce clusters: split graph into partitions, compute local PageRank, iterate. Damping factor (default 0.85) prevents dangling nodes (pages with no outlinks) from accumulating all rank. Reference: PageRank Algorithm — Stanford CS.

Implementation

BFS — Breadth-First Search

        
      
from collections import deque

def bfs(graph, start, goal=None):
    """
    Breadth-first search: explore neighbors level by level.

    Args:
        graph: adjacency list {node: [neighbors]}
        start: starting node
        goal: target node (optional); returns path if specified

    Returns:
        distance dict {node: shortest_distance} or path to goal

    Time: O(V + E)
    Space: O(V) for queue

    Why BFS:
    - Finds shortest path in UNWEIGHTED graphs
    - Explores by distance: 1 hop, 2 hops, etc.
    - Guarantees first reach to goal is shortest
    """
    visited = {start}
    queue = deque([start])
    distance = {start: 0}
    parent = {start: None}

    while queue:
        node = queue.popleft()

        # Early termination if goal found
        if goal and node == goal:
            # Reconstruct path
            path = []
            current = goal
            while current is not None:
                path.append(current)
                current = parent[current]
            return path[::-1]

        # Explore all unvisited neighbors
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                distance[neighbor] = distance[node] + 1
                parent[neighbor] = node
                queue.append(neighbor)

    return distance if not goal else None  # Return all distances if no goal

# Example: social network degrees of separation
if __name__ == "__main__":
    network = {
        'Alice': ['Bob', 'Charlie'],
        'Bob': ['Alice', 'Diana'],
        'Charlie': ['Alice', 'Eve'],
        'Diana': ['Bob'],
        'Eve': ['Charlie', 'Frank'],
        'Frank': ['Eve']
    }

    # Find shortest path from Alice to Frank
    path = bfs(network, 'Alice', 'Frank')
    print(f"Path from Alice to Frank: {path}")
    # Output: ['Alice', 'Charlie', 'Eve', 'Frank']

    # Distance from Alice to all reachable nodes
    distances = bfs(network, 'Alice')
    print(f"Distances from Alice: {distances}")
    # Output: {'Alice': 0, 'Bob': 1, 'Charlie': 1, 'Diana': 2, 'Eve': 2, 'Frank': 3}

DFS — Depth-First Search

        
      
def dfs_iterative(graph, start):
    """
    Iterative DFS using explicit stack (avoids recursion depth limit).

    Args:
        graph: adjacency list {node: [neighbors]}
        start: starting node

    Returns:
        postorder: list of nodes in post-order (used for topological sort)

    Time: O(V + E)
    Space: O(V) for stack + recursion

    Why DFS:
    - Detects cycles (back edges in directed graphs)
    - Topological sort via post-order traversal
    - Connected components via flood fill
    - Requires less memory than BFS queue (especially for sparse trees)
    """
    visited = set()
    stack = [start]
    postorder = []

    # Iterative DFS with explicit stack
    while stack:
        node = stack[-1]  # Peek at top

        if node not in visited:
            visited.add(node)

        # Check if any unvisited neighbors remain
        has_unvisited = False
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                has_unvisited = True
                stack.append(neighbor)
                break  # Push one neighbor; continue outer loop

        # All neighbors visited or no neighbors: pop and add to postorder
        if not has_unvisited:
            postorder.append(stack.pop())

    return postorder

def dfs_recursive(graph, start, visited=None, postorder=None):
    """
    Recursive DFS (simpler but risks stack overflow on very deep graphs).
    """
    if visited is None:
        visited = set()
    if postorder is None:
        postorder = []

    visited.add(start)

    for neighbor in graph.get(start, []):
        if neighbor not in visited:
            dfs_recursive(graph, neighbor, visited, postorder)

    postorder.append(start)
    return postorder

# Example: detect cycle in directed graph
if __name__ == "__main__":
    # Graph with a cycle: A -> B -> C -> A
    graph_cyclic = {
        'A': ['B'],
        'B': ['C'],
        'C': ['A'],  # Creates cycle
    }

    def has_cycle(graph):
        """Check if directed graph has cycle using DFS coloring."""
        color = {}  # 0=white (unvisited), 1=gray (visiting), 2=black (visited)

        def visit(node):
            if node in color:
                return color[node] == 1  # Cycle if visiting a gray node

            color[node] = 1  # Mark as visiting (gray)
            for neighbor in graph.get(node, []):
                if visit(neighbor):
                    return True
            color[node] = 2  # Mark as visited (black)
            return False

        for node in graph:
            if node not in color:
                if visit(node):
                    return True
        return False

    print(f"Cyclic graph has cycle: {has_cycle(graph_cyclic)}")  # True
    print(f"DAG has cycle: {has_cycle({'X': ['Y'], 'Y': ['Z'], 'Z': []})}")  # False

Dijkstra’s Algorithm

        
      
import heapq

def dijkstra(graph, start):
    """
    Dijkstra's shortest path algorithm (greedy with min-heap priority queue).

    Args:
        graph: adjacency list {node: [(neighbor, weight), ...]}
        start: source node

    Returns:
        distance: {node: shortest_distance_from_start}
        parent: {node: previous_node_on_shortest_path}

    Time: O((V + E) log V) with binary heap
    Space: O(V)

    Key insight:
    - Always expand the closest unvisited node (greedy)
    - Min-heap ensures we process nodes in distance order
    - Fails on negative weights because the greedy "settle closest node" assumption breaks
    """
    distance = {node: float('inf') for node in graph}
    distance[start] = 0
    parent = {start: None}

    # Min-heap of (distance, node)
    heap = [(0, start)]

    while heap:
        curr_dist, node = heapq.heappop(heap)

        # Skip if already processed with better distance
        if curr_dist > distance[node]:
            continue

        # Relax edges to neighbors
        for neighbor, weight in graph.get(node, []):
            new_dist = distance[node] + weight
            if new_dist < distance[neighbor]:
                distance[neighbor] = new_dist
                parent[neighbor] = node
                heapq.heappush(heap, (new_dist, neighbor))

    return distance, parent

def reconstruct_path(parent, target):
    """Reconstruct path from start to target using parent pointers."""
    path = []
    current = target
    while current is not None:
        path.append(current)
        current = parent[current]
    return path[::-1]

# Example: GPS routing
if __name__ == "__main__":
    # Graph: (source, [(destination, distance), ...])
    graph = {
        'A': [('B', 4), ('C', 2)],
        'B': [('C', 1), ('D', 5)],
        'C': [('D', 8), ('E', 10)],
        'D': [('E', 2)],
        'E': []
    }

    distance, parent = dijkstra(graph, 'A')
    print(f"Shortest distances from A: {distance}")
    # Output: {'A': 0, 'B': 4, 'C': 2, 'D': 9, 'E': 11}

    print(f"Shortest path A -> E: {reconstruct_path(parent, 'E')}")
    # Output: ['A', 'C', 'D', 'E'] with total distance 11

Algorithm Decision Guide

Problem	Algorithm	Complexity	When to Use
Shortest path (unweighted)	BFS	O(V + E)	Social networks, web crawling
Shortest path (weighted, positive)	Dijkstra	O((V+E) log V)	GPS routing, network protocols
Shortest path (negative weights)	Bellman-Ford	O(VE)	Currency arbitrage detection
All-pairs shortest paths	Floyd-Warshall	O(V³)	Small networks, all routes needed
Detect cycle (undirected)	BFS/DFS + union-find	O(V + E)	Dependency detection
Detect cycle (directed)	DFS coloring	O(V + E)	Build system cycle detection
Task ordering / dependencies	Topological sort	O(V + E)	Gradle, Make, Kubernetes
Minimum spanning tree	Prim / Kruskal	O(E log V)	Network design, clustering
Strongly connected components	Tarjan / Kosaraju	O(V + E)	Condensation graph analysis
Bipartite check	BFS 2-coloring	O(V + E)	Matching problems
Centrality / importance	PageRank / Betweenness	O(iterations × E)	Ranking, influence

References

Dijkstra’s Shortest Path Algorithm — Dijkstra (1959) — Original paper; foundational for shortest path computation.
Introduction to Algorithms (CLRS) — Cormen, Leiserson, Rivest, Stein — Chapters 22–24 cover BFS, DFS, shortest paths, topological sort; industry standard reference.
MIT 6.006 — Graph Algorithms — Video lectures on BFS, DFS, Dijkstra, topological sort with visualizations.
The Google PageRank Algorithm — Page, Brin, et al. (1998) — Explains PageRank as random walk on web graph; iterative computation.
From A to B: Algorithms Powering Google Maps Navigation — Details on Dijkstra optimizations (contraction hierarchies), A* search, real-world routing.
Neo4j Graph Data Science — BFS/DFS Algorithms — Built-in graph traversal in production graph database; query patterns.
OSPF Specification — RFC 2328 — Dijkstra’s algorithm in network routing; link-state flooding, shortest path tree.
ByteByteGo — Graph Algorithms — Visual explanation of traversal algorithms and real-world applications; practical examples.

Software Architecture, Algorithms

algorithms

This post is licensed under CC BY 4.0 by the author.

Graph Representations

Key Properties by Algorithm

BFS — Breadth-First Search

DFS — Depth-First Search

Dijkstra’s Algorithm

Topological Sort

When to Use / Avoid

✅ Use BFS

❌ Avoid BFS

✅ Use DFS

❌ Avoid DFS

✅ Use Dijkstra

❌ Avoid Dijkstra

✅ Use Topological Sort

❌ Avoid Topological Sort

How Real Systems Use This

Google Maps & Uber — Dijkstra’s Algorithm with Contraction Hierarchies

LinkedIn — BFS for “Degrees of Separation” & Recommendations

Gradle & Maven — Topological Sort for Build Dependency Order

Kubernetes — Topological Sort for Pod Dependency Resolution

OSPF/BGP — Dijkstra for Network Routing

Neo4j — Graph Traversal for Pattern Matching & Recommendations

PageRank — Graph-Wide Centrality for Relevance

Implementation

BFS — Breadth-First Search

DFS — Depth-First Search

Dijkstra’s Algorithm

Algorithm Decision Guide

References

Trending Tags