Topological Sort
Introduction
Concept: Topological sort produces a linear ordering of nodes in a directed acyclic graph (DAG) such that each node comes before all nodes it points to. If there’s an edge U → V, then U appears earlier than V. This is only possible if the graph has no cycles.
Typical Use Cases: Scheduling problems (e.g. course prerequisite ordering, build systems), task dependency resolution. Interview problems like Course Schedule or Build Order are classic examples.
Keywords/Cues: “Prerequisites,” “dependency,” “precede,” “order of tasks,” “before/after relationship,” or explicit mention of a DAG.
Core Idea / Mental Model
Think of ordering tasks by prerequisites – always pick a task with no remaining prerequisites (no incoming edges), do it, then remove it and update prerequisites for others. This repeats until all tasks are ordered. A mental model is “repeatedly remove nodes with no incoming edges.”
Analogy: Building a house: you can’t paint walls before the foundation is laid. Maintain a list of tasks ready to do, do one, and update which new tasks become free.
Common methods:
- Kahn’s Algorithm (BFS): Track in-degrees. Pick nodes with in-degree 0, output, remove their outgoing edges (reducing neighbors' in-degrees). Detects cycles if no nodes have in-degree 0 but unprocessed nodes remain.
- DFS-based: Perform DFS. When finishing a node, add to a stack (or front of list). Produces reverse post-order, a topological order.
Base Template (Python - Kahn's Algorithm)
from collections import deque
def topo_sort(graph):
# graph: dict mapping node -> list of adjacent nodes
indegree = {node: 0 for node in graph}
for u in graph:
for v in graph[u]:
indegree[v] += 1
queue = deque([node for node in graph if indegree[node] == 0])
topo_order = []
while queue:
node = queue.popleft()
topo_order.append(node)
for nei in graph[node]:
indegree[nei] -= 1
if indegree[nei] == 0:
queue.append(nei)
if len(topo_order) != len(graph):
raise ValueError("Graph is not a DAG - cycle detected")
return topo_order
# Example: graph = {"A":["B","C"], "B":["D"], "C":["D"], "D":[]}
# order = topo_sort(graph) -> ['A', 'B', 'C', 'D'] (or A,C,B,D)
graph = {A:[B,C], B:[D], C:[D], D:[]}
- In-degrees: {A:0, B:1, C:1, D:2}. Queue: [A]. Order: [].
- Dequeue A. Order: [A]. B in-degree: 0, C in-degree: 0. Enqueue B, C. Queue: [B,C].
- Dequeue B. Order: [A,B]. D in-degree: 1. Queue: [C].
- Dequeue C. Order: [A,B,C]. D in-degree: 0. Enqueue D. Queue: [D].
- Dequeue D. Order: [A,B,C,D]. Queue: [].
- Result: [A,B,C,D].
Time and Space Complexity
Time: O(V + E) for both Kahn’s and DFS-based. Each node and edge visited once.
Space: O(V + E) for graph, in-degree map/visited set O(V), queue/stack O(V).
Common Pitfalls
- Forgetting cycle checks (topo sort only for DAGs).
- Not initializing all nodes (especially isolated ones).
- Queue/stack misuse (e.g., using wrong data structure or wrong DFS order).
- Incorrectly updating in-degrees.
- Assuming a unique topological order (multiple can exist).
- Handling disconnected graphs (ensure all components are processed).
Dijkstra’s Algorithm
Introduction
Concept: Finds shortest path distances from a single source to all other nodes in a weighted graph (non-negative edge weights). Produces a shortest-path tree.
Typical Use Cases: Pathfinding in maps (GPS), network routing, puzzle/game AI. Problems like minimum cost path in a grid or cheapest flight itinerary.
Keywords/Cues: “Shortest path,” “least cost,” “minimum distance,” weighted graph with positive weights. If negative weights, Dijkstra is not suitable (use Bellman-Ford).
Core Idea / Mental Model
A greedy extension of BFS for weighted graphs. Explores outward from source, always taking the next “closest” unvisited node. Maintains tentative distances (initially ∞, source 0). Pick node with smallest current distance, finalize it, relax (update) neighbors’ distances.
Mental Model: Water rippling out from source, reaching closest nodes first.
Priority Queue (Min-Heap): Efficiently selects the next closest node (distance, node).
Relaxation: If dist[U] + weight(U,V) < dist[V], update dist[V].
Base Template (Python)
import heapq
def dijkstra(graph, start):
# graph: node -> list of (neighbor, weight)
dist = {node: float('inf') for node in graph}
dist[start] = 0
pq = [(0, start)] # (distance, node)
visited = set()
while pq:
cur_dist, node = heapq.heappop(pq)
if node in visited: continue
visited.add(node)
for neighbor, weight in graph[node]:
new_dist = cur_dist + weight
if new_dist < dist[neighbor]:
dist[neighbor] = new_dist
heapq.heappush(pq, (new_dist, neighbor))
return dist
# graph = {'A':[('B',1),('C',4)], 'B':[('C',2),('D',5)], 'C':[('D',1)], 'D':[]}
# dijkstra(graph, 'A') -> {'A':0, 'B':1, 'C':3, 'D':4}
- Init: dist={A:0, B:inf, C:inf, D:inf}. PQ=[(0,A)]. Visited={}.
- Pop (0,A). Visited={A}. Relax A's neighbors: dist[B]=1, PQ=[(1,B)]; dist[C]=4, PQ=[(1,B),(4,C)].
- Pop (1,B). Visited={A,B}. Relax B's neighbors: dist[C]=min(4, 1+2=3), PQ=[(3,C),(4,C)]; dist[D]=1+5=6, PQ=[(3,C),(4,C),(6,D)].
- Pop (3,C). Visited={A,B,C}. Relax C's neighbors: dist[D]=min(6, 3+1=4), PQ=[(4,C),(4,D),(6,D)].
- Pop (4,C) (stale, C visited).
- Pop (4,D). Visited={A,B,C,D}. No unvisited neighbors. PQ=[(6,D)].
- Pop (6,D) (stale, D visited). PQ empty.
- Result: {A:0, B:1, C:3, D:4}.
Time and Space Complexity
Time: O((V + E) log V) or O(E log V) with a min-heap.
Space: O(V + E) for graph, dist map O(V), priority queue O(V) or O(E), visited set O(V).
Common Pitfalls
- Using with negative edge weights (incorrect results).
- Not using a heap (O(V²) instead of O(E log V)).
- Handling stale entries in priority queue incorrectly.
- Improper initialization of distances (source to 0, others to ∞).
- Early stopping logic errors if only one target is needed.
- Forgetting to reconstruct path if required (need predecessor map).
- Not considering disconnected graph components.
Minimum Spanning Tree (MST)
Introduction
Concept: A subset of edges in a weighted, undirected, connected graph that connects all vertices with the minimum possible total edge weight. It's a spanning tree (no cycles, all vertices connected).
Typical Use Cases: Network design (minimal cost cabling), VLSI chip design, clustering. Problems like "connect all points with minimum total distance."
Keywords/Cues: “Connect all... with minimum cost,” “network connecting all nodes,” “minimum infrastructure,” “least total distance.” Do not confuse with shortest path.
Core Idea / Mental Model
MST algorithms are greedy. Two main types:
- Kruskal’s Algorithm: Sort all edges by weight. Add edges from smallest weight, skipping if it forms a cycle (use Union-Find). Continue until V-1 edges are added.
- Prim’s Algorithm: Start from an arbitrary vertex. Grow the tree by adding the smallest weight edge that connects a new vertex to the current tree. Repeat until all vertices are included.
Mental Model: Pick cheapest connections while avoiding loops, until all nodes are connected. Kruskal's is global; Prim's expands locally.
Base Template (Python - Prim's Algorithm)
import heapq
def prim_mst(graph, start):
# graph: node -> list of (neighbor, weight) for undirected graph
visited = set([start])
mst_edges = []
total_cost = 0
edges_heap = [] # (weight, u, v) where u in visited, v not
for neighbor, weight in graph[start]:
heapq.heappush(edges_heap, (weight, start, neighbor))
while edges_heap and len(visited) < len(graph):
weight, u, v = heapq.heappop(edges_heap)
if v in visited: continue # Avoid cycle
visited.add(v)
mst_edges.append((u, v, weight))
total_cost += weight
for neighbor_of_v, weight_v_n in graph[v]:
if neighbor_of_v not in visited:
heapq.heappush(edges_heap, (weight_v_n, v, neighbor_of_v))
return mst_edges, total_cost
# graph = {1:[(2,2),(3,10)], 2:[(1,2),(3,3)], 3:[(1,10),(2,3)]}
# prim_mst(graph, 1) -> ([(1,2,2),(2,3,3)], 5)
- Visited={1}. Heap=[(2,1,2), (10,1,3)]. MST=[], Cost=0.
- Pop (2,1,2). Add 2 to visited. MST=[(1,2,2)], Cost=2. Add edges from 2: (3,2,3) to heap. Heap=[(3,2,3), (10,1,3)].
- Pop (3,2,3). Add 3 to visited. MST=[(1,2,2),(2,3,3)], Cost=5. Add edges from 3: (10,3,1) - 1 is visited, skip. Heap=[(10,1,3)].
- All visited. Heap has (10,1,3) but 3 is visited. Stop.
- Result: Edges [(1,2,2),(2,3,3)], Cost 5.
Time and Space Complexity
Time: O(E log V) for Prim’s (with heap) and Kruskal’s (sorting edges).
Space: O(V + E) for graph, heap/edge list, visited set/union-find structure.
Common Pitfalls
- Forgetting cycle checks (Kruskal: union-find; Prim's: visited set).
- Handling disconnected graphs (MST applies to connected components).
- Edge cases like single/two-node graphs.
- Not using weights correctly (must pick min-weight valid edge).
- Confusing MST with Shortest Path problems.
- Incorrect Union-Find implementation for Kruskal.
- Ending with incorrect number of edges (MST has V-1 edges).