Union-Find (Disjoint Set Union) - Complete Guide
Union-Find (Disjoint Set Union) - Complete Guide
Introduction
The Union-Find data structure, also known as the Disjoint Set Union (DSU), is a powerful tool for managing a collection of disjoint sets. It is particularly useful in graph algorithms where we need to dynamically track connected components or detect cycles in an undirected graph.
Unlike traditional graph traversals (BFS/DFS) which can be expensive if run repeatedly on the same graph, Union-Find allows us to perform these connectivity checks in near-constant time using Path Compression and Union by Rank optimizations.
Core Concepts
1. The Disjoint Set
A disjoint set is a set of elements partitioned into a number of non-overlapping subsets. In the context of a graph, each subset represents a connected component.
2. The Operations
The DSU supports two primary operations:
- Find(x): Determines which subset a particular element
xis in. It returns the representative (or root) of the subset containingx. - Union(x, y): Merges the subsets containing elements
xandy. If they are already in the same subset, this operation does nothing.
Naive Implementation
In a naive implementation, we can represent the sets using a tree structure where each node points to its parent. The root node points to itself.
function find(x: number): number {
if (x === parent[x]) return x;
return find(parent[x]);
}
function union(x: number, y: number): void {
const rootX = find(x);
const rootY = find(y);
if (rootX !== rootY) {
parent[rootX] = rootY;
}
}The Problem: If we simply attach one tree to another, the tree can become skewed (degenerate into a linked list). In the worst case, find operations can take O(N) time, making the structure inefficient for large graphs.
Optimizations
To ensure near-constant time complexity, we implement two optimizations:
1. Path Compression
Path compression flattens the structure of the tree during the find operation. When we find the root of a node, we update the parent of every node along the path to point directly to the root.
Visual:
Before: 4 -> 3 -> 1
After: 4 -> 1, 3 -> 1
This makes future find operations on these nodes instant.
2. Union by Rank (or Size)
Union by Rank ensures that the tree remains balanced. When merging two sets, we attach the tree with the smaller height (rank) under the tree with the larger height.
- Rank: An upper bound on the height of the tree.
- Size: The number of nodes in the tree.
This prevents the tree from becoming skewed during the union operation.
Complete Implementation (TypeScript)
Here is a robust implementation of the Union-Find data structure using both optimizations.
class UnionFind {
private parent: number[];
private rank: number[];
constructor(size: number) {
this.parent = new Array(size);
this.rank = new Array(size).fill(0);
// Initialize each element as its own parent (self-loop)
for (let i = 0; i < size; i++) {
this.parent[i] = i;
}
}
/**
* Finds the root of the set containing x.
* Implements Path Compression.
*/
find(x: number): number {
if (this.parent[x] !== x) {
// Path Compression: Make x point directly to the root
this.parent[x] = this.find(this.parent[x]);
}
return this.parent[x];
}
/**
* Unions the sets containing x and y.
* Implements Union by Rank.
* Returns true if they were in different sets, false otherwise.
*/
union(x: number, y: number): boolean {
const rootX = this.find(x);
const rootY = this.find(y);
// If they are already in the same set, no union needed
if (rootX === rootY) return false;
// Union by Rank: Attach smaller rank tree under larger rank tree
if (this.rank[rootX] < this.rank[rootY]) {
this.parent[rootX] = rootY;
} else if (this.rank[rootX] > this.rank[rootY]) {
this.parent[rootY] = rootX;
} else {
// If ranks are equal, attach one to the other and increment rank
this.parent[rootY] = rootX;
this.rank[rootX]++;
}
return true;
}
/**
* Checks if two elements are in the same set.
*/
connected(x: number, y: number): boolean {
return this.find(x) === this.find(y);
}
}
// Usage Example
const uf = new UnionFind(5); // 5 elements: 0, 1, 2, 3, 4
uf.union(0, 1); // Merge 0 and 1
uf.union(2, 3); // Merge 2 and 3
console.log(uf.connected(0, 1)); // true
console.log(uf.connected(0, 2)); // false
uf.union(1, 2); // Merge 1 and 2 (connects the two components)
console.log(uf.connected(0, 2)); // trueComplexity Analysis
With both optimizations applied, the amortized time complexity for both find and union operations is O(α(N)), where α(N) is the Inverse Ackermann function.
- α(N) grows extremely slowly. For all practical purposes (even for N up to 10^6 or 10^7), α(N) is less than 5.
- Therefore, we treat the time complexity as O(1).
Applications in Graph Algorithms
1. Kruskal's Algorithm (Minimum Spanning Tree)
Kruskal's algorithm sorts all edges by weight and adds them to the MST one by one. Before adding an edge (u, v), we must check if u and v are already connected. If they are, adding the edge would create a cycle. Union-Find is the perfect data structure for this check.
2. Detecting Cycles in an Undirected Graph
While processing edges, if find(u) === find(v) for an edge (u, v), it means u and v are already connected, forming a cycle.
3. Number of Connected Components
Initialize a Union-Find structure with N nodes. Iterate through all edges and perform union. Finally, count the number of unique roots (elements where parent[i] === i).
4. Social Networks (Friends of Friends)
Used to determine if two people are connected through a chain of friends (directly or indirectly) or to find the number of friend groups in a social network.
Common Mistakes to Avoid
- Forgetting Path Compression: Without path compression, the
findoperation degrades to O(N) in the worst case, making the structure useless for large graphs. - Incorrect Rank Updates: When merging two trees of equal rank, ensure you increment the rank of the new root. If you forget this, the tree height might grow unnecessarily.
- 1-based vs 0-based Indexing: Be consistent. If your graph nodes are labeled 1 to N, ensure your Union-Find array size is N+1 and you initialize index 0 as well.
Summary
The Union-Find data structure is an essential tool for any competitive programmer or software engineer dealing with graphs. Its ability to handle dynamic connectivity problems with near-constant time complexity makes it superior to naive recursive or iterative approaches for problems involving cycle detection and connected components.