Union-Find (Disjoint Set Union) - Complete Guide

Introduction

The Union-Find data structure, also known as the Disjoint Set Union (DSU), is a powerful tool for managing a collection of disjoint sets. It is particularly useful in graph algorithms where we need to dynamically track connected components or detect cycles in an undirected graph.

Unlike traditional graph traversals (BFS/DFS) which can be expensive if run repeatedly on the same graph, Union-Find allows us to perform these connectivity checks in near-constant time using Path Compression and Union by Rank optimizations.

Core Concepts

1. The Disjoint Set

A disjoint set is a set of elements partitioned into a number of non-overlapping subsets. In the context of a graph, each subset represents a connected component.

2. The Operations

The DSU supports two primary operations:

Find(x): Determines which subset a particular element x is in. It returns the representative (or root) of the subset containing x.
Union(x, y): Merges the subsets containing elements x and y. If they are already in the same subset, this operation does nothing.

Naive Implementation

In a naive implementation, we can represent the sets using a tree structure where each node points to its parent. The root node points to itself.

function find(x: number): number {
  if (x === parent[x]) return x;
  return find(parent[x]);
}

function union(x: number, y: number): void {
  const rootX = find(x);
  const rootY = find(y);
  if (rootX !== rootY) {
    parent[rootX] = rootY;
  }
}

The Problem: If we simply attach one tree to another, the tree can become skewed (degenerate into a linked list). In the worst case, find operations can take O(N) time, making the structure inefficient for large graphs.

Optimizations

To ensure near-constant time complexity, we implement two optimizations:

1. Path Compression

Path compression flattens the structure of the tree during the find operation. When we find the root of a node, we update the parent of every node along the path to point directly to the root.

Visual:

Before: 4 -> 3 -> 1

After: 4 -> 1, 3 -> 1

This makes future find operations on these nodes instant.

2. Union by Rank (or Size)

Union by Rank ensures that the tree remains balanced. When merging two sets, we attach the tree with the smaller height (rank) under the tree with the larger height.

Rank: An upper bound on the height of the tree.
Size: The number of nodes in the tree.

This prevents the tree from becoming skewed during the union operation.

Complete Implementation (TypeScript)

Here is a robust implementation of the Union-Find data structure using both optimizations.

class UnionFind {
  private parent: number[];
  private rank: number[];

  constructor(size: number) {
    this.parent = new Array(size);
    this.rank = new Array(size).fill(0);

    // Initialize each element as its own parent (self-loop)
    for (let i = 0; i < size; i++) {
      this.parent[i] = i;
    }
  }

  /**
   * Finds the root of the set containing x.
   * Implements Path Compression.
   */
  find(x: number): number {
    if (this.parent[x] !== x) {
      // Path Compression: Make x point directly to the root
      this.parent[x] = this.find(this.parent[x]);
    }
    return this.parent[x];
  }

  /**
   * Unions the sets containing x and y.
   * Implements Union by Rank.
   * Returns true if they were in different sets, false otherwise.
   */
  union(x: number, y: number): boolean {
    const rootX = this.find(x);
    const rootY = this.find(y);

    // If they are already in the same set, no union needed
    if (rootX === rootY) return false;

    // Union by Rank: Attach smaller rank tree under larger rank tree
    if (this.rank[rootX] < this.rank[rootY]) {
      this.parent[rootX] = rootY;
    } else if (this.rank[rootX] > this.rank[rootY]) {
      this.parent[rootY] = rootX;
    } else {
      // If ranks are equal, attach one to the other and increment rank
      this.parent[rootY] = rootX;
      this.rank[rootX]++;
    }

    return true;
  }

  /**
   * Checks if two elements are in the same set.
   */
  connected(x: number, y: number): boolean {
    return this.find(x) === this.find(y);
  }
}

// Usage Example
const uf = new UnionFind(5); // 5 elements: 0, 1, 2, 3, 4

uf.union(0, 1); // Merge 0 and 1
uf.union(2, 3); // Merge 2 and 3

console.log(uf.connected(0, 1)); // true
console.log(uf.connected(0, 2)); // false

uf.union(1, 2); // Merge 1 and 2 (connects the two components)
console.log(uf.connected(0, 2)); // true

Complexity Analysis

With both optimizations applied, the amortized time complexity for both find and union operations is O(α(N)), where α(N) is the Inverse Ackermann function.

α(N) grows extremely slowly. For all practical purposes (even for N up to 10^6 or 10^7), α(N) is less than 5.
Therefore, we treat the time complexity as O(1).

Applications in Graph Algorithms

1. Kruskal's Algorithm (Minimum Spanning Tree)

Kruskal's algorithm sorts all edges by weight and adds them to the MST one by one. Before adding an edge (u, v), we must check if u and v are already connected. If they are, adding the edge would create a cycle. Union-Find is the perfect data structure for this check.

2. Detecting Cycles in an Undirected Graph

While processing edges, if find(u) === find(v) for an edge (u, v), it means u and v are already connected, forming a cycle.

3. Number of Connected Components

Initialize a Union-Find structure with N nodes. Iterate through all edges and perform union. Finally, count the number of unique roots (elements where parent[i] === i).

4. Social Networks (Friends of Friends)

Used to determine if two people are connected through a chain of friends (directly or indirectly) or to find the number of friend groups in a social network.

Common Mistakes to Avoid

Forgetting Path Compression: Without path compression, the find operation degrades to O(N) in the worst case, making the structure useless for large graphs.
Incorrect Rank Updates: When merging two trees of equal rank, ensure you increment the rank of the new root. If you forget this, the tree height might grow unnecessarily.
1-based vs 0-based Indexing: Be consistent. If your graph nodes are labeled 1 to N, ensure your Union-Find array size is N+1 and you initialize index 0 as well.

Summary

The Union-Find data structure is an essential tool for any competitive programmer or software engineer dealing with graphs. Its ability to handle dynamic connectivity problems with near-constant time complexity makes it superior to naive recursive or iterative approaches for problems involving cycle detection and connected components.