Substring Search Algorithms | KMP & Rabin-Karp | String

Overview

Substring search (also called pattern matching or string searching) is the problem of finding all occurrences of a pattern string P within a text string T. This is fundamental to many applications: text editors, search engines, DNA sequence analysis, and more.

While the naive approach works, several advanced algorithms optimize this problem:

Naive/Brute Force — Simple but O(nm) time
KMP (Knuth-Morris-Pratt) — O(n + m) time using failure function
Rabin-Karp — O(n + m) average time using rolling hash
Z-Algorithm — O(n + m) time using Z-array
Boyer-Moore — Often sublinear, skips characters

Pattern 1: Naive/Brute Force

The simplest approach: check every possible starting position.

function naiveSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

for (let i = 0; i <= n - m; i++) {
let j = 0;
while (j < m && text[i + j] === pattern[j]) {
j++;
}
if (j === m) {
indices.push(i);
}
}

return indices;
}

Time: O(n * m), Space: O(1)

When to use: Short patterns, simple problems, or when you need a quick implementation.

Pattern 2: KMP (Knuth-Morris-Pratt)

KMP avoids re-checking characters by building a "failure function" (LPS array) that tells us where to resume matching after a mismatch.

Key insight: If we've matched k characters and then mismatch, we can skip ahead based on the longest proper prefix that's also a suffix of the matched portion.

function kmpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

// Build LPS (Longest Proper Prefix which is also Suffix) array
const lps = buildLPS(pattern);

let i = 0; // index for text
let j = 0; // index for pattern

while (i < n) {
if (text[i] === pattern[j]) {
i++;
j++;
}

if (j === m) {
indices.push(i - j); // Found match
j = lps[j - 1]; // Continue searching
} else if (i < n && text[i] !== pattern[j]) {
if (j !== 0) {
j = lps[j - 1]; // Don't match lps[0..lps[j-1]] characters
} else {
i++;
}
}
}

return indices;
}

function buildLPS(pattern: string): number[] {
const m = pattern.length;
const lps = new Array(m).fill(0);
let len = 0; // Length of previous longest prefix suffix
let i = 1;

while (i < m) {
if (pattern[i] === pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len !== 0) {
len = lps[len - 1]; // Don't increment i
} else {
lps[i] = 0;
i++;
}
}
}

return lps;
}

Time: O(n + m), Space: O(m)

When to use: When you need guaranteed linear time, or when searching for the same pattern multiple times.

Pattern 3: Rabin-Karp

Uses rolling hash to compare substrings efficiently. If hash matches, verify with actual comparison.

function rabinKarpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

if (m > n) return indices;

const base = 256; // Base for hash
const mod = 1000000007; // Large prime

// Calculate hash of pattern and first window of text
let patternHash = 0;
let textHash = 0;
let h = 1; // h = base^(m-1) % mod

for (let i = 0; i < m - 1; i++) {
h = (h * base) % mod;
}

for (let i = 0; i < m; i++) {
patternHash = (base * patternHash + pattern.charCodeAt(i)) % mod;
textHash = (base * textHash + text.charCodeAt(i)) % mod;
}

// Slide the window
for (let i = 0; i <= n - m; i++) {
if (patternHash === textHash) {
// Verify actual match
let match = true;
for (let j = 0; j < m; j++) {
if (text[i + j] !== pattern[j]) {
match = false;
break;
}
}
if (match) {
indices.push(i);
}
}

// Calculate hash for next window
if (i < n - m) {
textHash = (base * (textHash - text.charCodeAt(i) * h) + text.charCodeAt(i + m)) % mod;
if (textHash < 0) textHash += mod;
}
}

return indices;
}

Time: O(n + m) average, O(n * m) worst case (rare), Space: O(1)

When to use: When searching for multiple patterns, or when you need a simple rolling hash approach.

Pattern 4: Z-Algorithm

Builds a Z-array where Z[i] is the length of the longest substring starting at i that matches the prefix of the string.

function zAlgorithmSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const combined = pattern + '$' + text; // $ is a separator not in text/pattern
const z = buildZArray(combined);
const m = pattern.length;

for (let i = m + 1; i < combined.length; i++) {
if (z[i] === m) {
indices.push(i - m - 1); // Adjust for separator
}
}

return indices;
}

function buildZArray(s: string): number[] {
const n = s.length;
const z = new Array(n).fill(0);
let l = 0, r = 0; // Z-box boundaries

for (let i = 1; i < n; i++) {
if (i <= r) {
z[i] = Math.min(r - i + 1, z[i - l]);
}

while (i + z[i] < n && s[z[i]] === s[i + z[i]]) {
z[i]++;
}

if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
}

return z;
}

Time: O(n + m), Space: O(n + m)

When to use: When you need the Z-array for other purposes, or for pattern matching with preprocessing.

Pattern 5: Simplified Boyer-Moore

Boyer-Moore skips characters by matching from right to left and using bad character rule.

function boyerMooreSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

// Build bad character table
const badChar = new Map();
for (let i = 0; i < m; i++) {
badChar.set(pattern[i], i);
}

let s = 0; // Shift

while (s <= n - m) {
let j = m - 1;

// Match from right to left
while (j >= 0 && pattern[j] === text[s + j]) {
j--;
}

if (j < 0) {
indices.push(s);
// Shift by pattern length (or use good suffix rule for better performance)
s += (s + m < n) ? m - (badChar.get(text[s + m]) ?? -1) : 1;
} else {
// Shift by bad character rule
const badCharShift = badChar.get(text[s + j]);
s += Math.max(1, j - (badCharShift ?? -1));
}
}

return indices;
}

Time: O(n * m) worst case, often sublinear in practice, Space: O(m)

When to use: When pattern is long and alphabet is large (allows more skips).

When to Use Each Algorithm

Algorithm	Best For	Time Complexity	Space Complexity
Naive	Short patterns, simple problems	O(n × m)	O(1)
KMP	Guaranteed linear time, repeated searches	O(n + m)	O(m)
Rabin-Karp	Multiple patterns, rolling hash needed	O(n + m) avg	O(1)
Z-Algorithm	Need Z-array, pattern preprocessing	O(n + m)	O(n + m)
Boyer-Moore	Long patterns, large alphabet	O(n × m) worst, often sublinear	O(m)

General Rule

Pattern Length	Recommendation
Short (< 10 chars)	Use naive or built-in `indexOf`
Medium, single search	Use KMP or Rabin-Karp
Multiple patterns	Use Rabin-Karp or build a Trie
Long patterns, large alphabet	Consider Boyer-Moore

Common Variations

Variation 1: Find All Occurrences

Return all starting indices where pattern appears.

function findAllOccurrences(text: string, pattern: string): number[] {
return kmpSearch(text, pattern); // or any algorithm above
}

Variation 2: Check if Pattern Exists

Return boolean indicating if pattern is found.

function containsPattern(text: string, pattern: string): boolean {
const indices = kmpSearch(text, pattern);
return indices.length > 0;
}

Variation 3: Count Occurrences

Return count of how many times pattern appears.

function countOccurrences(text: string, pattern: string): number {
return kmpSearch(text, pattern).length;
}

Variation 4: Replace All Occurrences

Replace all occurrences of pattern with replacement string.

function replaceAll(text: string, pattern: string, replacement: string): string {
const indices = kmpSearch(text, pattern);
if (indices.length === 0) return text;

const result: string[] = [];
let lastIndex = 0;

for (const idx of indices) {
result.push(text.substring(lastIndex, idx));
result.push(replacement);
lastIndex = idx + pattern.length;
}

result.push(text.substring(lastIndex));
return result.join('');
}

Template (KMP)

function substringSearchTemplate(text: string, pattern: string): number[] {
const indices: number[] = [];
const lps = buildLPS(pattern);
let i = 0, j = 0;

while (i < text.length) {
if (text[i] === pattern[j]) {
i++;
j++;
}

if (j === pattern.length) {
indices.push(i - j);
j = lps[j - 1];
} else if (i < text.length && text[i] !== pattern[j]) {
if (j !== 0) j = lps[j - 1];
else i++;
}
}

return indices;
}

Time and Space Complexity Summary

Naive: O(n * m) time, O(1) space
KMP: O(n + m) time, O(m) space
Rabin-Karp: O(n + m) average, O(n * m) worst, O(1) space
Z-Algorithm: O(n + m) time, O(n + m) space
Boyer-Moore: O(n * m) worst, often sublinear, O(m) space

Practice Tips

Start with naive — Understand the problem first
Learn KMP — Most commonly asked in interviews
Understand LPS array — Key to KMP algorithm
Practice rolling hash — Useful for many string problems
Handle edge cases — Empty pattern, pattern longer than text, all matches

Related Concepts

String Matching — Core problem solved by these algorithms
Rolling Hash — Technique used in Rabin-Karp
Dynamic Programming — LPS array construction uses DP-like thinking
Trie — Alternative for multiple pattern matching

Substring search algorithms are fundamental to many string processing tasks. Master KMP and Rabin-Karp, and you'll be well-prepared for interview questions involving pattern matching.