Substring Search
Overview
Substring search (also called pattern matching or string searching) is the problem of finding all occurrences of a pattern string P within a text string T. This is fundamental to many applications: text editors, search engines, DNA sequence analysis, and more.
While the naive approach works, several advanced algorithms optimize this problem:
- Naive/Brute Force β Simple but O(nm) time
- KMP (Knuth-Morris-Pratt) β O(n + m) time using failure function
- Rabin-Karp β O(n + m) average time using rolling hash
- Z-Algorithm β O(n + m) time using Z-array
- Boyer-Moore β Often sublinear, skips characters
Pattern 1: Naive/Brute Force
The simplest approach: check every possible starting position.
function naiveSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;
for (let i = 0; i <= n - m; i++) {
let j = 0;
while (j < m && text[i + j] === pattern[j]) {
j++;
}
if (j === m) {
indices.push(i);
}
}
return indices;
}Time: O(n * m), Space: O(1)
When to use: Short patterns, simple problems, or when you need a quick implementation.
Pattern 2: KMP (Knuth-Morris-Pratt)
KMP avoids re-checking characters by building a "failure function" (LPS array) that tells us where to resume matching after a mismatch.
Key insight: If we've matched k characters and then mismatch, we can skip ahead based on the longest proper prefix that's also a suffix of the matched portion.
function kmpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;
// Build LPS (Longest Proper Prefix which is also Suffix) array
const lps = buildLPS(pattern);
let i = 0; // index for text
let j = 0; // index for pattern
while (i < n) {
if (text[i] === pattern[j]) {
i++;
j++;
}
if (j === m) {
indices.push(i - j); // Found match
j = lps[j - 1]; // Continue searching
} else if (i < n && text[i] !== pattern[j]) {
if (j !== 0) {
j = lps[j - 1]; // Don't match lps[0..lps[j-1]] characters
} else {
i++;
}
}
}
return indices;
}
function buildLPS(pattern: string): number[] {
const m = pattern.length;
const lps = new Array(m).fill(0);
let len = 0; // Length of previous longest prefix suffix
let i = 1;
while (i < m) {
if (pattern[i] === pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len !== 0) {
len = lps[len - 1]; // Don't increment i
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}Time: O(n + m), Space: O(m)
When to use: When you need guaranteed linear time, or when searching for the same pattern multiple times.
Pattern 3: Rabin-Karp
Uses rolling hash to compare substrings efficiently. If hash matches, verify with actual comparison.
function rabinKarpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;
if (m > n) return indices;
const base = 256; // Base for hash
const mod = 1000000007; // Large prime
// Calculate hash of pattern and first window of text
let patternHash = 0;
let textHash = 0;
let h = 1; // h = base^(m-1) % mod
for (let i = 0; i < m - 1; i++) {
h = (h * base) % mod;
}
for (let i = 0; i < m; i++) {
patternHash = (base * patternHash + pattern.charCodeAt(i)) % mod;
textHash = (base * textHash + text.charCodeAt(i)) % mod;
}
// Slide the window
for (let i = 0; i <= n - m; i++) {
if (patternHash === textHash) {
// Verify actual match
let match = true;
for (let j = 0; j < m; j++) {
if (text[i + j] !== pattern[j]) {
match = false;
break;
}
}
if (match) {
indices.push(i);
}
}
// Calculate hash for next window
if (i < n - m) {
textHash = (base * (textHash - text.charCodeAt(i) * h) + text.charCodeAt(i + m)) % mod;
if (textHash < 0) textHash += mod;
}
}
return indices;
}Time: O(n + m) average, O(n * m) worst case (rare), Space: O(1)
When to use: When searching for multiple patterns, or when you need a simple rolling hash approach.
Pattern 4: Z-Algorithm
Builds a Z-array where Z[i] is the length of the longest substring starting at i that matches the prefix of the string.
function zAlgorithmSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const combined = pattern + '$' + text; // $ is a separator not in text/pattern
const z = buildZArray(combined);
const m = pattern.length;
for (let i = m + 1; i < combined.length; i++) {
if (z[i] === m) {
indices.push(i - m - 1); // Adjust for separator
}
}
return indices;
}
function buildZArray(s: string): number[] {
const n = s.length;
const z = new Array(n).fill(0);
let l = 0, r = 0; // Z-box boundaries
for (let i = 1; i < n; i++) {
if (i <= r) {
z[i] = Math.min(r - i + 1, z[i - l]);
}
while (i + z[i] < n && s[z[i]] === s[i + z[i]]) {
z[i]++;
}
if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
}
return z;
}Time: O(n + m), Space: O(n + m)
When to use: When you need the Z-array for other purposes, or for pattern matching with preprocessing.
Pattern 5: Simplified Boyer-Moore
Boyer-Moore skips characters by matching from right to left and using bad character rule.
function boyerMooreSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;
// Build bad character table
const badChar = new Map();
for (let i = 0; i < m; i++) {
badChar.set(pattern[i], i);
}
let s = 0; // Shift
while (s <= n - m) {
let j = m - 1;
// Match from right to left
while (j >= 0 && pattern[j] === text[s + j]) {
j--;
}
if (j < 0) {
indices.push(s);
// Shift by pattern length (or use good suffix rule for better performance)
s += (s + m < n) ? m - (badChar.get(text[s + m]) ?? -1) : 1;
} else {
// Shift by bad character rule
const badCharShift = badChar.get(text[s + j]);
s += Math.max(1, j - (badCharShift ?? -1));
}
}
return indices;
}Time: O(n * m) worst case, often sublinear in practice, Space: O(m)
When to use: When pattern is long and alphabet is large (allows more skips).
When to Use Each Algorithm
| Algorithm | Best For | Time Complexity | Space Complexity |
|---|---|---|---|
| Naive | Short patterns, simple problems | O(n Γ m) | O(1) |
| KMP | Guaranteed linear time, repeated searches | O(n + m) | O(m) |
| Rabin-Karp | Multiple patterns, rolling hash needed | O(n + m) avg | O(1) |
| Z-Algorithm | Need Z-array, pattern preprocessing | O(n + m) | O(n + m) |
| Boyer-Moore | Long patterns, large alphabet | O(n Γ m) worst, often sublinear | O(m) |
General Rule
| Pattern Length | Recommendation |
|---|---|
| Short (< 10 chars) | Use naive or built-in indexOf |
| Medium, single search | Use KMP or Rabin-Karp |
| Multiple patterns | Use Rabin-Karp or build a Trie |
| Long patterns, large alphabet | Consider Boyer-Moore |
Common Variations
Variation 1: Find All Occurrences
Return all starting indices where pattern appears.
function findAllOccurrences(text: string, pattern: string): number[] {
return kmpSearch(text, pattern); // or any algorithm above
}Variation 2: Check if Pattern Exists
Return boolean indicating if pattern is found.
function containsPattern(text: string, pattern: string): boolean {
const indices = kmpSearch(text, pattern);
return indices.length > 0;
}Variation 3: Count Occurrences
Return count of how many times pattern appears.
function countOccurrences(text: string, pattern: string): number {
return kmpSearch(text, pattern).length;
}Variation 4: Replace All Occurrences
Replace all occurrences of pattern with replacement string.
function replaceAll(text: string, pattern: string, replacement: string): string {
const indices = kmpSearch(text, pattern);
if (indices.length === 0) return text;
const result: string[] = [];
let lastIndex = 0;
for (const idx of indices) {
result.push(text.substring(lastIndex, idx));
result.push(replacement);
lastIndex = idx + pattern.length;
}
result.push(text.substring(lastIndex));
return result.join('');
}Template (KMP)
function substringSearchTemplate(text: string, pattern: string): number[] {
const indices: number[] = [];
const lps = buildLPS(pattern);
let i = 0, j = 0;
while (i < text.length) {
if (text[i] === pattern[j]) {
i++;
j++;
}
if (j === pattern.length) {
indices.push(i - j);
j = lps[j - 1];
} else if (i < text.length && text[i] !== pattern[j]) {
if (j !== 0) j = lps[j - 1];
else i++;
}
}
return indices;
}Time and Space Complexity Summary
- Naive: O(n * m) time, O(1) space
- KMP: O(n + m) time, O(m) space
- Rabin-Karp: O(n + m) average, O(n * m) worst, O(1) space
- Z-Algorithm: O(n + m) time, O(n + m) space
- Boyer-Moore: O(n * m) worst, often sublinear, O(m) space
Practice Tips
- Start with naive β Understand the problem first
- Learn KMP β Most commonly asked in interviews
- Understand LPS array β Key to KMP algorithm
- Practice rolling hash β Useful for many string problems
- Handle edge cases β Empty pattern, pattern longer than text, all matches
Related Concepts
- String Matching β Core problem solved by these algorithms
- Rolling Hash β Technique used in Rabin-Karp
- Dynamic Programming β LPS array construction uses DP-like thinking
- Trie β Alternative for multiple pattern matching
Substring search algorithms are fundamental to many string processing tasks. Master KMP and Rabin-Karp, and you'll be well-prepared for interview questions involving pattern matching.