Topicsβ€ΊStringβ€ΊSubstring Search
πŸ“–

Substring Search

String
~20 min read
5 practice problems

Overview

Substring search (also called pattern matching or string searching) is the problem of finding all occurrences of a pattern string P within a text string T. This is fundamental to many applications: text editors, search engines, DNA sequence analysis, and more.

While the naive approach works, several advanced algorithms optimize this problem:

  • Naive/Brute Force β€” Simple but O(nm) time
  • KMP (Knuth-Morris-Pratt) β€” O(n + m) time using failure function
  • Rabin-Karp β€” O(n + m) average time using rolling hash
  • Z-Algorithm β€” O(n + m) time using Z-array
  • Boyer-Moore β€” Often sublinear, skips characters

Pattern 1: Naive/Brute Force

The simplest approach: check every possible starting position.

function naiveSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

for (let i = 0; i <= n - m; i++) {
let j = 0;
while (j < m && text[i + j] === pattern[j]) {
j++;
}
if (j === m) {
indices.push(i);
}
}

return indices;
}

Time: O(n * m), Space: O(1)

When to use: Short patterns, simple problems, or when you need a quick implementation.


Pattern 2: KMP (Knuth-Morris-Pratt)

KMP avoids re-checking characters by building a "failure function" (LPS array) that tells us where to resume matching after a mismatch.

Key insight: If we've matched k characters and then mismatch, we can skip ahead based on the longest proper prefix that's also a suffix of the matched portion.

function kmpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

// Build LPS (Longest Proper Prefix which is also Suffix) array
const lps = buildLPS(pattern);

let i = 0; // index for text
let j = 0; // index for pattern

while (i < n) {
if (text[i] === pattern[j]) {
i++;
j++;
}

if (j === m) {
indices.push(i - j); // Found match
j = lps[j - 1]; // Continue searching
} else if (i < n && text[i] !== pattern[j]) {
if (j !== 0) {
j = lps[j - 1]; // Don't match lps[0..lps[j-1]] characters
} else {
i++;
}
}
}

return indices;
}

function buildLPS(pattern: string): number[] {
const m = pattern.length;
const lps = new Array(m).fill(0);
let len = 0; // Length of previous longest prefix suffix
let i = 1;

while (i < m) {
if (pattern[i] === pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len !== 0) {
len = lps[len - 1]; // Don't increment i
} else {
lps[i] = 0;
i++;
}
}
}

return lps;
}

Time: O(n + m), Space: O(m)

When to use: When you need guaranteed linear time, or when searching for the same pattern multiple times.


Pattern 3: Rabin-Karp

Uses rolling hash to compare substrings efficiently. If hash matches, verify with actual comparison.

function rabinKarpSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

if (m > n) return indices;

const base = 256; // Base for hash
const mod = 1000000007; // Large prime

// Calculate hash of pattern and first window of text
let patternHash = 0;
let textHash = 0;
let h = 1; // h = base^(m-1) % mod

for (let i = 0; i < m - 1; i++) {
h = (h * base) % mod;
}

for (let i = 0; i < m; i++) {
patternHash = (base * patternHash + pattern.charCodeAt(i)) % mod;
textHash = (base * textHash + text.charCodeAt(i)) % mod;
}

// Slide the window
for (let i = 0; i <= n - m; i++) {
if (patternHash === textHash) {
// Verify actual match
let match = true;
for (let j = 0; j < m; j++) {
if (text[i + j] !== pattern[j]) {
match = false;
break;
}
}
if (match) {
indices.push(i);
}
}

// Calculate hash for next window
if (i < n - m) {
textHash = (base * (textHash - text.charCodeAt(i) * h) + text.charCodeAt(i + m)) % mod;
if (textHash < 0) textHash += mod;
}
}

return indices;
}

Time: O(n + m) average, O(n * m) worst case (rare), Space: O(1)

When to use: When searching for multiple patterns, or when you need a simple rolling hash approach.


Pattern 4: Z-Algorithm

Builds a Z-array where Z[i] is the length of the longest substring starting at i that matches the prefix of the string.

function zAlgorithmSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const combined = pattern + '$' + text; // $ is a separator not in text/pattern
const z = buildZArray(combined);
const m = pattern.length;

for (let i = m + 1; i < combined.length; i++) {
if (z[i] === m) {
indices.push(i - m - 1); // Adjust for separator
}
}

return indices;
}

function buildZArray(s: string): number[] {
const n = s.length;
const z = new Array(n).fill(0);
let l = 0, r = 0; // Z-box boundaries

for (let i = 1; i < n; i++) {
if (i <= r) {
z[i] = Math.min(r - i + 1, z[i - l]);
}

while (i + z[i] < n && s[z[i]] === s[i + z[i]]) {
z[i]++;
}

if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
}

return z;
}

Time: O(n + m), Space: O(n + m)

When to use: When you need the Z-array for other purposes, or for pattern matching with preprocessing.


Pattern 5: Simplified Boyer-Moore

Boyer-Moore skips characters by matching from right to left and using bad character rule.

function boyerMooreSearch(text: string, pattern: string): number[] {
const indices: number[] = [];
const n = text.length;
const m = pattern.length;

// Build bad character table
const badChar = new Map();
for (let i = 0; i < m; i++) {
badChar.set(pattern[i], i);
}

let s = 0; // Shift

while (s <= n - m) {
let j = m - 1;

// Match from right to left
while (j >= 0 && pattern[j] === text[s + j]) {
j--;
}

if (j < 0) {
indices.push(s);
// Shift by pattern length (or use good suffix rule for better performance)
s += (s + m < n) ? m - (badChar.get(text[s + m]) ?? -1) : 1;
} else {
// Shift by bad character rule
const badCharShift = badChar.get(text[s + j]);
s += Math.max(1, j - (badCharShift ?? -1));
}
}

return indices;
}

Time: O(n * m) worst case, often sublinear in practice, Space: O(m)

When to use: When pattern is long and alphabet is large (allows more skips).


When to Use Each Algorithm

AlgorithmBest ForTime ComplexitySpace Complexity
NaiveShort patterns, simple problemsO(n Γ— m)O(1)
KMPGuaranteed linear time, repeated searchesO(n + m)O(m)
Rabin-KarpMultiple patterns, rolling hash neededO(n + m) avgO(1)
Z-AlgorithmNeed Z-array, pattern preprocessingO(n + m)O(n + m)
Boyer-MooreLong patterns, large alphabetO(n Γ— m) worst, often sublinearO(m)

General Rule

Pattern LengthRecommendation
Short (< 10 chars)Use naive or built-in indexOf
Medium, single searchUse KMP or Rabin-Karp
Multiple patternsUse Rabin-Karp or build a Trie
Long patterns, large alphabetConsider Boyer-Moore

Common Variations

Variation 1: Find All Occurrences

Return all starting indices where pattern appears.

function findAllOccurrences(text: string, pattern: string): number[] {
return kmpSearch(text, pattern); // or any algorithm above
}

Variation 2: Check if Pattern Exists

Return boolean indicating if pattern is found.

function containsPattern(text: string, pattern: string): boolean {
const indices = kmpSearch(text, pattern);
return indices.length > 0;
}

Variation 3: Count Occurrences

Return count of how many times pattern appears.

function countOccurrences(text: string, pattern: string): number {
return kmpSearch(text, pattern).length;
}

Variation 4: Replace All Occurrences

Replace all occurrences of pattern with replacement string.

function replaceAll(text: string, pattern: string, replacement: string): string {
const indices = kmpSearch(text, pattern);
if (indices.length === 0) return text;

const result: string[] = [];
let lastIndex = 0;

for (const idx of indices) {
result.push(text.substring(lastIndex, idx));
result.push(replacement);
lastIndex = idx + pattern.length;
}

result.push(text.substring(lastIndex));
return result.join('');
}

Template (KMP)

function substringSearchTemplate(text: string, pattern: string): number[] {
const indices: number[] = [];
const lps = buildLPS(pattern);
let i = 0, j = 0;

while (i < text.length) {
if (text[i] === pattern[j]) {
i++;
j++;
}

if (j === pattern.length) {
indices.push(i - j);
j = lps[j - 1];
} else if (i < text.length && text[i] !== pattern[j]) {
if (j !== 0) j = lps[j - 1];
else i++;
}
}

return indices;
}

Time and Space Complexity Summary

  • Naive: O(n * m) time, O(1) space
  • KMP: O(n + m) time, O(m) space
  • Rabin-Karp: O(n + m) average, O(n * m) worst, O(1) space
  • Z-Algorithm: O(n + m) time, O(n + m) space
  • Boyer-Moore: O(n * m) worst, often sublinear, O(m) space

Practice Tips

  1. Start with naive β€” Understand the problem first
  2. Learn KMP β€” Most commonly asked in interviews
  3. Understand LPS array β€” Key to KMP algorithm
  4. Practice rolling hash β€” Useful for many string problems
  5. Handle edge cases β€” Empty pattern, pattern longer than text, all matches

Related Concepts

  • String Matching β€” Core problem solved by these algorithms
  • Rolling Hash β€” Technique used in Rabin-Karp
  • Dynamic Programming β€” LPS array construction uses DP-like thinking
  • Trie β€” Alternative for multiple pattern matching

Substring search algorithms are fundamental to many string processing tasks. Master KMP and Rabin-Karp, and you'll be well-prepared for interview questions involving pattern matching.