TopicsStringAdvanced String Matching
📖

Advanced String Matching

String
~20 min read
5 practice problems

Overview

Advanced String Matching involves sophisticated algorithms for finding patterns in text efficiently. Beyond naive approaches, these algorithms use preprocessing, hashing, automata, and clever optimizations to achieve better time complexity.

Key algorithms:

  • KMP (Knuth-Morris-Pratt) — Uses failure function for O(n + m)
  • Rabin-Karp — Rolling hash for multiple patterns
  • Z-Algorithm — Z-array for pattern matching
  • Boyer-Moore — Right-to-left scanning with bad character rule
  • Aho-Corasick — Multiple pattern matching with automata
  • Suffix Automaton — Linear-time construction for substring queries

Common advanced matching problems include:

  • Multiple pattern search
  • Pattern matching with wildcards
  • Approximate string matching
  • Regular expression matching
  • Longest common subsequence
  • Edit distance

Pattern 1: KMP Algorithm (Knuth-Morris-Pratt)

Efficient single pattern matching using failure function.

function kmpSearch(text: string, pattern: string): number[] {
  const n = text.length;
  const m = pattern.length;
  const lps = buildLPS(pattern);
  const results: number[] = [];
  
  let i = 0; // text index
  let j = 0; // pattern index
  
  while (i < n) {
    if (text[i] === pattern[j]) {
      i++;
      j++;
    }
    
    if (j === m) {
      results.push(i - j);
      j = lps[j - 1];
    } else if (i < n && text[i] !== pattern[j]) {
      if (j !== 0) {
        j = lps[j - 1];
      } else {
        i++;
      }
    }
  }
  
  return results;
}

function buildLPS(pattern: string): number[] {
  const m = pattern.length;
  const lps: number[] = new Array(m);
  let len = 0;
  let i = 1;
  
  lps[0] = 0;
  
  while (i < m) {
    if (pattern[i] === pattern[len]) {
      len++;
      lps[i] = len;
      i++;
    } else {
      if (len !== 0) {
        len = lps[len - 1];
      } else {
        lps[i] = 0;
        i++;
      }
    }
  }
  
  return lps;
}

Time: O(n + m), Space: O(m)


Pattern 2: Rabin-Karp Algorithm

Pattern matching using rolling hash for multiple patterns.

function rabinKarpSearch(text: string, pattern: string): number[] {
  const n = text.length;
  const m = pattern.length;
  const base = 256;
  const mod = 1000000007;
  
  const results: number[] = [];
  
  // Calculate hash of pattern and first window
  let patternHash = 0;
  let textHash = 0;
  let h = 1;
  
  // h = base^(m-1) % mod
  for (let i = 0; i < m - 1; i++) {
    h = (h * base) % mod;
  }
  
  // Calculate initial hashes
  for (let i = 0; i < m; i++) {
    patternHash = (base * patternHash + pattern.charCodeAt(i)) % mod;
    textHash = (base * textHash + text.charCodeAt(i)) % mod;
  }
  
  // Slide window and check
  for (let i = 0; i <= n - m; i++) {
    if (patternHash === textHash) {
      // Verify (handle hash collision)
      let match = true;
      for (let j = 0; j < m; j++) {
        if (text[i + j] !== pattern[j]) {
          match = false;
          break;
        }
      }
      if (match) {
        results.push(i);
      }
    }
    
    // Calculate hash for next window
    if (i < n - m) {
      textHash = (base * (textHash - text.charCodeAt(i) * h) + text.charCodeAt(i + m)) % mod;
      if (textHash < 0) {
        textHash += mod;
      }
    }
  }
  
  return results;
}

Time: O(n + m) average, O(n * m) worst case, Space: O(1)

Multiple patterns:

function rabinKarpMultiplePatterns(text: string, patterns: string[]): Map<string, number[]> {
  const results = new Map<string, number[]>();
  const base = 256;
  const mod = 1000000007;
  
  for (const pattern of patterns) {
    results.set(pattern, rabinKarpSearch(text, pattern));
  }
  
  return results;
}

Pattern 3: Z-Algorithm

Pattern matching using Z-array.

function zAlgorithmSearch(text: string, pattern: string): number[] {
  const combined = pattern + '$' + text;
  const z = buildZArray(combined);
  const m = pattern.length;
  const results: number[] = [];
  
  for (let i = m + 1; i < z.length; i++) {
    if (z[i] === m) {
      results.push(i - m - 1);
    }
  }
  
  return results;
}

function buildZArray(s: string): number[] {
  const n = s.length;
  const z: number[] = new Array(n);
  let l = 0, r = 0;
  
  z[0] = 0;
  
  for (let i = 1; i < n; i++) {
    if (i > r) {
      // No Z-box, expand manually
      l = r = i;
      while (r < n && s[r - l] === s[r]) {
        r++;
      }
      z[i] = r - l;
      r--;
    } else {
      // Inside Z-box
      const k = i - l;
      if (z[k] < r - i + 1) {
        z[i] = z[k];
      } else {
        // Expand beyond Z-box
        l = i;
        while (r < n && s[r - l] === s[r]) {
          r++;
        }
        z[i] = r - l;
        r--;
      }
    }
  }
  
  return z;
}

Time: O(n + m), Space: O(n + m)


Pattern 4: Boyer-Moore Algorithm

Right-to-left scanning with bad character rule.

function boyerMooreSearch(text: string, pattern: string): number[] {
  const n = text.length;
  const m = pattern.length;
  const badChar = buildBadCharTable(pattern);
  const results: number[] = [];
  
  let s = 0; // shift
  
  while (s <= n - m) {
    let j = m - 1;
    
    // Match from right to left
    while (j >= 0 && pattern[j] === text[s + j]) {
      j--;
    }
    
    if (j < 0) {
      // Pattern found
      results.push(s);
      s += (s + m < n) ? m - badChar.get(text[s + m]) || m : 1;
    } else {
      // Shift based on bad character
      const badCharShift = j - (badChar.get(text[s + j]) || -1);
      s += Math.max(1, badCharShift);
    }
  }
  
  return results;
}

function buildBadCharTable(pattern: string): Map<string, number> {
  const table = new Map<string, number>();
  const m = pattern.length;
  
  for (let i = 0; i < m; i++) {
    table.set(pattern[i], i);
  }
  
  return table;
}

Time: O(n * m) worst case, O(n / m) best case, Space: O(m)


Pattern 5: Aho-Corasick Algorithm

Multiple pattern matching using automata.

class AhoCorasickNode {
  children: Map<string, AhoCorasickNode>;
  fail: AhoCorasickNode | null;
  output: string[];
  
  constructor() {
    this.children = new Map();
    this.fail = null;
    this.output = [];
  }
}

class AhoCorasick {
  root: AhoCorasickNode;
  
  constructor(patterns: string[]) {
    this.root = new AhoCorasickNode();
    this.buildTrie(patterns);
    this.buildFailureLinks();
  }
  
  buildTrie(patterns: string[]): void {
    for (const pattern of patterns) {
      let node = this.root;
      
      for (const char of pattern) {
        if (!node.children.has(char)) {
          node.children.set(char, new AhoCorasickNode());
        }
        node = node.children.get(char)!;
      }
      
      node.output.push(pattern);
    }
  }
  
  buildFailureLinks(): void {
    const queue: AhoCorasickNode[] = [];
    
    // Initialize root's children
    for (const [char, child] of this.root.children) {
      child.fail = this.root;
      queue.push(child);
    }
    
    while (queue.length > 0) {
      const node = queue.shift()!;
      
      for (const [char, child] of node.children) {
        let fail = node.fail;
        
        while (fail !== null && !fail.children.has(char)) {
          fail = fail.fail;
        }
        
        child.fail = fail?.children.get(char) || this.root;
        child.output.push(...child.fail.output);
        queue.push(child);
      }
    }
  }
  
  search(text: string): Map<string, number[]> {
    const results = new Map<string, number[]>();
    let node = this.root;
    
    for (let i = 0; i < text.length; i++) {
      const char = text[i];
      
      while (node !== this.root && !node.children.has(char)) {
        node = node.fail!;
      }
      
      node = node.children.get(char) || this.root;
      
      for (const pattern of node.output) {
        if (!results.has(pattern)) {
          results.set(pattern, []);
        }
        results.get(pattern)!.push(i - pattern.length + 1);
      }
    }
    
    return results;
  }
}

Time: O(n + m + z) where z is number of matches, Space: O(m)


Pattern 6: Regular Expression Matching

Match pattern with '.' and '*' wildcards.

function regexMatch(s: string, p: string): boolean {
  const memo = new Map<string, boolean>();
  
  const dp = (i: number, j: number): boolean => {
    const key = `${i},${j}`;
    if (memo.has(key)) return memo.get(key)!;
    
    if (j === p.length) {
      return i === s.length;
    }
    
    const firstMatch = i < s.length && (p[j] === s[i] || p[j] === '.');
    
    if (j + 1 < p.length && p[j + 1] === '*') {
      // Zero or more matches
      const result = dp(i, j + 2) || (firstMatch && dp(i + 1, j));
      memo.set(key, result);
      return result;
    } else {
      // Single match
      const result = firstMatch && dp(i + 1, j + 1);
      memo.set(key, result);
      return result;
    }
  };
  
  return dp(0, 0);
}

Time: O(n m), Space: O(n m)

*Wildcard matching with '?' and '':**

function wildcardMatch(s: string, p: string): boolean {
  const n = s.length;
  const m = p.length;
  const dp: boolean[][] = Array(n + 1)
    .fill(null)
    .map(() => Array(m + 1).fill(false));
  
  dp[0][0] = true;
  
  // Handle leading '*'
  for (let j = 1; j <= m && p[j - 1] === '*'; j++) {
    dp[0][j] = true;
  }
  
  for (let i = 1; i <= n; i++) {
    for (let j = 1; j <= m; j++) {
      if (p[j - 1] === '*') {
        dp[i][j] = dp[i - 1][j] || dp[i][j - 1];
      } else if (p[j - 1] === '?' || p[j - 1] === s[i - 1]) {
        dp[i][j] = dp[i - 1][j - 1];
      }
    }
  }
  
  return dp[n][m];
}

Pattern 7: Approximate String Matching

Find pattern with allowed mismatches (Hamming distance).

function approximateMatch(text: string, pattern: string, k: number): number[] {
  const n = text.length;
  const m = pattern.length;
  const results: number[] = [];
  
  for (let i = 0; i <= n - m; i++) {
    let mismatches = 0;
    
    for (let j = 0; j < m; j++) {
      if (text[i + j] !== pattern[j]) {
        mismatches++;
        if (mismatches > k) break;
      }
    }
    
    if (mismatches <= k) {
      results.push(i);
    }
  }
  
  return results;
}

Time: O(n * m), Space: O(1)

Using rolling hash for efficiency:

function approximateMatchHash(text: string, pattern: string, k: number): number[] {
  // Similar to Rabin-Karp but allow k mismatches
  const results: number[] = [];
  const n = text.length;
  const m = pattern.length;
  
  for (let i = 0; i <= n - m; i++) {
    let mismatches = 0;
    
    // Quick check with hash first
    const textHash = hash(text.substring(i, i + m));
    const patternHash = hash(pattern);
    
    if (textHash === patternHash) {
      // Verify
      let match = true;
      for (let j = 0; j < m; j++) {
        if (text[i + j] !== pattern[j]) {
          mismatches++;
          if (mismatches > k) {
            match = false;
            break;
          }
        }
      }
      if (match) results.push(i);
    } else {
      // Count mismatches
      for (let j = 0; j < m && mismatches <= k; j++) {
        if (text[i + j] !== pattern[j]) mismatches++;
      }
      if (mismatches <= k) results.push(i);
    }
  }
  
  return results;
}

function hash(s: string): number {
  let h = 0;
  for (const char of s) {
    h = ((h * 31) + char.charCodeAt(0)) % 1000000007;
  }
  return h;
}

Pattern 8: Longest Common Subsequence (LCS)

Find longest common subsequence between strings.

function longestCommonSubsequence(s1: string, s2: string): string {
  const m = s1.length;
  const n = s2.length;
  const dp: number[][] = Array(m + 1)
    .fill(null)
    .map(() => Array(n + 1).fill(0));
  
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        dp[i][j] = dp[i - 1][j - 1] + 1;
      } else {
        dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
      }
    }
  }
  
  // Reconstruct LCS
  let i = m, j = n;
  const lcs: string[] = [];
  
  while (i > 0 && j > 0) {
    if (s1[i - 1] === s2[j - 1]) {
      lcs.unshift(s1[i - 1]);
      i--;
      j--;
    } else if (dp[i - 1][j] > dp[i][j - 1]) {
      i--;
    } else {
      j--;
    }
  }
  
  return lcs.join('');
}

Time: O(m n), Space: O(m n)

Space-optimized:

function longestCommonSubsequenceOptimized(s1: string, s2: string): number {
  const m = s1.length;
  const n = s2.length;
  let prev = Array(n + 1).fill(0);
  let curr = Array(n + 1).fill(0);
  
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        curr[j] = prev[j - 1] + 1;
      } else {
        curr[j] = Math.max(prev[j], curr[j - 1]);
      }
    }
    [prev, curr] = [curr, prev];
  }
  
  return prev[n];
}

Time: O(m * n), Space: O(min(m, n))


Pattern 9: Edit Distance (Levenshtein)

Calculate minimum edits to transform one string to another.

function editDistance(s1: string, s2: string): number {
  const m = s1.length;
  const n = s2.length;
  const dp: number[][] = Array(m + 1)
    .fill(null)
    .map(() => Array(n + 1).fill(0));
  
  // Base cases
  for (let i = 0; i <= m; i++) dp[i][0] = i;
  for (let j = 0; j <= n; j++) dp[0][j] = j;
  
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        dp[i][j] = dp[i - 1][j - 1];
      } else {
        dp[i][j] = 1 + Math.min(
          dp[i - 1][j],     // delete
          dp[i][j - 1],     // insert
          dp[i - 1][j - 1]  // replace
        );
      }
    }
  }
  
  return dp[m][n];
}

Time: O(m n), Space: O(m n)

Space-optimized:

function editDistanceOptimized(s1: string, s2: string): number {
  const m = s1.length;
  const n = s2.length;
  
  if (m < n) return editDistanceOptimized(s2, s1);
  
  let prev = Array(n + 1).fill(0);
  let curr = Array(n + 1).fill(0);
  
  for (let j = 0; j <= n; j++) prev[j] = j;
  
  for (let i = 1; i <= m; i++) {
    curr[0] = i;
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        curr[j] = prev[j - 1];
      } else {
        curr[j] = 1 + Math.min(prev[j], curr[j - 1], prev[j - 1]);
      }
    }
    [prev, curr] = [curr, prev];
  }
  
  return prev[n];
}

Time: O(m * n), Space: O(min(m, n))


Pattern 10: Suffix Automaton

Linear-time construction for substring queries.

class SuffixAutomatonNode {
  len: number;
  link: SuffixAutomatonNode | null;
  next: Map<string, SuffixAutomatonNode>;
  
  constructor() {
    this.len = 0;
    this.link = null;
    this.next = new Map();
  }
}

class SuffixAutomaton {
  last: SuffixAutomatonNode;
  root: SuffixAutomatonNode;
  
  constructor(s: string) {
    this.root = new SuffixAutomatonNode();
    this.last = this.root;
    
    for (const char of s) {
      this.extend(char);
    }
  }
  
  extend(char: string): void {
    const cur = new SuffixAutomatonNode();
    cur.len = this.last.len + 1;
    
    let p: SuffixAutomatonNode | null = this.last;
    
    while (p !== null && !p.next.has(char)) {
      p.next.set(char, cur);
      p = p.link;
    }
    
    if (p === null) {
      cur.link = this.root;
    } else {
      const q = p.next.get(char)!;
      
      if (p.len + 1 === q.len) {
        cur.link = q;
      } else {
        const clone = new SuffixAutomatonNode();
        clone.len = p.len + 1;
        clone.next = new Map(q.next);
        clone.link = q.link;
        
        q.link = clone;
        cur.link = clone;
        
        while (p !== null && p.next.get(char) === q) {
          p.next.set(char, clone);
          p = p.link;
        }
      }
    }
    
    this.last = cur;
  }
  
  contains(pattern: string): boolean {
    let node = this.root;
    
    for (const char of pattern) {
      if (!node.next.has(char)) {
        return false;
      }
      node = node.next.get(char)!;
    }
    
    return true;
  }
}

Time: O(n) construction, O(m) query, Space: O(n)


Pattern 11: Multiple Pattern Matching

Find all patterns in text efficiently.

function multiplePatternMatch(text: string, patterns: string[]): Map<string, number[]> {
  // Use Aho-Corasick for multiple patterns
  const ac = new AhoCorasick(patterns);
  return ac.search(text);
}

// Alternative: Use suffix array
function multiplePatternMatchSuffixArray(text: string, patterns: string[]): Map<string, number[]> {
  const results = new Map<string, number[]>();
  const suffixArray = buildSuffixArray(text);
  
  for (const pattern of patterns) {
    results.set(pattern, searchPattern(text, pattern, suffixArray));
  }
  
  return results;
}

Time: O(n + m + z) with Aho-Corasick, Space: O(m)


Pattern 12: Pattern Matching with Constraints

Match pattern with additional constraints.

Minimum length constraint:

function matchWithMinLength(text: string, pattern: string, minLength: number): number[] {
  const matches = kmpSearch(text, pattern);
  return matches.filter(pos => pos + pattern.length >= minLength);
}

Non-overlapping matches:

function nonOverlappingMatches(text: string, pattern: string): number[] {
  const matches = kmpSearch(text, pattern);
  const results: number[] = [];
  let lastEnd = -1;
  
  for (const pos of matches) {
    if (pos > lastEnd) {
      results.push(pos);
      lastEnd = pos + pattern.length - 1;
    }
  }
  
  return results;
}

Maximum matches with overlap limit:

function maxMatchesWithOverlap(text: string, pattern: string, maxOverlap: number): number[] {
  const matches = kmpSearch(text, pattern);
  const results: number[] = [];
  
  for (let i = 0; i < matches.length; i++) {
    const pos = matches[i];
    
    if (results.length === 0 || pos - results[results.length - 1] >= pattern.length - maxOverlap) {
      results.push(pos);
    }
  }
  
  return results;
}

When to Use Advanced String Matching

Use advanced algorithms when:

  • Single pattern — KMP for O(n + m)
  • Multiple patterns — Aho-Corasick or Rabin-Karp
  • Approximate matching — Edit distance, Hamming distance
  • Wildcard patterns — Dynamic programming
  • Regular expressions — State machine or DP
  • Substring queries — Suffix automaton or suffix array
  • Longest common subsequence — Dynamic programming

Template (KMP)

function kmpTemplate(text: string, pattern: string): number[] {
  const lps = buildLPS(pattern);
  const results: number[] = [];
  let i = 0, j = 0;
  
  while (i < text.length) {
    if (text[i] === pattern[j]) {
      i++;
      j++;
    }
    
    if (j === pattern.length) {
      results.push(i - j);
      j = lps[j - 1];
    } else if (i < text.length && text[i] !== pattern[j]) {
      j = j !== 0 ? lps[j - 1] : i++;
    }
  }
  
  return results;
}

Template (Rabin-Karp)

function rabinKarpTemplate(text: string, pattern: string): number[] {
  const base = 256;
  const mod = 1000000007;
  const results: number[] = [];
  
  // Calculate hashes and search
  // Handle rolling hash updates
  
  return results;
}

Time and Space Complexity Summary

  • KMP: O(n + m) time, O(m) space
  • Rabin-Karp: O(n + m) average, O(n * m) worst, O(1) space
  • Z-Algorithm: O(n + m) time, O(n + m) space
  • Boyer-Moore: O(n * m) worst, O(n / m) best, O(m) space
  • Aho-Corasick: O(n + m + z) time, O(m) space
  • Edit Distance: O(m * n) time, O(min(m, n)) space
  • LCS: O(m * n) time, O(min(m, n)) space

Practice Tips

  1. Choose algorithm — KMP for single pattern, Aho-Corasick for multiple
  2. Understand preprocessing — LPS, failure links, hash functions
  3. Handle edge cases — Empty strings, single characters
  4. Optimize space — Use rolling arrays for DP
  5. Hash collisions — Always verify matches in Rabin-Karp
  6. Preprocessing time — Consider one-time cost vs query frequency

Common Mistakes

  1. Off-by-one errors — Careful with array indices
  2. Hash collisions — Not verifying matches in Rabin-Karp
  3. Failure links — Incorrect Aho-Corasick construction
  4. LPS calculation — Wrong prefix/suffix matching
  5. State transitions — Incorrect automata transitions

Related Concepts

  • String Algorithms — Basic string operations
  • Dynamic Programming — For edit distance, LCS
  • Automata Theory — For pattern matching
  • Hashing — For Rabin-Karp

Advanced string matching algorithms are essential for efficient pattern search. Master these patterns, and you'll be well-prepared for complex string algorithm interview questions.