All topics·Topic 01 / 09

String fundamentals

Overview

Most string interview questions are not about memorizing library methods—they are about maintaining a small amount of state while you scan a character sequence. The failure mode I see repeatedly is choosing the right idea (two pointers, sliding window, frequency map) but losing the thread on the invariant: what must stay true about s[left..right] after every move? When that invariant is vague, nested loops feel inevitable, and you end up with an O(n^2) brute force that times out on long inputs. This guide frames strings as contiguous ranges, symmetric peeling from both ends, and multiset bookkeeping, with concrete traces and implementation pitfalls you can recognize under pressure.

Key concepts

  • Substring versus subsequence: A substring is a contiguous slice of the original string; a subsequence allows gaps. Sliding window techniques apply to substrings because the valid region is always a single interval on the index line. If the problem allows skipping characters, a window that only shrinks from the left cannot express the solution space—you usually need a different model (single-direction two pointers, greedy checks, or dynamic programming). Mixing these definitions is the fastest way to implement something that passes toy tests and fails on permutations and repeated characters.
  • Scan, boundaries, invariants: A linear scan updates cheap aggregate state (last index seen, running count, parity). Two explicit indices define boundaries—either independent cursors moving toward a meeting point, or a window [left, right] whose meaning you can state in one sentence before you code. The invariant is the contract: "all characters in the window are unique," "the window contains every required character with sufficient multiplicity," "the outside pair has been validated after skipping ignorable symbols." If you cannot write that contract, pause and re-read the prompt.
  • Representation and alphabet assumptions: Interview constraints often limit you to lowercase English letters, which makes a length-26 counter array attractive for constant-factor speed and clarity. When the alphabet is Unicode-sized or unknown, a map from character to count is the safe default. The tradeoff is memory and constant overhead, not asymptotic complexity for typical n. Always confirm whether comparisons are case-sensitive and whether non-alphanumeric characters matter—those details change skip logic and equality tests.
  • Mutation and copying costs: Building a new string inside a tight loop can devolve into quadratic copying in languages with immutable strings. Interview solutions usually keep indices into the original text, update counts or membership structures, and defer materialization until the end. That discipline also makes complexity arguments cleaner: you are not hiding an extra linear factor inside repeated concatenation.

How it works

  • Two pointers from both ends: Initialize left = 0 and right = n - 1. Each iteration advances one or both pointers inward after applying local rules (skip spaces, compare case-insensitively, ignore punctuation). Terminate when pointers cross. Correctness rests on the observation that each index moves monotonically toward the center, so each character is examined a constant number of times. Guard every skip loop with left < right so purely non-alphanumeric inputs cannot walk past each other and throw.
  • Sliding window expansion and contraction: Fix an expanding edge right. When the window becomes invalid (duplicate, too many distinct letters, sum too large), move left forward until validity returns. The inner while looks dangerous, but it is amortized linear: every index enters the window once and exits once when you use membership or counts tied strictly to the window. The difference between while and a single if is not stylistic—it is the difference between a correct window state and a silent invariant break on inputs like "abba".
  • Frequency maps for constrained windows: Maintain counts for characters "owed" by the problem (pattern multiset) and counts for the current window. Track how many required character types are fully satisfied so you can decide when the window is "complete" without scanning the whole alphabet each step. When shrinking, decrement carefully: a requirement is lost only when a count drops below its target, not merely when it changes.
  • Complexity habits: Two-pointer palindrome-style scans are O(n) time and O(1) extra space beyond the input string. A sliding window with a hash set or map is typically O(n) time with space O(min(n, ÎŁ)) where ÎŁ is the effective alphabet size in the window. State those bounds in terms of pointer movement and structure size, not vibes.

Worked trace: longest substring without repeating characters

Take s = "pwwkew". Maintain a set of characters currently inside s[left..right] with invariant "all characters are unique." Expand right. On duplicate, shrink left until the duplicate leaves the set, then insert the new character. Indices: at "pw" length 2; the third w forces shrinking past p then the first w, leaving window "w"; grow through "wk", "wke" for best 3; the final w evicts the leading w and keeps best 3. Notice left never moves backward—total work stays linear.

Common patterns and techniques

  • Fixed-size window: Slide a length-k panel across the text for anagram checks, rolling hashes, or max-sum subarrays of fixed length. Update counts by removing the outgoing character and adding the incoming one in O(1) per step.
  • Variable-size window for optimization: Grow until feasible, then shrink while feasible to minimize length (minimum window substring). The feasibility predicate is the same class of map logic as expansion-heavy variants, but the objective flips from maximize to minimize.
  • Last-seen index optimization: For "no repeats" variants you can store the last index of each character and jump left to max(left, lastIndex[ch] + 1) instead of deleting from a set one by one. Same asymptotics, fewer churn operations—useful when you want a compact inner loop.
  • Advanced matching (when asked explicitly): Knuth–Morris–Pratt preprocesses the pattern to skip redundant comparisons after mismatches; Rabin–Karp rolls a hash over windows. Reach for these when the problem is about many pattern occurrences or large alphabets with tight time budgets—not for every substring question.

Code: palindrome check with skips

function isPalindrome(s: string): boolean {
  let left = 0;
  let right = s.length - 1;
  const isAlphaNum = (c: string) => /[0-9A-Za-z]/.test(c);
  while (left < right) {
    while (left < right && !isAlphaNum(s[left])) left++;
    while (left < right && !isAlphaNum(s[right])) right--;
    if (s[left].toLowerCase() !== s[right].toLowerCase()) return false;
    left++;
    right--;
  }
  return true;
}

This implementation deliberately avoids allocating a filtered copy of the string. Filtering into a new array or string costs extra space and often an additional pass; keeping two indices preserves O(1) auxiliary space and keeps the comparison logic co-located with the skipping logic. The double while guards are not redundant decoration—they prevent the pointers from crossing while one side is still skipping, which is exactly where off-by-one failures appear on inputs that are all punctuation or mostly spaces. The symmetry choice—advancing both pointers after a successful match—matches the palindrome definition: you have certified the outermost valid pair, so both ends must move inward together. If you only advance one pointer after a match, you either re-compare the same characters or break the "outside-in" order and risk false negatives. Case normalization happens at comparison time so you do not need to mutate the input; if the problem disallows regex character classes, replace isAlphaNum with explicit range checks while keeping the same guard structure. Changing the skip loops to omit left < right can throw on some languages when pointers run past the ends after heavy punctuation. In live interviews, state aloud that each pointer moves monotonically inward, so each index is touched a constant number of times—that is your O(n) time argument.

Code: longest substring without repeating characters

function lengthOfLongestSubstring(s: string): number {
  const seen = new Set<string>();
  let left = 0;
  let best = 0;
  for (let right = 0; right < s.length; right++) {
    while (seen.has(s[right])) {
      seen.delete(s[left]);
      left++;
    }
    seen.add(s[right]);
    best = Math.max(best, right - left + 1);
  }
  return best;
}

The set models membership of the active window, and the invariant is "no duplicates inside [left, right]." You expand with right and contract with left until the incoming character is novel for the window. The while is the critical piece: after removing s[left] once, the window might still contain another copy of s[right], so you must keep shrinking until the duplicate is gone. Replacing that loop with a single if is one of the most common implementation bugs I have watched break otherwise correct candidates—"abba" is the classic witness where one eviction is insufficient. Choosing a set here encodes uniqueness directly; if the problem changes to "at most k distinct characters" or "permutation of pattern," you swap the set for a frequency map and adjust the validity test, but the pointer economics stay the same. The length update uses right - left + 1 because both bounds are inclusive; forgetting the +1 hides on short strings and fails exactly when the answer is length 1. Alternatives include tracking last indices to jump left instead of deleting from a set; that can reduce constant factors but demands careful reasoning about stale indices outside the window. Deleting s[right] by mistake disconnects the structure from the window boundary and produces flaky behavior that is hard to debug mid-interview—always mutate state on the edge that actually moved.

Practice tips

  • Write the window invariant as a single sentence before coding; if you cannot, you are still fuzzy on substring versus subsequence or on what "valid" means.
  • Test with repeated characters at both ends, all identical characters, length 0/1, and patterns like "abba" specifically for contraction loops.
  • When counts matter, use a map or fixed array; when only presence matters, a set suffices—do not pay for multiset logic you will not use.
  • After you state time complexity, tie it to pointer movement: each index advances a limited number of times across the whole run.

Related topics

Arrays reuse the same sliding-window and two-pointer idioms on contiguous memory: Array. Pointer discipline and invariants carry directly to node-based structures next: Linked List.