Topicsβ€ΊStringβ€ΊString Parsing
πŸ“–

String Parsing

String
~20 min read
5 practice problems

Overview

String parsing involves breaking down a string into meaningful components (tokens, words, segments) and processing them systematically. This technique is essential for problems involving file paths, expressions, URLs, CSV data, and formatted text.

Parsing typically involves:

  1. Tokenization β€” Splitting the string into tokens
  2. Validation β€” Checking if tokens meet certain criteria
  3. Transformation β€” Converting tokens into desired format
  4. Reconstruction β€” Building new strings from processed tokens

Common Parsing Patterns

Pattern 1: Split and Process

Use built-in split methods or manual parsing to break strings into segments.

Example: Reverse Words in a String

function reverseWords(s: string): string {
  return s.trim()
    .split(/\s+/)  // Split on whitespace
    .reverse()
    .join(' ');
}

Manual parsing version:

function reverseWordsManual(s: string): string {
  const words: string[] = [];
  let word = '';
  
  for (let i = 0; i < s.length; i++) {
    if (s[i] === ' ') {
      if (word) {
        words.push(word);
        word = '';
      }
    } else {
      word += s[i];
    }
  }
  
  if (word) words.push(word);
  
  return words.reverse().join(' ');
}

Time: O(n), Space: O(n)

Pattern 2: Path Parsing

Parse file paths, URLs, or hierarchical structures.

Example: Simplify Path

function simplifyPath(path: string): string {
  const stack: string[] = [];
  const parts = path.split('/');
  
  for (const part of parts) {
    if (part === '' || part === '.') {
      continue;
    } else if (part === '..') {
      if (stack.length > 0) {
        stack.pop();
      }
    } else {
      stack.push(part);
    }
  }
  
  return '/' + stack.join('/');
}

Example: Extract Domain from URL

function extractDomain(url: string): string {
  // Remove protocol
  let domain = url.replace(/^https?:\/\//, '');
  
  // Remove path, query, fragment
  domain = domain.split('/')[0];
  domain = domain.split('?')[0];
  domain = domain.split('#')[0];
  
  // Remove port
  domain = domain.split(':')[0];
  
  return domain;
}

Pattern 3: Expression Parsing

Parse mathematical expressions or nested structures.

Example: Basic Calculator (addition and subtraction)

function calculate(s: string): number {
  let result = 0;
  let sign = 1; // 1 for +, -1 for -
  let num = 0;
  const stack: number[] = [];
  
  for (let i = 0; i < s.length; i++) {
    const char = s[i];
    
    if (char >= '0' && char <= '9') {
      num = num * 10 + (char.charCodeAt(0) - '0'.charCodeAt(0));
    } else if (char === '+' || char === '-') {
      result += sign * num;
      num = 0;
      sign = char === '+' ? 1 : -1;
    } else if (char === '(') {
      stack.push(result);
      stack.push(sign);
      result = 0;
      sign = 1;
    } else if (char === ')') {
      result += sign * num;
      num = 0;
      result *= stack.pop()!; // sign
      result += stack.pop()!; // previous result
    }
  }
  
  return result + sign * num;
}

Pattern 4: Delimiter-Based Parsing

Parse strings with specific delimiters (CSV, key-value pairs, etc.).

Example: Parse Key-Value Pairs

function parseKeyValue(s: string, delimiter: string = '&'): Map<string, string> {
  const map = new Map<string, string>();
  const pairs = s.split(delimiter);
  
  for (const pair of pairs) {
    const [key, value] = pair.split('=');
    if (key && value) {
      map.set(decodeURIComponent(key), decodeURIComponent(value));
    }
  }
  
  return map;
}

Example: Parse CSV Line

function parseCSVLine(line: string): string[] {
  const fields: string[] = [];
  let current = '';
  let inQuotes = false;
  
  for (let i = 0; i < line.length; i++) {
    const char = line[i];
    
    if (char === '"') {
      inQuotes = !inQuotes;
    } else if (char === ',' && !inQuotes) {
      fields.push(current);
      current = '';
    } else {
      current += char;
    }
  }
  
  fields.push(current);
  return fields;
}

Advanced Patterns

Pattern 5: Valid Number Parsing

Parse and validate number formats (integers, decimals, scientific notation).

function isNumber(s: string): boolean {
  let i = 0;
  const n = s.length;
  
  // Skip leading whitespace
  while (i < n && s[i] === ' ') i++;
  
  // Optional sign
  if (i < n && (s[i] === '+' || s[i] === '-')) i++;
  
  let hasDigits = false;
  let hasDot = false;
  
  // Parse integer part
  while (i < n && s[i] >= '0' && s[i] <= '9') {
    hasDigits = true;
    i++;
  }
  
  // Optional decimal point
  if (i < n && s[i] === '.') {
    hasDot = true;
    i++;
    while (i < n && s[i] >= '0' && s[i] <= '9') {
      hasDigits = true;
      i++;
    }
  }
  
  // Must have at least one digit
  if (!hasDigits) return false;
  
  // Optional exponent
  if (i < n && (s[i] === 'e' || s[i] === 'E')) {
    i++;
    if (i < n && (s[i] === '+' || s[i] === '-')) i++;
    let expDigits = false;
    while (i < n && s[i] >= '0' && s[i] <= '9') {
      expDigits = true;
      i++;
    }
    if (!expDigits) return false;
  }
  
  // Skip trailing whitespace
  while (i < n && s[i] === ' ') i++;
  
  return i === n;
}

Pattern 6: Nested Structure Parsing

Parse nested structures like JSON, XML, or parentheses.

function isValidNested(s: string, open: string, close: string): boolean {
  const stack: string[] = [];
  
  for (const char of s) {
    if (char === open) {
      stack.push(char);
    } else if (char === close) {
      if (stack.length === 0 || stack.pop() !== open) {
        return false;
      }
    }
  }
  
  return stack.length === 0;
}

Parsing Techniques

Technique 1: State Machine

Use a state machine for complex parsing logic.

enum ParseState {
  START,
  IN_WORD,
  IN_NUMBER,
  ERROR
}

function parseWithStateMachine(s: string): string[] {
  const tokens: string[] = [];
  let state = ParseState.START;
  let current = '';
  
  for (const char of s) {
    switch (state) {
      case ParseState.START:
        if (/[a-zA-Z]/.test(char)) {
          state = ParseState.IN_WORD;
          current = char;
        } else if (/[0-9]/.test(char)) {
          state = ParseState.IN_NUMBER;
          current = char;
        }
        break;
        
      case ParseState.IN_WORD:
        if (/[a-zA-Z0-9]/.test(char)) {
          current += char;
        } else {
          tokens.push(current);
          current = '';
          state = ParseState.START;
        }
        break;
        
      case ParseState.IN_NUMBER:
        if (/[0-9]/.test(char)) {
          current += char;
        } else {
          tokens.push(current);
          current = '';
          state = ParseState.START;
        }
        break;
    }
  }
  
  if (current) tokens.push(current);
  return tokens;
}

Technique 2: Recursive Descent

Parse recursively for nested structures.

function parseExpression(s: string, index: { value: number }): number {
  let result = parseTerm(s, index);
  
  while (index.value < s.length && (s[index.value] === '+' || s[index.value] === '-')) {
    const op = s[index.value++];
    const term = parseTerm(s, index);
    result = op === '+' ? result + term : result - term;
  }
  
  return result;
}

function parseTerm(s: string, index: { value: number }): number {
  let result = parseFactor(s, index);
  
  while (index.value < s.length && (s[index.value] === '*' || s[index.value] === '/')) {
    const op = s[index.value++];
    const factor = parseFactor(s, index);
    result = op === '*' ? result * factor : result / factor;
  }
  
  return result;
}

function parseFactor(s: string, index: { value: number }): number {
  if (s[index.value] === '(') {
    index.value++; // skip '('
    const result = parseExpression(s, index);
    index.value++; // skip ')'
    return result;
  }
  
  let num = 0;
  while (index.value < s.length && s[index.value] >= '0' && s[index.value] <= '9') {
    num = num * 10 + (s.charCodeAt(index.value++) - '0'.charCodeAt(0));
  }
  
  return num;
}

When to Use String Parsing

Use parsing techniques when you see these signals:

  • "Split" or "Parse" β€” Break string into components
  • "Path" or "URL" β€” Hierarchical structures
  • "Expression" or "Formula" β€” Mathematical or logical expressions
  • "CSV" or "Delimited" β€” Structured data with separators
  • "Validate format" β€” Check if string matches a pattern
  • "Extract" β€” Pull specific parts from formatted strings

Common Challenges

  1. Whitespace handling β€” Trim, normalize, or preserve?
  2. Edge cases β€” Empty strings, single tokens, trailing delimiters
  3. Escaped characters β€” Handle quotes, backslashes, special chars
  4. Nested structures β€” Use stack or recursion
  5. Performance β€” Avoid unnecessary string concatenation (use array + join)

Template

function parseTemplate(s: string): string[] {
  const tokens: string[] = [];
  let current = '';
  
  for (let i = 0; i < s.length; i++) {
    const char = s[i];
    
    if (/* delimiter or separator */) {
      if (current) {
        tokens.push(current);
        current = '';
      }
    } else {
      current += char;
    }
  }
  
  if (current) tokens.push(current);
  
  return tokens;
}

Time and Space Complexity

  • Time: O(n) for simple parsing, O(nΒ²) if string concatenation is used inefficiently
  • Space: O(n) for storing tokens, O(1) if processing in-place

Practice Tips

  1. Use built-in methods β€” split(), trim(), replace() when appropriate
  2. Avoid string concatenation in loops β€” Use array and join() instead
  3. Handle edge cases β€” Empty strings, all delimiters, no delimiters
  4. Consider regex β€” For pattern matching, but be mindful of performance
  5. Use stack for nested structures β€” Parentheses, brackets, etc.

Related Concepts

  • Regular Expressions β€” Pattern matching for parsing
  • Stack β€” Essential for nested structure parsing
  • State Machines β€” For complex parsing logic
  • Recursion β€” For recursive structures

String parsing is a fundamental skill that appears in many real-world applications. Master these patterns, and you'll be able to handle a wide variety of string processing problems.