String Parsing
Overview
String parsing involves breaking down a string into meaningful components (tokens, words, segments) and processing them systematically. This technique is essential for problems involving file paths, expressions, URLs, CSV data, and formatted text.
Parsing typically involves:
- Tokenization β Splitting the string into tokens
- Validation β Checking if tokens meet certain criteria
- Transformation β Converting tokens into desired format
- Reconstruction β Building new strings from processed tokens
Common Parsing Patterns
Pattern 1: Split and Process
Use built-in split methods or manual parsing to break strings into segments.
Example: Reverse Words in a String
function reverseWords(s: string): string {
return s.trim()
.split(/\s+/) // Split on whitespace
.reverse()
.join(' ');
}Manual parsing version:
function reverseWordsManual(s: string): string {
const words: string[] = [];
let word = '';
for (let i = 0; i < s.length; i++) {
if (s[i] === ' ') {
if (word) {
words.push(word);
word = '';
}
} else {
word += s[i];
}
}
if (word) words.push(word);
return words.reverse().join(' ');
}Time: O(n), Space: O(n)
Pattern 2: Path Parsing
Parse file paths, URLs, or hierarchical structures.
Example: Simplify Path
function simplifyPath(path: string): string {
const stack: string[] = [];
const parts = path.split('/');
for (const part of parts) {
if (part === '' || part === '.') {
continue;
} else if (part === '..') {
if (stack.length > 0) {
stack.pop();
}
} else {
stack.push(part);
}
}
return '/' + stack.join('/');
}Example: Extract Domain from URL
function extractDomain(url: string): string {
// Remove protocol
let domain = url.replace(/^https?:\/\//, '');
// Remove path, query, fragment
domain = domain.split('/')[0];
domain = domain.split('?')[0];
domain = domain.split('#')[0];
// Remove port
domain = domain.split(':')[0];
return domain;
}Pattern 3: Expression Parsing
Parse mathematical expressions or nested structures.
Example: Basic Calculator (addition and subtraction)
function calculate(s: string): number {
let result = 0;
let sign = 1; // 1 for +, -1 for -
let num = 0;
const stack: number[] = [];
for (let i = 0; i < s.length; i++) {
const char = s[i];
if (char >= '0' && char <= '9') {
num = num * 10 + (char.charCodeAt(0) - '0'.charCodeAt(0));
} else if (char === '+' || char === '-') {
result += sign * num;
num = 0;
sign = char === '+' ? 1 : -1;
} else if (char === '(') {
stack.push(result);
stack.push(sign);
result = 0;
sign = 1;
} else if (char === ')') {
result += sign * num;
num = 0;
result *= stack.pop()!; // sign
result += stack.pop()!; // previous result
}
}
return result + sign * num;
}Pattern 4: Delimiter-Based Parsing
Parse strings with specific delimiters (CSV, key-value pairs, etc.).
Example: Parse Key-Value Pairs
function parseKeyValue(s: string, delimiter: string = '&'): Map<string, string> {
const map = new Map<string, string>();
const pairs = s.split(delimiter);
for (const pair of pairs) {
const [key, value] = pair.split('=');
if (key && value) {
map.set(decodeURIComponent(key), decodeURIComponent(value));
}
}
return map;
}Example: Parse CSV Line
function parseCSVLine(line: string): string[] {
const fields: string[] = [];
let current = '';
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const char = line[i];
if (char === '"') {
inQuotes = !inQuotes;
} else if (char === ',' && !inQuotes) {
fields.push(current);
current = '';
} else {
current += char;
}
}
fields.push(current);
return fields;
}Advanced Patterns
Pattern 5: Valid Number Parsing
Parse and validate number formats (integers, decimals, scientific notation).
function isNumber(s: string): boolean {
let i = 0;
const n = s.length;
// Skip leading whitespace
while (i < n && s[i] === ' ') i++;
// Optional sign
if (i < n && (s[i] === '+' || s[i] === '-')) i++;
let hasDigits = false;
let hasDot = false;
// Parse integer part
while (i < n && s[i] >= '0' && s[i] <= '9') {
hasDigits = true;
i++;
}
// Optional decimal point
if (i < n && s[i] === '.') {
hasDot = true;
i++;
while (i < n && s[i] >= '0' && s[i] <= '9') {
hasDigits = true;
i++;
}
}
// Must have at least one digit
if (!hasDigits) return false;
// Optional exponent
if (i < n && (s[i] === 'e' || s[i] === 'E')) {
i++;
if (i < n && (s[i] === '+' || s[i] === '-')) i++;
let expDigits = false;
while (i < n && s[i] >= '0' && s[i] <= '9') {
expDigits = true;
i++;
}
if (!expDigits) return false;
}
// Skip trailing whitespace
while (i < n && s[i] === ' ') i++;
return i === n;
}Pattern 6: Nested Structure Parsing
Parse nested structures like JSON, XML, or parentheses.
function isValidNested(s: string, open: string, close: string): boolean {
const stack: string[] = [];
for (const char of s) {
if (char === open) {
stack.push(char);
} else if (char === close) {
if (stack.length === 0 || stack.pop() !== open) {
return false;
}
}
}
return stack.length === 0;
}Parsing Techniques
Technique 1: State Machine
Use a state machine for complex parsing logic.
enum ParseState {
START,
IN_WORD,
IN_NUMBER,
ERROR
}
function parseWithStateMachine(s: string): string[] {
const tokens: string[] = [];
let state = ParseState.START;
let current = '';
for (const char of s) {
switch (state) {
case ParseState.START:
if (/[a-zA-Z]/.test(char)) {
state = ParseState.IN_WORD;
current = char;
} else if (/[0-9]/.test(char)) {
state = ParseState.IN_NUMBER;
current = char;
}
break;
case ParseState.IN_WORD:
if (/[a-zA-Z0-9]/.test(char)) {
current += char;
} else {
tokens.push(current);
current = '';
state = ParseState.START;
}
break;
case ParseState.IN_NUMBER:
if (/[0-9]/.test(char)) {
current += char;
} else {
tokens.push(current);
current = '';
state = ParseState.START;
}
break;
}
}
if (current) tokens.push(current);
return tokens;
}Technique 2: Recursive Descent
Parse recursively for nested structures.
function parseExpression(s: string, index: { value: number }): number {
let result = parseTerm(s, index);
while (index.value < s.length && (s[index.value] === '+' || s[index.value] === '-')) {
const op = s[index.value++];
const term = parseTerm(s, index);
result = op === '+' ? result + term : result - term;
}
return result;
}
function parseTerm(s: string, index: { value: number }): number {
let result = parseFactor(s, index);
while (index.value < s.length && (s[index.value] === '*' || s[index.value] === '/')) {
const op = s[index.value++];
const factor = parseFactor(s, index);
result = op === '*' ? result * factor : result / factor;
}
return result;
}
function parseFactor(s: string, index: { value: number }): number {
if (s[index.value] === '(') {
index.value++; // skip '('
const result = parseExpression(s, index);
index.value++; // skip ')'
return result;
}
let num = 0;
while (index.value < s.length && s[index.value] >= '0' && s[index.value] <= '9') {
num = num * 10 + (s.charCodeAt(index.value++) - '0'.charCodeAt(0));
}
return num;
}When to Use String Parsing
Use parsing techniques when you see these signals:
- "Split" or "Parse" β Break string into components
- "Path" or "URL" β Hierarchical structures
- "Expression" or "Formula" β Mathematical or logical expressions
- "CSV" or "Delimited" β Structured data with separators
- "Validate format" β Check if string matches a pattern
- "Extract" β Pull specific parts from formatted strings
Common Challenges
- Whitespace handling β Trim, normalize, or preserve?
- Edge cases β Empty strings, single tokens, trailing delimiters
- Escaped characters β Handle quotes, backslashes, special chars
- Nested structures β Use stack or recursion
- Performance β Avoid unnecessary string concatenation (use array + join)
Template
function parseTemplate(s: string): string[] {
const tokens: string[] = [];
let current = '';
for (let i = 0; i < s.length; i++) {
const char = s[i];
if (/* delimiter or separator */) {
if (current) {
tokens.push(current);
current = '';
}
} else {
current += char;
}
}
if (current) tokens.push(current);
return tokens;
}Time and Space Complexity
- Time: O(n) for simple parsing, O(nΒ²) if string concatenation is used inefficiently
- Space: O(n) for storing tokens, O(1) if processing in-place
Practice Tips
- Use built-in methods β
split(),trim(),replace()when appropriate - Avoid string concatenation in loops β Use array and
join()instead - Handle edge cases β Empty strings, all delimiters, no delimiters
- Consider regex β For pattern matching, but be mindful of performance
- Use stack for nested structures β Parentheses, brackets, etc.
Related Concepts
- Regular Expressions β Pattern matching for parsing
- Stack β Essential for nested structure parsing
- State Machines β For complex parsing logic
- Recursion β For recursive structures
String parsing is a fundamental skill that appears in many real-world applications. Master these patterns, and you'll be able to handle a wide variety of string processing problems.