The 2201B error occurs when PostgreSQL encounters malformed regex syntax. This typically involves unescaped special characters, unbalanced parentheses, or invalid escape sequences in pattern matching operations.
The `2201B: invalid_regular_expression` error is PostgreSQL's way of reporting that a regular expression pattern used in a query contains invalid syntax. This error is thrown when using regex operators like `~`, `~*`, `!~`, `!~*` or functions like `regexp_matches()`, `regexp_replace()`, or `regexp_split_to_array()` with a pattern that doesn't conform to POSIX regular expression syntax. PostgreSQL uses a variant of the POSIX regular expression engine (specifically, a superset of POSIX ERE - Extended Regular Expressions - with additional Advanced RE features). Unlike PCRE (Perl Compatible Regular Expressions) used in many programming languages, PostgreSQL's regex engine has different syntax requirements and escape rules. Common mistakes include using backslash escape sequences that are valid in other regex flavors but illegal in PostgreSQL, mismatched parentheses or brackets, or forgetting to double-escape backslashes in SQL string literals. The error is typically caught at query execution time when PostgreSQL's regex parser attempts to compile the pattern. The error message will include the sqlstate code `2201B` and usually indicates the specific issue, such as "invalid escape sequence" or "parentheses () not balanced".
First, check the exact error message from PostgreSQL - it often indicates the specific problem:
-- Example error output
ERROR: invalid regular expression: parentheses () not balanced
ERROR: invalid regular expression: invalid escape \\ sequenceReview your regex pattern for common syntax issues:
- Unmatched opening/closing parentheses: (, )
- Unmatched brackets: [, ]
- Unmatched braces: {, }
- Invalid escape sequences
Test your pattern against PostgreSQL's POSIX regex syntax requirements. Remember that PostgreSQL regex is NOT the same as JavaScript, Python, or PCRE regex.
In PostgreSQL SQL strings, backslashes must be doubled to represent a single backslash in the regex pattern:
-- WRONG: Single backslash gets consumed by SQL string parsing
SELECT email FROM users WHERE email ~ '@\w+\.com';
-- ERROR: invalid regular expression: invalid escape \\ sequence
-- CORRECT: Double backslash for literal backslash in regex
SELECT email FROM users WHERE email ~ '@\\w+\\.com';
-- Pattern seen by regex engine: @\w+\.comAlternatively, use the PostgreSQL escape string syntax E'...' and double the backslashes:
SELECT email FROM users WHERE email ~ E'@\\w+\\.com';Or use dollar-quoted strings to avoid escaping issues:
SELECT email FROM users WHERE email ~ $$@\w+\.com$$;If you want to match literal characters that have special meaning in regex, escape them with backslash:
Special regex metacharacters: . ^ $ * + ? { } [ ] \\ | ( )
-- WRONG: Dot matches any character
SELECT * FROM products WHERE code ~ 'ABC.123';
-- Matches: ABC-123, ABCX123, etc.
-- CORRECT: Escape the dot to match literal period
SELECT * FROM products WHERE code ~ 'ABC\\.123';
-- Matches only: ABC.123
-- WRONG: Parentheses create capture group
SELECT regexp_matches('test(123)', 'test(123)');
-- ERROR: invalid regular expression: parentheses () not balanced
-- CORRECT: Escape parentheses for literal match
SELECT regexp_matches('test(123)', 'test\\(123\\)');
-- Returns: {test(123)}For user-submitted input, create an escaping function:
CREATE OR REPLACE FUNCTION escape_regex(text) RETURNS text AS $$
SELECT regexp_replace($1, '([.^$*+?{}\[\]\\|()])', '\\\1', 'g');
$$ LANGUAGE SQL IMMUTABLE;
-- Use it to sanitize user input
SELECT * FROM articles
WHERE content ~ escape_regex('What is $cost?');Ensure all grouping constructs are properly closed:
-- WRONG: Unclosed parenthesis
SELECT * FROM logs WHERE message ~ '(ERROR|WARN';
-- ERROR: invalid regular expression: parentheses () not balanced
-- CORRECT: Balanced parentheses
SELECT * FROM logs WHERE message ~ '(ERROR|WARN)';
-- WRONG: Unclosed bracket
SELECT * FROM codes WHERE value ~ '[A-Z0-9';
-- ERROR: invalid regular expression: brackets [] not balanced
-- CORRECT: Closed bracket
SELECT * FROM codes WHERE value ~ '[A-Z0-9]';
-- WRONG: Unclosed brace in repetition
SELECT * FROM ids WHERE id ~ '^[0-9]{5$';
-- ERROR: invalid regular expression: braces {} not balanced
-- CORRECT: Properly closed brace
SELECT * FROM ids WHERE id ~ '^[0-9]{5}$';Avoid PCRE-specific features not supported in POSIX regex:
Not supported in PostgreSQL:
- Named capture groups: (?P<name>...) or (?<name>...)
- Non-capturing groups: (?:...) - PostgreSQL doesn't have this, all groups capture
- Lookbehind/lookahead: (?<=...), (?<!...), (?=...), (?!...)
- \K (reset match start)
- Possessive quantifiers: *+, ++, ?+
Use PostgreSQL alternatives:
-- WRONG: PCRE non-capturing group
SELECT regexp_matches('test123', '(?:test)(\d+)');
-- ERROR: invalid regular expression: invalid embedded option
-- CORRECT: Just use capturing group (or don't group if not needed)
SELECT regexp_matches('test123', 'test(\d+)');
-- WRONG: Lookbehind
SELECT regexp_matches('$100', '(?<=\$)\d+');
-- ERROR: invalid regular expression
-- CORRECT: Use capturing group and extract what you need
SELECT (regexp_matches('$100', '\$(\d+)'))[1];
-- Returns: 100Refer to PostgreSQL's official pattern matching documentation for supported features: https://www.postgresql.org/docs/current/functions-matching.html
Use regexp_match() to test and debug your patterns before using them in complex queries:
-- Test if pattern is valid and matches as expected
SELECT regexp_match('test string', 'your\\pattern\\here');
-- If this works, the pattern is valid
SELECT regexp_match('[email protected]', '^[\\w.-]+@[\\w.-]+\\.[a-z]{2,}$');
-- Returns: {[email protected]}
-- Test character classes
SELECT regexp_match('ABC123', '[A-Z]+');
-- Returns: {ABC}
-- Verify escape sequences work correctly
SELECT regexp_match('file.txt', '\\w+\\.\\w+');
-- Returns: {file.txt}For complex patterns, build them incrementally:
-- Start simple
SELECT regexp_match('2024-01-15', '\\d{4}'); -- {2024}
-- Add more
SELECT regexp_match('2024-01-15', '\\d{4}-\\d{2}'); -- {2024-01}
-- Complete pattern
SELECT regexp_match('2024-01-15', '\\d{4}-\\d{2}-\\d{2}'); -- {2024-01-15}Character Class Ranges: In bracket expressions, ranges must be specified in ascending order. [z-a] is invalid; use [a-z] instead. Case matters: [A-z] includes more than just letters because it spans ASCII codes 65-122.
Octal vs Back References: PostgreSQL resolves ambiguity between octal escapes (\040 for space) and back references (\1 for first capture group) using specific rules: a leading zero indicates octal, a single non-zero digit is always a back reference, and multi-digit sequences are back references if they match a valid capture group number, otherwise octal.
ARE vs ERE modes: By default, PostgreSQL uses Advanced Regular Expressions (AREs). You can switch modes using the flags parameter in regex functions. For example, regexp_matches(string, pattern, 'b') forces basic RE mode. Most users should stick with the default ARE mode.
Performance considerations: Complex regex patterns with many alternations or nested groups can be slow on large datasets. PostgreSQL's regex engine doesn't optimize as aggressively as specialized search engines. For performance-critical searches on large tables, consider using full-text search (tsvector/tsquery) or LIKE patterns when appropriate.
Database migration pitfalls: If migrating from MySQL, be aware that MySQL's REGEXP uses a different engine (based on Henry Spencer's library but with differences). Patterns may need adjustment. From Oracle, REGEXP_LIKE and similar functions have different syntax. Always test regex patterns after migration.
Standard SQL vs PostgreSQL: The SQL standard defines pattern matching with LIKE and SIMILAR TO operators. PostgreSQL's ~ operator and regex functions are extensions. For maximum portability, use SIMILAR TO (which uses SQL:1999 regex syntax), though it's less powerful than full POSIX regex.
ERROR: syntax error at end of input
Syntax error at end of input in PostgreSQL
Bind message supplies N parameters but prepared statement requires M
Bind message supplies N parameters but prepared statement requires M in PostgreSQL
Multidimensional arrays must have sub-arrays with matching dimensions
Multidimensional arrays must have sub-arrays with matching dimensions
ERROR: value too long for type character varying
Value too long for type character varying
insufficient columns in unique constraint for partition key
How to fix "insufficient columns in unique constraint for partition key" in PostgreSQL