How to fix 2201B: invalid_regular_expression in PostgreSQL

PostgreSQLINTERMEDIATEMEDIUM

The 2201B error occurs when PostgreSQL encounters malformed regex syntax. This typically involves unescaped special characters, unbalanced parentheses, or invalid escape sequences in pattern matching operations.

What this error means

The `2201B: invalid_regular_expression` error is PostgreSQL's way of reporting that a regular expression pattern used in a query contains invalid syntax. This error is thrown when using regex operators like `~`, `~*`, `!~`, `!~*` or functions like `regexp_matches()`, `regexp_replace()`, or `regexp_split_to_array()` with a pattern that doesn't conform to POSIX regular expression syntax. PostgreSQL uses a variant of the POSIX regular expression engine (specifically, a superset of POSIX ERE - Extended Regular Expressions - with additional Advanced RE features). Unlike PCRE (Perl Compatible Regular Expressions) used in many programming languages, PostgreSQL's regex engine has different syntax requirements and escape rules. Common mistakes include using backslash escape sequences that are valid in other regex flavors but illegal in PostgreSQL, mismatched parentheses or brackets, or forgetting to double-escape backslashes in SQL string literals. The error is typically caught at query execution time when PostgreSQL's regex parser attempts to compile the pattern. The error message will include the sqlstate code `2201B` and usually indicates the specific issue, such as "invalid escape sequence" or "parentheses () not balanced".

How to fix "2201B: invalid_regular_expression"

1Validate your regex pattern syntax

First, check the exact error message from PostgreSQL - it often indicates the specific problem:

sql

-- Example error output
ERROR:  invalid regular expression: parentheses () not balanced
ERROR:  invalid regular expression: invalid escape \\ sequence

Review your regex pattern for common syntax issues:
- Unmatched opening/closing parentheses: (, )
- Unmatched brackets: [, ]
- Unmatched braces: {, }
- Invalid escape sequences

Test your pattern against PostgreSQL's POSIX regex syntax requirements. Remember that PostgreSQL regex is NOT the same as JavaScript, Python, or PCRE regex.

2Fix backslash escaping in SQL string literals

In PostgreSQL SQL strings, backslashes must be doubled to represent a single backslash in the regex pattern:

sql

-- WRONG: Single backslash gets consumed by SQL string parsing
SELECT email FROM users WHERE email ~ '@\w+\.com';
-- ERROR: invalid regular expression: invalid escape \\ sequence

-- CORRECT: Double backslash for literal backslash in regex
SELECT email FROM users WHERE email ~ '@\\w+\\.com';
-- Pattern seen by regex engine: @\w+\.com

Alternatively, use the PostgreSQL escape string syntax E'...' and double the backslashes:

sql

SELECT email FROM users WHERE email ~ E'@\\w+\\.com';

Or use dollar-quoted strings to avoid escaping issues:

sql

SELECT email FROM users WHERE email ~ $$@\w+\.com$$;

3Escape special regex metacharacters

If you want to match literal characters that have special meaning in regex, escape them with backslash:

Special regex metacharacters: . ^ $ * + ? { } [ ] \\ | ( )

sql

-- WRONG: Dot matches any character
SELECT * FROM products WHERE code ~ 'ABC.123';
-- Matches: ABC-123, ABCX123, etc.

-- CORRECT: Escape the dot to match literal period
SELECT * FROM products WHERE code ~ 'ABC\\.123';
-- Matches only: ABC.123

-- WRONG: Parentheses create capture group
SELECT regexp_matches('test(123)', 'test(123)');
-- ERROR: invalid regular expression: parentheses () not balanced

-- CORRECT: Escape parentheses for literal match
SELECT regexp_matches('test(123)', 'test\\(123\\)');
-- Returns: {test(123)}

For user-submitted input, create an escaping function:

sql

CREATE OR REPLACE FUNCTION escape_regex(text) RETURNS text AS $$
  SELECT regexp_replace($1, '([.^$*+?{}\[\]\\|()])', '\\\1', 'g');
$$ LANGUAGE SQL IMMUTABLE;

-- Use it to sanitize user input
SELECT * FROM articles
WHERE content ~ escape_regex('What is $cost?');

4Check for balanced parentheses and brackets

Ensure all grouping constructs are properly closed:

sql

-- WRONG: Unclosed parenthesis
SELECT * FROM logs WHERE message ~ '(ERROR|WARN';
-- ERROR: invalid regular expression: parentheses () not balanced

-- CORRECT: Balanced parentheses
SELECT * FROM logs WHERE message ~ '(ERROR|WARN)';

-- WRONG: Unclosed bracket
SELECT * FROM codes WHERE value ~ '[A-Z0-9';
-- ERROR: invalid regular expression: brackets [] not balanced

-- CORRECT: Closed bracket
SELECT * FROM codes WHERE value ~ '[A-Z0-9]';

-- WRONG: Unclosed brace in repetition
SELECT * FROM ids WHERE id ~ '^[0-9]{5$';
-- ERROR: invalid regular expression: braces {} not balanced

-- CORRECT: Properly closed brace
SELECT * FROM ids WHERE id ~ '^[0-9]{5}$';

5Use PostgreSQL-compatible regex syntax

Avoid PCRE-specific features not supported in POSIX regex:

Not supported in PostgreSQL:
- Named capture groups: (?P<name>...) or (?<name>...)
- Non-capturing groups: (?:...) - PostgreSQL doesn't have this, all groups capture
- Lookbehind/lookahead: (?<=...), (?<!...), (?=...), (?!...)
- \K (reset match start)
- Possessive quantifiers: *+, ++, ?+

Use PostgreSQL alternatives:

sql

-- WRONG: PCRE non-capturing group
SELECT regexp_matches('test123', '(?:test)(\d+)');
-- ERROR: invalid regular expression: invalid embedded option

-- CORRECT: Just use capturing group (or don't group if not needed)
SELECT regexp_matches('test123', 'test(\d+)');

-- WRONG: Lookbehind
SELECT regexp_matches('$100', '(?<=\$)\d+');
-- ERROR: invalid regular expression

-- CORRECT: Use capturing group and extract what you need
SELECT (regexp_matches('$100', '\$(\d+)'))[1];
-- Returns: 100

Refer to PostgreSQL's official pattern matching documentation for supported features: https://www.postgresql.org/docs/current/functions-matching.html

6Test patterns with regexp_match for debugging

Use regexp_match() to test and debug your patterns before using them in complex queries:

sql

-- Test if pattern is valid and matches as expected
SELECT regexp_match('test string', 'your\\pattern\\here');

-- If this works, the pattern is valid
SELECT regexp_match('[email protected]', '^[\\w.-]+@[\\w.-]+\\.[a-z]{2,}$');
-- Returns: {[email protected]}

-- Test character classes
SELECT regexp_match('ABC123', '[A-Z]+');
-- Returns: {ABC}

-- Verify escape sequences work correctly
SELECT regexp_match('file.txt', '\\w+\\.\\w+');
-- Returns: {file.txt}

For complex patterns, build them incrementally:

sql

-- Start simple
SELECT regexp_match('2024-01-15', '\\d{4}');  -- {2024}

-- Add more
SELECT regexp_match('2024-01-15', '\\d{4}-\\d{2}');  -- {2024-01}

-- Complete pattern
SELECT regexp_match('2024-01-15', '\\d{4}-\\d{2}-\\d{2}');  -- {2024-01-15}

How to fix 2201B: invalid_regular_expression in PostgreSQL

What this error means

Typical symptoms

Common causes

How to fix "2201B: invalid_regular_expression"

Advanced notes

Related errors

Official resources & further reading