This error occurs when PostgreSQL encounters data that doesn't conform to UTF-8 encoding. Common causes include importing data from non-UTF8 sources, null bytes in text fields, or client/server encoding mismatches. Ensure data is properly encoded or use appropriate binary formats.
PostgreSQL is configured to use UTF-8 encoding but received byte sequences that are invalid in UTF-8. UTF-8 is a variable-length character encoding where characters are represented by sequences of bytes. Invalid sequences occur when byte patterns don't match valid UTF-8 structure. This often happens during data import (COPY, INSERT), when loading data from different encoding sources, or when handling special characters like null bytes that UTF-8 cannot represent in text fields. PostgreSQL strictly validates encoding to prevent data corruption and ensure consistent behavior across systems.
Verify your database is configured for UTF-8:
-- Connect to PostgreSQL
psql -U username -d databasename
-- Check database encoding
\l databasename
-- Or query directly
SELECT datname, pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = 'databasename';If the database is not UTF-8, you'll need to recreate it with UTF-8 encoding (cannot be changed on existing database). Otherwise, note the current encoding and ensure your data source matches.
The error message usually includes context showing which row caused the issue:
ERROR: invalid byte sequence for encoding UTF8: 0xfc CONTEXT: COPY mytable, line 42This tells you the problem is at line 42 of your data file. Check that specific line for:
- Special or binary characters
- Null bytes (0x00)
- Characters from a different encoding (like 0xfc from Latin-1 'ü')
You can inspect a file's encoding with:
# Check file encoding
file -i yourfile.csv
# Show hex dump of problematic area
xdump -C yourfile.csv | head -50If the source file is in a different encoding, convert it to UTF-8 using iconv:
# Detect source encoding (e.g., Latin-1, Windows-1252, ISO-8859-1)
file -i yourfile.csv
# Convert to UTF-8
iconv -f ISO-8859-1 -t UTF-8 yourfile.csv > yourfile_utf8.csv
# Or for Windows-1252
iconv -f WINDOWS-1252 -t UTF-8 yourfile.csv > yourfile_utf8.csv
# Verify the conversion
file -i yourfile_utf8.csvThen use the converted file for import.
If the error shows 0x00 (null byte), the data contains null characters which UTF-8 text fields cannot store. Remove them:
# Remove null bytes from file
tr -d '\0' < yourfile.csv > yourfile_clean.csvOr in the COPY command, you can use the FORMAT to handle binary data:
COPY mytable FROM STDIN WITH (FORMAT BINARY);However, if you're loading text data with null bytes, consider:
- Storing the data in bytea (binary) column instead of text
- Replacing null bytes with a placeholder (space, dash, etc.)
- Filtering them out during preprocessing
Ensure the client encoding matches your data when using COPY:
-- Connect to database
psql -U username -d databasename
-- Check current client encoding
\encoding
-- Change client encoding if needed
\encoding UTF8
-- Or set environment variable before connecting
export PGCLIENTENCODING=UTF8
psql -U username -d databasename
-- Then run COPY
COPY mytable FROM '/path/to/file.csv' WITH (FORMAT csv);You can also specify encoding in the COPY command directly:
COPY mytable FROM STDIN WITH (FORMAT csv, ENCODING 'UTF8');For large imports, use OPTIONS to skip problematic rows:
-- PostgreSQL 14+: Continue on error
COPY mytable FROM '/path/to/file.csv'
WITH (FORMAT csv, ON_ERROR CONTINUE);
-- For older versions, use a wrapper script
-- or preprocess the file to remove bad rowsThis allows partial imports where only valid UTF-8 rows are loaded.
Alternatively, preprocess the file to sanitize encoding:
# Remove or replace non-UTF8 sequences
perldoc -T perlunifaq
# Or use:
python3 -c "import sys; print(open('file.csv', encoding='utf-8', errors='ignore').read(), end='')" > file_clean.csvIf you need to store binary data (like null bytes), use bytea columns instead of text:
-- Define column as bytea, not text
CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
data bytea
);
-- Or alter existing column
ALTER TABLE mytable ALTER COLUMN data TYPE bytea;
-- Insert binary data
INSERT INTO mytable (data) VALUES (E'\\x48656c6c6f'::bytea);
-- Or from a file
COPY mytable (data) FROM STDIN WITH (FORMAT BINARY);bytea fields use binary format that preserves all byte sequences without encoding validation.
After fixing the data, test with a small subset before importing everything:
# Extract first 100 lines
head -100 yourfile_utf8.csv > sample.csv
# Try importing the sample
psql -U username -d databasename -c "COPY mytable FROM STDIN WITH (FORMAT csv)" < sample.csvIf the sample imports successfully, proceed with the full file:
psql -U username -d databasename -c "COPY mytable FROM STDIN WITH (FORMAT csv)" < yourfile_utf8.csvFor large files, consider using a staging table first:
CREATE TEMP TABLE staging LIKE mytable;
COPY staging FROM STDIN WITH (FORMAT csv);
INSERT INTO mytable SELECT * FROM staging WHERE is_valid();Encoding Validation: PostgreSQL strictly enforces UTF-8 at the byte level. Invalid sequences can come from incomplete multi-byte sequences (e.g., lone continuation bytes) or out-of-order bytes. The specific hex value in the error (0xfc, 0x00) tells you exactly which byte violated UTF-8 rules. Null Byte Limitation: PostgreSQL text fields cannot store null bytes because historically PostgreSQL used C-style null-terminated strings internally. This limitation applies to VARCHAR, TEXT, and CHAR types. If you need null bytes, use bytea. Binary COPY Format: The FORMAT BINARY option in COPY tells PostgreSQL the data is already in the server's native binary format, bypassing all encoding validation. This is fast but requires the data to be in the exact format PostgreSQL expects. SQL_ASCII Compatibility: Databases created with SQL_ASCII encoding are special—they don't validate encoding at all, treating each byte independently. If migrating from SQL_ASCII, bytes 128-255 may be invalid UTF-8. Use iconv or encoding detection tools. pg_dump Encoding: When using pg_dump to backup a non-UTF8 database, the dump file will be in that encoding. Restore it to a UTF8 database with: PGCLIENTENCODING=original_encoding pg_restore -d newdb dump.custom. Performance Note: Encoding validation has minimal performance impact. Most slowness comes from I/O and parsing, not character set checking.
insufficient columns in unique constraint for partition key
How to fix "insufficient columns in unique constraint for partition key" in PostgreSQL
ERROR 42501: must be owner of table
How to fix "must be owner of table" in PostgreSQL
trigger cannot change partition destination
How to fix "Trigger cannot change partition destination" in PostgreSQL
SSL error: certificate does not match host name
SSL error: certificate does not match host name in PostgreSQL
No SSL connection
No SSL connection to PostgreSQL