How to fix Invalid byte sequence for encoding UTF8 in PostgreSQL

PostgreSQLINTERMEDIATEHIGH

This error occurs when PostgreSQL encounters data that doesn't conform to UTF-8 encoding. Common causes include importing data from non-UTF8 sources, null bytes in text fields, or client/server encoding mismatches. Ensure data is properly encoded or use appropriate binary formats.

What this error means

PostgreSQL is configured to use UTF-8 encoding but received byte sequences that are invalid in UTF-8. UTF-8 is a variable-length character encoding where characters are represented by sequences of bytes. Invalid sequences occur when byte patterns don't match valid UTF-8 structure. This often happens during data import (COPY, INSERT), when loading data from different encoding sources, or when handling special characters like null bytes that UTF-8 cannot represent in text fields. PostgreSQL strictly validates encoding to prevent data corruption and ensure consistent behavior across systems.

How to fix "Invalid byte sequence for encoding UTF8"

1Check the database encoding

Verify your database is configured for UTF-8:

sql

-- Connect to PostgreSQL
psql -U username -d databasename

-- Check database encoding
\l databasename

-- Or query directly
SELECT datname, pg_encoding_to_char(encoding) 
FROM pg_database 
WHERE datname = 'databasename';

If the database is not UTF-8, you'll need to recreate it with UTF-8 encoding (cannot be changed on existing database). Otherwise, note the current encoding and ensure your data source matches.

2Identify the problematic row and byte sequence

The error message usually includes context showing which row caused the issue:

bash

ERROR: invalid byte sequence for encoding UTF8: 0xfc CONTEXT: COPY mytable, line 42

This tells you the problem is at line 42 of your data file. Check that specific line for:
- Special or binary characters
- Null bytes (0x00)
- Characters from a different encoding (like 0xfc from Latin-1 'ü')

You can inspect a file's encoding with:

bash

# Check file encoding
file -i yourfile.csv

# Show hex dump of problematic area
xdump -C yourfile.csv | head -50

3Convert the data file to UTF-8

If the source file is in a different encoding, convert it to UTF-8 using iconv:

bash

# Detect source encoding (e.g., Latin-1, Windows-1252, ISO-8859-1)
file -i yourfile.csv

# Convert to UTF-8
iconv -f ISO-8859-1 -t UTF-8 yourfile.csv > yourfile_utf8.csv

# Or for Windows-1252
iconv -f WINDOWS-1252 -t UTF-8 yourfile.csv > yourfile_utf8.csv

# Verify the conversion
file -i yourfile_utf8.csv

Then use the converted file for import.

4Remove null bytes from data

If the error shows 0x00 (null byte), the data contains null characters which UTF-8 text fields cannot store. Remove them:

bash

# Remove null bytes from file
tr -d '\0' < yourfile.csv > yourfile_clean.csv

Or in the COPY command, you can use the FORMAT to handle binary data:

sql

COPY mytable FROM STDIN WITH (FORMAT BINARY);

However, if you're loading text data with null bytes, consider:
- Storing the data in bytea (binary) column instead of text
- Replacing null bytes with a placeholder (space, dash, etc.)
- Filtering them out during preprocessing

5Set client encoding before COPY

Ensure the client encoding matches your data when using COPY:

sql

-- Connect to database
psql -U username -d databasename

-- Check current client encoding
\encoding

-- Change client encoding if needed
\encoding UTF8

-- Or set environment variable before connecting
export PGCLIENTENCODING=UTF8
psql -U username -d databasename

-- Then run COPY
COPY mytable FROM '/path/to/file.csv' WITH (FORMAT csv);

You can also specify encoding in the COPY command directly:

sql

COPY mytable FROM STDIN WITH (FORMAT csv, ENCODING 'UTF8');

6Use COPY with error handling

For large imports, use OPTIONS to skip problematic rows:

sql

-- PostgreSQL 14+: Continue on error
COPY mytable FROM '/path/to/file.csv' 
WITH (FORMAT csv, ON_ERROR CONTINUE);

-- For older versions, use a wrapper script
-- or preprocess the file to remove bad rows

This allows partial imports where only valid UTF-8 rows are loaded.

Alternatively, preprocess the file to sanitize encoding:

bash

# Remove or replace non-UTF8 sequences
perldoc -T perlunifaq
# Or use:
python3 -c "import sys; print(open('file.csv', encoding='utf-8', errors='ignore').read(), end='')" > file_clean.csv

7Handle bytea for binary data

If you need to store binary data (like null bytes), use bytea columns instead of text:

sql

-- Define column as bytea, not text
CREATE TABLE mytable (
  id SERIAL PRIMARY KEY,
  data bytea
);

-- Or alter existing column
ALTER TABLE mytable ALTER COLUMN data TYPE bytea;

-- Insert binary data
INSERT INTO mytable (data) VALUES (E'\\x48656c6c6f'::bytea);

-- Or from a file
COPY mytable (data) FROM STDIN WITH (FORMAT BINARY);

bytea fields use binary format that preserves all byte sequences without encoding validation.

8Test the import with a sample

After fixing the data, test with a small subset before importing everything:

bash

# Extract first 100 lines
head -100 yourfile_utf8.csv > sample.csv

# Try importing the sample
psql -U username -d databasename -c "COPY mytable FROM STDIN WITH (FORMAT csv)" < sample.csv

If the sample imports successfully, proceed with the full file:

bash

psql -U username -d databasename -c "COPY mytable FROM STDIN WITH (FORMAT csv)" < yourfile_utf8.csv

For large files, consider using a staging table first:

sql

CREATE TEMP TABLE staging LIKE mytable;
COPY staging FROM STDIN WITH (FORMAT csv);
INSERT INTO mytable SELECT * FROM staging WHERE is_valid();

How to fix Invalid byte sequence for encoding UTF8 in PostgreSQL

What this error means

Typical symptoms

Common causes

How to fix "Invalid byte sequence for encoding UTF8"

Advanced notes

Related errors

Official resources & further reading