This streaming replication error occurs when a standby server requests a WAL segment that has been deleted from the primary server. It indicates the standby has fallen too far behind and cannot catch up without rebuilding or using WAL archiving.
PostgreSQL uses Write-Ahead Log (WAL) files to ensure data durability and enable streaming replication. When a standby (replica) server requests a specific WAL segment from the primary server, the primary may have already deleted that segment if the standby has fallen too far behind. This happens when the replication lag exceeds the amount of WAL data the primary is configured to retain. The error signals that replication has failed because the standby cannot fetch the needed WAL segment from the primaryβit has been garbage collected based on the wal_keep_size (PostgreSQL 13+) or wal_keep_segments (older versions) settings, or when using replication slots with max_slot_wal_keep_size limits.
Connect to the primary server and check how far the standby has fallen behind:
SELECT slot_name, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;If using streaming replication without slots, check the pg_stat_replication view:
SELECT client_addr, state, flush_lsn
FROM pg_stat_replication;A NULL or very old confirmed_flush_lsn indicates the standby is severely behind.
If using PostgreSQL 13 or later, increase wal_keep_size (in MB) in postgresql.conf:
wal_keep_size = 4096 # Retain 4 GB of WAL; adjust based on your replication lag toleranceFor PostgreSQL 12 and earlier, use wal_keep_segments (in 16 MB segments):
wal_keep_segments = 256 # Retain 256 * 16 MB = 4 GB of WALReload the configuration:
sudo systemctl reload postgresql
# or
sudo -u postgres pg_ctl reload -D /var/lib/postgresql/dataBe cautious: setting this too high can exhaust disk space on the primary.
Replication slots ensure WAL files are retained indefinitely until consumed by the standby, without relying on wal_keep_size guessing. Create a slot on the primary:
SELECT * FROM pg_create_physical_replication_slot('standby_slot');Then configure the standby to use this slot in recovery.conf or postgresql.conf:
primary_slot_name = 'standby_slot'Optionally limit WAL buildup with max_slot_wal_keep_size to prevent disk exhaustion:
max_slot_wal_keep_size = 4096 # In MBIf the standby has fallen too far behind and you don't have WAL archiving set up, the only reliable recovery is to rebuild it:
# Stop the standby
sudo systemctl stop postgresql
# Remove the old data directory
sudo rm -rf /var/lib/postgresql/data/*
# Run pg_basebackup from the primary
sudo -u postgres pg_basebackup -h <primary_ip> -D /var/lib/postgresql/data -U replication -v -P -W
# Start the standby
sudo systemctl start postgresqlFor pg_basebackup, use the -Xs flag (streaming) instead of -Xf (fetch) to avoid this issue in the future:
sudo -u postgres pg_basebackup -h <primary_ip> -D /var/lib/postgresql/data -U replication -v -P -W -XsIf you have an archive location (S3, NFS, etc.), enable WAL archiving on the primary to give standby servers a fallback source:
# postgresql.conf on primary
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp pg_wal/%f s3://my-bucket/wal-archive/%f'
archive_timeout = 300 # Archive every 5 minutes regardless of sizeReload the configuration and test:
sudo systemctl reload postgresqlOn the standby, configure recovery to fetch from the archive:
# recovery.conf (or in postgresql.conf)
restore_command = 'aws s3 cp s3://my-bucket/wal-archive/%f pg_wal/%f'This allows the standby to recover missing WAL segments without requiring a full rebuild.
After recovery, monitor to prevent future issues:
-- View replication slot status
SELECT slot_name, slot_type, restart_lsn, confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024.0 / 1024.0 AS retained_wal_mb
FROM pg_replication_slots;
-- View standby progress
SELECT pid, usename, state, write_lsn, flush_lsn, replay_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) / 1024.0 / 1024.0 AS replication_lag_mb
FROM pg_stat_replication;Based on the replication lag observed, set wal_keep_size (or wal_keep_segments) to a value that accommodates normal slowdowns plus a safety margin:
# Set based on peak replication lag observed + 1-2 GB buffer
wal_keep_size = 6144 # 6 GBReplication slots come with a caveat: if a slot is inactive and max_slot_wal_keep_size is set, the primary will still delete WAL older than that limit to prevent disk exhaustion. Monitor pg_replication_slots.inactive_since to identify stalled slots.
For Greenplum Database (GPDB) environments, use gprecoverseg -F to recover mirrors quickly.
When using pg_basebackup, the difference between -Xf (fetch) and -Xs (streaming) is important: -Xf downloads WAL at the end (risky if primary rotates segments), while -Xs streams WAL concurrently (safer but slightly slower).
On cloud platforms like AWS RDS, the equivalent to replication slots is often managed automatically, but you can still tune max_slot_wal_keep_size via parameter groups. Azure Database for PostgreSQL also supports replication slots.
If running in a Kubernetes environment with operators like Zalando postgres-operator or Patroni, ensure the operator is configured with appropriate wal_keep_size and uses replication slots to prevent this failure mode during node failovers.
insufficient columns in unique constraint for partition key
How to fix "insufficient columns in unique constraint for partition key" in PostgreSQL
ERROR 42501: must be owner of table
How to fix "must be owner of table" in PostgreSQL
trigger cannot change partition destination
How to fix "Trigger cannot change partition destination" in PostgreSQL
SSL error: certificate does not match host name
SSL error: certificate does not match host name in PostgreSQL
No SSL connection
No SSL connection to PostgreSQL