How to fix FATAL: the database system is in recovery mode in PostgreSQL

PostgreSQLINTERMEDIATEHIGH

The 'FATAL: the database system is in recovery mode' error occurs when PostgreSQL is recovering from an unexpected shutdown, crash, or operating as a read-only standby/replica server. This typically happens after server restarts, crashes, or during replication scenarios.

What this error means

The "FATAL: the database system is in recovery mode" error indicates that PostgreSQL is currently in a special recovery state where it's either: 1. **Crash Recovery**: The database is replaying transaction logs (WAL files) after an unclean shutdown or crash to restore the database to a consistent state 2. **Standby/Replica Mode**: The server is running as a read-only standby that continuously receives WAL records from a primary server 3. **Archive Recovery**: The server is recovering from a backup and applying archived WAL files to reach a specific point in time During recovery mode, PostgreSQL only accepts read-only connections and DDL (CREATE, ALTER, DROP) statements are not permitted. Write operations and most administrative commands will fail. The recovery process can take anywhere from seconds to hours depending on the amount of data and the crash recovery scope.

How to fix "FATAL: the database system is in recovery mode"

1Check PostgreSQL logs to understand the recovery state

Look at the PostgreSQL logs to see what recovery process is happening:

bash

# On Linux, logs are typically here:
tail -f /var/log/postgresql/postgresql-*.log

# Or check systemd journal
sudo journalctl -u postgresql -f

# Look for messages like:
# LOG: starting crash recovery
# LOG: redo starts at ...
# LOG: redo done at ...
# LOG: MultiXact member wraparound protections are now enabled
# LOG: database system is ready to accept read-only connections

# Or for standby mode:
# LOG: entering standby mode
# LOG: consistent recovery state reached
# LOG: database system is ready to accept read-only connections

These messages indicate the recovery is proceeding normally. The recovery can take significant time on large databases.

2Wait for recovery to complete if the server recently restarted

If PostgreSQL recently crashed or restarted, recovery happens automatically on startup. Simply wait for it to finish:

sql

-- Check if recovery is still in progress
SELECT pg_is_in_recovery();
-- Returns: true (still recovering) or false (recovery complete, ready to write)

-- On a standby, this will always return true
-- On a primary that finished recovery, this returns false

Monitoring recovery progress:

bash

# Watch PostgreSQL logs in real-time
tail -f /var/log/postgresql/postgresql-*.log | grep -E "redo|recovery|ready"

# Monitor with ps to see crash recovery status
watch 'ps aux | grep postgres | grep -v grep'

For crash recovery on large databases, the process can take:
- Small database (< 1 GB): Minutes
- Medium database (1-100 GB): 10-60 minutes
- Large database (> 100 GB): Hours

Do not restart PostgreSQL during recovery or you'll extend the recovery time.

3Verify you're connected to the correct server (primary vs standby)

If you have a replication setup, ensure you're connecting to the primary server for write operations:

sql

-- Check if this is a standby (read-only) or primary server
SELECT pg_is_in_recovery();
-- standby: returns true
-- primary: returns false (after recovery completes)

-- For PostgreSQL 10+, use this for more details:
SELECT
  pg_is_wal_replay_paused() AS replay_paused,
  CASE WHEN pg_is_in_recovery() THEN 'STANDBY' ELSE 'PRIMARY' END AS server_role,
  pg_last_xlog_receive_location() AS receive_location,
  pg_last_xlog_replay_location() AS replay_location;

If connected to a standby:

You can either:
1. Connect to the primary server instead for write operations
2. Promote the standby to become a primary (if planned failover):

bash

# To promote a standby to primary:
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/14/main

# Or from SQL (PostgreSQL 10+):
# SELECT pg_promote();

4Check if WAL files are corrupted or missing

Corruption or missing WAL files can cause recovery to fail or hang:

bash

# Check WAL directory
ls -lh /var/lib/postgresql/14/main/pg_wal/

# Look for files like: 00000001000000000000000C
# Each file is normally 16MB

# If you see very small files or gaps in the sequence, it indicates corruption

# Check PostgreSQL logs for WAL-related errors:
grep -i "wal|redo|corrupt|invalid" /var/log/postgresql/postgresql-*.log

If WAL files are missing:

Option 1: Wait for replication to catch up (if standby)
Option 2: Perform PITR from a backup (if you have one)
Option 3: Use pg_resetwal as a last resort (see Advanced Notes)

bash

# Check if you have archived WAL files for recovery
ls -lh /path/to/wal-archive/

5Ensure adequate disk space for recovery

Recovery requires temporary disk space. If the disk is full, recovery will fail:

bash

# Check available disk space
df -h /var/lib/postgresql/

# Check if the data directory is full
du -sh /var/lib/postgresql/14/main/

# Typical requirement: at least 10-20% free space of database size

If disk is full:

1. Stop PostgreSQL gracefully first:

bash

sudo systemctl stop postgresql

2. Free up disk space:

bash

# Remove old WAL files (if properly archived)
rm /var/lib/postgresql/14/main/pg_wal/0000000*[01]

# Or delete old logs
rm /var/log/postgresql/*.log.1

3. Restart PostgreSQL:

bash

sudo systemctl start postgresql

6Restart PostgreSQL gracefully if recovery seems stuck

If recovery appears to be hung (no progress in logs for extended time), restart gracefully:

bash

# Check current recovery process
ps aux | grep postgres | grep -v grep

# Graceful shutdown (waits for recovery to finish or kill long queries)
sudo systemctl stop postgresql
# or
sudo -u postgres pg_ctl stop -D /var/lib/postgresql/14/main -m smart

# Wait for graceful shutdown
sleep 10

# Start PostgreSQL again
sudo systemctl start postgresql

# Monitor logs
tail -f /var/log/postgresql/postgresql-*.log

Important: Never use -m immediate with pg_ctl stop as it will require recovery again on next startup:

bash

# AVOID THIS - will cause recovery on next startup:
pg_ctl stop -m immediate

# Use these instead:
pg_ctl stop -m smart   # Waits for clients to disconnect
pg_ctl stop -m fast    # Disconnects clients (still cleaner than immediate)

7Monitor recovery completion and database availability

Once recovery progresses, monitor the database becoming available:

bash

# Wait for PostgreSQL to be ready
while ! sudo -u postgres psql -c "SELECT 1" > /dev/null 2>&1; do
  echo "Waiting for PostgreSQL to become available..."
  sleep 5
done

echo "PostgreSQL is now accepting connections!"

sql

-- Once connected, verify recovery is complete
SELECT pg_is_in_recovery() AS in_recovery;
-- Should return: false (for primary) or true (for standby)

-- Check connection status
SELECT datname, usename, state FROM pg_stat_activity LIMIT 5;

-- Verify database integrity (after recovery completes)
REINDEX DATABASE postgres;

-- Check for any remaining replication lag (if standby)
SELECT slot_name, slot_type, restart_lsn FROM pg_replication_slots;

bash

# Test application connectivity
psql -U postgres -d postgres -c "SELECT version();"

# Verify data integrity with a simple query
psql -U postgres -d postgres -c "SELECT count(*) FROM information_schema.tables;"

Advanced notes

Understanding PostgreSQL Recovery Mode

PostgreSQL uses WAL (Write-Ahead Logging) for crash recovery and replication. When the server starts after a crash:

1. PostgreSQL reads the control file (pg_control) to find the last checkpoint
2. It replays all WAL records from that checkpoint onward
3. Once all WAL is replayed, it transitions to normal operation (or standby mode if configured)

The recovery process is automatic and essential for data consistency.

Recovery Types:

1. Crash Recovery (REDO): Replays committed transactions from WAL after an unexpected shutdown. Time depends on data size and shutdown method.

2. Standby Recovery (Hot Standby): A replica server continuously receives WAL from the primary and applies it. The standby remains in recovery mode indefinitely, accepting read-only connections.

3. Point-in-Time Recovery (PITR): Restores from a backup and applies WAL up to a specific timestamp or transaction.

When Recovery Takes Too Long:

Large databases may take hours to recover. PostgreSQL 9.4+ introduced parallel crash recovery (wal_level = replica):

bash

# Check current wal_level setting
sudo -u postgres psql -c "SHOW wal_level;"

# To enable parallel recovery (requires restart)
# Edit postgresql.conf:
max_parallel_workers_for_maintenance = 4  # Adjust based on CPU cores

Avoiding Recovery in the Future:

1. Always shut down cleanly:

bash

# Good - clean shutdown
sudo systemctl stop postgresql
pg_ctl stop -D /path -m smart

# Bad - forces recovery on next start
pg_ctl stop -D /path -m immediate
kill -9 $(cat /tmp/pg.pid)

2. Enable data checksums for corruption detection:

bash

# Must be done on a fresh cluster (requires initdb with -k flag)
initdb -k /var/lib/postgresql/14/main

3. Use replication slots to prevent WAL file loss:

sql

-- On primary server
SELECT * FROM pg_create_physical_replication_slot('standby_slot');

-- In standby recovery.conf
primary_slot_name = 'standby_slot'

pg_resetwal - Last Resort Only:

bash

# WARNING: Only use if data files are confirmed to be in a consistent state
# and you've exhausted all other recovery options
pg_resetwal -D /var/lib/postgresql/14/main

# BACKUP YOUR DATA FIRST:
pg_dump -Fc postgres > backup.dump

This utility forcefully resets the WAL pointer, bypassing recovery. Use only when:
- WAL files are permanently lost/corrupted
- Recovery hangs indefinitely with no progress
- You have a confirmed backup of your data

Using pg_resetwal without a backup can result in permanent data loss.

Replication-Specific Issues:

If a standby is stuck in recovery mode:

1. Check replication lag:

sql

SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;

2. Check standby connection to primary:

bash

# On primary, check connected standbys
sudo -u postgres psql -c "SELECT client_addr, replay_lsn FROM pg_stat_replication;"

3. If primary WAL files are deleted before standby received them:

bash

# Full resync needed - backup and restore the standby from primary
pg_basebackup -h primary_host -D /var/lib/postgresql/14/main -U replication -v -P

How to fix FATAL: the database system is in recovery mode in PostgreSQL

What this error means

Typical symptoms

Common causes

How to fix "FATAL: the database system is in recovery mode"

Advanced notes

Related errors

Official resources & further reading