How to fix +try-failover: New failover in progress in Redis

RedisINTERMEDIATEMEDIUM

Redis Sentinel logs "+try-failover: New failover in progress" when multiple failover attempts overlap, usually because a previous failover is still underway or quorum cannot be reached. This prevents duplicate promotions and ensures only one master election occurs at a time. To resolve, wait for the current failover to complete, verify Sentinel quorum and network connectivity, then retry if needed.

What this error means

The "+try-failover: New failover in progress" message appears in Redis Sentinel logs when a Sentinel instance attempts to initiate a failover while another failover is already underway. Redis Sentinel uses a distributed consensus protocol to elect a new master when the current master fails, and it prevents concurrent failovers to avoid split-brain scenarios and data inconsistency. This error indicates that either: 1. A failover is currently in progress and hasn't completed yet 2. Multiple Sentinels are trying to initiate failovers simultaneously due to network partitions or timing issues 3. The failover timeout hasn't expired from a previous attempt Sentinel will queue or reject new failover attempts until the current one completes, times out, or is aborted. This is a protective mechanism to ensure only one master election occurs at a time.

How to fix "+try-failover: New failover in progress"

1Check current failover status and wait for completion

First, check if a failover is already in progress and monitor its status:

bash

# Connect to any Sentinel instance
redis-cli -p 26379

# Check current master and failover state
SENTINEL masters
SENTINEL sentinels <master-name>
SENTINEL get-master-addr-by-name <master-name>

# Look for failover-related flags in Sentinel info
SENTINEL master <master-name> | grep -E "failover|epoch|state"

If failover_state shows wait_start, select_slave, promote_slave, reconf_slaves, or update_config, a failover is actively progressing. Wait for it to complete (typically 30-60 seconds) before attempting any new actions.

2Verify Sentinel quorum and network connectivity

Ensure Sentinels can communicate and reach quorum for decision-making:

bash

# Check if Sentinels can see each other
SENTINEL sentinels <master-name>

# Verify quorum requirements are met
SENTINEL master <master-name> | grep -E "quorum|num-other-sentinels"

# Test network connectivity between Sentinels
redis-cli -h <sentinel-ip> -p 26379 PING

# Check for split-brain scenarios
SENTINEL is-master-down-by-addr <master-ip> <master-port>

If num-other-sentinels is less than quorum-1, Sentinels cannot reach consensus. Fix network issues or adjust quorum in Sentinel configuration. Ensure all Sentinels can reach the master and each other on port 26379.

3Review and adjust failover-timeout configuration

Check if the failover timeout is too short, causing overlapping attempts:

bash

# View current failover-timeout setting
SENTINEL master <master-name> | grep failover-timeout

# Check Sentinel configuration file
grep -i failover-timeout /etc/redis/sentinel.conf

# Typical configuration (adjust as needed)
sentinel failover-timeout <master-name> 60000  # 60 seconds

The default failover-timeout is 180 seconds (3 minutes). If set too low (e.g., 30 seconds), Sentinels may retry before previous attempts complete. Increase to 60-180 seconds for stable clusters. After changing, reload Sentinel with SENTINEL CONFIG REWRITE or restart.

4Force failover completion or reset if stuck

If a failover appears stuck, you may need to intervene carefully:

bash

# First, check if failover is truly stuck (epoch not changing)
SENTINEL master <master-name> | grep -E "failover_epoch|config_epoch"

# If stuck, try resetting the master (CAUTION: may cause downtime)
SENTINEL reset <master-name>

# As last resort, restart Sentinel instances one by one
sudo systemctl restart redis-sentinel

Use SENTINEL reset cautiously—it clears all state for that master and forces re-discovery. Only restart Sentinels one at a time to maintain quorum. Monitor logs closely after any reset: tail -f /var/log/redis/sentinel.log.

5Implement monitoring and prevent future overlaps

Set up monitoring to detect and prevent overlapping failovers:

bash

# Monitor Sentinel logs for failover messages
grep -E "\+try-failover|\+vote-for-leader|\+elected-leader" /var/log/redis/sentinel.log

# Alert on consecutive failover attempts within short windows
# Example Prometheus alert rule:
# - alert: RedisSentinelFailoverOverlap
#   expr: increase(redis_sentinel_failovers_total[5m]) > 2
#   for: 1m

# Ensure proper Sentinel deployment (odd number, >=3 for production)
# Recommended: 3 or 5 Sentinel instances across different failure domains

Deploy an odd number of Sentinel instances (3 or 5) to ensure clean quorum decisions. Place them across different availability zones or hosts. Use monitoring tools to alert on rapid successive failover attempts, which may indicate network flapping or configuration issues.

How to fix +try-failover: New failover in progress in Redis

What this error means

Typical symptoms

Common causes

How to fix "+try-failover: New failover in progress"

Advanced notes

Related errors

Official resources & further reading