This error occurs when a Docker Swarm cluster loses quorum because too few manager nodes are available. The Raft consensus algorithm requires a majority of managers to be online to elect a leader. Recovery typically involves reinitializing the swarm with --force-new-cluster on a surviving manager.
The "swarm does not have a leader" error indicates that your Docker Swarm cluster has lost quorum and cannot perform any management operations. Docker Swarm uses the Raft consensus algorithm to maintain a consistent cluster state across all manager nodes. This algorithm requires a strict majority (more than half) of manager nodes to be available to elect a leader and make decisions. When the cluster loses its leader (due to network partitions, node failures, or maintenance), the remaining managers attempt to hold an election. If there aren't enough managers available to form a majority, no leader can be elected, and the swarm becomes unable to process any management commands. For example, in a 3-manager cluster, you need at least 2 managers online. If 2 managers go down, the remaining 1 manager (which is only 33% of the cluster) cannot achieve quorum. Similarly, a 5-manager cluster can tolerate 2 failures but not 3. While the swarm is leaderless, existing services and containers continue running on worker nodes. However, you cannot deploy new services, update existing ones, add or remove nodes, or perform any other management tasks until quorum is restored.
First, determine which manager nodes are still accessible. On any node that can communicate:
docker node lsThis may fail with the "no leader" error. If so, check individual node status:
docker info | grep -A 20 "Swarm:"Look for:
- Is Manager: true
- Managers: X
- Nodes: Y
Check if the Docker daemon is running on all manager nodes:
# On each manager node
systemctl status docker
docker infoList the Raft state directory to see if data exists:
ls -la /var/lib/docker/swarm/raft/Manager nodes must be able to communicate on specific ports. Test connectivity from each manager:
# Test cluster management port (TCP 2377)
nc -zv <other_manager_ip> 2377
# Test node communication port (TCP/UDP 7946)
nc -zv <other_manager_ip> 7946
# Test overlay network traffic (UDP 4789)
nc -zvu <other_manager_ip> 4789Check firewall rules:
# For iptables
sudo iptables -L -n | grep -E "2377|7946|4789"
# For firewalld
sudo firewall-cmd --list-allEnsure these ports are open between all manager nodes:
- 2377/tcp: Cluster management
- 7946/tcp+udp: Node communication
- 4789/udp: Overlay network traffic
If manager nodes are simply down (not lost), bring them back online:
# On each offline manager
sudo systemctl start docker
# Wait for the node to rejoin
sleep 30
# Check if leader is elected
docker node lsIf managers are stuck, try restarting Docker:
sudo systemctl restart dockerCheck Docker logs for errors:
sudo journalctl -u docker -fLook for messages about Raft elections, connectivity issues, or leader election timeouts.
Important: If you have 3 managers and only 1 is healthy, you cannot recover quorum this way. You need at least 2 of 3 managers to achieve majority.
If you cannot bring enough managers back online, reinitialize the swarm from a surviving manager with intact Raft data:
# On the manager with the most recent data
docker swarm init --force-new-cluster --advertise-addr <manager_ip>This command:
- Creates a new single-node swarm using existing Raft data
- Preserves service definitions, secrets, and configs
- Maintains worker node registrations (they'll reconnect)
- Makes this node the new leader
After forcing a new cluster:
# Verify the new swarm is working
docker node ls
# Check services are intact
docker service ls
# Get the new join tokens
docker swarm join-token manager
docker swarm join-token workerWarning: Only use --force-new-cluster when you cannot restore quorum normally. Running it on multiple nodes simultaneously can cause split-brain scenarios.
After recovering, clean up failed nodes and restore redundancy:
# List all nodes
docker node ls
# Remove unreachable manager nodes
docker node demote <failed_node_id>
docker node rm --force <failed_node_id>Add new manager nodes to restore fault tolerance:
# On the leader, get the manager join token
docker swarm join-token manager
# On the new manager node
docker swarm join --token <manager_token> <leader_ip>:2377Promote existing workers to managers if needed:
docker node promote <worker_node_id>Verify the cluster is healthy:
docker node ls
# Look for one "Leader" and multiple "Reachable" manager statusesPrevent future quorum loss by following Docker's recommendations:
Use an odd number of managers (3, 5, or 7):
- 3 managers: Tolerates 1 failure
- 5 managers: Tolerates 2 failures
- 7 managers: Tolerates 3 failures (maximum recommended)
Never use 2 managers - if one fails, you have 50% which is not a majority.
Spread managers across failure domains:
# When adding managers, use different availability zones/racks
docker node update --label-add zone=us-east-1a manager1
docker node update --label-add zone=us-east-1b manager2
docker node update --label-add zone=us-east-1c manager3Use static IPs for manager nodes to ensure reliable communication.
Configure manager-only nodes:
# Prevent workloads from running on managers
docker node update --availability drain <manager_node_id>This dedicates managers to cluster management and improves Raft performance.
Implement proactive monitoring to detect quorum issues early:
Monitor manager health:
#!/bin/bash
# Check swarm health
if ! docker node ls &>/dev/null; then
echo "CRITICAL: Swarm has no leader!"
# Send alert
fi
# Count reachable managers
managers=$(docker node ls --filter role=manager -q 2>/dev/null | wc -l)
reachable=$(docker node ls --filter role=manager --format "{{.Status}}" 2>/dev/null | grep -c "Ready")
echo "Managers: $reachable/$managers reachable"Backup swarm configuration regularly:
# Backup swarm data (run on a manager)
sudo tar -czvf swarm-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/swarm/Use Docker's autolock feature for additional security:
docker swarm update --autolock=true
# Save the unlock key securely!If autolock is enabled and a manager restarts, you'll need to unlock it:
docker swarm unlockUnderstanding Raft Consensus: Docker Swarm uses the Raft consensus algorithm to maintain a consistent replicated log across all manager nodes. Raft requires a majority (quorum) to make progress:
- 3 nodes: quorum = 2
- 5 nodes: quorum = 3
- 7 nodes: quorum = 4
The formula is: quorum = (n/2) + 1, where n is the total number of managers.
Why 7 Managers Maximum: While you can have more than 7 managers, it's not recommended. Each additional manager increases the time for consensus operations because all managers must acknowledge writes. Beyond 7, the performance overhead outweighs the fault-tolerance benefits.
Raft Log Recovery: If Raft logs become corrupted, you may need to:
# Stop Docker
sudo systemctl stop docker
# Remove corrupted Raft data (LAST RESORT - loses cluster state)
sudo rm -rf /var/lib/docker/swarm/raft/
# Reinitialize or rejoin
sudo systemctl start docker
docker swarm init # or docker swarm joinSplit-Brain Prevention: Never run --force-new-cluster on multiple nodes. If two managers each create a new cluster, you'll have two separate swarms with conflicting state.
Time Synchronization: Raft elections use timeouts. If clocks drift significantly between managers, elections can fail repeatedly. Ensure NTP is configured:
timedatectl status
sudo timedatectl set-ntp trueDebugging Raft Issues:
# Check Raft state
docker info --format '{{json .Swarm.Cluster.Raft}}'
# View detailed swarm info
docker system info | grep -A 50 "Swarm"Kubernetes Migration Consideration: If you frequently encounter swarm stability issues, consider migrating to Kubernetes, which has more sophisticated cluster management. However, for simpler deployments, a properly configured Swarm is often sufficient and easier to operate.
unable to configure the Docker daemon with file /etc/docker/daemon.json
How to fix 'unable to configure the Docker daemon with file daemon.json' in Docker
image operating system "linux" cannot be used on this platform
How to fix 'image operating system linux cannot be used on this platform' in Docker
dockerfile parse error line 5: unknown instruction: RRUN
How to fix 'unknown instruction' Dockerfile parse error in Docker
manifest unknown: manifest unknown
How to fix 'manifest unknown' in Docker
cannot open '/etc/passwd': Permission denied
How to fix 'cannot open: Permission denied' in Docker