This error occurs when a Docker Swarm node cannot communicate with the manager node, typically due to network connectivity issues, firewall rules blocking required ports, or when the manager node has lost quorum. The fix involves verifying network connectivity, ensuring required ports are open, and potentially recovering the swarm cluster.
Docker Swarm uses a distributed consensus protocol called Raft to maintain cluster state across manager nodes. When you see the "cannot reach manager node" error, it indicates that the Docker daemon on your current node is unable to establish communication with any manager node in the swarm. Manager nodes in Docker Swarm are responsible for: - Maintaining the cluster state and configuration - Scheduling services across worker nodes - Serving the Swarm management API - Storing secrets and configs Communication between swarm nodes requires specific ports to be open: - **Port 2377/tcp**: Cluster management and Raft consensus - **Port 7946/tcp+udp**: Node discovery and container network discovery - **Port 4789/udp**: Overlay network traffic (VXLAN) When these ports are blocked, or network connectivity is otherwise impaired, nodes cannot communicate with managers, leading to this error. The error can also occur when attempting to promote a worker to manager, join a new node as manager, or when the swarm has lost quorum (majority of managers unavailable).
First, if you have access to any working manager node, check the status of all nodes:
docker node lsLook for managers with status "Unreachable" or "Down". The output shows:
- MANAGER STATUS: "Leader", "Reachable", or "Unreachable"
- AVAILABILITY: "Active", "Pause", or "Drain"
- STATUS: "Ready" or "Down"
If a manager shows as "Unreachable", that's the problematic node.
Check the swarm state:
docker info --format '{{.Swarm.LocalNodeState}}'
# Expected: active
docker info --format '{{.Swarm.ControlAvailable}}'
# Expected: true (on managers)Test basic network connectivity between nodes:
# From the failing node, ping the manager
ping <MANAGER_IP>
# Test TCP connectivity to the management port
nc -zv <MANAGER_IP> 2377
# Or use telnet
telnet <MANAGER_IP> 2377If ping works but port 2377 doesn't connect, it's likely a firewall issue.
For more detailed network debugging:
# Capture traffic to see what's happening
sudo tcpdump -i any port 2377
# Check routing
traceroute <MANAGER_IP>Docker Swarm requires these ports to be open between all nodes:
On Linux with iptables:
# Cluster management (Swarm join, Raft)
sudo iptables -A INPUT -p tcp --dport 2377 -j ACCEPT
# Node communication
sudo iptables -A INPUT -p tcp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 7946 -j ACCEPT
# Overlay network
sudo iptables -A INPUT -p udp --dport 4789 -j ACCEPT
# Save rules (varies by distro)
sudo iptables-save > /etc/iptables/rules.v4On Linux with firewalld:
sudo firewall-cmd --permanent --add-port=2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reloadOn Linux with ufw:
sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udpOn cloud platforms: Also check security groups (AWS), network security groups (Azure), or VPC firewall rules (GCP).
If the manager has multiple network interfaces, it may be advertising the wrong IP:
Check current advertise address:
docker info --format '{{.Swarm.NodeAddr}}'If wrong, you need to leave and rejoin with correct address:
On the problematic node:
docker swarm leave --forceOn an existing manager, get the join token:
docker swarm join-token manager
# or for workers:
docker swarm join-token workerRejoin with explicit advertise address:
docker swarm join \
--advertise-addr <CORRECT_IP>:2377 \
--token <TOKEN> \
<MANAGER_IP>:2377For the initial manager, if reinitializing:
docker swarm init --advertise-addr <CORRECT_IP>If the manager daemon became unresponsive, restarting may help:
# Restart Docker daemon
sudo systemctl restart docker
# Wait for it to rejoin
sleep 30
# Check node status
docker node lsIf the manager doesn't come back as "Reachable" after restart:
# Check Docker logs for errors
sudo journalctl -u docker -n 100 --no-pager
# Look for Raft or swarm-related errors
sudo journalctl -u docker | grep -i "raft\|swarm\|manager"If a majority of managers are down and swarm has lost quorum, management operations fail. You'll see errors like "context deadline exceeded" even on remaining managers.
Check quorum status:
- 3 managers: need at least 2 up (tolerates 1 failure)
- 5 managers: need at least 3 up (tolerates 2 failures)
- 7 managers: need at least 4 up (tolerates 3 failures)
Force recovery on a surviving manager:
# CAUTION: Run on ONE manager only
# This forces a new single-manager cluster
docker swarm init --force-new-clusterAfter recovery:
1. Other managers must leave and rejoin
2. Re-add managers to restore fault tolerance
# On old managers that need to rejoin
docker swarm leave --force
# Get new join token from recovered manager
docker swarm join-token manager
# Join old managers back
docker swarm join --token <NEW_TOKEN> <RECOVERED_MANAGER_IP>:2377If a manager is permanently gone and you need to remove it:
# Force remove an unreachable manager
docker node rm --force <NODE_ID>Important: After removing a manager, add a new one to maintain odd number:
# Promote an existing worker
docker node promote <WORKER_NODE_ID>
# Or add a new manager
docker swarm join-token manager
# Run the output command on the new nodeBest practices for manager count:
- Development: 1 manager (no fault tolerance)
- Production: 3 managers (tolerates 1 failure)
- Large production: 5-7 managers (max recommended)
Confirm all nodes can reach managers:
# Check all nodes are healthy
docker node ls
# Expected output shows all managers as "Reachable" or "Leader"
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
# abc123 * manager1 Ready Active Leader
# def456 manager2 Ready Active Reachable
# ghi789 worker1 Ready Active
# Test swarm operations
docker service create --name test-nginx --replicas 3 nginx
docker service ps test-nginx
docker service rm test-nginx
# Check overlay network creation
docker network create --driver overlay test-overlay
docker network rm test-overlayIf all commands succeed, swarm communication is restored.
### Raft Consensus and Quorum
Docker Swarm uses the Raft consensus algorithm to maintain a consistent cluster state across managers. Understanding Raft helps diagnose "cannot reach manager" issues:
1. Leader Election: One manager is elected leader; others are followers
2. Heartbeats: The leader sends periodic heartbeats to maintain authority
3. Log Replication: State changes are replicated to a majority before being committed
When network partitions occur, managers on the minority side lose quorum and cannot process requests, resulting in "cannot reach manager" errors.
### Diagnosing Raft Issues
# Check Raft status
docker node inspect <MANAGER_NODE_ID> --format '{{.ManagerStatus}}'
# View Raft logs (requires debug mode)
sudo dockerd --debug
# Then check: sudo journalctl -u docker | grep raft### Split-Brain Prevention
To prevent split-brain scenarios where two groups of managers think they're the cluster:
- Always use an odd number of managers
- Distribute managers across failure domains (AZs, racks)
- Don't use auto-scaling for manager nodes
### Cloud-Specific Considerations
AWS:
# Security group rules needed
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp --port 2377 --source-group sg-xxx
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp --port 7946 --source-group sg-xxx
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol udp --port 7946 --source-group sg-xxx
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol udp --port 4789 --source-group sg-xxxGCP:
gcloud compute firewall-rules create docker-swarm \
--allow tcp:2377,tcp:7946,udp:7946,udp:4789 \
--source-tags docker-swarm \
--target-tags docker-swarm### Using Static IPs for Managers
To prevent issues with changing IP addresses:
1. Use static IPs or reserved IPs for manager nodes
2. Use DNS names with short TTLs if IPs must change
3. Consider overlay networks for service communication (independent of host IPs)
### Manager Node Recovery Procedure
If a manager's Docker data is corrupted:
# Stop Docker
sudo systemctl stop docker
# Back up existing swarm data
sudo mv /var/lib/docker/swarm /var/lib/docker/swarm.bak
# Start Docker (will leave swarm)
sudo systemctl start docker
# Rejoin the swarm
docker swarm join --token <TOKEN> <MANAGER_IP>:2377### Monitoring Swarm Health
Set up monitoring to detect manager issues early:
# Check swarm health programmatically
docker node ls --format '{{.Hostname}} {{.Status}} {{.ManagerStatus}}' | \
grep -v "Ready.*Reachable\|Ready.*Leader" && \
echo "ALERT: Swarm node issue detected"unable to configure the Docker daemon with file /etc/docker/daemon.json
How to fix 'unable to configure the Docker daemon with file daemon.json' in Docker
docker: Error response from daemon: OCI runtime create failed: container_linux.go: starting container process caused: exec: "/docker-entrypoint.sh": stat /docker-entrypoint.sh: no such file or directory
How to fix 'exec: entrypoint.sh: no such file or directory' in Docker
image operating system "linux" cannot be used on this platform
How to fix 'image operating system linux cannot be used on this platform' in Docker
dockerfile parse error line 5: unknown instruction: RRUN
How to fix 'unknown instruction' Dockerfile parse error in Docker
manifest unknown: manifest unknown
How to fix 'manifest unknown' in Docker