This error occurs when a Docker node cannot successfully join an existing Swarm cluster. Common causes include firewall blocking required ports, network connectivity issues, time synchronization problems, or an invalid/expired join token.
The "error while joining swarm: failed to join the cluster" message indicates that your Docker node attempted to join a Swarm cluster but the join operation failed. Docker Swarm uses a distributed consensus algorithm (Raft) for manager nodes, and all nodes must be able to communicate with the manager(s) over specific ports. This error can appear in several scenarios: - **Firewall blocking ports**: Swarm requires TCP port 2377 for cluster management, TCP/UDP port 7946 for node communication, and UDP port 4789 for overlay network traffic - **Network connectivity issues**: The joining node cannot reach the manager node's advertised address - **Time synchronization problems**: Swarm tokens are time-sensitive, and clock skew between nodes can cause join failures - **Invalid or expired join token**: The token used may have been rotated or the command was copied incorrectly - **Hostname conflicts**: Nodes with identical hostnames can cause cluster issues - **Proxy interference**: HTTP/HTTPS proxies can interfere with Swarm's internal communication The error may also be accompanied by more specific messages like "context deadline exceeded", "connection refused", or "TLS certificate errors" which provide additional diagnostic clues.
First, ensure the joining node can reach the Swarm manager on the required port:
# Test connectivity to the manager node on port 2377
telnet <manager-ip> 2377
# Or using netcat
nc -zv <manager-ip> 2377
# Or using curl (will show connection but protocol error, which is OK)
curl -v telnet://<manager-ip>:2377If the connection times out or is refused, you have a network or firewall issue to resolve.
Check if you're using the correct IP address:
# On the manager node, verify the advertised address
docker info | grep -A 5 "Swarm"The "Node Address" shown should be reachable from your joining node.
Docker Swarm requires specific ports to be open on all participating nodes:
- TCP 2377: Cluster management and Raft sync
- TCP/UDP 7946: Container network discovery
- UDP 4789: Overlay network traffic (VXLAN)
For firewalld (RHEL/CentOS/Fedora):
sudo firewall-cmd --permanent --add-port=2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reloadFor UFW (Ubuntu/Debian):
sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp
sudo ufw reloadFor iptables:
sudo iptables -A INPUT -p tcp --dport 2377 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 4789 -j ACCEPT
sudo iptables-save | sudo tee /etc/iptables/rules.v4Important: Open these ports on ALL nodes (managers and workers), not just the one you're joining.
Swarm join tokens are time-sensitive. If there's significant clock skew between nodes, joins can fail.
Check current time on both nodes:
# On manager
date
# On joining node
dateInstall and configure NTP/chrony:
For Ubuntu/Debian:
sudo apt-get install -y chrony
sudo systemctl enable chrony
sudo systemctl start chronyFor RHEL/CentOS:
sudo yum install -y chrony
sudo systemctl enable chronyd
sudo systemctl start chronydForce time sync:
# Using chrony
sudo chronyc makestep
# Using ntpdate (if available)
sudo ntpdate pool.ntp.org
# Manual sync from manager (quick fix)
sudo date --set="$(ssh user@manager-node date)"Verify synchronization:
chronyc tracking
# or
timedatectl statusJoin tokens can expire or be rotated. Get a fresh token from the manager:
For worker nodes:
# Run on a manager node
docker swarm join-token workerFor manager nodes:
# Run on a manager node
docker swarm join-token managerIf you suspect the token was compromised or want to invalidate old tokens:
# Rotate the worker token
docker swarm join-token --rotate worker
# Rotate the manager token
docker swarm join-token --rotate managerCopy the entire docker swarm join command exactly as shown, including the token and address.
Common mistakes:
- Copying only part of the token
- Using an old token that was rotated
- Mixing up worker and manager tokens
If the node was previously part of a swarm, leftover state can cause join failures:
# Force leave any existing swarm
docker swarm leave --force
# Verify the node is no longer in swarm mode
docker info | grep Swarm
# Should show: Swarm: inactiveIf issues persist, clean Docker's swarm state directory:
# Stop Docker
sudo systemctl stop docker
# Remove swarm state (this is safe on a node that isn't a manager)
sudo rm -rf /var/lib/docker/swarm
# Start Docker
sudo systemctl start docker
# Try joining again
docker swarm join --token <token> <manager-ip>:2377Warning: Only remove /var/lib/docker/swarm on worker nodes or managers you're removing from the cluster. Never do this on an active manager that's part of the quorum.
Docker Swarm requires unique hostnames for each node. Duplicate hostnames cause cluster issues.
Check hostnames on all nodes:
hostname
cat /etc/hostnameIf nodes share the same hostname (like "localhost" or a default cloud hostname):
# Set a unique hostname
sudo hostnamectl set-hostname swarm-node-01
# Update /etc/hosts
sudo bash -c 'echo "127.0.0.1 swarm-node-01" >> /etc/hosts'
# Restart Docker to pick up the change
sudo systemctl restart dockerVerify the change took effect:
docker info | grep NameEach node in the swarm should have a unique name.
HTTP/HTTPS proxies can interfere with Docker Swarm's internal communication, causing join failures.
Check for proxy settings:
echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy
env | grep -i proxyCheck Docker daemon proxy configuration:
cat /etc/systemd/system/docker.service.d/http-proxy.confAdd Swarm manager to NO_PROXY:
Create or edit /etc/systemd/system/docker.service.d/http-proxy.conf:
[Service]
Environment="HTTP_PROXY=http://proxy.example.com:8080"
Environment="HTTPS_PROXY=http://proxy.example.com:8080"
Environment="NO_PROXY=localhost,127.0.0.1,<manager-ip>,<other-node-ips>"Reload and restart Docker:
sudo systemctl daemon-reload
sudo systemctl restart dockerAlternatively, temporarily disable the proxy for the join:
unset HTTP_PROXY HTTPS_PROXY http_proxy https_proxy
docker swarm join --token <token> <manager-ip>:2377Sometimes the Docker daemon needs a restart to clear transient issues:
# Restart Docker
sudo systemctl restart docker
# Wait a moment for it to fully start
sleep 5
# Verify Docker is running
docker info
# Retry the join
docker swarm join --token <token> <manager-ip>:2377If join still fails, check Docker logs for more details:
# View recent Docker daemon logs
sudo journalctl -u docker --since "10 minutes ago"
# Follow logs while attempting join (in another terminal)
sudo journalctl -u docker -fLook for specific error messages like certificate errors, connection timeouts, or authentication failures.
If the joining node has multiple network interfaces, Docker might advertise the wrong IP:
# Join with explicit advertise address
docker swarm join \
--advertise-addr <this-node-ip> \
--token <token> \
<manager-ip>:2377On the manager side, if the swarm was initialized with the wrong address:
# Check current advertise address
docker info | grep "Advertise Address"
# If incorrect, you may need to reinitialize (WARNING: this affects the whole cluster)
# Only do this if absolutely necessary and you understand the implications
docker swarm init --force-new-cluster --advertise-addr <correct-ip>Common scenarios requiring explicit advertise-addr:
- Nodes with both public and private IPs (cloud environments)
- Nodes with VPN interfaces
- Docker running inside a VM with bridged networking
- Multiple Docker networks or custom bridges
Understanding Swarm Manager Quorum:
Docker Swarm uses the Raft consensus algorithm for manager nodes. This requires a majority (quorum) of managers to be available for cluster operations:
| Manager Count | Quorum Needed | Fault Tolerance |
|---------------|---------------|-----------------|
| 1 | 1 | 0 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
Recovering from lost quorum:
If you lose quorum (more than half of managers down), the cluster becomes read-only and you cannot join new nodes. To recover:
# On the last surviving manager, force create a new single-manager cluster
docker swarm init --force-new-cluster
# Then rejoin other managers
docker swarm join-token manager
# Use the new token on other manager nodesTLS Certificate Issues:
Swarm uses mutual TLS for all node communication. Certificate errors during join can occur if:
- The manager's certificate expired
- Clock skew makes certificates appear invalid
- Manual certificate manipulation corrupted the chain
To diagnose:
# Check certificate info on manager
openssl s_client -connect <manager-ip>:2377 < /dev/null 2>/dev/null | openssl x509 -noout -datesNode ID Considerations:
Each node has a unique ID generated when it first joins a swarm. Important points:
- Never copy /var/lib/docker/swarm between nodes
- A node can only use its ID once to join a swarm
- If a node was removed, it needs docker swarm leave --force before rejoining
Cloud Provider Considerations:
In cloud environments (AWS, GCP, Azure):
- Use private IPs for advertise-addr when nodes are in the same VPC
- Security groups must allow Swarm ports between all nodes
- Auto-scaling groups may cause issues if instances are terminated and replaced
- Consider using DNS names that resolve to current manager IPs for the join command
Debugging with increased verbosity:
# Enable debug mode in Docker daemon
sudo dockerd --debug
# Or add to /etc/docker/daemon.json
{
"debug": true
}After enabling debug mode, check journalctl -u docker for detailed logs during the join attempt.
image operating system "linux" cannot be used on this platform
How to fix 'image operating system linux cannot be used on this platform' in Docker
manifest unknown: manifest unknown
How to fix 'manifest unknown' in Docker
cannot open '/etc/passwd': Permission denied
How to fix 'cannot open: Permission denied' in Docker
Error response from daemon: failed to create the ipvlan port
How to fix 'failed to create the ipvlan port' in Docker
toomanyrequests: Rate exceeded for anonymous users
How to fix 'Rate exceeded for anonymous users' in Docker Hub