How to fix Error response from daemon: error while joining swarm: failed to join the cluster in Docker

DockerINTERMEDIATEMEDIUM

This error occurs when a Docker node cannot successfully join an existing Swarm cluster. Common causes include firewall blocking required ports, network connectivity issues, time synchronization problems, or an invalid/expired join token.

What this error means

The "error while joining swarm: failed to join the cluster" message indicates that your Docker node attempted to join a Swarm cluster but the join operation failed. Docker Swarm uses a distributed consensus algorithm (Raft) for manager nodes, and all nodes must be able to communicate with the manager(s) over specific ports. This error can appear in several scenarios: - **Firewall blocking ports**: Swarm requires TCP port 2377 for cluster management, TCP/UDP port 7946 for node communication, and UDP port 4789 for overlay network traffic - **Network connectivity issues**: The joining node cannot reach the manager node's advertised address - **Time synchronization problems**: Swarm tokens are time-sensitive, and clock skew between nodes can cause join failures - **Invalid or expired join token**: The token used may have been rotated or the command was copied incorrectly - **Hostname conflicts**: Nodes with identical hostnames can cause cluster issues - **Proxy interference**: HTTP/HTTPS proxies can interfere with Swarm's internal communication The error may also be accompanied by more specific messages like "context deadline exceeded", "connection refused", or "TLS certificate errors" which provide additional diagnostic clues.

How to fix "Error response from daemon: error while joining swarm: failed to join the cluster"

1Verify network connectivity to the manager node

First, ensure the joining node can reach the Swarm manager on the required port:

bash

# Test connectivity to the manager node on port 2377
telnet <manager-ip> 2377

# Or using netcat
nc -zv <manager-ip> 2377

# Or using curl (will show connection but protocol error, which is OK)
curl -v telnet://<manager-ip>:2377

If the connection times out or is refused, you have a network or firewall issue to resolve.

Check if you're using the correct IP address:

bash

# On the manager node, verify the advertised address
docker info | grep -A 5 "Swarm"

The "Node Address" shown should be reachable from your joining node.

2Open required firewall ports on all nodes

Docker Swarm requires specific ports to be open on all participating nodes:

- TCP 2377: Cluster management and Raft sync
- TCP/UDP 7946: Container network discovery
- UDP 4789: Overlay network traffic (VXLAN)

For firewalld (RHEL/CentOS/Fedora):

bash

sudo firewall-cmd --permanent --add-port=2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reload

For UFW (Ubuntu/Debian):

bash

sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp
sudo ufw reload

For iptables:

bash

sudo iptables -A INPUT -p tcp --dport 2377 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 4789 -j ACCEPT
sudo iptables-save | sudo tee /etc/iptables/rules.v4

Important: Open these ports on ALL nodes (managers and workers), not just the one you're joining.

3Synchronize time across all nodes

Swarm join tokens are time-sensitive. If there's significant clock skew between nodes, joins can fail.

Check current time on both nodes:

bash

# On manager
date

# On joining node
date

Install and configure NTP/chrony:

For Ubuntu/Debian:

bash

sudo apt-get install -y chrony
sudo systemctl enable chrony
sudo systemctl start chrony

For RHEL/CentOS:

bash

sudo yum install -y chrony
sudo systemctl enable chronyd
sudo systemctl start chronyd

Force time sync:

bash

# Using chrony
sudo chronyc makestep

# Using ntpdate (if available)
sudo ntpdate pool.ntp.org

# Manual sync from manager (quick fix)
sudo date --set="$(ssh user@manager-node date)"

Verify synchronization:

bash

chronyc tracking
# or
timedatectl status

4Generate a fresh join token

Join tokens can expire or be rotated. Get a fresh token from the manager:

For worker nodes:

bash

# Run on a manager node
docker swarm join-token worker

For manager nodes:

bash

# Run on a manager node
docker swarm join-token manager

If you suspect the token was compromised or want to invalidate old tokens:

bash

# Rotate the worker token
docker swarm join-token --rotate worker

# Rotate the manager token
docker swarm join-token --rotate manager

Copy the entire docker swarm join command exactly as shown, including the token and address.

Common mistakes:
- Copying only part of the token
- Using an old token that was rotated
- Mixing up worker and manager tokens

5Clean up previous swarm state and retry

If the node was previously part of a swarm, leftover state can cause join failures:

bash

# Force leave any existing swarm
docker swarm leave --force

# Verify the node is no longer in swarm mode
docker info | grep Swarm
# Should show: Swarm: inactive

If issues persist, clean Docker's swarm state directory:

bash

# Stop Docker
sudo systemctl stop docker

# Remove swarm state (this is safe on a node that isn't a manager)
sudo rm -rf /var/lib/docker/swarm

# Start Docker
sudo systemctl start docker

# Try joining again
docker swarm join --token <token> <manager-ip>:2377

Warning: Only remove /var/lib/docker/swarm on worker nodes or managers you're removing from the cluster. Never do this on an active manager that's part of the quorum.

6Check and fix hostname conflicts

Docker Swarm requires unique hostnames for each node. Duplicate hostnames cause cluster issues.

Check hostnames on all nodes:

bash

hostname
cat /etc/hostname

If nodes share the same hostname (like "localhost" or a default cloud hostname):

bash

# Set a unique hostname
sudo hostnamectl set-hostname swarm-node-01

# Update /etc/hosts
sudo bash -c 'echo "127.0.0.1 swarm-node-01" >> /etc/hosts'

# Restart Docker to pick up the change
sudo systemctl restart docker

Verify the change took effect:

bash

docker info | grep Name

Each node in the swarm should have a unique name.

7Disable proxy settings for Swarm communication

HTTP/HTTPS proxies can interfere with Docker Swarm's internal communication, causing join failures.

Check for proxy settings:

bash

echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy
env | grep -i proxy

Check Docker daemon proxy configuration:

bash

cat /etc/systemd/system/docker.service.d/http-proxy.conf

Add Swarm manager to NO_PROXY:

Create or edit /etc/systemd/system/docker.service.d/http-proxy.conf:

ini

[Service]
Environment="HTTP_PROXY=http://proxy.example.com:8080"
Environment="HTTPS_PROXY=http://proxy.example.com:8080"
Environment="NO_PROXY=localhost,127.0.0.1,<manager-ip>,<other-node-ips>"

Reload and restart Docker:

bash

sudo systemctl daemon-reload
sudo systemctl restart docker

Alternatively, temporarily disable the proxy for the join:

bash

unset HTTP_PROXY HTTPS_PROXY http_proxy https_proxy
docker swarm join --token <token> <manager-ip>:2377

8Restart Docker service and retry join

Sometimes the Docker daemon needs a restart to clear transient issues:

bash

# Restart Docker
sudo systemctl restart docker

# Wait a moment for it to fully start
sleep 5

# Verify Docker is running
docker info

# Retry the join
docker swarm join --token <token> <manager-ip>:2377

If join still fails, check Docker logs for more details:

bash

# View recent Docker daemon logs
sudo journalctl -u docker --since "10 minutes ago"

# Follow logs while attempting join (in another terminal)
sudo journalctl -u docker -f

Look for specific error messages like certificate errors, connection timeouts, or authentication failures.

9Specify the correct advertise address

If the joining node has multiple network interfaces, Docker might advertise the wrong IP:

bash

# Join with explicit advertise address
docker swarm join \
  --advertise-addr <this-node-ip> \
  --token <token> \
  <manager-ip>:2377

On the manager side, if the swarm was initialized with the wrong address:

bash

# Check current advertise address
docker info | grep "Advertise Address"

# If incorrect, you may need to reinitialize (WARNING: this affects the whole cluster)
# Only do this if absolutely necessary and you understand the implications
docker swarm init --force-new-cluster --advertise-addr <correct-ip>

Common scenarios requiring explicit advertise-addr:
- Nodes with both public and private IPs (cloud environments)
- Nodes with VPN interfaces
- Docker running inside a VM with bridged networking
- Multiple Docker networks or custom bridges

Advanced notes

Understanding Swarm Manager Quorum:

Docker Swarm uses the Raft consensus algorithm for manager nodes. This requires a majority (quorum) of managers to be available for cluster operations:

| Manager Count | Quorum Needed | Fault Tolerance |
|---------------|---------------|-----------------|
| 1 | 1 | 0 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |

Recovering from lost quorum:

If you lose quorum (more than half of managers down), the cluster becomes read-only and you cannot join new nodes. To recover:

bash

# On the last surviving manager, force create a new single-manager cluster
docker swarm init --force-new-cluster

# Then rejoin other managers
docker swarm join-token manager
# Use the new token on other manager nodes

TLS Certificate Issues:

Swarm uses mutual TLS for all node communication. Certificate errors during join can occur if:

- The manager's certificate expired
- Clock skew makes certificates appear invalid
- Manual certificate manipulation corrupted the chain

To diagnose:

bash

# Check certificate info on manager
openssl s_client -connect <manager-ip>:2377 < /dev/null 2>/dev/null | openssl x509 -noout -dates

Node ID Considerations:

Each node has a unique ID generated when it first joins a swarm. Important points:

- Never copy /var/lib/docker/swarm between nodes
- A node can only use its ID once to join a swarm
- If a node was removed, it needs docker swarm leave --force before rejoining

Cloud Provider Considerations:

In cloud environments (AWS, GCP, Azure):

- Use private IPs for advertise-addr when nodes are in the same VPC
- Security groups must allow Swarm ports between all nodes
- Auto-scaling groups may cause issues if instances are terminated and replaced
- Consider using DNS names that resolve to current manager IPs for the join command

Debugging with increased verbosity:

bash

# Enable debug mode in Docker daemon
sudo dockerd --debug

# Or add to /etc/docker/daemon.json
{
  "debug": true
}

After enabling debug mode, check journalctl -u docker for detailed logs during the join attempt.

How to fix Error response from daemon: error while joining swarm: failed to join the cluster in Docker

What this error means

Typical symptoms

Common causes

How to fix "Error response from daemon: error while joining swarm: failed to join the cluster"

Advanced notes

Related errors

Official resources & further reading