This error occurs when Docker Swarm cannot schedule tasks on a node because it is set to 'drain' or 'pause' availability, or the node has gone offline. The fix involves checking node status, updating availability settings, or rejoining the node to the swarm.
The "node is not available" error in Docker Swarm indicates that the swarm manager cannot assign tasks to a particular node. This happens when a node's availability state prevents it from receiving new workloads. In Docker Swarm, every node has an availability setting that determines whether it can accept tasks: - **Active**: The node can receive and run tasks (default state when joining) - **Pause**: The node cannot receive new tasks, but existing tasks continue running - **Drain**: The scheduler stops all existing tasks and moves them to other active nodes; no new tasks are assigned This error commonly appears when you try to deploy a service with placement constraints targeting a node that is drained or paused, when a node has gone offline due to network issues, or when a manager node has been drained to dedicate it to management tasks only.
First, inspect the current state of all nodes in your swarm:
docker node lsThis shows the availability of each node. Look for nodes with "Drain" or "Pause" in the AVAILABILITY column:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
abc123def456... manager1 Ready Active Leader
xyz789ghi012... worker1 Ready DrainTo get detailed information about a specific node:
docker node inspect <node_name> --prettyCheck the "Availability" field in the output.
If the node shows "Drain" or "Pause" status and you want it to accept tasks, update its availability:
docker node update --availability active <node_name>For example:
docker node update --availability active worker1Verify the change:
docker node lsThe node should now show "Active" in the AVAILABILITY column. Existing services will automatically reschedule tasks to this node if needed.
If the node shows "Down" status instead of "Ready", it has lost connection to the swarm:
docker node lsLook for:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
xyz789ghi012... worker1 Down ActiveOn the disconnected node, check if Docker is running:
sudo systemctl status dockerIf Docker is stopped, start it:
sudo systemctl start dockerCheck swarm membership on the node:
docker info | grep -A 5 "Swarm"If the node thinks it's not in a swarm, you may need to rejoin it.
If a node has been disconnected or removed, rejoin it to the swarm.
On a manager node, get the join token:
# For worker nodes
docker swarm join-token worker
# For manager nodes
docker swarm join-token managerThis outputs a command like:
docker swarm join --token SWMTKN-1-abc123... 192.168.1.100:2377On the node to rejoin, first leave any existing swarm:
docker swarm leave --forceThen run the join command provided by the manager:
docker swarm join --token SWMTKN-1-abc123... 192.168.1.100:2377On the manager, remove the old node entry if it persists:
docker node rm <old_node_id>Docker Swarm requires specific ports to be open between nodes:
- TCP 2377: Cluster management communications
- TCP/UDP 7946: Communication among nodes (gossip protocol)
- UDP 4789: Overlay network traffic (VXLAN)
Test connectivity from a worker to the manager:
nc -zv <manager_ip> 2377
nc -zv <manager_ip> 7946
nc -zuv <manager_ip> 4789If using UFW (Ubuntu firewall):
sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp
sudo ufw reloadIf using firewalld (RHEL/CentOS):
sudo firewall-cmd --permanent --add-port=2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reloadCheck for HTTP proxy interference: If you have an HTTP proxy configured, it may intercept swarm traffic. Ensure Docker traffic bypasses the proxy.
If you're using placement constraints in your service definition, ensure they target available nodes.
Check existing service constraints:
docker service inspect <service_name> --pretty | grep -A 10 "Placement"Common constraint issues:
1. Targeting a drained node by name:
# This fails if worker1 is drained
deploy:
placement:
constraints:
- node.hostname == worker12. Using labels that don't exist on active nodes:
deploy:
placement:
constraints:
- node.labels.region == us-westAdd labels to nodes:
docker node update --label-add region=us-west worker1Verify node labels:
docker node inspect worker1 --format '{{ .Spec.Labels }}'Update service to remove or modify constraints:
docker service update --constraint-rm 'node.hostname == worker1' <service_name>If a node was removed but tasks still reference it, clean up the stale entries.
List all nodes including down ones:
docker node lsRemove nodes that are no longer part of the cluster:
docker node rm <node_id>If the node is still showing tasks:
docker node rm --force <node_id>Check for orphaned tasks:
docker service ps <service_name> --filter "desired-state=running"If tasks are stuck referencing a removed node, force a service update:
docker service update --force <service_name>This reschedules all tasks, placing them on available nodes.
Understanding node availability states:
- Active: Default state. The node participates fully in the swarm and can receive task assignments.
- Pause: The node remains in the swarm and existing tasks keep running, but no new tasks are assigned. Useful for troubleshooting without disrupting current workloads.
- Drain: The scheduler evacuates all tasks from the node and prevents new assignments. Use this for maintenance or to dedicate manager nodes to management only.
Best practice for manager nodes: In production swarms, drain your manager nodes so they focus solely on cluster management:
docker node update --availability drain <manager_node>Handling node failures gracefully: Configure appropriate restart policies and replica counts so the swarm can recover when nodes fail:
deploy:
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3Monitoring node health: Set up monitoring to alert when nodes go down:
# Simple check script
docker node ls --format "{{.Hostname}}: {{.Status}}" | grep -v "Ready"Soft constraints (workaround): Docker Swarm doesn't support "preferred" placement (like Kubernetes nodeAffinity). If a node is unavailable, services with hard constraints will fail. Consider using labels on multiple nodes to provide fallback options:
# Label multiple nodes with the same role
docker node update --label-add role=database worker1 worker2 worker3Recovering from split-brain scenarios: If network partitions cause managers to disagree on cluster state, you may need to reinitialize the swarm. As a last resort:
docker swarm init --force-new-clusterThis recovers from a loss of quorum using the current manager's state. Use with caution.
Task history and stale references: Docker keeps a history of tasks (default: 5). If you see "node not found" errors in logs, these may relate to old task entries. Reduce history to minimize noise:
docker swarm update --task-history-limit 3image operating system "linux" cannot be used on this platform
How to fix 'image operating system linux cannot be used on this platform' in Docker
manifest unknown: manifest unknown
How to fix 'manifest unknown' in Docker
cannot open '/etc/passwd': Permission denied
How to fix 'cannot open: Permission denied' in Docker
Error response from daemon: failed to create the ipvlan port
How to fix 'failed to create the ipvlan port' in Docker
toomanyrequests: Rate exceeded for anonymous users
How to fix 'Rate exceeded for anonymous users' in Docker Hub