The node.kubernetes.io/unreachable taint is applied when the Kubernetes control plane cannot communicate with a worker node and the node status becomes "Unknown". This causes pod eviction and prevents new pods from scheduling on that node.
When a node becomes unreachable (network isolated, crashed, or unresponsive), the Kubernetes control plane automatically applies the "node.kubernetes.io/unreachable:NoExecute" taint. This is a self-healing mechanism to prevent new pods from being scheduled on a dead node and to evict existing pods so they can be rescheduled elsewhere. The node status shows "Unknown" when the control plane hasn't heard from the kubelet for more than the node controller's grace period (typically 40-50 seconds). Pods get a default 5-minute tolerance before being evicted (unless they have a matching toleration).
Run:
kubectl get nodes
kubectl describe node <node-name>Look for taint: "node.kubernetes.io/unreachable:NoExecute". Check the "Ready" condition—it should show "False" or "Unknown". Examine recent events for clues about what caused unreachability.
Try to SSH to the node and check if kubelet is running:
ssh <node-ip>
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 50 # Last 50 kubelet logsIf kubelet is down, restart it:
sudo systemctl restart kubeletMonitor status:
kubectl get nodes -w # Watch for status changesFrom control plane, verify you can reach the node:
ping <node-ip>
kubectl get --raw /api/v1/nodes/<node-name>/proxy/metricsCheck kubelet port (10250):
netstat -tlnp | grep 10250 # On the node
curl -k https://<node-ip>:10250/metrics # From control planeIf firewall is blocking, open required ports: 10250 (kubelet), 10255 (metrics).
IPv4 forwarding is required for pod networking. Check:
cat /proc/sys/net/ipv4/ip_forwardIf it returns 0, enable it:
sudo sysctl -w net.ipv4.ip_forward=1Make permanent by editing /etc/sysctl.conf:
net.ipv4.ip_forward = 1Then apply: sudo sysctl -p
Check installed CNI plugin:
kubectl get ds -n kube-system # DaemonSets including CNI
kubectl logs -n kube-system <cni-pod> # Check logs for errorsCommon CNI plugins: flannel, calico, weave. Verify:
- Pods are running: kubectl get pods -n kube-system | grep <cni>
- Node CIDR assignment: kubectl get node <node-name> -o jsonpath="{.spec.podCIDR}"
If misconfigured, redeploy CNI plugin from official manifests.
Review firewall rules:
sudo iptables -L -n # On the node
sudo firewall-cmd --list-all # If using firewalldEnsure these ports are open:
- 10250/TCP (kubelet)
- 10255/TCP (kubelet metrics)
- 8285-8472/UDP (Flannel VxLAN, if using Flannel)
- 179/TCP (Calico BGP, if using Calico)
For testing, temporarily disable firewall:
sudo systemctl stop firewalldIf this fixes it, add proper firewall rules instead.
If temporary unreachability is expected, increase toleration:
spec:
tolerations:
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 6000 # 100 minutes instead of 300s
# ... rest of pod specFor critical pods that must stay running, increase tolerationSeconds even more or remove the tolerance entirely (not recommended for most workloads).
If the node cannot be recovered, gracefully remove it:
kubectl drain <node-name> --ignore-daemonsets
kubectl delete node <node-name>For cloud providers (AWS, GCP, Azure), the node may auto-rejoin if the instance is still running. Stop the instance if you want permanent removal:
# AWS EC2
aws ec2 stop-instances --instance-ids <instance-id>Watch the node status after recovery attempts:
kubectl get nodes -w
kubectl describe node <node-name>Once Ready condition returns to True, the taint is automatically removed and pods can be scheduled again. Check pod eviction status:
kubectl get pods --all-namespaces --field-selector=status.reason=EvictedEvicted pods need to be manually recreated (via deployments, etc.).
This taint is Kubernetes' self-healing mechanism—it's not an error to panic about, but rather the system protecting workloads from dead nodes. For stateful workloads, increase tolerationSeconds to allow time for node recovery. For cloud-managed Kubernetes (AWS EKS, GKE, AKS), check cloud-specific monitoring—the issue might be at the infrastructure layer (VM unavailable, network misconfiguration) rather than Kubernetes. In multi-zone clusters, node unreachability may indicate zone failure; monitor zone health. WSL2-based Kubernetes may experience temporary unreachability during VM suspension/resume—ensure proper VM configuration. For on-premises clusters, ensure stable network between control plane and all worker nodes; consider redundant network paths for critical clusters.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm