NodeNotReady indicates a Kubernetes node cannot run workloads due to kubelet issues, container runtime failures, or resource exhaustion. The control plane taints the node and evicts workloads to healthy nodes.
The NodeNotReady condition indicates that a Kubernetes node is unable to run workloads and the control plane has detected a fundamental problem preventing the node from being fully operational. This status means the kubelet on that node has either lost communication with the API server, failed health checks, or encountered critical issues that prevent it from managing pods safely. When a node's Ready condition remains Unknown or False for longer than the kube-controller-manager's NodeMonitorGracePeriod (default 50 seconds), the control plane automatically adds taints to prevent new pod scheduling and begins evicting existing workloads to healthy nodes. The node's readiness is determined by heartbeat mechanisms: the kubelet sends Lease object updates every 10 seconds and reports node status changes to the API server. When heartbeats stop or conditions indicate problems (resource exhaustion, network unavailability, container runtime failure), Kubernetes marks the node as NotReady.
Verify the node is actually NotReady and gather detailed condition information:
kubectl get nodes
kubectl describe node <node-name>Look at the 'Conditions' section. Check for:
- Ready: False or Unknown (the core issue)
- MemoryPressure, DiskPressure, PIDPressure: True values indicate resource problems
- NetworkUnavailable: True indicates network connectivity issues
- Any conditions showing 'Unknown' suggest kubelet is not reporting
SSH into the node and check if the kubelet service is active:
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100 --no-pagerCommon errors in logs:
- 'container runtime is down' → container runtime has crashed
- 'connection timeout to the control plane' → networking issue
- 'Cannot connect to the Docker daemon' → runtime endpoint misconfiguration
- 'pthread_create failed' → PID/thread exhaustion
If kubelet shows 'inactive' or 'failed', restart it:
sudo systemctl restart kubeletCheck the status of the container runtime:
sudo systemctl status containerd
sudo systemctl status docker # if using DockerIf the runtime is not running:
sudo systemctl restart containerdVerify the kubelet can communicate with the runtime:
sudo crictl ps # for containerd
sudo docker ps # for DockerIf you see socket errors, verify the socket exists:
ls -la /run/containerd/containerd.sock
ls -la /var/run/docker.sockUse system tools to examine CPU, memory, disk, and PID usage:
df -h # disk usage
free -m # memory usage
top -b -n 1 # CPU and memory per process
cat /proc/sys/kernel/pid_max # max PID limit
ps aux | wc -l # current process countIf you see 'out of memory' or 'no space left on device', the node needs cleanup:
# Free disk space by removing unused images
sudo crictl rmi --prune
# Or restart containerd to allow garbage collection
sudo systemctl restart containerdTest connectivity from the node to the API server:
nc -zv <API_SERVER_IP> 6443Check if the CNI plugin is running in the kube-system namespace:
kubectl get pods -n kube-system -l k8s-app=calico-node
kubectl get pods -n kube-system -l k8s-app=flannel
kubectl get pods -n kube-system -l k8s-app=ciliumIf CNI pods are not running or in CrashLoopBackOff, check their logs:
kubectl logs -n kube-system <cni-pod-name> --tail=50If the node is stuck in NotReady and previous steps don't work, drain and restart it:
# Drain the node to evict pods safely
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Restart the node
sudo reboot
# Monitor the node rejoining
kubectl get nodes -w
# After the node is Ready, uncordon it
kubectl uncordon <node-name>For persistent issues, consider removing and replacing the node entirely:
kubectl delete node <node-name>
# Then provision a new node via your infrastructureKubelet sends Lease object updates every 10 seconds. The controller-manager has a NodeMonitorGracePeriod (default 50 seconds) and a NodeMonitorNotReadyDuration (default 300 seconds, 5 minutes). If the Ready condition is False for > 5 minutes, pods are evicted.
PLEG (Pod Lifecycle Event Generator) monitors the container runtime for pod state changes. If PLEG reports unhealthy ("pleg was last seen active Xm ago; threshold is 3m"), it means no heartbeat from the runtime. This is often caused by runtime lockups, not actual crashes—restarting the kubelet can unstick it.
Modern Kubernetes requires the cgroup driver (usually systemd) to match between kubelet and the container runtime. Mismatches cause resource enforcement failures. Check kubelet config: grep cgroup /etc/kubernetes/kubelet/kubelet-config.yaml.
For production clusters, deploy the node-problem-detector DaemonSet to catch NotReady conditions early and emit Kubernetes events before automatic eviction.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm