How to fix NodeNotReady in Kubernetes

KubernetesINTERMEDIATEHIGH

NodeNotReady indicates a Kubernetes node cannot run workloads due to kubelet issues, container runtime failures, or resource exhaustion. The control plane taints the node and evicts workloads to healthy nodes.

What this error means

The NodeNotReady condition indicates that a Kubernetes node is unable to run workloads and the control plane has detected a fundamental problem preventing the node from being fully operational. This status means the kubelet on that node has either lost communication with the API server, failed health checks, or encountered critical issues that prevent it from managing pods safely. When a node's Ready condition remains Unknown or False for longer than the kube-controller-manager's NodeMonitorGracePeriod (default 50 seconds), the control plane automatically adds taints to prevent new pod scheduling and begins evicting existing workloads to healthy nodes. The node's readiness is determined by heartbeat mechanisms: the kubelet sends Lease object updates every 10 seconds and reports node status changes to the API server. When heartbeats stop or conditions indicate problems (resource exhaustion, network unavailability, container runtime failure), Kubernetes marks the node as NotReady.

How to fix "NodeNotReady"

1Check the node status and conditions

Verify the node is actually NotReady and gather detailed condition information:

bash

kubectl get nodes
kubectl describe node <node-name>

Look at the 'Conditions' section. Check for:
- Ready: False or Unknown (the core issue)
- MemoryPressure, DiskPressure, PIDPressure: True values indicate resource problems
- NetworkUnavailable: True indicates network connectivity issues
- Any conditions showing 'Unknown' suggest kubelet is not reporting

2Verify kubelet is running and check its logs

SSH into the node and check if the kubelet service is active:

bash

sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100 --no-pager

Common errors in logs:
- 'container runtime is down' → container runtime has crashed
- 'connection timeout to the control plane' → networking issue
- 'Cannot connect to the Docker daemon' → runtime endpoint misconfiguration
- 'pthread_create failed' → PID/thread exhaustion

If kubelet shows 'inactive' or 'failed', restart it:

bash

sudo systemctl restart kubelet

3Verify container runtime is healthy

Check the status of the container runtime:

bash

sudo systemctl status containerd
sudo systemctl status docker  # if using Docker

If the runtime is not running:

bash

sudo systemctl restart containerd

Verify the kubelet can communicate with the runtime:

bash

sudo crictl ps  # for containerd
sudo docker ps  # for Docker

If you see socket errors, verify the socket exists:

bash

ls -la /run/containerd/containerd.sock
ls -la /var/run/docker.sock

4Check for resource exhaustion on the node

Use system tools to examine CPU, memory, disk, and PID usage:

bash

df -h                    # disk usage
free -m                  # memory usage
top -b -n 1             # CPU and memory per process
cat /proc/sys/kernel/pid_max  # max PID limit
ps aux | wc -l          # current process count

If you see 'out of memory' or 'no space left on device', the node needs cleanup:

bash

# Free disk space by removing unused images
sudo crictl rmi --prune

# Or restart containerd to allow garbage collection
sudo systemctl restart containerd

5Verify network connectivity and CNI plugin status

Test connectivity from the node to the API server:

bash

nc -zv <API_SERVER_IP> 6443

Check if the CNI plugin is running in the kube-system namespace:

bash

kubectl get pods -n kube-system -l k8s-app=calico-node
kubectl get pods -n kube-system -l k8s-app=flannel
kubectl get pods -n kube-system -l k8s-app=cilium

If CNI pods are not running or in CrashLoopBackOff, check their logs:

bash

kubectl logs -n kube-system <cni-pod-name> --tail=50

6Perform a graceful node recovery

If the node is stuck in NotReady and previous steps don't work, drain and restart it:

bash

# Drain the node to evict pods safely
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Restart the node
sudo reboot

# Monitor the node rejoining
kubectl get nodes -w

# After the node is Ready, uncordon it
kubectl uncordon <node-name>

For persistent issues, consider removing and replacing the node entirely:

bash

kubectl delete node <node-name>
# Then provision a new node via your infrastructure

How to fix NodeNotReady in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "NodeNotReady"

Advanced notes

Related errors

Official resources & further reading