Unknown status means Kubernetes lost contact with the node running the pod. Check node health, network connectivity, and kubelet status. The pod may still be running but unreachable.
A pod in Unknown status indicates that the Kubernetes API server cannot communicate with the kubelet on the node where the pod is running. This is a communication problem, not necessarily a pod problem—the container may still be running, but Kubernetes has no visibility into its state. The node controller marks pods as Unknown when it hasn't received a status update from the node within the node-monitor-grace-period (default 40 seconds). This typically indicates node failure, network partition, or kubelet issues.
First, verify the node health:
kubectl get nodesLook for nodes in NotReady status. Get details:
kubectl describe node <node-name>Check the Conditions section for:
- Ready: False or Unknown
- MemoryPressure, DiskPressure, PIDPressure
- NetworkUnavailable
If the node is NotReady, that's the root cause of Unknown pods.
List all pods in Unknown state:
kubectl get pods --all-namespaces -o wide | grep UnknownIf all Unknown pods are on the same node, the issue is node-level, not pod-level:
# Find which node has the problems
kubectl get pods -o wide | grep Unknown | awk '{print $7}' | sort | uniq -cIf you have node access, check kubelet:
# SSH to node
ssh <node-ip>
# Check kubelet status
systemctl status kubelet
# View kubelet logs
journalctl -u kubelet -f
# Check node resources
free -h
df -hCommon issues:
- Kubelet crashed due to OOM
- Disk full preventing kubelet from operating
- Certificate expired for kubelet-API communication
Kubernetes will automatically recover when communication resumes:
1. If node comes back online, pod status updates automatically
2. If node stays offline past pod-eviction-timeout (default 5 minutes), pods are rescheduled
Monitor recovery:
kubectl get pods -w # Watch for status changes
kubectl get nodes -wFor managed Kubernetes (EKS, GKE, AKS), the cloud provider may automatically replace unhealthy nodes.
If the node won't recover and pods remain Unknown:
kubectl delete pod <pod-name> --grace-period=0 --forceCaution: Force deletion tells Kubernetes to forget about the pod, but:
- The container may still be running on the failed node
- Attached volumes may not be properly released
- PersistentVolumes might show as still attached
Only use force delete when you're certain the node won't recover or has been terminated.
If the node is permanently lost:
# Drain remaining pods (will fail but marks intent)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# Delete the node
kubectl delete node <node-name>After node deletion:
- Pods are marked for rescheduling
- PersistentVolumes should detach (may need cloud provider cleanup)
- New nodes can join to replace capacity
For cloud providers, terminated instances are usually cleaned up automatically.
Unknown status in managed Kubernetes services (EKS, GKE, AKS) often resolves automatically as the cloud provider replaces unhealthy nodes. Wait several minutes before manual intervention.
For on-premises clusters, Unknown status may indicate:
- Network switch failures
- DNS resolution problems
- Firewall blocking kubelet-to-API communication
- Load balancer issues in front of API servers
To prevent data loss with stateful workloads, ensure:
- PodDisruptionBudgets limit simultaneous failures
- StatefulSets use volumeClaimTemplates for persistent storage
- Backup strategies exist for critical data
The node-monitor-grace-period and pod-eviction-timeout controller-manager flags control how quickly Unknown pods are rescheduled. Shorter timeouts mean faster recovery but more false positives during transient network issues.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm