Container runtime errors indicate the kubelet cannot communicate with Docker, containerd, or another CRI runtime. Pods fail to start and nodes become degraded when the runtime is unavailable or misconfigured.
The kubelet relies on a container runtime (Docker, containerd, CRI-O) to: 1. Create and run containers 2. Manage container lifecycle (start, stop, restart) 3. Handle image pulls 4. Configure container networking When the runtime fails, the kubelet cannot execute any pods. Common causes: - Runtime daemon crashed or is not responding - Runtime socket file missing or permissions wrong - Container runtime image pull failures - Incompatible kubelet and runtime versions - Node disk full, preventing runtime operations
Check which runtime your cluster uses:
kubectl get nodes -o wide # Check CONTAINER-RUNTIME columnSSH into the affected node and verify runtime status:
For Docker:
sudo systemctl status docker
sudo systemctl start docker # If stopped
sudo docker ps # Test connectivityFor containerd:
sudo systemctl status containerd
sudo systemctl start containerd
sudo ctr -a /run/containerd/containerd.sock versionFor CRI-O:
sudo systemctl status crio
sudo systemctl start crioIf the daemon is running, proceed to next step.
Review runtime logs for errors:
Docker:
sudo journalctl -u docker -f
sudo tail -f /var/log/docker.logcontainerd:
sudo journalctl -u containerd -f
sudo tail -f /var/log/containerd/containerd.logCRI-O:
sudo journalctl -u crio -fLook for errors like:
- "failed to create container"
- "out of memory"
- "no space left on device"
- "permission denied"
- "connection refused"
Check that the runtime socket exists and is accessible:
Docker:
ls -la /var/run/docker.sock
# Should output: srw-rw---- root docker /var/run/docker.sockcontainerd:
ls -la /run/containerd/containerd.sock
# Should output: srw-rw-rw- root root /run/containerd/containerd.sockCRI-O:
ls -la /var/run/crio/crio.sockIf socket file is missing:
1. Restart the runtime daemon
2. Check if the runtime directory exists
3. Verify mount points if using unusual storage
If permissions are wrong:
sudo chmod 660 /var/run/docker.sock
sudo chown root:docker /var/run/docker.sockContainer runtime typically needs space in:
- /var/lib/docker (Docker)
- /var/lib/containerd (containerd)
- /var/lib/crio (CRI-O)
Check disk usage:
df -h /var/lib/docker
df -h /var/lib/containerd
du -sh /var/lib/docker/* # Breakdown by componentIf disk is full (> 90%):
# Clean up unused images
sudo docker rmi $(docker images -q -f "dangling=true")
sudo ctr -n k8s.io i rm $(ctr -n k8s.io i ls -q | tail -20)
# Remove unused containers
sudo docker container prune
# Clear build cache
sudo docker builder pruneFor persistent space issues, increase the volume or move storage:
sudo docker info | grep "Storage Driver"Verify kubelet is configured to use the correct runtime socket:
ps aux | grep kubelet | grep -E "(container-runtime|container-runtime-endpoint)"For containerd (Kubernetes 1.24+):
kubectl describe node <node> | grep -i "container"Edit kubelet config:
For kubeadm clusters:
sudo nano /etc/sysconfig/kubelet
# or
sudo nano /etc/kubernetes/kubelet.envEnsure it includes:
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sockOr for Docker:
--container-runtime=dockerRestart kubelet:
sudo systemctl restart kubelet
sudo journalctl -u kubelet -fTest runtime API connectivity:
containerd:
sudo ctr -a /run/containerd/containerd.sock version
sudo ctr -a /run/containerd/containerd.sock images listDocker:
sudo docker version
sudo docker psCRI-O:
sudo crictl version
sudo crictl imagesIf API errors appear:
- Check socket permissions (see Step 3)
- Restart runtime daemon (Step 1)
- Check for stale connections:
sudo lsof | grep containerd.sock
sudo kill <stale-pid>As a last resort, restart the entire runtime:
# Cordon the node (prevent new pods)
kubectl cordon <node-name>
# Drain existing pods
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# SSH into node and restart runtime
sudo systemctl stop kubelet
sudo systemctl restart docker # or containerd, crio
sudo systemctl start kubelet
# Monitor recovery
sudo journalctl -u kubelet -f
kubectl uncordon <node-name>
kubectl get nodes -wPods will be rescheduled to other nodes. Monitor their status:
kubectl get pods -A -w | grep <node-name>Check for version compatibility:
kubectl version # Kubernetes version
sudo docker --version # Docker version
sudo containerd --version # containerd versionRef: https://kubernetes.io/docs/setup/production-environment/container-runtimes/
If versions are incompatible, upgrade the runtime:
Docker (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install -y docker-ce=<version>
sudo systemctl restart dockercontainerd (Ubuntu/Debian):
sudo apt-get install -y containerd.io=<version>
sudo systemctl restart containerdCRI-O:
sudo apt-get install -y cri-o=<version>
sudo systemctl restart crioAlways test in a staging environment first.
Container runtime errors at scale (many nodes) suggest a cluster-wide issue: perhaps a container registry outage, storage backend failure, or network partition. For managed services (EKS, GKE, AKS), runtime issues are usually handled by the platform—report via support. Kubernetes 1.24 deprecated Docker in favor of containerd, which is lighter and faster. For development, Docker Desktop includes a runtime; production should use dedicated runtime (containerd preferred). Runtime errors can cascade: if one pod fails to start, its retry storms can consume kubelet resources and destabilize the node. Use exponential backoff in pod restart policies. Monitoring runtime health is critical: Prometheus metrics from kubelet expose container creation/destruction errors. For stateless workloads, automatic node replacement (managed scaling) is safer than manual recovery. Custom OCI runtimes (kata, gVisor) have their own troubleshooting paths—consult runtime-specific docs.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes