API server timeout errors occur when kubectl or cluster components cannot reach the Kubernetes API server within the expected time. This happens due to control plane overload, network latency, or misconfigured timeouts and can block deployments and cluster operations.
The Kubernetes API server is the central component that manages cluster state and processes all API requests from kubectl, kubelet, and other components. When an API server timeout occurs, it means a client (kubectl, kubelet, webhook, etc.) could not complete a request before the deadline expired. This can happen at the network level (connection timeout), at the application level (request processing timeout), or due to the server being overloaded and unable to respond promptly. The API server acts as a gatekeeper for all cluster operations, so timeouts here block deployments, scaling, configuration changes, and cluster monitoring.
SSH into control plane and examine API server logs:
# For kubeadm clusters:
sudo docker logs -f kube-apiserver # Check for errors
# Or via kubelet:
sudo journalctl -u kubelet -f
# Or via systemd:
sudo journalctl -u kube-apiserver -fLook for:
- "context deadline exceeded" - client timeout
- "etcd is unavailable" - database issue
- "request duration exceeded deadline" - slow processing
- "webhook timeout" - validating/mutating webhook is slow
Monitor CPU, memory, and connection count:
# SSH to control plane
top -p $(pgrep kube-apiserver) # Real-time usage
# Check connection count:
ss -tunap | grep 6443
# Or use Prometheus if available:
kubectl port-forward -n kube-system svc/prometheus 9090:9090
# Then view: http://localhost:9090
# Query: apiserver_request_duration_secondsIf API server is at 80%+ CPU or memory, the server is overloaded.
High request volume exhausts the API server. Common causes:
# Check how many API calls are happening:
kubectl top nodes --all-namespaces
# If operators/controllers are making excessive requests:
kubectl get events --all-namespaces --sort-by=.metadata.creationTimestamp
# Identify noisy pods/controllers:
kubectl logs -n kube-system deployment/your-controller --tail=100
# Reduce custom controller reconciliation frequency:
# Edit your controller deployment and increase resyncPeriod
# Example: change resyncPeriod from 10s to 60sDisable or scale down unnecessary controllers.
API server depends on etcd. Check etcd status:
# SSH to control plane
sudo docker exec etcd etcdctl endpoint health
# Check database size:
sudo docker exec etcd etcdctl endpoint status
# For large databases, run defrag:
sudo docker exec etcd etcdctl defragIf etcd shows "unhealthy" or database is > 1GB:
- Compact etcd: etcdctl compact $(etcdctl endpoint status | awk -F, '{print $2}')
- Delete unnecessary objects: kubectl delete events --all-namespaces
- Reduce revision history: helm plugin install https://github.com/jkroepke/helm-secrets && helm secrets delrev <release> --revisions-to-keep=2
Network latency causes timeouts. Test from a node:
# SSH to a worker node
ping -c 5 <control-plane-ip>
# Test TCP connection to API server:
telnet <control-plane-ip> 6443
# Check network policy:
kubectl get networkpolicies --all-namespaces
# Measure latency:
time curl -k https://<control-plane-ip>:6443/api/v1 --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.keyIf latency > 100ms, investigate network topology.
Default timeout is 60 seconds. Only increase if API server is healthy but slow:
# For kubeadm, edit API server pod manifest:
sudo nano /etc/kubernetes/manifests/kube-apiserver.yaml
# Add or modify:
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
spec:
containers:
- name: kube-apiserver
command:
- kube-apiserver
- --request-timeout=90s # Increase from 60s
- --min-request-timeout=10s # Minimum for watch requestsSave and apiserver will restart automatically. Verify:
kubectl get pod -n kube-system kube-apiserver-<node> -o yaml | grep request-timeoutWARNING: Only increase if you understand the cause. Masking the timeout hides the underlying issue.
Validating/mutating webhooks can timeout and block the API:
# List webhooks:
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations
# Check webhook timeout:
kubectl get validatingwebhookconfigurations -o yaml | grep timeoutSeconds
# Temporarily disable problematic webhook:
kubectl delete validatingwebhookconfigurations <webhook-name>
# If you need to keep it, increase timeout:
kubectl patch validatingwebhookconfigurations <webhook-name> --type='json' -p='[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value":30}]'If webhook is slow, fix the webhook application (add caching, optimize rules).
Undersized control plane causes timeouts:
# Check control plane node resources:
kubectl describe node <control-plane-node> | grep -A 10 "Allocated resources"
# Minimum recommendations:
# - CPU: 2+ cores (4+ for production)
# - RAM: 4GB minimum (8GB+ recommended)
# - Disk: 50GB+ for etcd
# Check actual usage:
kubectl top node <control-plane-node>
# For cloud clusters (EKS, AKS, GKE):
# - Verify control plane tier (standard, ha)
# - Scale up control plane if neededIf under-resourced, upgrade the control plane instance type.
Nodes failing to contact API server cause "NotReady" status:
# SSH to a node
kubectl describe node <node-name> | grep -A 5 "Conditions"
# Test kubelet connectivity from node:
curl -k --key /var/lib/kubelet/pki/kubelet-client-current.pem --cert /var/lib/kubelet/pki/kubelet-client-current.pem https://<api-server>:6443/api/v1
# Check kubelet logs:
sudo journalctl -u kubelet --tail=100
# Restart kubelet if stuck:
sudo systemctl restart kubeletIf kubelet cannot reach API server, check firewall rules and DNS resolution.
API server timeout errors have three common layers: network (DNS resolution, packet loss), TLS/certificate validation (expensive under load), and application (API server processing). In managed Kubernetes (EKS, AKS, GKE), the control plane is cloud-provider managed; upgrade to HA tier if timeouts persist. For large clusters (500+ nodes), use multiple control planes and distribute load. Webhooks are common culprits—implement webhook caching and use failurePolicy: Ignore for non-critical webhooks. In CI/CD pipelines, increase kubectl timeout: kubectl --request-timeout=120s get nodes. For custom operators, implement exponential backoff: retry failed requests instead of giving up immediately. Monitor apiserver_request_duration_seconds and apiserver_request_total Prometheus metrics to identify bottlenecks.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes