The Kubernetes API server is overwhelmed by excessive requests, returning 429 (Too Many Requests) errors. Causes include clients making unoptimized LIST queries, etcd performance degradation, or insufficient inflight request capacity. Fix by identifying offending clients, tuning etcd, and enabling API Priority and Fairness (APF).
API server rate limiting rejects requests when concurrent inflight requests exceed configured limits (default: 400). This protects the cluster from cascading failures but blocks legitimate operations. Root cause is usually etcd degradation (slow backing store) or client behavior (excessive LIST operations).
Classifies and prioritizes requests to prevent cascade failures:
# Check if enabled (default: true in 1.20+)
kubectl api-resources | grep flowschema
# Create prioritized flows for critical workloads
kubectl apply -f - <<EOF
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
name: system-critical
spec:
type: Limited
limited:
nominalConcurrencyShares: 10000
EOFLarge etcd causes API server slowdown:
kubectl exec -n kube-system etcd-<node> -- etcdctl endpoint status
# Check DB size (alert if >6GB)
kubectl exec -n kube-system etcd-<node> -- etcdctl alarm listIf DB too large, compact and defragment:
kubectl exec -n kube-system etcd-<node> -- etcdctl compact <revision>
kubectl exec -n kube-system etcd-<node> -- etcdctl defragFind clients making excessive requests:
# Query Prometheus for request rates by client
kubectl port-forward -n prometheus svc/prometheus 9090:9090
# In browser: http://localhost:9090
# Query: sum(rate(apiserver_request_total[5m])) by (user, client)Optimize clients to batch requests, use watches instead of polls, and add field selectors to LIST queries.
Increase limits if appropriate:
kubectl edit -n kube-system deployment kube-apiserver
# Add or modify:
spec:
containers:
- name: kube-apiserver
args:
- --max-requests-inflight=800 # Increase from 400
- --max-mutating-requests-inflight=400 # Increase from 200Note: Monitor CPU/memory impact carefully.
Track latency and throttling:
# API request latency (P99)
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket[5m]))
# Throttled requests
sum(rate(apiserver_request_total{code="429"}[5m]))
# etcd size
etcd_mvcc_db_total_size_in_bytes / 1024 / 1024 / 1024Alert if P99 latency > 2s or 429 rate > 0.
Add more API server replicas:
# For managed clusters (GKE, AKS, EKS)
# Typically automatic or via cluster settings
# For self-managed clusters
kubectl scale deployment -n kube-system kube-apiserver --replicas=5Configure controller rate limits:
kubectl set env deployment/<name> \
KUBE_API_QPS=50 \
KUBE_API_BURST=100Defaults: qps=5, burst=10. Increase for high-throughput controllers.
API overload is almost always a symptom of etcd degradation or client behavior problems, not API server limits. Fix root cause (etcd, clients) rather than just raising limits. Monitor with Prometheus; alert on 429 rate or latency trends. Use API Priority and Fairness (APF) for graceful degradation under load.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm