The "Events rate limit exceeded" error occurs when the API server generates more events per second than the rate limiter allows. This typically happens during pod creation storms, node failures, or configuration updates, causing recent events to be dropped and hindering troubleshooting.
Kubernetes tracks cluster events (pod creation, failures, restarts) in the Events API. The API server limits the rate at which events can be created to prevent event explosion and API server overload. When your cluster generates events faster than the configured rate limit, the API server discards excess events, preventing you from seeing the full history of what happened. This is a cluster-level protection mechanism. When triggered, you lose visibility into recent events, making troubleshooting harder because `kubectl describe pod` and event logs become incomplete.
Inspect the API server configuration:
# For kubeadm clusters:
kubectl get pod -n kube-system kube-apiserver-<node> -o yaml | grep -A 5 "event-rate-limit"
# Check the event-rate-limit policy file (if configured):
ls -la /etc/kubernetes/manifests/kube-apiserver.yaml
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep eventRateLimitIf event-rate-limit is not configured, the default limit is 5 events per second per namespace.
Check what events were rate-limited:
# For kubeadm clusters:
kubectl logs -n kube-system kube-apiserver-<node> | grep "rate limit" | tail -20
# For managed clusters (AKS, EKS, GKE), check cloud provider logs:
# AWS EKS: CloudWatch Logs under /aws/eks/<cluster>/cluster
# Azure AKS: Azure Monitor or Log Analytics Workspace
# GCP GKE: Cloud Logging with filter for API server
# You can also check kubelet logs on nodes:
kubectl logs -n kube-system kubelet-<node> # If exposed as podDetermine what's generating so many events:
# Count events by type:
kubectl get events -A | awk '{print $5}' | sort | uniq -c | sort -rn
# Watch events in real-time:
kubectl get events -A -w
# Find pods with many restarts:
kubectl get pods -A --sort-by=.status.containerStatuses[0].restartCount
# Check for failing pods:
kubectl get pods -A --field-selector=status.phase=Failed
# Monitor pod status changes:
kubectl get pods -A -wLook for patterns: are certain pods restarting? Is a deployment rolling out? Are nodes failing?
Stabilize the application to reduce event generation:
# Add proper liveness/readiness probes to prevent restart storms:
apiVersion: v1
kind: Pod
metadata:
name: stable-app
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Wait before first probe
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5Set appropriate startupProbe delays to prevent false failures.
Increase the event rate limit:
# For kubeadm clusters, edit the API server configuration:
kubectl edit pod -n kube-system kube-apiserver-<node>
# OR
sudo nano /etc/kubernetes/manifests/kube-apiserver.yamlAdd or modify the flag:
- --event-rate-limit-qps=20 # Increase from default 5 to 20
- --event-rate-limit-burst=30 # Burst allowanceFor managed clusters, this may not be directly editable. Contact cloud provider support or use their API.
For fine-grained control, create an event rate limit policy:
# Save as /etc/kubernetes/manifests/event-rate-limit-policy.yaml
apiVersion: ratelimit.admission.k8s.io/v1alpha1
kind: Configuration
eventRateLimit:
- type: Server
qps: 100
burst: 100
- type: Namespace
qps: 10
burst: 15
namespaceSelector:
matchNames:
- default
- kube-system
- type: User
qps: 5
burst: 10Enable it in kube-apiserver:
--enable-admission-plugins=EventRateLimit
--admission-control-config-file=/etc/kubernetes/manifests/event-rate-limit-policy.yamlIf a specific workload is causing the spike:
# Pause a deployment (stops rolling updates):
kubectl rollout pause deployment/<name> -n <namespace>
# Temporarily disable HPA:
kubectl patch hpa <hpa-name> -n <namespace> -p '{"spec":{"minReplicas":1,"maxReplicas":1}}'
# Or delete the HPA:
kubectl delete hpa <hpa-name> -n <namespace>
# After stabilizing, resume:
kubectl rollout resume deployment/<name> -n <namespace>Set up alerting for event rate limits:
# If using Prometheus, scrape API server metrics:
# Look for: apiserver_event_rate_limit_triggers_total
# Create a ServiceMonitor (if using Prometheus Operator):
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: apiserver-events
spec:
selector:
matchLabels:
component: kube-apiserver
endpoints:
- port: https
scheme: https
tlsConfig:
insecureSkipVerify: trueSet up alerts when rate limit is triggered frequently.
Event rate limiting is crucial for API server stability. While increasing the limit helps, it's better to reduce the root cause (app crashes, rapid deployments, node failures). In very large clusters (1000+ nodes), the default 5 QPS is often insufficient—consider 20-50 QPS. The event-rate-limit admission plugin is optional and not enabled by default. For managed clusters (AKS, EKS, GKE), you may not be able to adjust limits directly—contact cloud provider support. Events are etcd storage overhead; even if not rate-limited, thousands of events per second will fill etcd. Use event TTL (default 1 hour) to prevent unbounded growth. Custom metrics can track rate limit triggers: watch apiserver logs for "events-rate-limit" to build dashboards. Some clusters use log aggregation to retain event history separately from the Event API.
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes
default backend - 404
How to fix "default backend - 404" in Kubernetes Ingress
serviceaccount cannot list resource
How to fix "serviceaccount cannot list resource" in Kubernetes