Pod CPU limits cause the kernel to throttle container processes when they exceed configured limits. Unlike memory, throttled pods continue running but with reduced performance. Fix by adjusting CPU requests/limits, enabling HPA, or removing inappropriate limits.
When a Kubernetes pod specifies a CPU limit, the Linux CFS (Completely Fair Scheduler) kernel subsystem enforces that limit by throttling—slowing down the container's processes to prevent exceeding the allocated CPU percentage. Unlike memory limits, which kill pods that exceed them, CPU throttling just reduces throughput and increases latency. The container keeps running but at reduced performance, causing: - Increased API response times - Timeout failures - Queue backlogs - Cascading failures in dependent services CPU throttling becomes obvious when comparing actual CPU usage (from metrics) versus the container's performance degradation.
Identify if throttling is actually happening:
# Check pod resource usage:
kubectl top pod -n <namespace> <pod-name>
# View current limits:
kubectl describe pod -n <namespace> <pod-name> | grep -A 4 "Limits\|Requests"
# Check throttling metrics in Prometheus:
container_cpu_cfs_throttled_seconds_total
container_cpu_cfs_throttled_periods_totalCompare the pod's actual CPU usage against its limit.
Determine what CPU your application actually needs:
# Monitor during typical load:
kubectl top pod -n <namespace> <pod-name> --containers
# Track over time using Prometheus:
rate(container_cpu_usage_seconds_total[5m])
# Check 95th percentile usage:
histogram_quantile(0.95, rate(container_cpu_usage_seconds_total[5m]))Use this data to set appropriate requests and limits.
Edit the deployment to increase CPU limits:
kubectl edit deployment -n <namespace> <deployment-name>Update the limits:
resources:
limits:
cpu: "2" # Increase from current limit
requests:
cpu: "1" # Set to your baselineNew pods will respect the updated limits:
kubectl rollout restart deployment -n <namespace> <deployment-name>Best practice is to set requests equal to or slightly above peak usage, and limits higher for burst traffic:
resources:
requests:
cpu: "500m" # What pod consistently needs
limits:
cpu: "1000m" # Max burst capacityThe difference between request and limit allows for temporary spikes without throttling.
Scale pods based on CPU usage instead of trying to give each pod unlimited CPU:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70When CPU usage exceeds 70%, Kubernetes adds more pod replicas.
Review the application for inefficiencies:
- Profile CPU usage: perf, py-spy, pprof
- Check for infinite loops or busy-wait patterns
- Reduce unnecessary computations in request paths
- Use caching (Redis, memcached) to avoid repeated work
- Batch operations to reduce per-request overhead
- Use async/await patterns for I/O-bound work
Example with pprof (Go):
import _ "net/http/pprof"
// Then visit http://localhost:6060/debug/pprof/After making changes, monitor performance:
# Watch real-time pod CPU:
kubectl top pod -n <namespace> -w
# Check metrics over longer periods:
# - API response times (p50, p95, p99)
# - Error rates
# - Throughput (requests/second)
# Verify throttling has stopped:
# (throttled_seconds should stop increasing)
container_cpu_cfs_throttled_seconds_totalPerformance should normalize within minutes.
Set up alerts to prevent future throttling:
# Prometheus alert:
- alert: PodCPUThrottling
expr: |
rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
for: 5m
annotations:
summary: "Pod {{ $labels.pod_name }} is being throttled"Monitor in Grafana:
- Graph: rate(container_cpu_cfs_throttled_seconds_total[5m])
- Set threshold alert at 10% throttling rate
### Understanding CFS Throttling
The Linux CFS scheduler divides CPU time into periods (default 100ms). When a container reaches its limit within a period, the kernel throttles it until the next period starts. This is why latency becomes unpredictable—requests that cross period boundaries experience severe delays.
### CPU Requests vs Limits
- Request: Guaranteed minimum CPU (used by scheduler for pod placement)
- Limit: Hard ceiling on CPU usage (enforced by kernel throttling)
Setting request=limit removes burst capacity. Better practice:
- request = 80th percentile of your usage
- limit = 95th percentile or 2x request
### QoS Classes
Kubernetes assigns QoS based on requests/limits:
- Guaranteed: request == limit (protected from eviction)
- Burstable: request < limit (can be evicted under pressure)
- BestEffort: no request or limit (evicted first)
### Removing CPU Limits
Some organizations remove CPU limits entirely to avoid throttling:
resources:
requests:
cpu: "500m"
# No limits - pod can burst up to node's CPUOnly safe if you have:
- Good node resource monitoring
- Pod Disruption Budgets configured
- Sufficient cluster headroom
### Multi-core Applications
Applications using multiple threads/processes should request multiple CPU:
resources:
requests:
cpu: "4" # 4 cores
limits:
cpu: "8" # Burst to 8 cores### Metric Interpretation
- High throttled_seconds = significant CPU contention
- Throttled_periods < throttled_seconds indicates severe throttling
- Zero throttling with low usage = limits set too high
### Alternative: Remove Limits Entirely
For non-noisy-neighbor environments:
resources:
requests:
cpu: "1"
# limits: omitted - no throttlingNo subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes
default backend - 404
How to fix "default backend - 404" in Kubernetes Ingress
serviceaccount cannot list resource
How to fix "serviceaccount cannot list resource" in Kubernetes