The "Memory resource exceeded" error occurs when a pod uses more memory than its limit, causing the kernel to OOMKill (out-of-memory kill) the container. This terminates the pod immediately, disrupting services and data processing. Requires investigating memory leaks and adjusting limits appropriately.
Memory is an incompressible resource in Kubernetes. When a pod exceeds its memory limit (set via `limits.memory`), the Linux kernel's OOM (Out-Of-Memory) killer forcibly terminates the container to prevent the entire node from running out of memory. Unlike CPU throttling which slows performance gracefully, OOMKill is sudden and can cause: - Immediate pod termination without graceful shutdown - Data loss if the application was writing to disk - Connection resets for clients - The pod restarts (if restart policy allows) This creates a crash loop if the memory issue isn't fixed.
Check what memory is allocated:
# View memory limit and usage:
kubectl describe pod <pod-name> -n <namespace> | grep -E "Limits|Requests" -A 3
# More detailed:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'
# Current usage:
kubectl top pods <pod-name> -n <namespace>
# Watch memory growth:
kubectl top pods <pod-name> -n <namespace> -wExample limit: 512Mi (megabytes) or 1Gi (gigabytes)
Determine if memory is leaking or legitimately needed:
# For Java applications, capture heap dump:
kubectl exec -it <pod-name> -n <namespace> -- jmap -dump:live,format=b,file=/tmp/heap.bin <pid>
kubectl cp <pod-name>:/tmp/heap.bin /tmp/heap.bin -n <namespace>
# Analyze with JProfiler or similar tool
# For Node.js, capture heap snapshot:
kubectl exec <pod-name> -n <namespace> -- node --inspect &
# Then use Chrome DevTools
# For Python:
kubectl exec <pod-name> -n <namespace> -- pip install memory-profiler
kubectl exec <pod-name> -n <namespace> -- python -m memory_profiler myapp.py
# Generic approach - check memory over time:
watch 'kubectl top pod <pod-name> -n <namespace>'If memory increases indefinitely without processing more work, it's a leak.
Check if recent code introduced the memory issue:
# View git history for recent changes:
git log --oneline -n 20
git diff HEAD~5..HEAD # Changes in last 5 commits
# Look for:
# - New caching without eviction policies
# - Loading entire files/datasets into memory
# - Accumulating collections (lists, maps) in global scope
# - Missing close/cleanup of resourcesRevert problematic changes:
git revert <commit-hash>Increase the memory limit to prevent crashes:
# For a deployment:
kubectl set resources deployment <name> -n <namespace> --limits=memory=2Gi
# Or patch:
kubectl patch deployment <name> -n <namespace> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"limits":{"memory":"2Gi"}}}]}}}}'
# Or edit YAML:
kubectl edit deployment <name> -n <namespace>
# Change: limits.memory from 512Mi to 2Gi
kubectl apply -f deployment.yamlNote: This is temporary while debugging. Address root cause (memory leak) to find permanent fix.
Add runtime memory monitoring:
Java:
MemoryMXBean memBean = ManagementFactory.getMemoryMXBean();
MemoryUsage heapUsage = memBean.getHeapMemoryUsage();
long maxMemory = heapUsage.getMax() / (1024 * 1024); // MB
long usedMemory = heapUsage.getUsed() / (1024 * 1024);
System.out.println("Heap: " + usedMemory + "/" + maxMemory);Node.js:
const used = process.memoryUsage();
console.log(`Memory: ${Math.round(used.heapUsed / 1024 / 1024)} MB`);Python:
import psutil
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / 1024 / 1024 # MB
print(f"Memory: {mem} MB")Log memory periodically to detect growth patterns.
Reduce memory consumption:
Streaming instead of loading:
# Bad: loads entire file into memory
data = open('large_file.csv').read()
# Good: read line by line
with open('large_file.csv') as f:
for line in f:
process(line)Pagination instead of loading all:
# Bad: SELECT * FROM users
users = db.query(User).all() # Million users in memory!
# Good: paginate
page = db.query(User).offset(0).limit(100).all()Cache eviction:
from functools import lru_cache
@lru_cache(maxsize=100) # Limit cache size
def expensive_operation(x):
return compute(x)Avoid global caches that grow unbounded.
Configure memory request to match actual need:
kubectl set resources deployment <name> -n <namespace> --requests=memory=512Mi
# Or edit:
kubectl edit deployment <name> -n <namespace>Guideline:
- Request: Amount needed for normal operation (used by scheduler)
- Limit: Request + headroom for spikes (hard cap)
Example:
resources:
requests:
memory: 512Mi # Scheduler reserves this
limits:
memory: 1Gi # Cannot exceed thisIf request is too high, pod won't fit on nodes. If too low, scheduler doesn't reserve enough.
Set up continuous monitoring:
# Enable metrics in Prometheus:
kubectl apply -f https://github.com/prometheus/prometheus/raw/main/documentation/examples/prometheus.yml
# Query memory usage:
# PromQL: container_memory_usage_bytes{pod="<pod>"}
# Or use kubectl:
kubectl top pods -n <namespace> --sort-by=memory -w
# Create alerts for memory threshold:
# Alert when pod uses > 80% of limit
alert = "container_memory_usage_bytes / (10^6) > limit * 0.8"Track memory growth over weeks to predict future issues.
OOMKill is the kernel's last resort to protect the node. The exit code 137 or reason "OOMKilled" is definitive. Unlike CPU which can be throttled, memory must be forcibly freed. For JVM apps, set -Xmx to slightly less than container limit (e.g., 900Mi limit → -Xmx512m). Language runtimes consume overhead beyond heap: JVM metadata, thread stacks, etc. can add 200-400MB. In production, avoid setting limit == request; always leave headroom for spikes. Use resources.limits as a safety net, not the normal operating limit. For batch processing, use pagination or streaming; never load entire datasets. Implement health checks that monitor memory and gracefully shut down if approaching limit. Use LimitRanges to enforce minimum and maximum memory across namespaces. Vertical Pod Autoscaler (VPA) can recommend optimal limits based on historical usage.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm