How to fix PIDPressure in Kubernetes

KubernetesINTERMEDIATEMEDIUM

PIDPressure indicates the node has exhausted its maximum number of processes (PIDs). New processes and containers cannot start when the PID limit is reached. This typically affects nodes running many concurrent processes or containers without proper isolation.

What this error means

When a Kubernetes node reaches its maximum PID limit, the kubelet sets PIDPressure=True. Linux kernels have a configurable maximum PID count (often /proc/sys/kernel/pid_max). When this limit is reached: 1. New processes cannot be spawned 2. Container startups fail with "cannot allocate memory" or "fork() failed" 3. New pods cannot start 4. The kubelet may become unresponsive This is distinct from memory pressure—the system has free memory but cannot create new process structures.

How to fix "PIDPressure"

1Identify which node has PID pressure

Check node status:

bash

kubectl get nodes
kubectl describe node <node-name> | grep PIDPressure

Look for "PIDPressure: True". Also check:

bash

kubectl get nodes -o custom-columns=NAME:.metadata.name,PID:.status.conditions[?(@.type=="PIDPressure")].status

2Check current PID usage on the node

SSH into the affected node:

bash

cat /proc/sys/kernel/pid_max  # Maximum PIDs allowed
ps aux | wc -l  # Approximate current process count
ps -eo pid,ppid,cmd | grep -c "^"  # More accurate count

Alternatively:

bash

egrep ^Threads /proc/[0-9]*/status | wc -l  # Total threads

If current processes exceed 85-90% of pid_max, action is needed before pressure triggers.

3Identify processes consuming most PIDs

Find which processes/containers use the most PIDs:

bash

for p in /proc/[0-9]*/; do
  echo "$(basename $p): $(cat $p/status 2>/dev/null | grep ^Threads | awk '{print $2}')"
done | sort -t: -k2 -rn | head -10

Or use pstree:

bash

pstree -p | head -50  # Process tree showing all PIDs
ps aux | awk '{print $2}" "$11}" | sort -u  # PIDs by command

Look for container processes spawning many children (e.g., shell, Python with multiprocessing).

4Check for zombie processes

Zombie processes consume PIDs until reaped:

bash

ps aux | grep "<defunct>"  # Show zombies
ps aux | grep -c "<defunct>"

To see which parent created zombies:

bash

ps -o ppid= -p $(pgrep -f "<defunct>" | head -1)

If many zombies exist, the parent process isn't reaping children. This is a bug in the application or container runtime:
- Restart the parent process
- Restart the container runtime (Docker/containerd)
- Kill the zombie's parent: kill -9 <ppid>

5Increase the system PID limit

Check and increase pid_max:

bash

cat /proc/sys/kernel/pid_max  # Current limit
sudo sysctl kernel.pid_max  # View via sysctl

Increase it:

bash

# Temporary (until reboot)
sudo sysctl -w kernel.pid_max=4194303  # Maximum value on 64-bit

# Permanent (survives reboot)
sudo echo "kernel.pid_max = 4194303" >> /etc/sysctl.d/99-kubernetes.conf
sudo sysctl -p /etc/sysctl.d/99-kubernetes.conf

Verify:

bash

cat /proc/sys/kernel/pid_max

For production, use this value in your node provisioning/IaC (Terraform, Ansible).

6Set per-container PID limits

Limit processes in individual pods to prevent runaway containers:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: limited-app
spec:
  securityContext:
    runAsUser: 1000
  containers:
  - name: app
    image: myapp
    resources:
      limits:
        cpu: "1"
        memory: "512Mi"
    # Note: No direct PID limit in K8s, use cgroups

For cgroup-based PID limits, edit container runtime config:

containerd (/etc/containerd/config.toml):

toml

[plugins."io.containerd.grpc.v1.cri".containerd]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    runtime_engine = ""
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      PidsLimit = 1024  # Limit per container

Docker (/etc/docker/daemon.json):

json

{
  "default-ulimits": {
    "nproc": {
      "Name": "nproc",
      "Hard": 1024,
      "Soft": 1024
    }
  }
}

Restart container runtime after changes.

7Kill or evict problematic pods

If a specific pod is consuming excessive PIDs, terminate it:

bash

# First, identify the pod
kubectl get pods --all-namespaces

# Delete the pod
kubectl delete pod <pod-name> -n <namespace>

For Deployments, the pod will be recreated. If you want to prevent recreation:

bash

kubectl delete deployment <deployment-name>

For temporary relief, scale down workloads:

bash

kubectl scale deployment <name> --replicas=0

Then investigate why the pod uses so many PIDs before scaling back up.

8Monitor and prevent future PID pressure

Set up monitoring and alerting:

bash

# Check PID usage regularly
watch -n 5 "ps aux | wc -l"

Enable kubelet event alerts:

bash

kubectl get events -A --sort-by=.metadata.creationTimestamp | grep PIDPressure

Add Prometheus monitoring:

promql

kubelet_node_status_condition{condition="PIDPressure",status="true"}

Implement resource limits on deployments:

yaml

resources:
  limits:
    cpu: "2"
    memory: "1Gi"

For applications that spawn many processes, use process pooling or async patterns to reduce PID usage. Review application logs for unexpected process creation.

How to fix PIDPressure in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "PIDPressure"

Advanced notes

Related errors

Official resources & further reading