How to fix Failed PreStop Hook in Kubernetes

KubernetesINTERMEDIATEHIGH

The "Failed PreStop Hook" error occurs when the PreStop lifecycle hook times out or fails during pod termination. PreStop hooks run before SIGTERM to allow graceful shutdown (connection draining, cleanup), but if they fail, the container is force-killed, causing connection loss and potential data corruption.

What this error means

When a pod is deleted or evicted, Kubernetes follows a termination sequence: 1. PreStop hook runs (if configured) - custom shutdown logic 2. SIGTERM signal sent to the container (PID 1) 3. terminationGracePeriodSeconds timeout (default 30s) 4. SIGKILL signal forces immediate termination If the PreStop hook fails or times out, the graceful shutdown sequence is interrupted. This causes: - Immediate SIGKILL without waiting for SIGTERM handling - In-flight requests dropped abruptly - Open connections closed without cleanup - Database transactions rolled back - Data loss for stateful applications

How to fix "Failed PreStop Hook"

1Check kubelet logs for PreStop hook failures

View detailed failure information:

bash

# SSH to the node:
ssh <node-ip>

# Check kubelet logs:
sudo journalctl -u kubelet -n 100 | grep -i prestop
kubectl logs -n kube-system kubelet-<node>  # If exposed as pod

# Or collect logs via Kubernetes:
kubectl debug node/<node-name> -it --image=ubuntu
cat /var/log/pods/*/kubelet.log | grep prestop

Look for:
- "hook execution timeout"
- "command not found"
- "exit status 1"

2Verify the PreStop hook command and timeout

Review the pod spec:

bash

kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 "preStop:"
kubectl get pod <pod-name> -n <namespace> -o yaml | grep "terminationGracePeriodSeconds"

Example configuration:

yaml

spec:
  terminationGracePeriodSeconds: 30  # Max time for graceful shutdown
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "curl http://localhost:8080/shutdown"]

Check:
- Is the command correct?
- Does the grace period allow enough time?
- Does the command exist in the container?

3Test the PreStop hook command manually

Debug by running the hook directly:

bash

# Run the pod in debug mode:
kubectl run test-hook --image=<app-image> -- sleep infinity
kubectl exec -it test-hook -- /bin/sh

# Test the PreStop command:
/bin/sh -c "curl http://localhost:8080/shutdown"
echo $?  # Check exit code

# If using a script:
sh -x /app/graceful-shutdown.sh  # Run with debug output

# Check if dependencies are reachable:
kubectl exec test-hook -- curl http://drain-service/drain
kubectl exec test-hook -- nc -zv database-host 5432

The command must succeed (return 0).

4Increase terminationGracePeriodSeconds if needed

Allow more time for PreStop hook:

bash

# Patch the pod:
kubectl patch pod <pod-name> -n <namespace> -p '{"spec":{"terminationGracePeriodSeconds":60}}'

# Or edit the deployment:
kubectl edit deployment <name> -n <namespace>

Example:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateful-app
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60  # Increase from default 30
      containers:
      - name: app
        image: myapp:latest
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/sh
                - -c
                - |
                  echo "Waiting for load balancer to drain..."
                  sleep 15
                  /app/graceful-shutdown.sh

Balance: high grace period → slow pod eviction, low → forced termination.

5Fix the PreStop hook command with proper error handling

Make the hook robust:

yaml

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - |
          #!/bin/sh
          set -e  # Exit on any error
          set -x  # Log commands
          
          echo "PreStop hook starting at $(date)"
          
          # Step 1: Signal graceful shutdown
          curl -s http://localhost:8080/actuator/shutdown || true
          
          # Step 2: Wait for in-flight requests
          sleep 10
          
          # Step 3: Close connections
          kill -TERM 1 2>/dev/null || true
          
          echo "PreStop hook completed"
          exit 0

Use || true to make steps non-fatal if not critical.

6Ensure PreStop hook dependencies are available

Verify required services are reachable:

bash

# If PreStop calls a drain endpoint, ensure it's reachable:
kubectl exec <pod> -- curl -v http://drain-service/drain

# Check network policy doesn't block outbound:
kubectl get networkpolicy -A

# Verify the service exists:
kubectl get svc <drain-service> -n <namespace>

# Test from pod:
kubectl exec <pod> -- nslookup drain-service
kubectl exec <pod> -- nc -zv drain-service 8080

If services are unreachable, the hook will timeout. Use fallback logic:

yaml

command:
  - /bin/sh
  - -c
  - |
    # Try to drain gracefully
    curl -s http://drain-service/drain || echo "Drain service unavailable"
    # Continue with local cleanup regardless
    /app/cleanup.sh

7Use httpGet hook for simpler cases

For simple cleanup (HTTP endpoint), use httpGet instead of exec:

yaml

lifecycle:
  preStop:
    httpGet:
      path: /graceful-shutdown
      port: 8080
      scheme: HTTP

This is simpler and more reliable than exec:
- Automatic timeout handling
- Built-in retry logic
- Better Kubernetes integration

Downside: only works for HTTP endpoints. For complex logic, use exec.

8Monitor graceful shutdown with readiness probes

Coordinate shutdown with load balancers:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: graceful-app
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: app
        image: myapp:latest
        
        readinessProbe:  # Used by load balancer
          httpGet:
            path: /ready
            port: 8080
          periodSeconds: 5
        
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/sh
                - -c
                - |
                  # Mark pod as not ready for load balancer
                  # (load balancer stops sending traffic)
                  sleep 15
                  # Now drain existing connections
                  /app/drain.sh
                  sleep 10

The readiness probe returning failure signals the pod is shutting down.

How to fix Failed PreStop Hook in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "Failed PreStop Hook"

Advanced notes

Related errors

Official resources & further reading