The "Failed PreStop Hook" error occurs when the PreStop lifecycle hook times out or fails during pod termination. PreStop hooks run before SIGTERM to allow graceful shutdown (connection draining, cleanup), but if they fail, the container is force-killed, causing connection loss and potential data corruption.
When a pod is deleted or evicted, Kubernetes follows a termination sequence: 1. PreStop hook runs (if configured) - custom shutdown logic 2. SIGTERM signal sent to the container (PID 1) 3. terminationGracePeriodSeconds timeout (default 30s) 4. SIGKILL signal forces immediate termination If the PreStop hook fails or times out, the graceful shutdown sequence is interrupted. This causes: - Immediate SIGKILL without waiting for SIGTERM handling - In-flight requests dropped abruptly - Open connections closed without cleanup - Database transactions rolled back - Data loss for stateful applications
View detailed failure information:
# SSH to the node:
ssh <node-ip>
# Check kubelet logs:
sudo journalctl -u kubelet -n 100 | grep -i prestop
kubectl logs -n kube-system kubelet-<node> # If exposed as pod
# Or collect logs via Kubernetes:
kubectl debug node/<node-name> -it --image=ubuntu
cat /var/log/pods/*/kubelet.log | grep prestopLook for:
- "hook execution timeout"
- "command not found"
- "exit status 1"
Review the pod spec:
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 "preStop:"
kubectl get pod <pod-name> -n <namespace> -o yaml | grep "terminationGracePeriodSeconds"Example configuration:
spec:
terminationGracePeriodSeconds: 30 # Max time for graceful shutdown
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "curl http://localhost:8080/shutdown"]Check:
- Is the command correct?
- Does the grace period allow enough time?
- Does the command exist in the container?
Debug by running the hook directly:
# Run the pod in debug mode:
kubectl run test-hook --image=<app-image> -- sleep infinity
kubectl exec -it test-hook -- /bin/sh
# Test the PreStop command:
/bin/sh -c "curl http://localhost:8080/shutdown"
echo $? # Check exit code
# If using a script:
sh -x /app/graceful-shutdown.sh # Run with debug output
# Check if dependencies are reachable:
kubectl exec test-hook -- curl http://drain-service/drain
kubectl exec test-hook -- nc -zv database-host 5432The command must succeed (return 0).
Allow more time for PreStop hook:
# Patch the pod:
kubectl patch pod <pod-name> -n <namespace> -p '{"spec":{"terminationGracePeriodSeconds":60}}'
# Or edit the deployment:
kubectl edit deployment <name> -n <namespace>Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateful-app
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Increase from default 30
containers:
- name: app
image: myapp:latest
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
echo "Waiting for load balancer to drain..."
sleep 15
/app/graceful-shutdown.shBalance: high grace period → slow pod eviction, low → forced termination.
Make the hook robust:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
#!/bin/sh
set -e # Exit on any error
set -x # Log commands
echo "PreStop hook starting at $(date)"
# Step 1: Signal graceful shutdown
curl -s http://localhost:8080/actuator/shutdown || true
# Step 2: Wait for in-flight requests
sleep 10
# Step 3: Close connections
kill -TERM 1 2>/dev/null || true
echo "PreStop hook completed"
exit 0Use || true to make steps non-fatal if not critical.
Verify required services are reachable:
# If PreStop calls a drain endpoint, ensure it's reachable:
kubectl exec <pod> -- curl -v http://drain-service/drain
# Check network policy doesn't block outbound:
kubectl get networkpolicy -A
# Verify the service exists:
kubectl get svc <drain-service> -n <namespace>
# Test from pod:
kubectl exec <pod> -- nslookup drain-service
kubectl exec <pod> -- nc -zv drain-service 8080If services are unreachable, the hook will timeout. Use fallback logic:
command:
- /bin/sh
- -c
- |
# Try to drain gracefully
curl -s http://drain-service/drain || echo "Drain service unavailable"
# Continue with local cleanup regardless
/app/cleanup.shFor simple cleanup (HTTP endpoint), use httpGet instead of exec:
lifecycle:
preStop:
httpGet:
path: /graceful-shutdown
port: 8080
scheme: HTTPThis is simpler and more reliable than exec:
- Automatic timeout handling
- Built-in retry logic
- Better Kubernetes integration
Downside: only works for HTTP endpoints. For complex logic, use exec.
Coordinate shutdown with load balancers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: graceful-app
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
image: myapp:latest
readinessProbe: # Used by load balancer
httpGet:
path: /ready
port: 8080
periodSeconds: 5
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# Mark pod as not ready for load balancer
# (load balancer stops sending traffic)
sleep 15
# Now drain existing connections
/app/drain.sh
sleep 10The readiness probe returning failure signals the pod is shutting down.
PreStop hooks are critical for graceful shutdown. A failed PreStop hook causes immediate SIGKILL, defeating the entire purpose of the grace period. Always set a reasonable terminationGracePeriodSeconds (30-60s for most apps). For connection draining, the sequence is: mark NotReady → wait for in-flight requests → SIGTERM → cleanup → SIGKILL. Never block the PreStop hook indefinitely; always set a timeout shorter than the grace period. For databases, use connPools with timeout; don't rely solely on PreStop cleanup. In Kubernetes 1.28+, you can use preStop.sleep for simple delays. For load balancer integration, coordinate with the ingress controller or service mesh (e.g., Istio graceful termination). Always log PreStop hook execution so you can debug failures. Use Pod Disruption Budgets (PDB) to ensure at least N pods remain running during evictions, reducing graceful shutdown failures.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm