How to fix Failing Startup Probe in Kubernetes

KubernetesBEGINNERMEDIUM

A failing startup probe prevents a container from ever becoming ready. The startup probe (introduced in Kubernetes 1.18) checks if an application has finished starting up. If it fails consistently, the container is killed and restarted after failureThreshold failures.

What this error means

The startup probe (added in K8s 1.18) is specifically designed for slow-starting applications. It runs before liveness and readiness probes: 1. **Startup probe** runs first—checks if app has finished initializing 2. **Liveness probe** runs continuously—checks if app is alive 3. **Readiness probe** runs continuously—checks if ready for traffic If startup probe fails: - Liveness and readiness probes don't run yet - Container gets more time to start (failureThreshold × periodSeconds) - If still failing after threshold, container is killed and restarted This is better than high initialDelaySeconds on liveness/readiness for slow-starting apps.

How to fix "Failing Startup Probe"

1Check startup probe configuration

View the probe definition:

bash

kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 "startupProbe:"
kubectl describe pod <pod-name> -n <namespace>

Example configuration:

yaml

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30  # Retry 30 times
  periodSeconds: 10      # Every 10 seconds
  # Total: 30 * 10 = 300 seconds (5 minutes) before giving up

Calculate total time: failureThreshold × periodSeconds.

2Monitor application startup time

Measure how long the app takes to become healthy:

bash

# Watch logs during startup:
kubectl logs <pod-name> -n <namespace> -f

# Look for startup completion messages:
grep -i "started\|listening\|ready\|initialization complete" /tmp/app.log

# Time the startup:
time curl http://localhost:8080/health

# Or monitor resource usage during startup:
watch 'kubectl top pod <pod-name> -n <namespace>'

Note the time when the health endpoint first responds successfully.

3Check if startup dependencies are available

Verify required services exist before app startup:

bash

# From inside the pod:
kubectl exec -it <pod-name> -n <namespace> -- sh

# Check database:
nc -zv database-host 5432
psql -h database-host -U user -d mydb -c "SELECT 1"

# Check message queue:
nc -zv rabbitmq-host 5672

# Check cache:
redis-cli -h redis-host ping

# Check DNS resolution:
nslookup database-host

# Verify network connectivity:
ping -c 2 database-host

If dependencies are unavailable, startup probe will fail.

4Increase startup probe failureThreshold

Allow more retries for slow-starting applications:

bash

# Patch the deployment:
kubectl patch deployment <name> -n <namespace> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","startupProbe":{"failureThreshold":60}}]}}}}'

# Or edit YAML:
kubectl edit deployment <name> -n <namespace>

Example:

yaml

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 60      # Retry 60 times
  periodSeconds: 5          # Every 5 seconds
  timeoutSeconds: 10        # Timeout after 10 seconds
  # Total: 60 * 5 = 300 seconds (5 minutes) before giving up

Increasing failureThreshold gives more time for initialization.

5Test startup endpoint before startup completes

Verify the health check responds appropriately during startup:

bash

# Option 1: Run pod with sleep to test during startup:
kubectl run test-pod --image=<app-image> -- sleep 1000
kubectl exec -it test-pod -- sh

# Option 2: Check logs to see when health endpoint responds:
kubectl logs <pod-name> -n <namespace> | grep -i health

# Option 3: Add startup logging to your app:
echo "App starting at $(date)" >> /tmp/startup.log
sleep 30  # Simulate initialization
echo "App ready at $(date)" >> /tmp/startup.log

The health endpoint must return failure (or timeout) until initialization completes.

6Implement a startup health check endpoint

Create an endpoint that returns success only when fully initialized:

python

# Python/Flask example:
import threading
from flask import Flask, jsonify

app = Flask(__name__)
initialization_complete = False

def initialize_app():
    global initialization_complete
    print("Starting initialization...")
    # Simulate slow initialization
    import time
    time.sleep(10)
    # Connect to database
    db.connect()
    # Load configuration
    load_config()
    # Warm up cache
    warm_cache()
    print("Initialization complete")
    initialization_complete = True

# Start initialization in background
thread = threading.Thread(target=initialize_app, daemon=False)
thread.start()

@app.route('/health', methods=['GET'])
def health():
    if initialization_complete:
        return jsonify({'status': 'healthy'}), 200
    else:
        return jsonify({'status': 'initializing'}), 503  # 503 = not ready yet

Return 200 only when fully initialized. Use 503 while initializing.

7Use init containers for pre-startup setup

Run setup tasks before the main container starts:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: app-with-init
spec:
  initContainers:
  - name: wait-for-db
    image: busybox
    command: ['sh', '-c', 'until nc -z database-host 5432; do echo waiting for db; sleep 2; done']
  
  - name: migrate-db
    image: myapp:latest
    command: ['sh', '-c', './migrate.sh']
  
  containers:
  - name: app
    image: myapp:latest
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30
      periodSeconds: 2
      # Now app only needs 60 seconds max (initialization already done)

Init containers complete before the main container starts, reducing startup probe time.

8Ensure sufficient resources for startup

Check that the pod has enough CPU and memory to initialize:

bash

# View current resource constraints:
kubectl describe pod <pod-name> -n <namespace> | grep -E "Limits|Requests"

# Increase resources:
kubectl set resources deployment <name> -n <namespace> \
  --requests=cpu=500m,memory=512Mi \
  --limits=cpu=1,memory=1Gi

# Or edit YAML:

yaml

resources:
  requests:
    cpu: 500m        # Needed for scheduling
    memory: 512Mi
  limits:
    cpu: 1000m       # Max allowed
    memory: 1Gi

Insufficicient resources cause slow startup, making probe fail.

How to fix Failing Startup Probe in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "Failing Startup Probe"

Advanced notes

Related errors

Official resources & further reading