A failing startup probe prevents a container from ever becoming ready. The startup probe (introduced in Kubernetes 1.18) checks if an application has finished starting up. If it fails consistently, the container is killed and restarted after failureThreshold failures.
The startup probe (added in K8s 1.18) is specifically designed for slow-starting applications. It runs before liveness and readiness probes: 1. **Startup probe** runs first—checks if app has finished initializing 2. **Liveness probe** runs continuously—checks if app is alive 3. **Readiness probe** runs continuously—checks if ready for traffic If startup probe fails: - Liveness and readiness probes don't run yet - Container gets more time to start (failureThreshold × periodSeconds) - If still failing after threshold, container is killed and restarted This is better than high initialDelaySeconds on liveness/readiness for slow-starting apps.
View the probe definition:
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 "startupProbe:"
kubectl describe pod <pod-name> -n <namespace>Example configuration:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # Retry 30 times
periodSeconds: 10 # Every 10 seconds
# Total: 30 * 10 = 300 seconds (5 minutes) before giving upCalculate total time: failureThreshold × periodSeconds.
Measure how long the app takes to become healthy:
# Watch logs during startup:
kubectl logs <pod-name> -n <namespace> -f
# Look for startup completion messages:
grep -i "started\|listening\|ready\|initialization complete" /tmp/app.log
# Time the startup:
time curl http://localhost:8080/health
# Or monitor resource usage during startup:
watch 'kubectl top pod <pod-name> -n <namespace>'Note the time when the health endpoint first responds successfully.
Verify required services exist before app startup:
# From inside the pod:
kubectl exec -it <pod-name> -n <namespace> -- sh
# Check database:
nc -zv database-host 5432
psql -h database-host -U user -d mydb -c "SELECT 1"
# Check message queue:
nc -zv rabbitmq-host 5672
# Check cache:
redis-cli -h redis-host ping
# Check DNS resolution:
nslookup database-host
# Verify network connectivity:
ping -c 2 database-hostIf dependencies are unavailable, startup probe will fail.
Allow more retries for slow-starting applications:
# Patch the deployment:
kubectl patch deployment <name> -n <namespace> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","startupProbe":{"failureThreshold":60}}]}}}}'
# Or edit YAML:
kubectl edit deployment <name> -n <namespace>Example:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 60 # Retry 60 times
periodSeconds: 5 # Every 5 seconds
timeoutSeconds: 10 # Timeout after 10 seconds
# Total: 60 * 5 = 300 seconds (5 minutes) before giving upIncreasing failureThreshold gives more time for initialization.
Verify the health check responds appropriately during startup:
# Option 1: Run pod with sleep to test during startup:
kubectl run test-pod --image=<app-image> -- sleep 1000
kubectl exec -it test-pod -- sh
# Option 2: Check logs to see when health endpoint responds:
kubectl logs <pod-name> -n <namespace> | grep -i health
# Option 3: Add startup logging to your app:
echo "App starting at $(date)" >> /tmp/startup.log
sleep 30 # Simulate initialization
echo "App ready at $(date)" >> /tmp/startup.logThe health endpoint must return failure (or timeout) until initialization completes.
Create an endpoint that returns success only when fully initialized:
# Python/Flask example:
import threading
from flask import Flask, jsonify
app = Flask(__name__)
initialization_complete = False
def initialize_app():
global initialization_complete
print("Starting initialization...")
# Simulate slow initialization
import time
time.sleep(10)
# Connect to database
db.connect()
# Load configuration
load_config()
# Warm up cache
warm_cache()
print("Initialization complete")
initialization_complete = True
# Start initialization in background
thread = threading.Thread(target=initialize_app, daemon=False)
thread.start()
@app.route('/health', methods=['GET'])
def health():
if initialization_complete:
return jsonify({'status': 'healthy'}), 200
else:
return jsonify({'status': 'initializing'}), 503 # 503 = not ready yetReturn 200 only when fully initialized. Use 503 while initializing.
Run setup tasks before the main container starts:
apiVersion: v1
kind: Pod
metadata:
name: app-with-init
spec:
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z database-host 5432; do echo waiting for db; sleep 2; done']
- name: migrate-db
image: myapp:latest
command: ['sh', '-c', './migrate.sh']
containers:
- name: app
image: myapp:latest
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 2
# Now app only needs 60 seconds max (initialization already done)Init containers complete before the main container starts, reducing startup probe time.
Check that the pod has enough CPU and memory to initialize:
# View current resource constraints:
kubectl describe pod <pod-name> -n <namespace> | grep -E "Limits|Requests"
# Increase resources:
kubectl set resources deployment <name> -n <namespace> \
--requests=cpu=500m,memory=512Mi \
--limits=cpu=1,memory=1Gi
# Or edit YAML:resources:
requests:
cpu: 500m # Needed for scheduling
memory: 512Mi
limits:
cpu: 1000m # Max allowed
memory: 1GiInsufficicient resources cause slow startup, making probe fail.
Startup probes are ideal for slow-starting applications (JVM, .NET, Node.js) without needing high initialDelaySeconds on liveness/readiness. The startup probe only runs until first success, then liveness and readiness take over. Once startup completes, startup probe is ignored. Use startup probes for apps that take > 30 seconds to initialize; otherwise, liveness/readiness with high initialDelaySeconds is sufficient. Startup probe periodSeconds is usually low (2-5s) to detect startup completion quickly. For containerized Java/Spring Boot, startup can take 30-60 seconds on cold start and 5-10 seconds on warm start (due to JIT compilation and class loading). Use JVM tuning flags (-XX:TieredStopAtLevel=1) to speed up startup. In Kubernetes < 1.18, use high initialDelaySeconds on liveness probe as a workaround.
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes
default backend - 404
How to fix "default backend - 404" in Kubernetes Ingress
serviceaccount cannot list resource
How to fix "serviceaccount cannot list resource" in Kubernetes