DaemonSet pods fail readiness checks due to liveness/readiness probe misconfiguration, application startup delays, or resource constraints. Fix by adjusting probe parameters, increasing initialDelaySeconds, or fixing the underlying application issue.
A DaemonSet pod that's Running but not Ready means: 1. Pod container is actually executing 2. Readiness or liveness probe is failing 3. kubelet marks pod as NotReady For DaemonSets, this is particularly problematic because: - Traffic doesn't get routed to the pod (if it's a service) - Dependent components think the pod is broken - Orchestration tools see the DaemonSet as unhealthy Unlike Deployments which can over-provision replicas, DaemonSets must have at least one pod per node, so NotReady pods leave gaps in coverage.
View the current probe configuration:
# Get DaemonSet YAML:
kubectl get daemonset -n <namespace> <name> -o yaml
# Look for readinessProbe and livenessProbe:
kubectl get daemonset -n <namespace> <name> -o yaml | grep -A 10 "readinessProbe\|livenessProbe"
# Check exact probe settings:
# - initialDelaySeconds
# - timeoutSeconds
# - periodSeconds
# - failureThresholdExamine what's happening when probe fails:
# Check pod logs:
kubectl logs -n <namespace> <pod-name>
# Check probe endpoint directly:
kubectl exec -n <namespace> <pod-name> -- curl -v http://localhost:8080/health
# Check application startup time:
kubectl logs -n <namespace> <pod-name> | head -20
# Look for how long app takes to be readyGive the application more time to start:
kubectl edit daemonset -n <namespace> <name>Update the readiness probe:
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Increased from 10
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3New pods will apply these settings:
# Force rollout:
kubectl rollout restart daemonset -n <namespace> <name>Allow more time for probe endpoint to respond:
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
timeoutSeconds: 10 # Increased from 1
periodSeconds: 10
failureThreshold: 5 # Increased from 3This gives the app 10 seconds to respond, and requires 5 consecutive failures before marking NotReady.
Verify the probe endpoint works:
# Get pod IP:
kubectl get pod -n <namespace> <pod-name> -o jsonpath='{.status.podIP}'
# Test endpoint from another pod:
kubectl run debug --rm -it --image=alpine -- sh
# Inside pod:
apk add curl
curl -v http://<pod-ip>:8080/health
# Or exec directly:
kubectl exec -n <namespace> <pod-name> -- curl -v http://localhost:8080/health
# Check response code - should be 200If endpoint returns non-200, the application isn't actually ready.
Verify the pod has enough resources:
# Check resource limits:
kubectl describe pod -n <namespace> <pod-name> | grep -A 3 "Limits\|Requests"
# Check if pod is hitting limits:
kubectl top pod -n <namespace> <pod-name>
# Check node for resource pressure:
kubectl describe node <node-name> | grep -A 5 "Allocated resources\|Conditions"
# If resources are tight, increase limits:
kubectl edit daemonset -n <namespace> <name>Insufficient CPU/memory can slow startup significantly.
If the application itself is broken:
# Check application logs for errors:
kubectl logs -n <namespace> <pod-name> --previous # If pod crashed and restarted
# Common issues:
# - Missing environment variables
# - Database connection failures
# - Configuration loading errors
# - Port already in use
# Set/verify environment variables:
kubectl get daemonset -n <namespace> <name> -o yaml | grep -A 10 env:
# Check if port is already in use:
kubectl exec -n <namespace> <pod-name> -- netstat -tlnp | grep 8080Fix the application issues, then redeploy.
After making changes, confirm the fix:
# Watch pod transitions:
kubectl get pods -n <namespace> -w | grep <pod-name>
# Check if pod becomes Ready:
kubectl get pods -n <namespace> <pod-name> -o wide
# Monitor probe results in events:
kubectl describe pod -n <namespace> <pod-name> | tail -20
# Once ready, verify all DaemonSet pods are ready:
kubectl get daemonset -n <namespace> <name>
# READY column should show N/N where N is number of nodes### Probe Types and Defaults
Kubernetes supports three probe types:
- httpGet: HTTP request (most common for web apps)
- tcpSocket: TCP connection check
- exec: Run command in container
# httpGet example:
readinessProbe:
httpGet:
path: /ready
port: 8080
scheme: HTTP
# tcpSocket example:
readinessProbe:
tcpSocket:
port: 3306
# exec example:
readinessProbe:
exec:
command:
- /bin/sh
- -c
- redis-cli ping### Liveness vs Readiness
- Liveness: Is the container still alive? Restart if fails
- Readiness: Can the container handle traffic? Remove from service if fails
For DaemonSets (not usually in a service), readiness is less critical than for Deployments.
### Debug Configuration
For debugging, temporarily disable readiness probe:
readinessProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 999999 # Effectively disabledThis lets the pod be marked Ready while you debug the actual issue.
### Startup Probe (Kubernetes 1.18+)
For very slow-starting applications:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 * 10s = 5 minutes max startup
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5Startup probe gates readiness probe—gives app time to start.
### Typical Probe Values
For different application types:
- Fast startup (1-5s): initialDelaySeconds=10, timeoutSeconds=1
- Moderate startup (10-30s): initialDelaySeconds=30, timeoutSeconds=5
- Slow startup (30-60s): initialDelaySeconds=60, timeoutSeconds=10
### Connection Refused Errors
If probe gets "connection refused":
1. Pod might not be listening yet
2. Port in probe config might be wrong
3. Application might be listening on different interface
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes
default backend - 404
How to fix "default backend - 404" in Kubernetes Ingress
serviceaccount cannot list resource
How to fix "serviceaccount cannot list resource" in Kubernetes