DaemonSet rolling updates get stuck when new pods can't be scheduled, are crashing, or old pods won't terminate. Fix by debugging pod failures, freeing node resources, fixing the new template, or manually rolling out the update.
DaemonSet rolling updates update one pod per node at a time (respecting maxUnavailable). When an update gets stuck, it means: 1. The DaemonSet controller is trying to create a pod with the new template 2. The pod either can't be created, stays unready, or crashes 3. The controller won't proceed to the next node until the current pod is healthy Unlike deployments where you can leave failed pods, DaemonSets must maintain pod presence on each node, so a single broken pod blocks the entire rollout.
Get detailed rollout information:
# Check rollout status:
kubectl rollout status daemonset -n <namespace> <name>
# Get full DaemonSet details:
kubectl describe daemonset -n <namespace> <name> | tail -30
# Check which nodes are updated:
kubectl get pods -n <namespace> -l <label> -o wide | grep <daemonset>
# Look for status conditions
kubectl get daemonset -n <namespace> <name> -o yaml | grep -A 10 status:Verify the new template is correct:
# Get current DaemonSet definition:
kubectl get daemonset -n <namespace> <name> -o yaml > current.yaml
# Check image:
cat current.yaml | grep image:
# Check if image exists and is accessible:
kubectl run -it --rm debug --image=<new-image> -- echo "Image works"
# Check pod spec for obvious errors:
kubectl get daemonset -n <namespace> <name> -o json | jq .spec.template.specExamine why new pods are failing:
# Find a new pod that's stuck:
kubectl get pods -n <namespace> -l <label> -o wide
# Check pod status:
kubectl describe pod -n <namespace> <new-pod-name>
# Check logs:
kubectl logs -n <namespace> <new-pod-name> --all-containers=true
# If pod is crashing:
kubectl logs -n <namespace> <new-pod-name> --previous
# Check events for specific errors:
kubectl describe pod -n <namespace> <new-pod-name> | grep -A 20 Events:The error message will show the exact problem.
Verify nodes have capacity for new pods:
# Check all node resource status:
kubectl top nodes
# Detailed view of a specific node:
kubectl describe node <node-name>
# Check for resource pressure conditions:
kubectl get nodes -o json | jq '.items[] | {name:.metadata.name, conditions:.status.conditions[]}'
# If node is under pressure, free resources:
# - Drain and evict non-critical pods
# - Increase node size
# - Use cluster autoscaler to add nodesIf the new template has errors, fix and redeploy:
# Option 1: Rollback to previous version
kubectl rollout undo daemonset -n <namespace> <name>
# Option 2: Fix template and update
kubectl set image daemonset/<name> <container>=<new-image> -n <namespace>
# Option 3: Edit YAML directly
kubectl edit daemonset -n <namespace> <name>
# Fix the image or config, save and exit
# Verify the fix works:
kubectl rollout status daemonset -n <namespace> <name>After the fix, rollout should proceed.
If maxUnavailable=0, allow temporary unavailability:
kubectl edit daemonset -n <namespace> <name>Update updateStrategy:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Allow 1 pod unavailable per node during updateThis lets the old pod terminate before the new one must be ready.
If one node is stuck, manually update it:
# Find the stuck node:
kubectl get pods -n <namespace> -o wide | grep <old-image>
# Delete the old pod (forces DaemonSet to recreate it):
kubectl delete pod -n <namespace> <old-pod-name>
# Watch the new pod creation:
kubectl get pods -n <namespace> -w
# Wait for it to be Ready, then continue:
kubectl rollout status daemonset -n <namespace> <name>The DaemonSet controller will schedule a new pod with the new template.
Confirm all nodes are updated:
# Check rollout is complete:
kubectl rollout status daemonset -n <namespace> <name>
# Verify all pods have new image:
kubectl get pods -n <namespace> -l <label> -o wide --sort-by=.spec.containers[0].image
# Check DaemonSet metrics:
kubectl get daemonset -n <namespace> <name>
# Desired, Current, Ready should all match
# All pods should show the new image in 'IMAGE' columnIf complete, rollout succeeded.
### UpdateStrategy Options
Two strategies exist for DaemonSet updates:
- RollingUpdate: Gradual replacement (default)
- OnDelete: Only update when pod is manually deleted
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1### minReadySeconds Impact
minReadySeconds: 300 # Pod must be Ready for 5 minutes before proceedingHigh values slow updates. Typical values:
- 0: No wait (default)
- 10: Ensure pod is stable
- 30-60: For stateful DaemonSets
### Common Template Mistakes
1. Wrong image registry: Image pull fails
2. Missing environment variables: App won't start
3. Port changes: Readiness probe fails
4. Security context: Permission denied errors
5. ResourceLimits too low: OOMKilled or CPU throttled
### Rollback vs Forward Fix
If rollout is stuck:
- Rollback: kubectl rollout undo daemonset <name>
- Forward fix: Update template and reapply
Rollback is faster if the old version works. Forward fix if the old version is also broken.
### Monitoring Rollout Progress
# Watch updates as they happen:
kubectl get daemonset -n <namespace> -w
# Track pod updates per node:
watch kubectl get pods -n <namespace> -o wide | grep <daemonset>
# Use events to see what's happening:
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20### Performance Tuning
For faster rollouts on large clusters:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 5 # Update 5 nodes in parallel
minReadySeconds: 10 # Reduced from 30Balance speed against stability—too aggressive can cause cascading failures.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm