Felix is Calico's policy engine. "Felix not ready" errors indicate liveness probe failures due to timeouts or dataplane issues. Fix by reviewing logs, adjusting probe timings, and ensuring dataplane health.
Felix is Calico's main policy enforcement engine running on every node. When Felix reports not ready or not live, it means the liveness/readiness watchdog timers have not received check-ins from critical internal loops. This prevents the Calico node from becoming ready, blocking pod scheduling.
Review detailed Felix error messages:
kubectl logs -n calico-system ds/calico-node -c calico-node --tail=100Look for "report timed out" messages indicating which component is slow.
Review probe settings in the daemonset:
kubectl get daemonset -n calico-system calico-node -o yaml | grep -A 10 "livenessProbe|readinessProbe"Default timeouts may be too aggressive for slow systems.
If system is slow, increase the probe timeout:
kubectl patch daemonset calico-node -n calico-system --type json \
-p '[{"op": "replace", "path": "/spec/template/spec/containers/0/livenessProbe/timeoutSeconds", "value": 10}]'Verify CPU and memory are not exhausted:
kubectl top nodes
kubectl top pod -n calico-system ds/calico-nodeIf resources are high, add more CPU/memory to the node.
Get more detailed logs by adjusting Felix configuration:
kubectl edit configmap -n calico-system felix-configSet logSeverityScreen to Debug:
logSeverityScreen: DebugUse calicoctl to check component health:
kubectl exec -it -n calico-system ds/calico-node -- \
wget -q -O- http://localhost:9099/livenessReturns 200 if live, 503 if not.
Ensure kernel supports Felix requirements:
uname -r
grep -i ebpf /proc/versionMinimum kernel version depends on Calico version. Check compatibility matrix.
If logs show intermittent issues, try restarting:
kubectl rollout restart daemonset/calico-node -n calico-system
kubectl rollout status daemonset/calico-node -n calico-systemFor production clusters, monitor Felix health proactively using Prometheus metrics. Very large clusters (100+ nodes) with thousands of policies may have slow Felix loops - consider splitting policies or using policy templates. If Felix keeps timing out despite adjustments, upgrade Calico to latest version as performance improvements may address the issue. Track kernel compatibility as eBPF features and requirements evolve with Calico versions.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm