Cilium endpoint not ready errors occur when pods fail to establish network connectivity. Fix by checking node networking, verifying pod IP allocation, and investigating eBPF program issues.
In Cilium, an endpoint represents a pod's network interface state. When endpoints are not ready, pods cannot communicate with other pods, services, or external networks. This typically indicates issues with IP allocation, routing configuration, or eBPF program enforcement on the node.
List and examine endpoint states:
kubectl exec -it -n kube-system ds/cilium -- \
cilium-dbg endpoint list
kubectl exec -it -n kube-system ds/cilium -- \
cilium-dbg endpoint get <endpoint-id>Check if pod has valid IP assigned:
kubectl get pod <pod-name> -o wide
kubectl exec -it <pod-name> -- ip addr
kubectl exec -it <pod-name> -- ip routeIf no IP, the IPAM system failed allocation.
Observe packet flow and drops:
kubectl exec -it -n kube-system ds/cilium -- \
cilium monitor -t drop
kubectl exec -it -n kube-system ds/cilium -- \
cilium monitor --type=traceLook for packet drops and their reason codes.
Check host kernel forwarding configuration:
sysctl net.ipv4.ip_forward
sysctl net.ipv6.conf.all.forwarding
# Enable if needed:
sudo sysctl -w net.ipv4.ip_forward=1Verify direct connectivity between nodes:
# From one node
ping <other-node-ip>
# Test from pod
kubectl exec -it <pod-name> -- ping <pod-on-other-node>Check if eBPF programs are properly loaded:
kubectl exec -it -n kube-system ds/cilium -- \
cilium-dbg bpf listIf programs are missing, agent may not have started properly.
Identify if network policies are causing drops:
kubectl get networkpolicy --all-namespaces
kubectl describe networkpolicy <policy-name>Verify policies allow traffic between pods.
Force reinit of Cilium on node:
kubectl delete pod -n kube-system -l app=cilium \
--field-selector spec.nodeName=<node-name>
kubectl wait --for=condition=Ready pod \
-n kube-system -l app=cilium --timeout=300sFor production, use Cilium Hubble for continuous network monitoring. Test connectivity between every pair of nodes during cluster setup. Enable debug logging on Cilium agent for endpoint state machine tracking. For large clusters, use node selectors to limit Cilium deployment. Monitor endpoint creation/deletion metrics for high churn indicating instability.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm