A Flannel backend error occurs when the pod networking plugin fails to set up or maintain the overlay network. Flannel provides cross-node pod communication via VXLAN, UDP, or host-gw backends. Backend failures prevent pods from communicating across nodes, causing total network isolation.
Flannel is a networking plugin that creates an overlay network for pod-to-pod communication. Common backends: - **VXLAN**: Virtual Extensible LAN (most common, UDP 4789) - **UDP**: User-space UDP tunneling (older, slower) - **host-gw**: Host gateway (direct routing, requires Layer 2 connectivity) - **aws**: AWS-specific, uses EC2 APIs When the backend fails: - Flannel daemon crashes or stops - Overlay network is not established - Pods on different nodes cannot communicate - Services fail because traffic can't reach backend pods
Verify Flannel is running on all nodes:
kubectl get pods -n kube-flannel # Or -n kube-system depending on installation
kubectl get pods -n kube-flannel -o wide # Shows which nodes have Flannel
# Check if DaemonSet is deployed:
kubectl get daemonset -n kube-flannel
# Describe the Flannel pod:
kubectl describe pod -n kube-flannel -l app=flannel
# Check logs:
kubectl logs -n kube-flannel -l app=flannel --tail=100
kubectl logs -n kube-flannel -l app=flannel --previous # Previous crashed instanceFlannel must be running on every node.
View the configured backend:
# Check ConfigMap with network config:
kubectl get cm -n kube-flannel
kubectl describe cm -n kube-flannel kube-flannel-cfg
# View the actual configuration:
kubectl get cm kube-flannel-cfg -n kube-flannel -o jsonpath='{.data.net-conf\.json}' | jq .
# Expected output:
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan",
"VNI": 1,
"Port": 4789
}
}Note the backend type and any custom settings.
Test basic node-to-node communication:
# Get node IPs:
kubectl get nodes -o wide
# From control plane, test connectivity:
for node in $(kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}'); do
ping -c 1 $node && echo "Node $node reachable" || echo "Node $node unreachable"
done
# Test specific backend ports:
kubectl exec -it <pod> -- nc -zv <node-ip> 4789 # VXLAN portIf nodes can't reach each other, Flannel can't function.
Ensure firewall allows backend communication:
# On each node, verify ports are open:
sudo iptables -L | grep -E "4789|8285|6783" # Common Flannel ports
# Flannel requires:
# - Port 4789 UDP (VXLAN)
# - Port 8285 UDP (Flannel UDP backend)
# - Port 8472 UDP (Alternative VXLAN)
# - Port 6783 TCP/UDP (flannel control)
# Add iptables rules if missing:
sudo iptables -I INPUT -p udp -m multiport --dports 4789 -j ACCEPT
sudo iptables -I INPUT -p udp -m multiport --dports 8285 -j ACCEPT
# For cloud providers, check security groups:
# AWS: Check security group inbound rules
# Azure: Check NSG rules
# GCP: Check firewall rulesOpen all required ports for Flannel backend.
Check if kernel module is loaded (for VXLAN):
ssh <node-ip>
sudo modprobe vxlan # Load module if missing
module -l | grep vxlan # Check if loaded
lsmod | grep vxlan
# Verify VXLAN interface:
ip link show # Look for "flannel.1" or similar
# If interface is missing:
sudo ip link add flannel.1 type vxlan id 1 port 4789 nolearningVXLAN requires kernel support. Some minimal Linux distributions may not have it.
Verify MTU (Maximum Transmission Unit) doesn't conflict:
# Check current MTU:
kubectl get cm kube-flannel-cfg -n kube-flannel -o jsonpath='{.data.net-conf\.json}' | jq .Backend.VXLANMtu
# VXLAN adds 50 bytes of overhead:
# If interface MTU is 1500, VXLAN MTU should be 1450
# If interface MTU is 9000 (jumbo frames), VXLAN can be 8950
# View actual MTU:
kubectl exec -it <pod> -- ifconfig eth0 # Check MTU
# Update ConfigMap if needed:
kubectl patch configmap kube-flannel-cfg -n kube-flannel --type merge -p \
'{"data":{"net-conf.json":"...VXLANMtu: 1450..."}}'
# Restart Flannel to apply changes:
kubectl rollout restart daemonset/kube-flannel -n kube-flannelEnsure Pod CIDR doesn't conflict with node networks:
# Get Pod CIDR:
kubectl get nodes -o jsonpath='{.items[0].spec.podCIDR}'
# Should be different from node CIDR:
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}'
# Check Flannel network in ConfigMap:
kubectl get cm kube-flannel-cfg -n kube-flannel -o jsonpath='{.data.net-conf\.json}' | jq .Network
# If there's overlap, update cluster network:
# This requires cluster recreation or complex migration
kubectl patch node <node> -p '{"spec":{"podCIDR":"10.200.0.0/24"}}'Overlapping CIDRs prevent proper routing.
Force Flannel to restart and reinitialize:
# Restart the DaemonSet:
kubectl rollout restart daemonset/kube-flannel -n kube-flannel
# Watch pods restart:
kubectl get pods -n kube-flannel -w
# Monitor logs during restart:
kubectl logs -n kube-flannel -l app=flannel -f
# Verify network is restored:
kubectl run -it --rm test --image=alpine -- sh
# Inside pod:
ping $(kubectl get pods -o jsonpath='{.items[1].status.podIP}')After restart, test pod-to-pod communication across nodes.
Flannel is a simple, production-grade networking plugin. VXLAN is the default and most reliable backend. Host-gw is faster but requires Layer 2 connectivity (same network segment). AWS backend uses EC2 APIs to manage routes directly. UDP backend is legacy and slower. MTU must be set correctly; Flannel overhead is ~50 bytes for VXLAN. Flannel stores subnet assignments in etcd; if etcd is unavailable, Flannel fails. For multi-cloud or complex topologies, consider Calico or Cilium for more advanced routing. Monitor Flannel metrics: check pod IP allocation, VXLAN packet loss, and cross-node latency. On systems with multiple networks, ensure Flannel uses the correct interface (--iface flag). For troubleshooting, capture packets on VXLAN interface: tcpdump -i flannel.1.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm