Azure CNI overlay networking fails due to Network Security Group (NSG) misconfigurations, SNAT routing issues, or IP address exhaustion. Pods cannot communicate across nodes or reach external services. Fix by reviewing NSG rules, ensuring pod CIDR capacity, and verifying CNI plugin initialization.
Azure CNI overlay enables pods to use Azure virtual network IP addresses directly (no encapsulation). When overlay fails, the CNI plugin cannot initialize on nodes, preventing pods from receiving network interfaces. Common causes: NSG blocking pod traffic, SNAT misconfiguration, or node IP pool exhaustion.
Check if nodes are ready and CNI initialized:
kubectl get nodes -o wide
kubectl describe node <node-name> | grep -A5 "Conditions"
kubectl get pods -n kube-system | grep cniLook for "cni plugin not initialized" condition. If present, CNI pods may not be running.
Azure NSG rules may block overlay traffic:
az network nsg rule list --resource-group <rg> --nsg-name <nsg-name> -o tableEnsure rules allow:
- Node CIDR ↔ Pod CIDR (all ports/protocols)
- Pod CIDR ↔ Pod CIDR (all ports/protocols)
- Pod CIDR → External (outbound)
Add missing rules if needed.
Check if pod IP pool is exhausted:
kubectl get nodes -o wide
kubectl describe node <node-name> | grep -A2 "Pod CIDR"Ensure:
- Pod CIDR range is large enough (default 192.168.0.0/16)
- Nodes have available IPs (check allocatable IPs)
- No external hosts using pod CIDR range
SNAT routing misconfiguration breaks outbound traffic:
kubectl get configmap -n kube-system azure-ip-masq-agent-config -o yamlIf problematic nonMasqueradeCIDR entries exist, fix them:
kubectl edit configmap -n kube-system azure-ip-masq-agent-config
# Remove problematic CIDR entries
kubectl rollout restart daemonset -n kube-system azure-ip-masq-agentDebug networking with a test pod:
kubectl run -it --rm debug --image=busybox --restart=Never -- sh
# Inside pod:
ping <external-host>
nslookup kubernetes.default
nc -zv <service-ip> <port>If connectivity fails, network rules are blocking traffic.
Check AKS cluster overlay settings:
az aks show --resource-group <rg> --name <cluster> --query networkProfileShould show:
- networkPlugin: azure
- networkPluginMode: overlay (for overlay)
- podCidr configured
Review CNI plugin logs:
kubectl logs -n kube-system -l k8s-app=azure-cni -f
kubectl logs -n kube-system -l k8s-app=azure-ip-masq-agentLook for download failures, IP allocation errors, or SNAT issues.
Azure CNI overlay differs from kubenet; pods use cluster IPs directly. NSG rules are the primary cause (80%+ of issues)—review all rules carefully. Overlay upgrade requires Kubernetes 1.22+ and clean migration path (remove network policies first). Dynamic IP Allocation incompatible with overlay. Use new Microsoft Connectivity Analysis Tool for complex diagnostics.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm