AKS virtual node errors occur when pods fail to run on Azure Container Instances (ACI) due to networking, permissions, or quota issues. These errors typically involve authentication failures, container group quota limits, or misconfigured subnets.
Azure AKS virtual nodes enable you to run Kubernetes pods on Azure Container Instances (ACI) without managing underlying virtual machines. However, virtual nodes require careful configuration of networking, managed identities, and quotas. When virtual node errors occur, pods fail to start, hang in "Creating" status, or encounter authentication issues communicating with the AKS API server and Azure services. Virtual node errors differ from regular Kubernetes pod errors because they involve coordination between the ACI connector (a Virtual Kubelet pod running in your cluster) and Azure Container Instances APIs. Misconfiguration at the Azure networking or IAM level often cascades to pod deployment failures.
Virtual nodes only work with Azure CNI networking. Check your AKS cluster:
az aks show --resource-group <rg> --name <cluster-name> --query networkProfile.networkPluginOutput should be "azure". If "kubenet":
- Virtual nodes cannot be enabled on kubenet clusters
- You must create a new AKS cluster with Azure CNI or migrate workloads
When creating AKS cluster with virtual nodes:
az aks create --resource-group <rg> --name <cluster> --enable-addons virtual-node --vnet-subnet-id <subnet-id> --network-plugin azureCheck if virtual node is managed by AKS:
kubectl get nodes
kubectl describe node virtual-node-aci-linux # Or your virtual node nameIf the virtual node appears, check if it's from the managed addon:
az aks addon list --resource-group <rg> --name <cluster>
az aks addon show --resource-group <rg> --name <cluster> --addon virtual-nodeIf virtual nodes addon is disabled, enable it:
az aks enable-addons --resource-group <rg> --name <cluster> --addons virtual-node --subnet-name <subnet-name>The virtual kubelet runs as aci-connector-linux pod:
kubectl get pods -n kube-system | grep aci-connector
kubectl logs -n kube-system -l app=aci-connector-linux -f
kubectl describe pod -n kube-system -l app=aci-connector-linuxCommon error patterns:
- "subnet lookup failed" → subnet misconfigured or aci-connector identity lacks Network Contributor
- "authentication failed" → Managed Identity not properly assigned
- "CrashLoopBackOff" → Missing role assignments or network policy blocking connector
For identity issues, verify the aci-connector has correct Managed Identity:
az aks show --resource-group <rg> --name <cluster> --query addonProfiles.virtualNode.identityThe subnet used by ACI must be clean and properly configured:
az network vnet subnet show --resource-group <rg> --vnet-name <vnet> --name <subnet-name> --query "routeTable, delegations"Requirements:
- No attached route table (routeTable should be null or managed by Azure)
- No other resources should exist in subnet except ACI container groups
- For manual setup, verify subnet was created with ACI connector
If route table is attached:
az network vnet subnet update --resource-group <rg> --vnet-name <vnet> --name <subnet-name> --remove routeTableIf you see "ContainerGroupQuotaReached" error, the 100 container instance limit per region is hit:
az container list --resource-group <rg> --query "[].{name:name, state:containers[0].instanceView.currentState.state}" -o tableDelete succeeded container instances:
az container delete --resource-group <rg> --name <container-name> --yesOr cleanup all completed instances:
for container in $(az container list --resource-group <rg> --query "[?containers[0].instanceView.currentState.state=='Succeeded'].name" -o tsv); do
az container delete --resource-group <rg> --name "$container" --yes
doneFor batch job pods, add appropriate cleanup policies in your Kubernetes manifests.
Standard Kubernetes image pull secrets don't work with ACI. Use correct ACI format in pod spec:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
imagePullSecrets:
- name: regcred # This won't work for ACI
containers:
- name: mycontainer
image: myregistry.azurecr.io/myimage:tagInstead, configure credentials in the pod spec directly if using private registry:
spec:
containers:
- name: mycontainer
image: myregistry.azurecr.io/myimage:tag
imagePullPolicy: IfNotPresentFor Azure Container Registry (ACR), use AKS integration:
az aks update --resource-group <rg> --name <cluster> --attach-acr <acr-name>This allows ACI virtual nodes to pull images without explicit secrets.
ACI has stricter requirements than regular Kubernetes. Verify:
1. If you provide args, you must also provide command:
spec:
containers:
- name: mycontainer
image: myimage:tag
command: ["/bin/sh"]
args: ["-c", "echo hello"]2. Check for unsupported fields (ACI may not support some Kubernetes features):
- SecurityContext (runAsUser, fsGroup, seLinuxOptions)
- Workload Identity / Managed Identity (not supported on virtual nodes)
- Daemon sets (cannot run on virtual nodes)
- StatefulSets with PVC (limited support)
3. Use nodeSelector to target virtual node explicitly:
spec:
nodeSelector:
kubernetes.io/hostname: virtual-node-aci-linux # Or your virtual node name
tolerations:
- key: virtual-kubelet.io/provider
operator: ExistsThe aci-connector identity must have correct Azure roles:
# Find the aci-connector managed identity
az identity list --resource-group <node-rg> --query "[?contains(name, 'aci-connector')].{name:name, id:id}"
# Verify Network Contributor role on the ACI subnet
az role assignment list --assignee <identity-principal-id> --scope <subnet-id> --query "[].roleDefinitionName"If missing, assign the role:
az role assignment create --assignee <identity-principal-id> --role "Network Contributor" --scope <subnet-id>For ACR access (if using private images):
az role assignment create --assignee <identity-principal-id> --role "AcrPull" --scope <acr-id>Role assignments can take a few minutes to propagate.
Azure AKS virtual nodes are built on Virtual Kubelet, a project that abstracts container orchestration platforms as Kubernetes nodes. Virtual nodes in AKS specifically use Azure Container Instances (ACI) as the underlying compute platform. Key advanced considerations: (1) Virtual nodes and regular node pools operate independently—pods scheduled to virtual nodes do not count toward your VM resource utilization, making them ideal for burst workloads, but costs scale with actual container runtime. (2) ACI does not support certain Kubernetes features: no daemon sets, limited persistent volume support, no init containers with args/command combinations, and no privileged security contexts. (3) Metrics scraping (Prometheus, monitoring agents) does not work on virtual nodes due to API limitations. (4) For production workloads with strict compliance requirements (SELinux, AppArmor, advanced networking policies), avoid virtual nodes or use them for stateless, temporary jobs only. (5) Container instance cleanup is critical for cost management—use finalizers or Kubernetes Job backoffLimit + ttlSecondsAfterFinished to auto-clean succeeded/failed container instances. (6) Multi-tenant scenarios: ACI container groups in the same subnet/region can theoretically leak information; isolate sensitive workloads using separate vNets or disable virtual nodes if not needed. (7) For CI/CD and batch processing, virtual nodes excel; use them for build jobs, testing, and ephemeral workloads rather than long-running services.
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes
default backend - 404
How to fix "default backend - 404" in Kubernetes Ingress
serviceaccount cannot list resource
How to fix "serviceaccount cannot list resource" in Kubernetes