The AKS policy addon error occurs when Azure Policy fails to install, initialize, or enforce policies on your Kubernetes cluster. Common causes include missing prerequisites, Azure Policy addon not enabled, gatekeeper pod failures, or policy constraint violations.
The Azure Policy Add-on extends Gatekeeper v3 (an admission controller webhook for Open Policy Agent) to enforce organizational policies across AKS clusters. When this addon encounters an error, it means the policy enforcement layer failed to initialize or is unable to process policy definitions. This can manifest as installation failures during addon enablement, gatekeeper pods refusing to start, policy constraint violations blocking resource creation, or policy assignments not syncing to the cluster. The addon serves as a control plane for compliance, so failures directly impact your ability to enforce security policies, governance rules, and organizational standards across containerized workloads.
Check if the addon is installed on your cluster:
az aks show -g <resource-group> -n <cluster-name> --query "addonProfiles.azurePolicy"Expected output should show "enabled": true. If not enabled, install it:
az aks enable-addons -g <resource-group> -n <cluster-name> --addons azure-policyWait 5-10 minutes for the addon to initialize:
kubectl get pods -n kube-system | grep azure-policy
kubectl get pods -n gatekeeper-systemCheck if gatekeeper components are healthy:
kubectl get pods -n gatekeeper-system
kubectl describe pod -n gatekeeper-system # Shows pod events
kubectl logs -n gatekeeper-system --all-containers=true # View logsExpected pods: gatekeeper-audit-*, gatekeeper-controller-manager-*. If pods show CrashLoopBackOff or Error status:
kubectl logs -n gatekeeper-system -l control-plane=controller-manager --tail=50Look for RBAC, webhook binding, or resource contention errors.
Azure Policy uses a webhook to intercept API requests. Verify it's configured:
kubectl get ValidatingWebhookConfiguration -n gatekeeper-system
kubectl describe ValidatingWebhookConfiguration gatekeeper-validating-webhook-configurationVerify the webhook service is accessible:
kubectl get service -n gatekeeper-system
kubectl port-forward -n gatekeeper-system svc/gatekeeper-webhook-service 8888:8888If webhook certificate is expired (common after long cluster idle periods), redeploy the addon:
az aks disable-addons -g <resource-group> -n <cluster-name> --addons azure-policy
sleep 60
az aks enable-addons -g <resource-group> -n <cluster-name> --addons azure-policyThe addon service principal needs cluster role bindings. Check:
kubectl get clusterrolebinding | grep azure-policy
kubectl get clusterrole | grep azure-policyIf missing, the addon cannot function. Verify you have Azure RBAC to enable addons:
az aks check-acl --resource-group <resource-group> --name <cluster-name>You need "Microsoft.ContainerService/managedClusters/write" permission. If permission denied:
- Contact subscription owner to grant "AKS Cluster Admin" or "Contributor" role
- Or use az aks update after role assignment:
az aks update -g <resource-group> -n <cluster-name>Network policies or Pod Security Standards (PSS) may block gatekeeper. Verify ingress to webhook:
kubectl get networkpolicy -n gatekeeper-system
kubectl describe pod -n gatekeeper-system -l control-plane=controller-manager | grep -i "network\|security"If PSS is enforcing restricted mode, it may block the webhook:
kubectl describe namespace gatekeeper-system | grep pod-securityIf restricted, temporarily relax for gatekeeper namespace:
kubectl label namespace gatekeeper-system pod-security.kubernetes.io/enforce=baseline --overwriteThen redeploy gatekeeper.
Check if constraint templates are deployed:
kubectl get constrainttemplates
kubectl describe constrainttemplate K8sRequiredLabels # Example templateIf templates missing, policy enforcement cannot start. For policy assignments in Azure:
az policy assignment list --scope /subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/<cluster-name>Assignments take up to 20 minutes to sync. If sync fails:
kubectl logs -n kube-system -l app=azure-policy --tail=100Look for "failed to apply" or "constraint not found" errors.
Azure Policy addon requires Kubernetes 1.14+. Check:
kubectl version --short
az aks show -g <resource-group> -n <cluster-name> --query kubernetesVersionIf older than 1.14, upgrade cluster:
az aks upgrade -g <resource-group> -n <cluster-name> --kubernetes-version 1.27Note: Kubernetes 1.27+ has long-term support (LTS) and is recommended for production. After upgrade, re-enable the addon if needed.
If addon still fails after checks, perform a clean redeploy:
# 1. Disable addon
az aks disable-addons -g <resource-group> -n <cluster-name> --addons azure-policy
# 2. Wait for cleanup
sleep 120
# 3. Verify removed
kubectl get pods -n gatekeeper-system # Should be empty
kubectl get pods -n kube-system | grep azure-policy # Should be gone
# 4. Re-enable
az aks enable-addons -g <resource-group> -n <cluster-name> --addons azure-policy
# 5. Monitor initialization
watch kubectl get pods -n gatekeeper-systemInitialization can take 5-15 minutes. Watch for all pods to reach "Running" state.
For persistent issues, enable Azure diagnostic logs:
az monitor diagnostic-settings create \
--name aks-policy-diags \
--resource /subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/<cluster-name> \
--logs '[{"category": "cluster-autoscaler", "enabled": true}]' \
--workspace <log-analytics-workspace-id>Then query logs in Log Analytics:
ContainerLog
| where TimeGenerated > ago(1h)
| where LogEntry contains "azure-policy" or LogEntry contains "gatekeeper"
| project TimeGenerated, Pod, LogEntryThis helps identify root causes in complex scenarios.
In production, policy addon failures can block security-critical deployments. Always test policy changes in non-prod clusters first. Policy assignments syncing takes 20+ minutes; do not immediately retry if assignment appears not applied. For distros like AKS on Azure Stack HCI, ensure proper network routing to webhook endpoint. When combining with Pod Security Standards (PSS), note that gatekeeper operates at admission control level while PSS operates at pod security levelβorder of enforcement matters. For air-gapped clusters, ensure container images for azure-policy and gatekeeper are cached in your private registry. Azure Policy for Kubernetes uses OPA (Open Policy Agent) Rego language for policy logic; invalid Rego constraints will fail silently. For cost-sensitive deployments, note that audit-mode policies still consume gatekeeper resources even if not enforcing.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
etcdserver: cluster ID mismatch
How to fix "etcdserver: cluster ID mismatch" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm