How to fix AKS disk CSI error in Kubernetes

KubernetesINTERMEDIATEHIGH

AKS disk CSI errors occur when Azure Disk volumes fail to attach, mount, or provision in Kubernetes clusters. Common causes include incorrect RBAC permissions, disk resource group mismatches, case-sensitive URI format issues, or CSI driver installation problems. Fix by verifying service principal permissions, checking disk resource groups, validating volume configurations, and ensuring the Azure Disk CSI driver is properly deployed.

What this error means

The Azure Disk Container Storage Interface (CSI) driver is a critical component in AKS that manages the lifecycle of persistent volumes backed by Azure Managed Disks. When CSI errors occur, pods cannot attach, mount, or provision disks, causing application failures. These errors stem from misaligned permissions between the AKS service principal and Azure resources, misconfigured volume references, or deployment issues with the CSI driver components. Unlike in-tree volume plugins, CSI drivers run as separate pods on cluster nodes and require explicit registration with kubelet for volume operations to succeed.

How to fix "AKS disk CSI error"

1Verify the AKS service principal has correct permissions

The AKS service principal must have Contributor role on the disk's resource group:

bash

# Get the service principal ID from your AKS cluster
SP_ID=$(az aks show --resource-group <rg> --name <cluster-name> \
  --query identity.principalId -o tsv)

# Get the disk's resource group (typically the MC_* node resource group)
DISK_RG=$(az disk show --resource-group <rg> --name <disk-name> \
  --query resourceGroup -o tsv)

# Check the role assignment
az role assignment list --assignee $SP_ID \
  --resource-group $DISK_RG \
  --query "[].roleDefinitionName" -o table

If Contributor is missing, assign it:

bash

az role assignment create \
  --assignee $SP_ID \
  --role Contributor \
  --resource-group $DISK_RG

2Confirm the disk resource ID format is correct

PersistentVolume disk resource IDs are case-sensitive:

bash

# Get the correct disk resource ID
az disk show --resource-group <rg> --name <disk-name> --query id -o tsv

# Example output:
# /subscriptions/12345678-1234-1234-1234-123456789012/resourceGroups/MC_myRG_myCluster_eastus/providers/Microsoft.Compute/disks/my-disk

In your PersistentVolume YAML, ensure the diskURI exactly matches (case-sensitive):

yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 32Gi
  accessModes:
    - ReadWriteOnce
  azureDisk:
    kind: Managed
    diskName: my-disk
    diskURI: /subscriptions/12345678-1234-1234-1234-123456789012/resourceGroups/MC_myRG_myCluster_eastus/providers/Microsoft.Compute/disks/my-disk
    fsType: ext4

3Verify the disk is in the correct resource group

AKS clusters create a managed resource group (MC_*) for node resources. Disks must be in this group:

bash

# Find your cluster's node resource group
NODE_RG=$(az aks show --resource-group <your-rg> --name <cluster-name> \
  --query nodeResourceGroup -o tsv)

echo "Node resource group: $NODE_RG"

# List all disks in the node resource group
az disk list --resource-group $NODE_RG --query "[].{name:name, id:id}" -o table

If your disk is in a different resource group, either:
1. Move the disk to the node resource group, OR
2. Ensure the service principal has Contributor role on the disk's actual resource group

To move the disk:

bash

# Use Azure Portal or Azure Resource Mover to relocate the disk to the node RG

4Check Azure Disk CSI driver deployment status

The CSI driver must be deployed and running on all worker nodes:

bash

# Check if the CSI driver is running
kubectl get pods -n kube-system | grep azuredisk

# Look for azuredisk-csi-node and azuredisk-csi-controller pods
kubectl get daemonset -n kube-system | grep azuredisk
kubectl get deployment -n kube-system | grep azuredisk

Expected output should show:
- azuredisk-csi-node-*: Running on every worker node (DaemonSet)
- azuredisk-csi-controller-*: Running controller pod(s)

If pods are missing or failing:

bash

kubectl describe pod -n kube-system <azuredisk-csi-node-pod>
kubectl logs -n kube-system <azuredisk-csi-node-pod>

5Deploy or upgrade the Azure Disk CSI driver via Helm

Install the official Azure Disk CSI driver using Helm:

bash

# Add the Azure Disk CSI driver Helm repository
helm repo add azuredisk-csi-driver https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/charts
helm repo update

# Install the driver
helm install azuredisk-csi-driver azuredisk-csi-driver/azuredisk-csi-driver \
  -n kube-system \
  --set controller.replicas=2 \
  --set node.tolerations[0].key=node-role.kubernetes.io/master \
  --set node.tolerations[0].operator=Exists \
  --set node.tolerations[1].key=node-role.kubernetes.io/control-plane \
  --set node.tolerations[1].operator=Exists

Or upgrade if already installed:

bash

helm upgrade azuredisk-csi-driver azuredisk-csi-driver/azuredisk-csi-driver \
  -n kube-system

Wait for deployment to complete:

bash

kubectl rollout status daemonset/azuredisk-csi-node -n kube-system
kubectl rollout status deployment/azuredisk-csi-controller -n kube-system

6Check node-level disk attachment limits

Each Azure VM SKU has a maximum number of data disks that can attach:

bash

# Check your node VM sizes
kubectl describe nodes | grep "node.kubernetes.io/instance-type"

# Common limits:
# Standard_B1s: 4 disks
# Standard_B2s: 8 disks
# Standard_D2s_v3: 8 disks
# Standard_D4s_v3: 16 disks
# Standard_D8s_v3: 32 disks
# Standard_D16s_v3: 32 disks
# Standard_E4s_v3: 8 disks
# Standard_E8s_v3: 16 disks

Count current disk attachments:

bash

kubectl describe node <node-name> | grep "attachedVolumes"

If approaching the limit, either:
1. Use larger VM SKUs with higher attachment limits, OR
2. Use Azure Container Storage or other storage solutions (NFS, Azure Files) instead

Scale nodes to higher SKUs:

bash

az aks nodepool update --resource-group <rg> --cluster-name <cluster> \
  --name <nodepool> --vm-set-type VirtualMachineScaleSets \
  --node-vm-size Standard_D8s_v3

7Validate StorageClass and PersistentVolumeClaim configuration

Ensure your StorageClass uses the managed-csi driver:

bash

kubectl get storageclass
kubectl describe storageclass managed-csi

Expected output should show:

bash

Provisioner: disk.csi.azure.com
Parameters:
  kind: Managed
  storageaccounttype: Premium_LRS  # or Standard_LRS

Create a test PersistentVolumeClaim:

yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-azure-disk-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: managed-csi
  resources:
    requests:
      storage: 10Gi

Apply and check status:

bash

kubectl apply -f test-pvc.yaml
kubectl describe pvc test-azure-disk-pvc
kubectl get pvc test-azure-disk-pvc -w  # Watch for Bound status

If still failing, check events:

bash

kubectl describe pvc test-azure-disk-pvc
kubectl get events -n default --sort-by='.lastTimestamp' | grep -i disk

8Inspect CSI driver logs and Azure API calls

Enable debug logging in the CSI driver to see API calls:

bash

# Check azuredisk-csi-controller logs
kubectl logs -n kube-system -l app=azuredisk-csi-controller -c azuredisk

# Check azuredisk-csi-node logs
kubectl logs -n kube-system -l app=azuredisk-csi-node -c azuredisk

# Look for specific errors like:
# - "RequestFailed"
# - "Forbidden"
# - "NotFound"
# - "InvalidParameter"

For authentication issues:

bash

# Verify cluster can reach Azure Resource Manager
kubectl run -it --image=mcr.microsoft.com/oss/azure-cli:latest --rm debug -- bash

# Inside the pod:
az login --service-principal -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID
az disk show --ids "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP/providers/Microsoft.Compute/disks/DISK_NAME"

How to fix AKS disk CSI error in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "AKS disk CSI error"

Advanced notes

Related errors

Official resources & further reading