State refresh failures occur when Terraform can't synchronize with remote infrastructure due to state locks, connectivity issues, or resource changes. This guide covers diagnosis and recovery strategies for different root causes.
The "Failed to refresh state" error occurs when Terraform attempts to synchronize its state file with the actual state of your infrastructure but encounters an error in the process. Terraform refreshes state during plan and apply operations to ensure it has accurate information about what resources currently exist. This error can stem from several distinct issues: remote state backend problems (S3, Azure, Terraform Cloud), active state locks from incomplete operations, credential or authentication failures, network connectivity issues, or resources that were deleted or modified outside of Terraform. Unlike some transient failures, a refresh failure indicates a deeper problem that must be resolved before Terraform can proceed safely.
First, determine if an active state lock is blocking operations. Use:
terraform state listIf this command also fails with a lock error, you have a stale lock. List state locks in your backend:
For S3 backend with DynamoDB locking:
aws dynamodb scan --table-name terraform-lockFor Azure Blob Storage:
Log into the Azure Portal, navigate to the Storage Account and Blob Container containing your state, and check the blob's properties for an active lease.
If you confirm a stale lock, proceed to Step 3. If no lock exists, move to Step 2.
Ensure your backend credentials and connectivity are correct before attempting recovery.
For S3 backend:
aws s3 ls s3://your-bucket/path/to/terraform.tfstateFor Azure Blob Storage:
az storage blob exists --account-name youraccount --container-name yourcontainer --name terraform.tfstateFor Terraform Cloud:
curl -H "Authorization: Bearer $TFE_TOKEN" \
https://app.terraform.io/api/v2/state-versionsIf connectivity fails, verify:
- AWS credentials are set (check aws sts get-caller-identity)
- Azure credentials are set (az account show)
- Terraform Cloud API token is valid (echo $TFE_TOKEN)
- Network allows outbound HTTPS connections
- Backend configuration in Terraform code is correct
If Step 1 confirmed a stale lock and Step 2 confirmed backend connectivity, unlock the state:
terraform force-unlock <LOCK_ID>Replace <LOCK_ID> with the lock ID from the state lock table (from Step 1).
For S3/DynamoDB:
# Get the lock ID from DynamoDB
LOCK_ID=$(aws dynamodb scan --table-name terraform-lock \
--query 'Items[0].ID.S' --output text)
terraform force-unlock $LOCK_IDFor Azure:
In the Azure Portal, navigate to the blob properties and use the "Break lease" button to release the lock.
After unlocking, attempt the operation again:
terraform refreshOnce basic connectivity is restored, use refresh-only mode to identify what state differences exist without applying changes:
terraform plan -refresh-onlyThis command will:
- Attempt to fetch current state
- Compare with your Terraform code
- Show what has changed in your infrastructure
- NOT apply any changes
Review the output carefully. If resources appear as destroyed that shouldn't be, this indicates resources were deleted outside Terraform (see Step 5).
If the plan succeeds, apply the refresh to update your local state:
terraform apply -refresh-onlyIf refresh-only mode shows resources that no longer exist in the cloud (marked for removal), you have two options:
Option A: Remove them from state (if deletion was intentional):
terraform state rm 'aws_instance.example'This removes the resource from Terraform's tracking without destroying it in the cloud.
Option B: Import the resource back (if it still exists but state is confused):
terraform import aws_instance.example i-1234567890abcdef0Option C: Restore the resource in the cloud (if it was deleted but should exist):
Manually recreate the resource using the cloud provider's console or CLI, then:
terraform import <resource_type>.<name> <resource_id>After state corrections, try the refresh again:
terraform refreshIf the issue is temporary (backend slowness, eventual consistency in S3), use the -lock-timeout flag to give Terraform more time:
terraform plan -lock-timeout=10mFor operations that consistently timeout, increase the timeout:
terraform apply -lock-timeout=20mWait 1-2 minutes if using S3 backend (S3 eventual consistency), then retry:
sleep 120
terraform planIf problems persist after these steps, escalate to backend service support (HashiCorp Terraform Cloud, AWS, Azure, etc.) with the full error logs:
TF_LOG=DEBUG terraform plan 2>&1 | tee terraform-debug.logState Refresh and Eventual Consistency:
When using S3 as a backend, AWS eventual consistency can cause temporary "state data in S3 does not have expected content" errors. If this occurs, wait 1-2 minutes and retry.
S3 State Lock Timeout Bug:
In some Terraform versions, if a lock timeout expires before acquiring the lock, Terraform may not retry properly if another process releases the lock during the wait. Force unlock and retry manually if this occurs.
Azure Lease Expiration:
Azure Blob Storage uses 60-second leases for state locking. If a Terraform process crashes, the lease automatically expires within 60 seconds. Manual intervention is only needed for process hangs beyond 60 seconds.
Preventing Future Lock Issues:
Use CI/CD pipeline mechanisms to prevent concurrent Terraform runs:
- GitLab: Use resource locks in pipelines
- GitHub Actions: Use concurrency groups
- Jenkins: Use lock mechanisms
- Terraform Cloud: Enforces mutual exclusion automatically
Disabling Locks (Not Recommended):
Use terraform plan -lock=false only for inspection operations (never with apply) if locks are permanently stuck. This is dangerous and should only be used as a last resort after confirming with your team that no other operations are running.
Error: Error installing helm release: cannot re-use a name that is still in use
How to fix "release name in use" error in Terraform with Helm
Error: Error creating GKE Cluster: BadRequest
BadRequest error creating GKE cluster in Terraform
Error: External program failed to produce valid JSON
External program failed to produce valid JSON
Error: Unsupported argument in child module call
How to fix "Unsupported argument in child module call" in Terraform
Error: network is unreachable
How to fix "network is unreachable" in Terraform