Retries exhausted occurs when Terraform or a provider SDK exhausts all configured retry attempts while waiting for an API operation to succeed. This typically happens with rate limiting, transient network errors, or API instability.
When Terraform's provider attempts an API operation, it has built-in retry logic to handle transient failures (network blips, temporary API errors, rate limiting). Each provider has a maximum number of retries (often 25 for AWS) and a maximum duration to keep retrying. When the operation fails repeatedly and reaches either the retry count limit or the retry timeout duration, Terraform reports 'retries exhausted'. This means the provider gave up trying because the error persisted despite multiple attempts.
Before making configuration changes, verify the cloud provider isn't experiencing an outage:
# AWS status page
https://health.aws.amazon.com/
# Azure status page
https://status.azure.com/
# Google Cloud status page
https://status.cloud.google.com/If there's a reported incident, wait for the provider to resolve it. Re-run Terraform after the provider confirms resolution.
For AWS, increase the maximum retry attempts in your provider block:
provider "aws" {
region = "us-east-1"
max_retries = 25 # Default is 25, increase if needed
}For other providers, check their documentation for similar settings. Each provider may have different retry configuration options.
If you're deploying many resources at once, reduce the number of concurrent operations:
# Default parallelism is 10, reduce it to 2-3
terraform apply -parallelism=3This gives the cloud provider more time to handle each request before the next one arrives, reducing the chance of rate limiting.
You can also set this in your terraform config:
terraform {
backend "local" {
}
}Increase timeouts for resources that take longer to stabilize:
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
timeouts {
create = "15m"
update = "15m"
delete = "10m"
}
}
resource "aws_rds_cluster" "example" {
cluster_identifier = "my-cluster"
timeouts {
create = "45m"
update = "60m"
delete = "30m"
}
}Giving resources more time reduces the likelihood of hitting retry exhaustion.
Run Terraform with debug logging to see what specific API error is triggering retries:
export TF_LOG=debug
terraform apply 2>&1 | tee terraform.log
# Or for maximum verbosity
export TF_LOG=trace
terraform apply 2>&1 | tee terraform.logSearch the log for HTTP status codes and actual error messages:
grep -i '429\|500\|503\|timeout\|throttl' terraform.logLook for patterns—if you see 429 (rate limit), 503 (service unavailable), or timeout errors, that identifies the problem.
If the issue is transient (provider had a brief hiccup), simply retry:
# Wait a minute or two, then retry
sleep 120
terraform applyIf the apply partially succeeded, you may need to refresh state first:
terraform refresh
terraform applyFor CI/CD, implement a retry loop in your pipeline script:
for i in {1..3}; do
terraform apply -auto-approve && exit 0
echo "Attempt $i failed, waiting..."
sleep 60
done
exit 1The retry exhaustion behavior is provider-specific. AWS uses exponential backoff with a maximum of 25 retries by default, which roughly equates to 30-60 seconds of retries for network errors. However, for throttling (429) responses, the backoff is longer, and cumulative wait times can exceed 5 minutes. Some providers like Databricks or Azure may have different retry strategies.
When combined with StateChangeConf (waiting for resource to reach desired state), retry exhaustion becomes more complex. The provider might exhaust retries waiting for the API to respond, then exhaust state change retries waiting for the resource to be provisioned. Each has its own timeout.
For AWS specifically, the hashicorp/aws-sdk-go-base wrapper reduces network error retries to 10 (roughly 30 seconds) to prevent extremely long hangs, while keeping higher retries for throttling. As a practitioner, you generally can't control this, but you can work around it by increasing resource-level timeouts and reducing parallelism.
Error: Error rendering template: template not found
How to fix "template not found" error in Terraform
Error: Error generating private key
How to fix 'Error generating private key' in Terraform
Error creating Kubernetes Service: field is immutable
How to fix "field is immutable" errors in Terraform
Error: Error creating local file: open: permission denied
How to fix "Error creating local file: permission denied" in Terraform
Error: line endings have changed from CRLF to LF
Line endings have changed from CRLF to LF in Terraform