The ReplicationGroupNotFoundFault error occurs when Terraform references an ElastiCache replication group that doesn't exist or hasn't been created yet. This typically happens due to dependency ordering issues, manual deletions, or incorrect resource references. Verify the replication group exists, fix dependencies, and upgrade your AWS provider to v5.10.0 or later.
When Terraform attempts to create or modify ElastiCache resources, it may fail with ReplicationGroupNotFoundFault if it references a replication group that cannot be found in your AWS account. This happens when: 1. The replication group ID or reference is incorrect 2. The replication group hasn't been created yet due to dependency ordering 3. The replication group was manually deleted outside of Terraform 4. Terraform is querying for the resource before it's fully available in AWS Older versions of Terraform AWS Provider (pre-5.10.0) don't properly handle manually deleted resources during refresh, causing persistent failures. This requires either upgrading the provider or removing the resource from Terraform state.
Check if the replication group actually exists in your AWS account and region:
aws elasticache describe-replication-groups \
--replication-group-id your-replication-group-id \
--region us-east-1If the command returns "ReplicationGroupNotFoundFault", the replication group truly doesn't exist. If it exists, the issue is with Terraform's reference or state.
Verify that your Terraform configuration correctly references the replication group:
resource "aws_elasticache_replication_group" "primary" {
replication_group_description = "My Redis cluster"
engine = "redis"
engine_version = "7.0"
node_type = "cache.t3.micro"
num_cache_clusters = 2
automatic_failover_enabled = true
}
# Correct: Reference the replication group ID directly
resource "aws_elasticache_cluster" "replica" {
cluster_id = "my-replica-cluster"
replication_group_id = aws_elasticache_replication_group.primary.id
}Ensure the replication_group_id matches exactly or uses the correct terraform reference.
Add explicit dependency declaration to ensure the replication group is created before dependent resources:
resource "aws_elasticache_replication_group" "primary" {
replication_group_description = "My Redis cluster"
engine = "redis"
# ... other config ...
}
resource "aws_elasticache_cluster" "replica" {
cluster_id = "my-replica-cluster"
replication_group_id = aws_elasticache_replication_group.primary.id
# Explicit depends_on (usually not needed with references, but helps with timing issues)
depends_on = [aws_elasticache_replication_group.primary]
}While Terraform usually infers dependencies from references, explicit depends_on can help with timing issues.
If the replication group was manually deleted outside Terraform, remove it from Terraform state and recreate it:
# Remove from state
terraform state rm aws_elasticache_replication_group.primary
# Re-apply to recreate
terraform applyAlternatively, if the resource exists in AWS but state is out of sync, import it:
terraform import aws_elasticache_replication_group.primary my-replication-group-idUpgrade your AWS Provider to v5.10.0 or later, which includes better handling of manually deleted resources:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.10"
}
}
}After updating, run:
terraform init
terraform applyThe newer provider will properly refresh state when resources are deleted outside Terraform.
For Redis replication, use aws_elasticache_replication_group instead of aws_elasticache_cluster:
# Correct for Redis with replication
resource "aws_elasticache_replication_group" "example" {
replication_group_description = "Redis cluster"
engine = "redis"
engine_version = "7.0"
node_type = "cache.t3.micro"
num_cache_clusters = 2
automatic_failover_enabled = true
}
# aws_elasticache_cluster is for standalone clusters or MemcachedUsing the correct resource type prevents referential errors and ensures proper configuration.
When working with ElastiCache in Terraform, be aware that replication groups have multiple states: creating, available, deleting, and snapshotting. Terraform may query for the resource during state transitions, causing temporary "not found" errors.
If you're using parameter groups with your replication group, ensure the parameter group family matches your Redis engine version (redis7, redis6.x, redis5.0, etc.). Mismatched families can cause creation failures.
For production deployments, always set automatic_failover_enabled = true and use at least 2 cache clusters (num_cache_clusters >= 2) for multi-AZ deployment and automatic failover support.
When dealing with managed backups or snapshots, be aware that the replication group cannot be deleted while a snapshot is in progress. Check CloudTrail logs if you see unexpected timing issues.
Error: Error installing helm release: cannot re-use a name that is still in use
How to fix "release name in use" error in Terraform with Helm
Error: Error creating GKE Cluster: BadRequest
BadRequest error creating GKE cluster in Terraform
Error: External program failed to produce valid JSON
External program failed to produce valid JSON
Error: Unsupported argument in child module call
How to fix "Unsupported argument in child module call" in Terraform
Error: network is unreachable
How to fix "network is unreachable" in Terraform