How to fix ItemCollectionSizeLimitExceededException: Collection size exceeded in DynamoDB

DynamoDBADVANCEDHIGH

DynamoDB returns ItemCollectionSizeLimitExceededException when an item collection exceeds the 10 GB size limit. This occurs when too much data accumulates under a single partition key value, requiring data redistribution, archival, or partition key redesign to resolve.

What this error means

The ItemCollectionSizeLimitExceededException error in DynamoDB indicates that a specific item collection has exceeded the 10 GB size limit. An "item collection" refers to all items that share the same partition key value in a table or local secondary index. This limit exists because: 1. **Partition management**: DynamoDB partitions data based on partition key values to distribute load 2. **Local secondary indexes**: LSIs maintain sorted collections of items within each partition 3. **Performance guarantees**: The 10 GB limit ensures predictable performance for queries and scans 4. **Storage architecture**: Each partition has finite storage capacity for consistent performance When you exceed this limit, DynamoDB cannot add new items to that partition key value until you reduce the collection size. This is a hard limit that cannot be increased, unlike provisioned throughput limits.

How to fix "ItemCollectionSizeLimitExceededException: Collection size exceeded"

1Identify the problematic partition key values

First, determine which partition key values are hitting the 10 GB limit:

bash

# Use CloudWatch metrics to identify hot partitions
aws cloudwatch get-metric-statistics   --namespace AWS/DynamoDB   --metric-name ConsumedWriteCapacityUnits   --dimensions Name=TableName,Value=YourTableName   --start-time $(date -u -d "1 hour ago" +%Y-%m-%dT%H:%M:%SZ)   --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ)   --period 300   --statistics Sum

# Check DynamoDB metrics for UserErrors
aws cloudwatch get-metric-statistics   --namespace AWS/DynamoDB   --metric-name UserErrors   --dimensions Name=TableName,Value=YourTableName Name=Operation,Value=PutItem   --start-time $(date -u -d "24 hours ago" +%Y-%m-%dT%H:%M:%SZ)   --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ)   --period 3600   --statistics Sum

# Review application logs for failed write operations
# Look for patterns in partition key values that appear in error messages

Analysis approach:
- Review application logs for failed write operations
- Check which partition key values appear in error messages
- Analyze data access patterns in your application code
- Use DynamoDB Streams to monitor write patterns
- Consider enabling detailed CloudWatch metrics for deeper insights

2Implement partition key sharding

Modify your partition key design to distribute data across multiple partitions:

javascript

// BEFORE: Single partition key for all metrics
const partitionKey = 'system-metrics';

// AFTER: Sharded partition keys
function getShardedPartitionKey(metricName) {
  const shardId = Math.floor(Math.random() * 100); // 0-99
  return `metric-${metricName}-shard-${shardId}`;
}

// For time-series data, include timestamp granularity
function getTimeBasedPartitionKey(entity, timestamp) {
  const date = new Date(timestamp);
  const month = date.toISOString().substring(0, 7); // YYYY-MM
  return `${entity}-${month}`;
}

// For user data, use composite keys
function getUserPartitionKey(userId, dataType) {
  return `user-${userId}-${dataType}`;
}

Sharding strategies:
- Add random suffixes (0-99, 0-999) based on expected volume
- Include timestamp components (year, month, day) for time-series
- Use composite keys combining entity type with sub-category
- Implement consistent hashing for deterministic shard assignment
- Consider using hash functions for even distribution

3Archive or delete old data

Implement data lifecycle management to remove old items:

javascript

// Example: Delete items older than retention period
async function deleteOldItems(tableName, partitionKey, retentionDays) {
  const cutoffDate = new Date();
  cutoffDate.setDate(cutoffDate.getDate() - retentionDays);

  // Query for old items (assuming sort key is timestamp)
  const params = {
    TableName: tableName,
    KeyConditionExpression: 'pk = :pk AND sk < :cutoff',
    ExpressionAttributeValues: {
      ':pk': { S: partitionKey },
      ':cutoff': { N: cutoffDate.getTime().toString() }
    }
  };

  const items = [];
  let lastEvaluatedKey;

  do {
    if (lastEvaluatedKey) {
      params.ExclusiveStartKey = lastEvaluatedKey;
    }

    const result = await dynamodb.query(params).promise();
    items.push(...result.Items);
    lastEvaluatedKey = result.LastEvaluatedKey;
  } while (lastEvaluatedKey);

  // Batch delete in chunks of 25
  for (let i = 0; i < items.length; i += 25) {
    const chunk = items.slice(i, i + 25);
    const deleteRequests = chunk.map(item => ({
      DeleteRequest: {
        Key: { pk: item.pk, sk: item.sk }
      }
    }));

    await dynamodb.batchWriteItem({
      RequestItems: {
        [tableName]: deleteRequests
      }
    }).promise();
  }
}

// Enable TTL (Time to Live) for automatic expiration

bash
# Enable TTL on a table attribute
aws dynamodb update-time-to-live --table-name YourTableName --time-to-live-specification "Enabled=true, AttributeName=ttl"
```

Retention strategies:
- Implement TTL (Time to Live) for automatic expiration
- Schedule Lambda functions for periodic cleanup
- Move historical data to DynamoDB Standard-IA or S3
- Consider DynamoDB Streams + Lambda for real-time archival

4Restructure local secondary indexes

If LSIs are causing the issue, consider alternative indexing strategies:

javascript

// BEFORE: LSI on frequently updated attribute
// This creates large collections if many items share partition key

// AFTER: Use global secondary index (GSI) instead
// GSIs have their own partition key and don't contribute to item collection size

// Or: Denormalize data into separate tables
async function writeToDenormalizedTable(mainTable, summaryTable, item) {
  // Write detailed data to main table with fine-grained partition key
  await dynamodb.putItem({
    TableName: mainTable,
    Item: {
      pk: { S: `detail-${item.id}-${Date.now()}` },
      // ... other attributes
    }
  }).promise();

  // Write summary to separate table
  await dynamodb.updateItem({
    TableName: summaryTable,
    Key: { pk: { S: `summary-${item.category}` } },
    UpdateExpression: 'ADD #count :inc',
    ExpressionAttributeNames: { '#count': 'itemCount' },
    ExpressionAttributeValues: { ':inc': { N: '1' } }
  }).promise();
}

// Use sparse indexes for conditional data
// Only include items in GSI when they have specific attributes

Indexing alternatives:
- Replace LSIs with GSIs when possible
- Implement composite GSIs for different query patterns
- Use DynamoDB Accelerator (DAX) for read-heavy patterns
- Consider Amazon Athena for analytical queries on exported data

5Implement write throttling and monitoring

Add proactive monitoring and throttling to prevent future issues:

javascript

// Monitor partition sizes
class PartitionMonitor {
  constructor(tableName) {
    this.tableName = tableName;
    this.partitionCounts = new Map();
    this.warningThreshold = 8 * 1024 * 1024 * 1024; // 8 GB warning
  }

  async trackWrite(partitionKey) {
    const count = (this.partitionCounts.get(partitionKey) || 0) + 1;
    this.partitionCounts.set(partitionKey, count);

    // Estimate size (adjust based on your average item size)
    const estimatedSize = count * 1024; // 1KB per item

    if (estimatedSize > this.warningThreshold) {
      console.warn(`Partition ${partitionKey} approaching 10 GB limit`);
      // Trigger mitigation: switch to sharded key, throttle writes, etc.
      return this.getShardedKey(partitionKey);
    }

    return partitionKey;
  }

  getShardedKey(originalKey) {
    const suffix = Math.floor(Math.random() * 10);
    return `${originalKey}-shard-${suffix}`;
  }
}

Monitoring strategies:
- Track estimated partition sizes in application memory
- Set up CloudWatch alarms for UserErrors metric
- Implement circuit breakers for problematic partitions
- Log all ItemCollectionSizeLimitExceededException occurrences
- Use AWS X-Ray for tracing write patterns

Advanced notes

## Understanding DynamoDB Collection Size Limits

### Item Collection Size Limit Details:
- 10 GB hard limit per partition key value
- Applies to both base table and all local secondary indexes combined
- Counts all versions of items (including deleted items until garbage collected)
- Based on uncompressed data size
- Cannot be increased via support ticket

### Local Secondary Index (LSI) Considerations:
- LSIs share the same partition key as base table
- Each LSI creates its own item collection within the partition
- All LSIs + base table must stay under 10 GB total
- LSIs cannot be added after table creation
- Consider GSIs instead for flexible indexing

### Partition Key Design Best Practices:

High-cardinality attributes:
- User IDs, session IDs, UUIDs
- Device IDs with high uniqueness
- Transaction IDs, order numbers

Avoid low-cardinality attributes:
- Status fields (active/inactive)
- Boolean flags (true/false)
- Small enum values (category A/B/C/D)

Time-series data patterns:
- ❌ Bad: "metrics" (single partition)
- ✅ Better: "metrics-2024-01", "metrics-2024-02"
- ✅ Best: "metrics-2024-01-shard-0" through "metrics-2024-01-shard-9"

### Data Lifecycle Management:

TTL (Time to Live):
- Automatically deletes items after expiry
- Reduces item collection size over time
- Background process, not immediate
- Useful for session data, temporary states

DynamoDB Streams + Lambda:
- Archive old data to S3
- Aggregate data into summary tables
- Implement custom retention policies
- Maintain data for compliance while reducing active size

### Migration Strategies:

Dual-write pattern:
1. Write to old and new table designs simultaneously
2. Use DynamoDB Streams to backfill historical data
3. Gradually shift read traffic to new design
4. Archive old table after validation

Blue-green deployment:
1. Create new table with improved design
2. Use data pipeline to copy/migrate data
3. Update application to use new table
4. Monitor performance before deleting old table

### When to Contact AWS Support:
- Need guidance on large-scale data migration
- Suspected data corruption issues
- Production-critical applications requiring immediate resolution
- Complex multi-table redesign requiring architecture review

How to fix ItemCollectionSizeLimitExceededException: Collection size exceeded in DynamoDB

What this error means

Typical symptoms

Common causes

How to fix "ItemCollectionSizeLimitExceededException: Collection size exceeded"

Advanced notes

Related errors

Official resources & further reading