How to fix TooManyBucketsException: Trying to create too many buckets in Elasticsearch

ElasticsearchINTERMEDIATEMEDIUM

This error occurs when Elasticsearch aggregations attempt to create more buckets than the configured limit (default: 10,000). Aggregations like terms, histogram, or composite can generate excessive buckets when processing high-cardinality fields or large datasets, causing memory issues and query failures.

What this error means

The "TooManyBucketsException: Trying to create too many buckets" error occurs when an Elasticsearch aggregation query attempts to create more unique buckets than the system limit allows. Buckets are the fundamental building blocks of Elasticsearch aggregations - they group documents based on field values, ranges, or other criteria. This error typically appears when: 1. Aggregating on high-cardinality fields (like user IDs, email addresses, or UUIDs) 2. Using large size parameters in terms aggregations 3. Running composite aggregations without proper pagination 4. Creating histogram/date_histogram aggregations with small intervals over large time ranges 5. Nesting multiple aggregations that multiply bucket counts Elasticsearch limits bucket creation to prevent memory exhaustion and cluster instability. Each bucket consumes memory, and excessive buckets can overwhelm node heaps, leading to circuit breaker trips or node failures. The default limit is 10,000 buckets, but this can be adjusted based on your cluster's capacity and use case.

How to fix "TooManyBucketsException: Trying to create too many buckets"

1Check current bucket limit and identify problematic aggregation

First, examine your aggregation query and check the current bucket limit:

bash

# Check current search.max_buckets setting
curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&filter_path=**.search.max_buckets" -u "username:password"

# Example response showing default limit
{
  "defaults": {
    "search": {
      "max_buckets": 10000
    }
  }
}

# Identify which aggregation is causing the issue
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "problematic_agg": {
      "terms": {
        "field": "user_id.keyword",
        "size": 10000
      }
    }
  }
}
'

# For composite aggregations, check if you're missing pagination
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "composite_agg": {
      "composite": {
        "sources": [
          {
            "user": {
              "terms": {
                "field": "user_id.keyword"
              }
            }
          },
          {
            "date": {
              "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "day"
              }
            }
          }
        ],
        "size": 1000
      }
    }
  }
}

Look for aggregations on high-cardinality fields or combinations that could generate many unique buckets.

2Reduce bucket count with aggregation optimizations

Optimize your aggregations to reduce bucket count:

bash

# 1. Add filters to reduce dataset size
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-7d/d"
      }
    }
  },
  "aggs": {
    "users": {
      "terms": {
        "field": "user_id.keyword",
        "size": 1000  # Limit to top 1000 users
      }
    }
  }
}
'

# 2. Use smaller size parameter
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "top_categories": {
      "terms": {
        "field": "category.keyword",
        "size": 100  # Only need top 100 categories
      }
    }
  }
}
'

# 3. Increase histogram interval
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "hourly_events": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "hour",  # Instead of "minute"
        "min_doc_count": 1
      }
    }
  }
}
'

# 4. Use composite aggregation with pagination
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "composite_agg": {
      "composite": {
        "sources": [
          {
            "user": {
              "terms": {
                "field": "user_id.keyword"
              }
            }
          }
        ],
        "size": 100  # Smaller page size
      }
    }
  }
}
'

# 5. Use sampler aggregation for high-cardinality fields
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 1000
      },
      "aggs": {
        "user_distribution": {
          "terms": {
            "field": "user_id.keyword",
            "size": 100
          }
        }
      }
    }
  }
}

3Increase search.max_buckets limit temporarily

If you genuinely need more buckets, increase the limit temporarily:

bash

# Increase limit for current session (transient)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "search.max_buckets": 50000
  }
}
'

# Increase limit permanently (persistent)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "search.max_buckets": 50000
  }
}
'

# Check node memory before increasing
curl -X GET "localhost:9200/_nodes/stats/jvm" -u "username:password"

# Monitor memory usage after increasing limit
curl -X GET "localhost:9200/_nodes/stats/indices,search" -u "username:password"

# Set limit on specific indices only
curl -X PUT "localhost:9200/my-index/_settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "index.max_result_window": 100000,
  "index.max_inner_result_window": 1000
}
'

Warning: Increasing bucket limits increases memory usage. Monitor your cluster's heap usage and consider increasing heap size if needed. Each bucket consumes ~48 bytes plus overhead for nested aggregations.

4Implement pagination for composite aggregations

For composite aggregations, implement proper pagination:

bash

# First page
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "composite_agg": {
      "composite": {
        "sources": [
          {
            "user": {
              "terms": {
                "field": "user_id.keyword"
              }
            }
          },
          {
            "date": {
              "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "day"
              }
            }
          }
        ],
        "size": 1000
      }
    }
  }
}
'

# Extract after_key from response and use for next page
# Response includes: "after_key": {"user": "user123", "date": 1672531200000}

# Second page using after_key
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "composite_agg": {
      "composite": {
        "sources": [
          {
            "user": {
              "terms": {
                "field": "user_id.keyword"
              }
            }
          },
          {
            "date": {
              "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "day"
              }
            }
          }
        ],
        "size": 1000,
        "after": {
          "user": "user123",
          "date": 1672531200000
        }
      }
    }
  }
}
'

# Client-side implementation example (JavaScript)
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });

async function getAllCompositeResults(index, sources, pageSize = 1000) {
  let allResults = [];
  let afterKey = null;

  do {
    const body = {
      size: 0,
      aggs: {
        composite_agg: {
          composite: {
            sources: sources,
            size: pageSize
          }
        }
      }
    };

    if (afterKey) {
      body.aggs.composite_agg.composite.after = afterKey;
    }

    const response = await client.search({
      index: index,
      body: body
    });

    const aggregation = response.aggregations.composite_agg;
    allResults = allResults.concat(aggregation.buckets);
    afterKey = aggregation.after_key;

  } while (afterKey);

  return allResults;
}

// Usage
const sources = [
  {
    user: {
      terms: { field: 'user_id.keyword' }
    }
  },
  {
    date: {
      date_histogram: {
        field: 'timestamp',
        calendar_interval: 'day'
      }
    }
  }
];

const results = await getAllCompositeResults('my-index', sources, 1000);
console.log('Total buckets:', results.length);

5Use alternative aggregation approaches

Consider alternative approaches for high-cardinality aggregations:

bash

# 1. Use cardinality aggregation for unique counts instead of terms
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id.keyword",
        "precision_threshold": 10000
      }
    }
  }
}
'

# 2. Use significant terms for identifying unusual patterns
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "match": { "category": "error" }
  },
  "aggs": {
    "significant_users": {
      "significant_terms": {
        "field": "user_id.keyword",
        "size": 10
      }
    }
  }
}
'

# 3. Use percentile aggregations for numeric distributions
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "response_times": {
      "percentiles": {
        "field": "response_time_ms",
        "percents": [50, 75, 90, 95, 99]
      }
    }
  }
}
'

# 4. Use top hits for sample documents instead of all buckets
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "top_users": {
      "terms": {
        "field": "user_id.keyword",
        "size": 10
      },
      "aggs": {
        "sample_docs": {
          "top_hits": {
            "size": 3,
            "_source": ["timestamp", "action"]
          }
        }
      }
    }
  }
}
'

# 5. Use filtered aggregations for specific subsets
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "active_users": {
      "filter": {
        "range": {
          "last_active": {
            "gte": "now-30d/d"
          }
        }
      },
      "aggs": {
        "user_types": {
          "terms": {
            "field": "user_type.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

Advanced notes

## Advanced Bucket Management

### Memory Considerations
Each bucket in Elasticsearch consumes approximately:
- 48 bytes for basic bucket overhead
- Additional memory for nested aggregations
- Field data cache for terms aggregations

For 100,000 buckets: ~4.8MB + nested aggregation overhead + field data.

### Monitoring Bucket Usage

bash

# Check aggregation memory usage
curl -X GET "localhost:9200/_nodes/stats/indices,search" -u "username:password"

# Monitor circuit breakers
curl -X GET "localhost:9200/_nodes/stats/breaker" -u "username:password"

# Check field data cache
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata" -u "username:password"

# Profile aggregation queries to see bucket counts
curl -X POST "localhost:9200/my-index/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "profile": true,
  "size": 0,
  "aggs": {
    "test_agg": {
      "terms": {
        "field": "user_id.keyword",
        "size": 10000
      }
    }
  }
}
'

### Index Design for Aggregations
1. Use keyword fields for aggregation (not text fields)
2. Enable doc_values for aggregatable fields
3. Consider using multi-fields for different aggregation needs
4. Use routing to distribute aggregation load

### Cluster Configuration

bash

# Increase heap size if increasing bucket limits
# In elasticsearch.yml
-Xms8g
-Xmx8g

# Configure circuit breakers
indices.breaker.total.limit: 70%
indices.breaker.fielddata.limit: 60%
indices.breaker.request.limit: 60%

# Field data cache settings
indices.fielddata.cache.size: 30%

### Performance Trade-offs
- Higher bucket limits → more memory usage → potential GC pauses
- Smaller intervals → more accurate analysis → more buckets
- Larger size parameters → complete results → memory pressure

### Alternative Architectures
For extremely high-cardinality aggregations:
1. Use rollup indices for pre-aggregated data
2. Implement application-side aggregation for specific use cases
3. Use time-based indices with coarser aggregations
4. Consider dedicated analytics databases for complex aggregations

How to fix TooManyBucketsException: Trying to create too many buckets in Elasticsearch

What this error means

Typical symptoms

Common causes

How to fix "TooManyBucketsException: Trying to create too many buckets"

Advanced notes

Related errors

Official resources & further reading