The Elasticsearch fielddata circuit breaker triggers when aggregations or sorting operations attempt to load too much field data into memory. This error occurs when the estimated memory needed for field data exceeds the configured circuit breaker limit, preventing memory exhaustion and node crashes.
The CircuitBreakingException with "[fielddata] Data too large" is Elasticsearch's memory protection mechanism preventing field data operations from consuming excessive heap memory. Field data is used for aggregations, sorting, and script fields on text fields, loading all values for a field into memory. When Elasticsearch estimates that loading field data would exceed the configured circuit breaker limit (default 60% of heap), it throws this exception to protect the node from OutOfMemoryErrors. This error typically occurs during complex aggregations, sorting on large text fields, or when using script fields that require field data. The circuit breaker is a safety feature that prevents single queries from overwhelming the JVM heap, which could cause node instability or crashes. The error message indicates that the operation you're attempting would require more field data memory than allowed by the circuit breaker settings. This doesn't necessarily mean your cluster is out of memory overall, but that this specific operation exceeds the safety threshold for field data operations.
First, examine your current circuit breaker configuration to understand the limits:
# Check circuit breaker settings
curl -X GET "localhost:9200/_nodes/stats/breaker?pretty"
# Look for the fielddata circuit breaker section
# Expected output includes:
# "fielddata" : {
# "limit_size_in_bytes" : 6442450944,
# "limit_size" : "6gb",
# "estimated_size_in_bytes" : 0,
# "estimated_size" : "0b",
# "overhead" : 1.03,
# "tripped" : 0
# }The limit_size shows the maximum allowed fielddata memory (default 60% of heap). The tripped count indicates how many times the breaker has triggered.
Review and optimize your field mappings to minimize fielddata memory consumption:
PUT /my_index/_mapping
{
"properties": {
"large_text_field": {
"type": "text",
"fielddata": false, # Disable fielddata if not needed for aggregations
"fields": {
"keyword": {
"type": "keyword", # Use keyword for aggregations instead
"ignore_above": 256
}
}
}
}
}Key optimizations:
1. Set "fielddata": false on text fields not used for aggregations
2. Use keyword type fields for aggregation and sorting
3. Consider using "eager_global_ordinals": false for rarely aggregated fields
4. Use "ignore_above" on keyword fields to limit string length
If you need to run a specific query, temporarily increase the circuit breaker limit:
PUT /_cluster/settings
{
"transient": {
"indices.breaker.fielddata.limit": "70%"
}
}Or set an absolute value:
PUT /_cluster/settings
{
"transient": {
"indices.breaker.fielddata.limit": "8gb"
}
}Warning: Only increase this if you have sufficient heap memory. Monitor memory usage closely and return to safer limits after your operation completes.
Convert queries to use doc values instead of fielddata for better memory efficiency:
# Instead of sorting on a text field with fielddata:
GET /my_index/_search
{
"sort": [
{
"text_field.keyword": { # Use keyword field with doc values
"order": "asc"
}
}
]
}
# For aggregations, use keyword fields:
GET /my_index/_search
{
"aggs": {
"categories": {
"terms": {
"field": "category.keyword", # Keyword field uses doc values
"size": 10
}
}
}
}Doc values are stored on disk and loaded more efficiently than fielddata. They work well for keyword fields, dates, and numeric types.
Use Elasticsearch APIs to monitor fielddata usage and identify problematic fields:
# Check fielddata memory usage by field
curl -X GET "localhost:9200/_cat/fielddata?v&fields=*&bytes=b"
# Get detailed fielddata statistics
curl -X GET "localhost:9200/_stats/fielddata?fields=*&pretty"
# Check which indices use the most fielddata
curl -X GET "localhost:9200/_cat/indices?v&h=index,fielddata.memory_size&bytes=b&s=fielddata.memory_size:desc"Look for fields with high memory usage and consider:
1. Reducing cardinality (unique values)
2. Using keyword fields instead of text for aggregations
3. Implementing data retention policies for old indices
4. Sharding data across more indices to distribute load
Optimize your queries to reduce fielddata memory requirements:
# Use smaller page sizes for aggregations
GET /my_index/_search
{
"size": 0,
"aggs": {
"large_agg": {
"terms": {
"field": "category.keyword",
"size": 100, # Limit aggregation size
"shard_size": 1000 # Control shard-level aggregation size
}
}
}
}
# Use composite aggregations for large result sets
GET /my_index/_search
{
"size": 0,
"aggs": {
"large_composite": {
"composite": {
"size": 1000,
"sources": [
{
"category": {
"terms": {
"field": "category.keyword"
}
}
}
]
}
}
}
}Additional optimizations:
1. Add query filters to reduce dataset size
2. Use date math indices to query only relevant time ranges
3. Implement aggregation pagination with composite aggregations
4. Consider using runtime fields instead of loading fielddata
## Advanced Configuration and Monitoring
### Permanent Circuit Breaker Configuration
For production environments, set circuit breaker limits in elasticsearch.yml:
indices.breaker.fielddata.limit: 60%
indices.breaker.fielddata.overhead: 1.03
indices.breaker.request.limit: 40%
indices.breaker.total.limit: 70%### Fielddata vs Doc Values Trade-offs
- Fielddata: Built at query time, stored in heap, supports text field aggregations
- Doc values: Built at index time, stored on disk, more memory-efficient for keyword/numeric fields
- Hybrid approach: Use text fields for full-text search with keyword sub-fields for aggregations
### Monitoring with Elastic Stack
Set up alerts for circuit breaker trips:
PUT _watcher/watch/circuit_breaker_alert
{
"trigger": { "schedule": { "interval": "1m" } },
"input": {
"search": {
"request": {
"indices": [".monitoring-es-*"],
"body": {
"query": {
"bool": {
"filter": [
{ "range": { "timestamp": { "gte": "now-1m" } } },
{ "term": { "breaker.fielddata.tripped": 1 } }
]
}
}
}
}
}
},
"condition": { "compare": { "ctx.payload.hits.total": { "gt": 0 } } },
"actions": { "email_alert": { "email": { "to": "[email protected]" } } }
}### JVM Heap Sizing Considerations
- Fielddata circuit breaker default: 60% of JVM heap
- Ensure total heap is sized appropriately for your data volume
- Monitor GC pressure when increasing circuit breaker limits
- Consider using smaller heaps with more nodes for better isolation
### Alternative: Use Time Series Data Streams
For time-series data, use data streams with optimized mappings:
PUT _index_template/logs-template
{
"index_patterns": ["logs-*"],
"data_stream": {},
"template": {
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text" },
"service": {
"type": "text",
"fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
}
}
}
}
}QueryShardException: No mapping found for [field] in order to sort on
How to fix "QueryShardException: No mapping found for field in order to sort on" in Elasticsearch
IndexNotFoundException: no such index [index_name]
How to fix "IndexNotFoundException: no such index [index_name]" in Elasticsearch
DocumentMissingException: [index][type][id]: document missing
DocumentMissingException: Document missing
ParsingException: Unknown key for a START_OBJECT in [query]
How to fix "ParsingException: Unknown key for a START_OBJECT in [query]" in Elasticsearch
AggregationExecutionException: Aggregation [agg_name] does not support sampling
How to fix "AggregationExecutionException: Aggregation [agg_name] does not support sampling" in Elasticsearch