How to fix CircuitBreakingException: [fielddata] Data too large in Elasticsearch

ElasticsearchINTERMEDIATEMEDIUM

The Elasticsearch fielddata circuit breaker triggers when aggregations or sorting operations attempt to load too much field data into memory. This error occurs when the estimated memory needed for field data exceeds the configured circuit breaker limit, preventing memory exhaustion and node crashes.

What this error means

The CircuitBreakingException with "[fielddata] Data too large" is Elasticsearch's memory protection mechanism preventing field data operations from consuming excessive heap memory. Field data is used for aggregations, sorting, and script fields on text fields, loading all values for a field into memory. When Elasticsearch estimates that loading field data would exceed the configured circuit breaker limit (default 60% of heap), it throws this exception to protect the node from OutOfMemoryErrors. This error typically occurs during complex aggregations, sorting on large text fields, or when using script fields that require field data. The circuit breaker is a safety feature that prevents single queries from overwhelming the JVM heap, which could cause node instability or crashes. The error message indicates that the operation you're attempting would require more field data memory than allowed by the circuit breaker settings. This doesn't necessarily mean your cluster is out of memory overall, but that this specific operation exceeds the safety threshold for field data operations.

How to fix "CircuitBreakingException: [fielddata] Data too large"

1Check current circuit breaker settings

First, examine your current circuit breaker configuration to understand the limits:

bash

# Check circuit breaker settings
curl -X GET "localhost:9200/_nodes/stats/breaker?pretty"

# Look for the fielddata circuit breaker section
# Expected output includes:
# "fielddata" : {
#   "limit_size_in_bytes" : 6442450944,
#   "limit_size" : "6gb",
#   "estimated_size_in_bytes" : 0,
#   "estimated_size" : "0b",
#   "overhead" : 1.03,
#   "tripped" : 0
# }

The limit_size shows the maximum allowed fielddata memory (default 60% of heap). The tripped count indicates how many times the breaker has triggered.

2Optimize field mappings to reduce memory usage

Review and optimize your field mappings to minimize fielddata memory consumption:

json

PUT /my_index/_mapping
{
  "properties": {
    "large_text_field": {
      "type": "text",
      "fielddata": false,  # Disable fielddata if not needed for aggregations
      "fields": {
        "keyword": {
          "type": "keyword",  # Use keyword for aggregations instead
          "ignore_above": 256
        }
      }
    }
  }
}

Key optimizations:
1. Set "fielddata": false on text fields not used for aggregations
2. Use keyword type fields for aggregation and sorting
3. Consider using "eager_global_ordinals": false for rarely aggregated fields
4. Use "ignore_above" on keyword fields to limit string length

3Increase fielddata circuit breaker limit temporarily

If you need to run a specific query, temporarily increase the circuit breaker limit:

json

PUT /_cluster/settings
{
  "transient": {
    "indices.breaker.fielddata.limit": "70%"
  }
}

Or set an absolute value:

json

PUT /_cluster/settings
{
  "transient": {
    "indices.breaker.fielddata.limit": "8gb"
  }
}

Warning: Only increase this if you have sufficient heap memory. Monitor memory usage closely and return to safer limits after your operation completes.

4Use doc values instead of fielddata where possible

Convert queries to use doc values instead of fielddata for better memory efficiency:

json

# Instead of sorting on a text field with fielddata:
GET /my_index/_search
{
  "sort": [
    {
      "text_field.keyword": {  # Use keyword field with doc values
        "order": "asc"
      }
    }
  ]
}

# For aggregations, use keyword fields:
GET /my_index/_search
{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",  # Keyword field uses doc values
        "size": 10
      }
    }
  }
}

Doc values are stored on disk and loaded more efficiently than fielddata. They work well for keyword fields, dates, and numeric types.

5Monitor and analyze fielddata usage

Use Elasticsearch APIs to monitor fielddata usage and identify problematic fields:

bash

# Check fielddata memory usage by field
curl -X GET "localhost:9200/_cat/fielddata?v&fields=*&bytes=b"

# Get detailed fielddata statistics
curl -X GET "localhost:9200/_stats/fielddata?fields=*&pretty"

# Check which indices use the most fielddata
curl -X GET "localhost:9200/_cat/indices?v&h=index,fielddata.memory_size&bytes=b&s=fielddata.memory_size:desc"

Look for fields with high memory usage and consider:
1. Reducing cardinality (unique values)
2. Using keyword fields instead of text for aggregations
3. Implementing data retention policies for old indices
4. Sharding data across more indices to distribute load

6Implement query optimization and pagination

Optimize your queries to reduce fielddata memory requirements:

json

# Use smaller page sizes for aggregations
GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "large_agg": {
      "terms": {
        "field": "category.keyword",
        "size": 100,  # Limit aggregation size
        "shard_size": 1000  # Control shard-level aggregation size
      }
    }
  }
}

# Use composite aggregations for large result sets
GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "large_composite": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "category": {
              "terms": {
                "field": "category.keyword"
              }
            }
          }
        ]
      }
    }
  }
}

Additional optimizations:
1. Add query filters to reduce dataset size
2. Use date math indices to query only relevant time ranges
3. Implement aggregation pagination with composite aggregations
4. Consider using runtime fields instead of loading fielddata

Advanced notes

## Advanced Configuration and Monitoring

### Permanent Circuit Breaker Configuration
For production environments, set circuit breaker limits in elasticsearch.yml:

yaml

indices.breaker.fielddata.limit: 60%
indices.breaker.fielddata.overhead: 1.03
indices.breaker.request.limit: 40%
indices.breaker.total.limit: 70%

### Fielddata vs Doc Values Trade-offs
- Fielddata: Built at query time, stored in heap, supports text field aggregations
- Doc values: Built at index time, stored on disk, more memory-efficient for keyword/numeric fields
- Hybrid approach: Use text fields for full-text search with keyword sub-fields for aggregations

### Monitoring with Elastic Stack
Set up alerts for circuit breaker trips:

json

PUT _watcher/watch/circuit_breaker_alert
{
  "trigger": { "schedule": { "interval": "1m" } },
  "input": {
    "search": {
      "request": {
        "indices": [".monitoring-es-*"],
        "body": {
          "query": {
            "bool": {
              "filter": [
                { "range": { "timestamp": { "gte": "now-1m" } } },
                { "term": { "breaker.fielddata.tripped": 1 } }
              ]
            }
          }
        }
      }
    }
  },
  "condition": { "compare": { "ctx.payload.hits.total": { "gt": 0 } } },
  "actions": { "email_alert": { "email": { "to": "[email protected]" } } }
}

### JVM Heap Sizing Considerations
- Fielddata circuit breaker default: 60% of JVM heap
- Ensure total heap is sized appropriately for your data volume
- Monitor GC pressure when increasing circuit breaker limits
- Consider using smaller heaps with more nodes for better isolation

### Alternative: Use Time Series Data Streams
For time-series data, use data streams with optimized mappings:

json

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" },
        "service": {
          "type": "text",
          "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
        }
      }
    }
  }
}

How to fix CircuitBreakingException: [fielddata] Data too large in Elasticsearch

What this error means

Typical symptoms

Common causes

How to fix "CircuitBreakingException: [fielddata] Data too large"

Advanced notes

Related errors

Official resources & further reading