This error occurs when Elasticsearch's circuit breaker mechanism prevents a request from executing because it would exceed memory limits. Circuit breakers protect the cluster from out-of-memory errors by monitoring memory usage and rejecting requests that would push the system beyond configured thresholds.
The "CircuitBreakingException: [request] Data too large" error is a protective mechanism in Elasticsearch that prevents the cluster from running out of memory. Elasticsearch uses circuit breakers to monitor memory usage across various components (request, fielddata, in-flight requests, etc.) and will reject operations that would exceed configured memory limits. The "[request]" circuit breaker specifically tracks the memory used by incoming requests before they are processed. When a request (like a search, aggregation, or bulk operation) is estimated to require more memory than the available limit, Elasticsearch rejects it with this error to prevent the entire node from running out of memory and becoming unstable. This is a safety feature, not a bug - it's Elasticsearch protecting itself from memory exhaustion that could cause node failures, long garbage collection pauses, or cluster instability. The error typically appears when: 1. Processing very large search results with extensive aggregations 2. Loading massive fielddata caches for text analysis 3. Executing complex script queries 4. Handling bulk requests with very large documents 5. Running memory-intensive operations on resource-constrained nodes
First, examine your current circuit breaker settings and memory usage to understand the limits being exceeded:
# Check circuit breaker settings
curl -X GET "localhost:9200/_nodes/stats/breaker" -u "username:password"
# Check JVM heap usage
curl -X GET "localhost:9200/_nodes/stats/jvm" -u "username:password"
# Check fielddata memory usage
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata" -u "username:password"
# Example response showing circuit breaker information:
{
"nodes": {
"node-id": {
"breakers": {
"request": {
"limit_size": "6.4gb",
"limit_size_in_bytes": 6871947673,
"estimated_size": "5.2gb",
"estimated_size_in_bytes": 5583457484,
"overhead": 1.0,
"tripped": 3
}
}
}
}
}Look for:
- limit_size: The configured limit for each circuit breaker
- estimated_size: Current memory usage
- tripped: Number of times the breaker has tripped
- overhead: The multiplier applied to estimated memory
Reduce memory usage by optimizing your queries:
// BEFORE: Memory-intensive query
{
"size": 10000,
"query": { "match_all": {} },
"aggs": {
"all_terms": {
"terms": {
"field": "user_id.keyword",
"size": 10000 // Too many buckets
}
}
}
}
// AFTER: Optimized query with limits
{
"size": 100, // Reduce result size
"query": {
"range": {
"timestamp": {
"gte": "now-1h" // Add time range
}
}
},
"aggs": {
"sampled_terms": {
"terms": {
"field": "user_id.keyword",
"size": 100, // Reasonable bucket limit
"execution_hint": "map" // Use map execution
}
}
}
}
// Use search_after for deep pagination instead of large 'size'
{
"size": 100,
"sort": [{"timestamp": "desc"}, {"_id": "asc"}],
"search_after": [1640995200000, "doc-id-123"]
}Key optimizations:
1. Add filters: Use date ranges, term filters to reduce dataset size
2. Limit aggregation buckets: Set reasonable size parameters
3. Use composite aggregations for large cardinality fields
4. Enable `doc_values` for numeric/date fields instead of fielddata
5. Use `search_after` for pagination instead of large from/size
Increase circuit breaker limits if your workload genuinely needs more memory:
# In elasticsearch.yml
indices.breaker.request.limit: 60% # Default is 60% of heap
indices.breaker.fielddata.limit: 40% # Default is 40% of heap
indices.breaker.total.limit: 70% # Default is 70% of heap
# For specific use cases, you can increase:
indices.breaker.request.limit: 70%
indices.breaker.fielddata.limit: 50%
# Or set absolute values
indices.breaker.request.limit: 8gb# Update settings dynamically (doesn't require restart)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
"persistent": {
"indices.breaker.request.limit": "70%",
"indices.breaker.fielddata.limit": "50%"
}
}
'
# Verify the update
curl -X GET "localhost:9200/_cluster/settings" -u "username:password"Important considerations:
- Total circuit breaker usage shouldn't exceed 95% of heap
- Increasing limits too much can cause out-of-memory errors
- Monitor memory usage after increasing limits
- Consider increasing heap size if limits are consistently hit
Fielddata is a common source of memory issues. Optimize field usage:
// Mapping optimization for text fields
PUT /my-index
{
"mappings": {
"properties": {
"product_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"doc_values": true, # Use doc_values instead of fielddata
"ignore_above": 256
}
}
},
"category": {
"type": "keyword", # keyword fields use doc_values by default
"doc_values": true
},
"description": {
"type": "text",
"fielddata": false # Explicitly disable fielddata
}
}
}
}
// Clear fielddata cache if it's consuming too much memory
POST /my-index/_cache/clear?fielddata=true
// Check fielddata memory usage by field
GET /_cat/fielddata?v&fields=*
// Limit fielddata circuit breaker
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.fielddata.limit": "30%"
}
}Best practices:
1. Use keyword fields for aggregations and sorting instead of text fields
2. Enable doc_values for numeric, date, and keyword fields
3. Set `fielddata: false` on text fields not used for aggregations
4. Use `eager_global_ordinals: false` for high-cardinality fields
5. Monitor fielddata usage with _cat/fielddata
Set up monitoring to catch memory issues before they cause errors:
# Monitor circuit breaker trips
curl -X GET "localhost:9200/_nodes/stats/breaker?filter_path=**.tripped" -u "username:password"
# Monitor heap usage
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=**.heap_*" -u "username:password"
# Set up alerting with Elasticsearch alerts or external monitoring
# Example: Alert when circuit breaker trips exceed threshold
# Check for memory-intensive shards
GET /_cat/shards?v&h=index,shard,prirep,state,docs,store,ip&s=store:desc
# Identify indices consuming the most memory
GET /_cat/indices?v&h=index,docs.count,store.size,pri.store.size&s=store.size:desc# Example monitoring configuration
# Use Elastic Stack monitoring or external tools like:
# - Prometheus with Elasticsearch exporter
# - Datadog Elasticsearch integration
# - New Relic Elasticsearch plugin
monitoring:
enabled: true
collection:
interval: 10s
xpack:
monitoring:
enabled: true
elasticsearch:
collection:
enabled: true
interval: 10sKey metrics to monitor:
1. Circuit breaker trip count (increasing trends indicate problems)
2. Heap usage percentage (alert above 75%)
3. Fielddata memory usage
4. Request cache hit ratio
5. GC frequency and duration
If memory issues persist, consider scaling your cluster:
# Increase heap size (elasticsearch.yml)
# Recommended: 50% of available RAM, not exceeding 32GB
-Xms16g
-Xmx16g
# Adjust thread pool sizes
thread_pool:
search:
size: 10 # Adjust based on CPU cores
queue_size: 1000
# Consider adding more nodes to distribute load
# Horizontal scaling is often better than vertical scaling
# Use dedicated coordinating nodes for heavy search loads
node.roles: [coordinating_only]
# Use dedicated data nodes
node.roles: [data]
# Use dedicated master nodes
node.roles: [master]# Check current node roles
GET /_cat/nodes?v&h=name,node.role,heap.percent,ram.percent,cpu
# Consider index lifecycle management for time-series data
# Move old indices to colder, cheaper storage
PUT _ilm/policy/hot_warm_cold
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "30d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 1
}
}
},
"cold": {
"min_age": "90d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
}
}
}
}Scaling strategies:
1. Vertical scaling: Increase heap size (up to 32GB max)
2. Horizontal scaling: Add more nodes to the cluster
3. Role separation: Use dedicated coordinating, data, and master nodes
4. Index lifecycle: Archive old data to reduce active dataset size
## Advanced Circuit Breaker Configuration
### Understanding Circuit Breaker Types
Elasticsearch has several circuit breakers:
1. Request breaker: Tracks memory used by incoming requests before processing
2. Fielddata breaker: Limits memory used by fielddata caches
3. In-flight requests breaker: Tracks memory of currently executing requests
4. Accounting breaker: Tracks memory used by Lucene segment writers
5. Parent breaker: Ensures total memory usage stays within limits
### Memory Estimation Overhead
Circuit breakers apply an "overhead" multiplier (default 1.0-2.0) to estimated memory to account for:
- Serialization/deserialization costs
- Object overhead in JVM
- Conservative safety margins
### Script Circuit Breakers
For script-heavy workloads, configure script-specific breakers:
script.max_compilations_rate: 75/5m
script.cache.max_size: 100
script.cache.expire: 10m### Monitoring and Tuning
Use the Circuit Breaker Stats API for detailed monitoring:
GET /_nodes/stats/breaker?human&pretty### Production Recommendations
1. Heap size: 50% of RAM, max 32GB (due to JVM pointer compression)
2. Request breaker: 60-70% of heap for search-heavy clusters
3. Fielddata breaker: 30-40% of heap, lower if using mostly keyword fields
4. Monitor trip counts: Alert on increasing trends
5. Test with production-like data: Circuit breaker behavior depends on data distribution
### Alternative: Use Search-as-you-type
For autocomplete/search use cases that cause memory issues, consider:
- Search-as-you-type fields
- N-gram tokenizers
- Completion suggesters
- These avoid fielddata entirely while providing fast prefix matching
IllegalStateException: There are no ingest nodes in this cluster, unable to forward request to an ingest node
How to fix "There are no ingest nodes in this cluster" in Elasticsearch
ConnectException: Connection refused
How to fix "ConnectException: Connection refused" in Elasticsearch
NodeDisconnectedException: [node] disconnected
How to fix "NodeDisconnectedException: [node] disconnected" in Elasticsearch
SnapshotException: [repository:snapshot] Snapshot could not be read
How to fix "SnapshotException: [repository:snapshot] Snapshot could not be read" in Elasticsearch
AccessDeniedException: action [cluster:admin/settings/update] is unauthorized
AccessDeniedException: action cluster:admin/settings/update is unauthorized