How to fix CircuitBreakingException: [request] Data too large in Elasticsearch

ElasticsearchINTERMEDIATEHIGH

This error occurs when Elasticsearch's circuit breaker mechanism prevents a request from executing because it would exceed memory limits. Circuit breakers protect the cluster from out-of-memory errors by monitoring memory usage and rejecting requests that would push the system beyond configured thresholds.

What this error means

The "CircuitBreakingException: [request] Data too large" error is a protective mechanism in Elasticsearch that prevents the cluster from running out of memory. Elasticsearch uses circuit breakers to monitor memory usage across various components (request, fielddata, in-flight requests, etc.) and will reject operations that would exceed configured memory limits. The "[request]" circuit breaker specifically tracks the memory used by incoming requests before they are processed. When a request (like a search, aggregation, or bulk operation) is estimated to require more memory than the available limit, Elasticsearch rejects it with this error to prevent the entire node from running out of memory and becoming unstable. This is a safety feature, not a bug - it's Elasticsearch protecting itself from memory exhaustion that could cause node failures, long garbage collection pauses, or cluster instability. The error typically appears when: 1. Processing very large search results with extensive aggregations 2. Loading massive fielddata caches for text analysis 3. Executing complex script queries 4. Handling bulk requests with very large documents 5. Running memory-intensive operations on resource-constrained nodes

How to fix "CircuitBreakingException: [request] Data too large"

1Check current circuit breaker settings and memory usage

First, examine your current circuit breaker settings and memory usage to understand the limits being exceeded:

bash

# Check circuit breaker settings
curl -X GET "localhost:9200/_nodes/stats/breaker" -u "username:password"

# Check JVM heap usage
curl -X GET "localhost:9200/_nodes/stats/jvm" -u "username:password"

# Check fielddata memory usage
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata" -u "username:password"

# Example response showing circuit breaker information:
{
  "nodes": {
    "node-id": {
      "breakers": {
        "request": {
          "limit_size": "6.4gb",
          "limit_size_in_bytes": 6871947673,
          "estimated_size": "5.2gb",
          "estimated_size_in_bytes": 5583457484,
          "overhead": 1.0,
          "tripped": 3
        }
      }
    }
  }
}

Look for:
- limit_size: The configured limit for each circuit breaker
- estimated_size: Current memory usage
- tripped: Number of times the breaker has tripped
- overhead: The multiplier applied to estimated memory

2Optimize your search queries and aggregations

Reduce memory usage by optimizing your queries:

json

// BEFORE: Memory-intensive query
{
  "size": 10000,
  "query": { "match_all": {} },
  "aggs": {
    "all_terms": {
      "terms": {
        "field": "user_id.keyword",
        "size": 10000  // Too many buckets
      }
    }
  }
}

// AFTER: Optimized query with limits
{
  "size": 100,  // Reduce result size
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1h"  // Add time range
      }
    }
  },
  "aggs": {
    "sampled_terms": {
      "terms": {
        "field": "user_id.keyword",
        "size": 100,  // Reasonable bucket limit
        "execution_hint": "map"  // Use map execution
      }
    }
  }
}

// Use search_after for deep pagination instead of large 'size'
{
  "size": 100,
  "sort": [{"timestamp": "desc"}, {"_id": "asc"}],
  "search_after": [1640995200000, "doc-id-123"]
}

Key optimizations:
1. Add filters: Use date ranges, term filters to reduce dataset size
2. Limit aggregation buckets: Set reasonable size parameters
3. Use composite aggregations for large cardinality fields
4. Enable `doc_values` for numeric/date fields instead of fielddata
5. Use `search_after` for pagination instead of large from/size

3Adjust circuit breaker limits appropriately

Increase circuit breaker limits if your workload genuinely needs more memory:

yaml

# In elasticsearch.yml
indices.breaker.request.limit: 60%  # Default is 60% of heap
indices.breaker.fielddata.limit: 40%  # Default is 40% of heap
indices.breaker.total.limit: 70%  # Default is 70% of heap

# For specific use cases, you can increase:
indices.breaker.request.limit: 70%
indices.breaker.fielddata.limit: 50%

# Or set absolute values
indices.breaker.request.limit: 8gb

bash

# Update settings dynamically (doesn't require restart)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "indices.breaker.request.limit": "70%",
    "indices.breaker.fielddata.limit": "50%"
  }
}
'

# Verify the update
curl -X GET "localhost:9200/_cluster/settings" -u "username:password"

Important considerations:
- Total circuit breaker usage shouldn't exceed 95% of heap
- Increasing limits too much can cause out-of-memory errors
- Monitor memory usage after increasing limits
- Consider increasing heap size if limits are consistently hit

4Manage fielddata and doc_values memory usage

Fielddata is a common source of memory issues. Optimize field usage:

json

// Mapping optimization for text fields
PUT /my-index
{
  "mappings": {
    "properties": {
      "product_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "doc_values": true,  # Use doc_values instead of fielddata
            "ignore_above": 256
          }
        }
      },
      "category": {
        "type": "keyword",  # keyword fields use doc_values by default
        "doc_values": true
      },
      "description": {
        "type": "text",
        "fielddata": false  # Explicitly disable fielddata
      }
    }
  }
}

// Clear fielddata cache if it's consuming too much memory
POST /my-index/_cache/clear?fielddata=true

// Check fielddata memory usage by field
GET /_cat/fielddata?v&fields=*

// Limit fielddata circuit breaker
PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.fielddata.limit": "30%"
  }
}

Best practices:
1. Use keyword fields for aggregations and sorting instead of text fields
2. Enable doc_values for numeric, date, and keyword fields
3. Set `fielddata: false` on text fields not used for aggregations
4. Use `eager_global_ordinals: false` for high-cardinality fields
5. Monitor fielddata usage with _cat/fielddata

5Implement memory monitoring and alerting

Set up monitoring to catch memory issues before they cause errors:

bash

# Monitor circuit breaker trips
curl -X GET "localhost:9200/_nodes/stats/breaker?filter_path=**.tripped" -u "username:password"

# Monitor heap usage
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=**.heap_*" -u "username:password"

# Set up alerting with Elasticsearch alerts or external monitoring
# Example: Alert when circuit breaker trips exceed threshold

# Check for memory-intensive shards
GET /_cat/shards?v&h=index,shard,prirep,state,docs,store,ip&s=store:desc

# Identify indices consuming the most memory
GET /_cat/indices?v&h=index,docs.count,store.size,pri.store.size&s=store.size:desc

yaml

# Example monitoring configuration
# Use Elastic Stack monitoring or external tools like:
# - Prometheus with Elasticsearch exporter
# - Datadog Elasticsearch integration
# - New Relic Elasticsearch plugin

monitoring:
  enabled: true
  collection:
    interval: 10s

xpack:
  monitoring:
    enabled: true
    elasticsearch:
      collection:
        enabled: true
        interval: 10s

Key metrics to monitor:
1. Circuit breaker trip count (increasing trends indicate problems)
2. Heap usage percentage (alert above 75%)
3. Fielddata memory usage
4. Request cache hit ratio
5. GC frequency and duration

6Scale your cluster and optimize resource allocation

If memory issues persist, consider scaling your cluster:

yaml

# Increase heap size (elasticsearch.yml)
# Recommended: 50% of available RAM, not exceeding 32GB
-Xms16g
-Xmx16g

# Adjust thread pool sizes
thread_pool:
  search:
    size: 10  # Adjust based on CPU cores
    queue_size: 1000

# Consider adding more nodes to distribute load
# Horizontal scaling is often better than vertical scaling

# Use dedicated coordinating nodes for heavy search loads
node.roles: [coordinating_only]

# Use dedicated data nodes
node.roles: [data]

# Use dedicated master nodes
node.roles: [master]

bash

# Check current node roles
GET /_cat/nodes?v&h=name,node.role,heap.percent,ram.percent,cpu

# Consider index lifecycle management for time-series data
# Move old indices to colder, cheaper storage
PUT _ilm/policy/hot_warm_cold
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 1
          }
        }
      },
      "cold": {
        "min_age": "90d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      }
    }
  }
}

Scaling strategies:
1. Vertical scaling: Increase heap size (up to 32GB max)
2. Horizontal scaling: Add more nodes to the cluster
3. Role separation: Use dedicated coordinating, data, and master nodes
4. Index lifecycle: Archive old data to reduce active dataset size

How to fix CircuitBreakingException: [request] Data too large in Elasticsearch

What this error means

Typical symptoms

Common causes

How to fix "CircuitBreakingException: [request] Data too large"

Advanced notes

Related errors

Official resources & further reading