How to fix EsRejectedExecutionException: rejected execution of search in Elasticsearch

ElasticsearchINTERMEDIATEHIGH

This error occurs when Elasticsearch's search thread pool becomes saturated and cannot accept new search requests. The thread pool queue reaches its maximum capacity (typically 1000 tasks), forcing Elasticsearch to reject incoming search operations. This indicates the cluster is under heavy load and cannot keep up with request volume.

What this error means

The "EsRejectedExecutionException: rejected execution of search" error means that your Elasticsearch cluster is overwhelmed and cannot process a new search request. The cluster maintains a thread pool specifically for handling search operations, with a limited queue size. When this queue fills up completely, Elasticsearch must reject new search requests to prevent memory exhaustion and cluster instability. This error is not a bug—it's a protective mechanism. Elasticsearch is telling you that: 1. One or more nodes have exhausted their search thread pool queue capacity 2. The cluster cannot keep up with the volume of incoming search requests 3. The system is protecting itself from out-of-memory conditions and node failures The error typically appears as "EsThreadPoolExecutor[search, queue capacity = 1000, active threads = 7, queued tasks = 1000, completed tasks = ...]", showing that all 1000 queue slots are occupied and more requests are arriving. This is different from circuit breaker exceptions (which limit memory usage) or timeout exceptions—it's purely about request queue capacity.

How to fix "EsRejectedExecutionException: rejected execution of search"

1Check thread pool status and rejection statistics

First, examine which nodes are experiencing rejections and how severe the problem is:

bash

# Get detailed search thread pool statistics
curl -X GET "localhost:9200/_cat/thread_pool/search?v&h=id,name,active,queue,rejected,completed" -u "username:password"

# Example output:
# id    name    active  queue  rejected  completed
# node1 search  7       1000   245       7265472
# node2 search  2       50     3         7123456

# Get node stats with detailed thread pool info
curl -X GET "localhost:9200/_nodes/stats/thread_pool/search" -u "username:password" | jq '.nodes'

# Check for HIGH rejection rates
# If 'rejected' count is increasing rapidly, you have an active problem

Key metrics to look for:
- active: Number of currently executing search operations (should be <CPU cores * 3/2)
- queue: Number of pending search operations waiting to execute (limit is typically 1000)
- rejected: Total count of rejected operations (increasing = worsening problem)
- completed: Total successfully completed operations

If rejections are coming from multiple nodes, the cluster lacks capacity. If it's only one node, that node may have issues.

2Verify cluster resources and node health

Check if the cluster has adequate resources or if specific nodes are problematic:

bash

# Check CPU utilization per node
curl -X GET "localhost:9200/_cat/nodes?v&h=name,ip,cpu,heap.percent,ram.percent" -u "username:password"

# Check load on each node
curl -X GET "localhost:9200/_nodes/stats/os?filter_path=**.load_average" -u "username:password"

# Check JVM garbage collection frequency (GC pauses cause rejections)
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=**.gc" -u "username:password"

# Detailed node info
curl -X GET "localhost:9200/_nodes?pretty" -u "username:password"

Look for:
- High CPU usage (>80%): Node cannot execute searches fast enough
- High heap usage (>75%): JVM is under memory pressure, causing GC pauses
- High load average: System is CPU-bound
- Frequent GC events: Stop-the-world pauses are delaying searches

3Implement adaptive replica selection and request routing

Route requests away from overloaded nodes using Elasticsearch's adaptive replica selection:

bash

# Enable adaptive replica selection (ES 6.1+)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.use_adaptive_replica_selection": true
  }
}
'

# Verify it's enabled
curl -X GET "localhost:9200/_cluster/settings" -u "username:password"

This tells the coordinating node to route searches to healthy replicas rather than the nearest shard, minimizing impact from distressed nodes.

4Reduce search query load with pagination and filtering

Optimize your application queries to reduce execution time and memory usage:

json

// BAD: Large result sets with aggregations
{
  "size": 10000,
  "query": { "match_all": {} },
  "aggs": {
    "all_users": {
      "terms": {
        "field": "user_id.keyword",
        "size": 50000
      }
    }
  }
}

// GOOD: Small result sets with date filtering
{
  "size": 100,
  "query": {
    "bool": {
      "must": [
        { "range": { "timestamp": { "gte": "now-1h" } } }
      ]
    }
  },
  "aggs": {
    "top_users": {
      "terms": {
        "field": "user_id.keyword",
        "size": 100,
        "execution_hint": "map"
      }
    }
  }
}

// For pagination, use search_after instead of large 'from'
{
  "size": 100,
  "sort": [{ "timestamp": "desc" }, { "_id": "asc" }],
  "search_after": [1640995200000, "doc-id-123"],
  "query": { "range": { "timestamp": { "gte": "now-30d" } } }
}

Optimization strategies:
1. Reduce size: Request 100-1000 documents max, not 10,000+
2. Add filters: Time ranges, status filters to shrink dataset
3. Limit aggregations: Cap bucket counts to reasonable numbers (100-1000)
4. Use execution_hint: Add "execution_hint": "map" to terms aggregations
5. Pagination: Use search_after for deep pagination, never from: 100000

5Configure thread pool sizes and queue settings

Adjust thread pool configuration to handle your workload:

yaml

# In elasticsearch.yml
# The search thread pool size formula is: int((# of available_processors * 3) / 2) + 1

# For a 4-core node, default size = 7 threads
# For an 8-core node, default size = 13 threads

# Configure search thread pool
thread_pool:
  search:
    type: fixed  # Don't change type, it's always fixed for search
    size: 13     # For 8 cores: (8 * 3 / 2) + 1 = 13
    queue_size: 1000  # Default, rarely needs adjustment

# If you want to accept more queued requests (not recommended):
# thread_pool:
#   search:
#     queue_size: 2000

# You can also add search_throttled pool for background searches
  search_throttled:
    type: fixed
    size: 3
    queue_size: 100

Important: Do NOT increase queue_size without addressing root cause. A larger queue just delays the inevitable and masks the real problem.

bash

# Reload configuration without restart (for supported settings)
curl -X PUT "localhost:9200/_cluster/settings" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "thread_pool.search.queue_size": 1000
  }
}
'

The thread pool size is based on CPU cores. To scale, add more nodes with more CPU cores rather than changing pool configuration.

6Implement client-side retry logic with exponential backoff

Handle 429 responses gracefully in your application:

javascript

// JavaScript/Node.js example with exponential backoff
async function searchWithRetry(query, maxRetries = 5) {
  let retries = 0;
  let backoff = 100; // Start with 100ms

  while (retries < maxRetries) {
    try {
      const response = await client.search({
        index: 'my-index',
        body: query
      });
      return response;
    } catch (error) {
      if (error.statusCode === 429 && retries < maxRetries) {
        retries++;
        const delay = backoff * Math.pow(2, retries - 1); // Exponential backoff
        console.log(`Rejected, retrying in ${delay}ms (attempt ${retries}/${maxRetries})`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded');
}

// Usage
try {
  const results = await searchWithRetry({
    query: { match_all: {} },
    size: 100
  });
  console.log(results);
} catch (error) {
  console.error('Search failed:', error.message);
}

python

# Python example with retry logic
import time
from elasticsearch import Elasticsearch
from elasticsearch.exceptions import ConnectionError

client = Elasticsearch(['http://localhost:9200'])

def search_with_retry(query, max_retries=5):
    retries = 0
    backoff = 0.1  # Start with 100ms

    while retries < max_retries:
        try:
            return client.search(index='my-index', body=query)
        except ConnectionError as e:
            if '429' in str(e) and retries < max_retries:
                retries += 1
                delay = backoff * (2 ** (retries - 1))
                print(f'Rejected, retrying in {delay:.1f}s (attempt {retries}/{max_retries})')
                time.sleep(delay)
            else:
                raise

# Usage
try:
    results = search_with_retry({'query': {'match_all': {}}, 'size': 100})
except Exception as e:
    print(f'Search failed: {e}')

Key principles:
1. Exponential backoff: Wait 100ms, 200ms, 400ms, 800ms, etc.
2. Jitter: Add randomness to prevent thundering herd
3. Max retries: Stop after 5 retries to avoid infinite loops
4. Circuit breaker pattern: Temporarily stop retries if cluster is overloaded

7Add more nodes or scale cluster resources

If optimization and retries don't resolve the issue, scale your cluster:

yaml

# Vertical scaling: Increase heap size (elasticsearch.yml)
# Recommended: 50% of available RAM, not exceeding 32GB
-Xms16g
-Xmx16g

# You must restart the node after changing heap size

# Check current heap
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=**.mem" -u "username:password"

bash

# Horizontal scaling: Add more nodes to the cluster
# Each new node adds:
# - More CPU cores for search threads
# - More heap for caching
# - More processing capacity overall

# After adding nodes, check cluster status
curl -X GET "localhost:9200/_cluster/health?pretty" -u "username:password"

# Check new node has joined
curl -X GET "localhost:9200/_cat/nodes?v" -u "username:password"

# Rebalance shards across new nodes
curl -X POST "localhost:9200/_cluster/reroute?retry_failed" -u "username:password"

Scaling options:
1. Add data nodes: Increases search processing capacity
2. Add dedicated coordinating nodes: Handles search request routing
3. Increase heap: Allows more caching (up to 32GB max)
4. Increase CPU cores: More threads for search processing
5. Use dedicated search clusters: Separate search from indexing traffic

8Monitor and alert on rejection rates

Set up continuous monitoring to prevent rejections:

bash

# Create a monitoring query to track rejection rate
curl -X POST "localhost:9200/_search" -u "username:password" -H 'Content-Type: application/json' -d'
{
  "index": ".monitoring-es-*",
  "query": {
    "bool": {
      "must": [
        { "term": { "type": "node_stats" } },
        { "exists": { "field": "node_stats.thread_pool.search.rejected" } }
      ]
    }
  },
  "size": 100,
  "sort": [{ "timestamp": { "order": "desc" } }]
}
'

# Check rejection trend
curl -X GET "localhost:9200/_nodes/stats/thread_pool/search?filter_path=**.rejected" -u "username:password"

Set up alerts for:
- Rejected count increasing rapidly (more than 10 in 5 minutes)
- Active thread pool at maximum (all threads busy)
- Queue size near capacity (>80% of 1000)
- Specific nodes with disproportionate rejections (load balancing issue)

Example alert configuration:

yaml

# With Elasticsearch alerting or external monitoring tool
# Alert when search rejections > 10 in 5 minutes
- alert: ElasticsearchSearchRejections
  expr: rate(elasticsearch_thread_pool_search_rejected[5m]) > 10
  for: 5m
  annotations:
    summary: "Elasticsearch is rejecting search requests"
    description: "{{ $labels.instance }} is rejecting search requests"

Advanced notes

## Advanced Troubleshooting for Search Rejections

### Understanding Thread Pool Capacity Formula

The search thread pool size is automatically calculated based on CPU cores:
- Formula: int((number_of_processors * 3) / 2) + 1
- 2 cores: (2 * 3 / 2) + 1 = 4 threads
- 4 cores: (4 * 3 / 2) + 1 = 7 threads
- 8 cores: (8 * 3 / 2) + 1 = 13 threads
- 16 cores: (16 * 3 / 2) + 1 = 25 threads

This formula is conservative to prevent resource exhaustion. The queue has a fixed size of 1000 by default.

### Why Rejections Happen

1. Queue fills up when search threads execute queries slower than new requests arrive
2. Thread execution time depends on:
- Query complexity (must evaluate all shards and aggregations)
- Data size (more docs = more processing)
- Memory pressure (garbage collection pauses slow execution)
- CPU contention (other processes competing for CPU)

### Diagnosing the Root Cause

If rejection happens on ONE node:
- Check if that node has higher load or CPU usage
- Verify network connectivity to the node
- Check if the node is performing heavy background tasks
- Solution: Add dedicated coordinating node or rebalance load

If rejection happens on ALL nodes DURING TRAFFIC SPIKE:
- Application is sending more requests than cluster can handle
- Cluster has reached capacity
- Solution: Reduce query rate or add more nodes

If rejection happens on ALL nodes CONSTANTLY:
- Cluster capacity is permanently exceeded
- Queries are too slow or heavy
- Solution: Optimize queries or scale cluster

### Slow Query Diagnosis

Enable slow query logging to identify problematic searches:

yaml

# In elasticsearch.yml
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 0s
index.search.slowlog.level: info

Then check logs for queries that take >2 seconds—these are blocking the thread pool.

### JVM Garbage Collection Issues

High garbage collection frequency causes thread pool stalls:

bash

# Check GC statistics
curl -X GET "localhost:9200/_nodes/stats/jvm?filter_path=**.gc" -u "username:password"

# Look for:
# - High 'collection_count' = GC running frequently
# - High 'collection_time_in_millis' = Long pause times

GC pause times should be <100ms. If pauses are longer, the heap is too small or there's memory leaks.

### Load Balancing Best Practices

For client-side load balancing:
- Use connection pooling with health checks
- Route to healthy nodes first
- Avoid batching too many requests in a single batch
- Implement request batching with reasonable sizes (10-100 requests max per batch)

For server-side:
- Enable adaptive replica selection
- Use dedicated coordinating nodes for search traffic
- Separate search clusters from indexing clusters if needed

### Search Template Optimization

Use search templates for complex queries:

bash

# Define a search template
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "bool": {
          "must": [
            { "match": { "title": "{{search_term}}" } },
            { "range": { "date": { "gte": "{{start_date}}", "lte": "{{end_date}}" } } }
          ]
        }
      },
      "size": "{{size}}"
    }
  }
}

# Use the template (faster, reduces parsing overhead)
POST /my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "search_term": "elasticsearch",
    "start_date": "2024-01-01",
    "end_date": "2024-12-31",
    "size": 100
  }
}

### Production Recommendations

1. Separate search and indexing nodes if you have both heavy workloads
2. Use data nodes with different roles: search-optimized vs ingest-optimized
3. Implement request rate limiting in your application
4. Monitor and alert on queue depth before rejections occur
5. Test load and capacity before production deployment
6. Use caching (query cache, request cache) to reduce duplicate work
7. Archive old data to reduce search scope
8. Use filters instead of queries where possible (filters are cached)

How to fix EsRejectedExecutionException: rejected execution of search in Elasticsearch

What this error means

Typical symptoms

Common causes

How to fix "EsRejectedExecutionException: rejected execution of search"

Advanced notes

Related errors

Official resources & further reading