How to fix ReceiveTimeoutTransportException: [node] request_id timed out in Elasticsearch

ElasticsearchADVANCEDHIGH

This timeout exception occurs when Elasticsearch nodes fail to receive responses within the expected timeout period during inter-node communication, typically caused by network latency, high cluster load, or resource constraints.

What this error means

The ReceiveTimeoutTransportException indicates that an Elasticsearch node attempted to communicate with another node in the cluster but did not receive a response within the configured timeout window. This is an inter-node communication error, not a query timeout. Elasticsearch uses a transport layer for all internal cluster communication - things like forwarding search requests, replicating documents, and coordinating cluster state. When a node sends a request to another node and doesn't get a response before the timeout expires, this exception is thrown. This error often points to underlying infrastructure or performance issues rather than configuration problems. The timeout is a symptom - the root cause could be network instability, resource exhaustion on the receiving node, or the cluster being overwhelmed with requests.

How to fix "ReceiveTimeoutTransportException: [node] request_id timed out"

1Check cluster health and node connectivity

First, verify which nodes are experiencing timeouts and check overall cluster health:

bash

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check node stats to identify struggling nodes
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Check pending tasks that might be blocking
curl -X GET "localhost:9200/_cluster/pending_tasks?pretty"

Look for nodes with high CPU, memory usage, or pending tasks. Check network connectivity between nodes using ping and ensure all nodes can reach each other.

2Review Elasticsearch logs for patterns

Examine the Elasticsearch logs to understand the frequency and pattern of timeouts:

bash

# On each node, check logs for timeout errors
tail -n 500 /var/log/elasticsearch/elasticsearch.log | grep -i "ReceiveTimeoutTransportException"

# Look for the specific node and request_id that's timing out
grep "request_id timed out" /var/log/elasticsearch/elasticsearch.log

Note which nodes are consistently timing out, what operations were being performed, and if there's a correlation with high load periods.

3Monitor resource usage on affected nodes

Use monitoring tools to check resource consumption:

bash

# Check CPU and memory
top -b -n 1 | head -n 20

# Check disk I/O
iostat -x 1 5

# Check JVM garbage collection stats
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

If you see high CPU (>80%), low available memory, or frequent/long GC pauses (>1s), you have resource constraints that need addressing before adjusting timeouts.

4Increase transport timeout settings

If network latency is unavoidable (e.g., cross-region clusters), increase the timeout values in elasticsearch.yml:

yaml

# Increase connection timeout (default: 30s)
transport.tcp.connect_timeout: 60s

# Configure TCP keep-alive to detect dead connections
transport.tcp.keep_alive: true

# Enable transport ping to keep connections alive
transport.ping_schedule: 5s

Apply to all nodes and perform a rolling restart:

bash

# Disable shard allocation before restarting
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}'

# Restart node, wait for it to rejoin
systemctl restart elasticsearch

# Re-enable allocation after all nodes restarted
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}'

5Optimize cluster performance

Address the underlying performance issues causing slow responses:

For high CPU usage:
- Scale horizontally by adding more nodes
- Optimize queries using the Slow Log feature
- Reduce refresh interval for indices with heavy indexing

For memory pressure:
- Increase heap size (max 50% of system RAM, not exceeding 32GB)
- Reduce field data cache usage by using doc_values
- Implement index lifecycle management to archive old data

For disk I/O:
- Use SSDs instead of HDDs for data nodes
- Reduce merge pressure by adjusting index.merge.scheduler.max_thread_count
- Separate hot and cold data to different node tiers

yaml

# Example: Increase heap size in jvm.options
-Xms16g
-Xmx16g

6Verify network configuration

Ensure network infrastructure supports reliable inter-node communication:

bash

# Test network latency between nodes
ping -c 10 <other-node-ip>

# Check for packet loss
mtr --report <other-node-ip>

# Verify firewall rules allow transport port (default 9300)
sudo iptables -L -n | grep 9300

Common network fixes:
- Ensure MTU settings match across all nodes and network equipment
- Disable any transparent proxies between Elasticsearch nodes
- Configure network keep-alive intervals shorter than any intermediate timeout
- For cloud deployments, ensure nodes are in the same VPC/region when possible

How to fix ReceiveTimeoutTransportException: [node] request_id timed out in Elasticsearch

What this error means

Typical symptoms

Common causes

How to fix "ReceiveTimeoutTransportException: [node] request_id timed out"

Advanced notes

Related errors

Official resources & further reading