This timeout exception occurs when Elasticsearch nodes fail to receive responses within the expected timeout period during inter-node communication, typically caused by network latency, high cluster load, or resource constraints.
The ReceiveTimeoutTransportException indicates that an Elasticsearch node attempted to communicate with another node in the cluster but did not receive a response within the configured timeout window. This is an inter-node communication error, not a query timeout. Elasticsearch uses a transport layer for all internal cluster communication - things like forwarding search requests, replicating documents, and coordinating cluster state. When a node sends a request to another node and doesn't get a response before the timeout expires, this exception is thrown. This error often points to underlying infrastructure or performance issues rather than configuration problems. The timeout is a symptom - the root cause could be network instability, resource exhaustion on the receiving node, or the cluster being overwhelmed with requests.
First, verify which nodes are experiencing timeouts and check overall cluster health:
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Check node stats to identify struggling nodes
curl -X GET "localhost:9200/_nodes/stats?pretty"
# Check pending tasks that might be blocking
curl -X GET "localhost:9200/_cluster/pending_tasks?pretty"Look for nodes with high CPU, memory usage, or pending tasks. Check network connectivity between nodes using ping and ensure all nodes can reach each other.
Examine the Elasticsearch logs to understand the frequency and pattern of timeouts:
# On each node, check logs for timeout errors
tail -n 500 /var/log/elasticsearch/elasticsearch.log | grep -i "ReceiveTimeoutTransportException"
# Look for the specific node and request_id that's timing out
grep "request_id timed out" /var/log/elasticsearch/elasticsearch.logNote which nodes are consistently timing out, what operations were being performed, and if there's a correlation with high load periods.
Use monitoring tools to check resource consumption:
# Check CPU and memory
top -b -n 1 | head -n 20
# Check disk I/O
iostat -x 1 5
# Check JVM garbage collection stats
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"If you see high CPU (>80%), low available memory, or frequent/long GC pauses (>1s), you have resource constraints that need addressing before adjusting timeouts.
If network latency is unavoidable (e.g., cross-region clusters), increase the timeout values in elasticsearch.yml:
# Increase connection timeout (default: 30s)
transport.tcp.connect_timeout: 60s
# Configure TCP keep-alive to detect dead connections
transport.tcp.keep_alive: true
# Enable transport ping to keep connections alive
transport.ping_schedule: 5sApply to all nodes and perform a rolling restart:
# Disable shard allocation before restarting
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}'
# Restart node, wait for it to rejoin
systemctl restart elasticsearch
# Re-enable allocation after all nodes restarted
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'Address the underlying performance issues causing slow responses:
For high CPU usage:
- Scale horizontally by adding more nodes
- Optimize queries using the Slow Log feature
- Reduce refresh interval for indices with heavy indexing
For memory pressure:
- Increase heap size (max 50% of system RAM, not exceeding 32GB)
- Reduce field data cache usage by using doc_values
- Implement index lifecycle management to archive old data
For disk I/O:
- Use SSDs instead of HDDs for data nodes
- Reduce merge pressure by adjusting index.merge.scheduler.max_thread_count
- Separate hot and cold data to different node tiers
# Example: Increase heap size in jvm.options
-Xms16g
-Xmx16gEnsure network infrastructure supports reliable inter-node communication:
# Test network latency between nodes
ping -c 10 <other-node-ip>
# Check for packet loss
mtr --report <other-node-ip>
# Verify firewall rules allow transport port (default 9300)
sudo iptables -L -n | grep 9300Common network fixes:
- Ensure MTU settings match across all nodes and network equipment
- Disable any transparent proxies between Elasticsearch nodes
- Configure network keep-alive intervals shorter than any intermediate timeout
- For cloud deployments, ensure nodes are in the same VPC/region when possible
Transport Layer Architecture: Elasticsearch maintains a pool of long-lived TCP connections between nodes. Each connection can handle multiple concurrent requests. The transport.connect_timeout only affects initial connection establishment, not ongoing request timeouts. Once connected, requests use internal timeout values that vary by operation type (e.g., cluster state publish has a different timeout than search forwarding).
Thread Pool Exhaustion: If the transport thread pool is saturated, even healthy nodes may appear to time out. Check thread pool stats with GET /_nodes/stats/thread_pool and look for queue rejections. The transport thread pool size is typically min(num_processors * 2, 8) and usually shouldn't be changed.
Cross-Cluster Search: When using cross-cluster search, ReceiveTimeoutTransportException can occur between clusters. Configure appropriate timeouts in the remote cluster settings and consider network latency between data centers.
Split Brain Protection: Ensure you have proper split brain protection configured (minimum_master_nodes in pre-7.0, or discovery.zen.minimum_master_nodes). Network partitions can cause cascading timeout errors.
Compression: For high-latency networks, enabling transport compression can reduce bandwidth usage at the cost of CPU: transport.compress: true. This trades CPU for reduced network transfer time.
IllegalStateException: There are no ingest nodes in this cluster, unable to forward request to an ingest node
How to fix "There are no ingest nodes in this cluster" in Elasticsearch
ConnectException: Connection refused
How to fix "ConnectException: Connection refused" in Elasticsearch
NodeDisconnectedException: [node] disconnected
How to fix "NodeDisconnectedException: [node] disconnected" in Elasticsearch
SnapshotException: [repository:snapshot] Snapshot could not be read
How to fix "SnapshotException: [repository:snapshot] Snapshot could not be read" in Elasticsearch
AccessDeniedException: action [cluster:admin/settings/update] is unauthorized
AccessDeniedException: action cluster:admin/settings/update is unauthorized