This Elasticsearch error occurs when one or more primary shards in an index are not allocated or active in the cluster, returning a 503 service unavailable response. The cluster is in a red health state, preventing reads and writes. Common causes include cluster recovery delays, insufficient disk space, or node failures.
UnavailableShardsException with "primary shard is not active" indicates that a cluster cannot find or activate a required primary shard for an index. Elasticsearch returns HTTP 503 (Service Unavailable) and blocks all operations on that index. Unlike replica shards (which can be recreated from the primary), a missing primary shard means data is unavailable. This error typically occurs after cluster restarts, node failures, or when shards fail to reallocate due to resource constraints. The cluster health state becomes RED when primary shards are unassigned, meaning Elasticsearch cannot guarantee full index availability. This is a critical issue that requires immediate investigation.
First, verify the cluster health and identify which indices have unassigned shards:
GET _cluster/healthThis returns the overall cluster state. If status is RED, you have unassigned primary shards. To see detailed shard allocation info:
GET _cat/shards?v&h=index,shard,prirep,state,node,reasonLook for rows with state=UNASSIGNED and prirep=p (primary). The "reason" column explains why the shard is unassigned.
Verify that all nodes have adequate disk space available:
GET _cat/nodes?v&h=name,diskAvail,diskUsed,diskPercentIf any node shows diskPercent >= 85%, Elasticsearch will not allocate shards to that node (low watermark threshold). Free up disk space by:
- Deleting old indices
- Increasing disk capacity
- Archiving data to external storage
Once disk usage drops below 85%, shards should begin reallocating automatically.
After a cluster restart or node failure, Elasticsearch delays shard reallocation by default (30 seconds) to avoid unnecessary rebalancing if the node recovers. Monitor progress:
GET _cat/recovery?v&h=index,shard,stage,files_recovered,files_totalIf shards are INITIALIZING or TRANSLOG, recovery is in progress—let it complete. This can take minutes to hours for large indices. Do not interrupt the process. Check recovery progress periodically until all shards reach DONE state.
If shards remain unassigned after 5-10 minutes despite adequate disk space, manually trigger reroute to retry allocation:
POST _cluster/reroute?retry_failed=trueThis retries shards that have exceeded max allocation failures. Check the response for deciders that blocked allocation—common ones include:
- DISK_THRESHOLD: Insufficient disk space
- AWARENESS: Shard replica distribution across nodes/racks
- SAME_SHARD_HOST: Replica shard assigned to same host as primary
After reroute, monitor cluster health again:
GET _cluster/healthIf a primary shard is permanently lost and all replicas are unavailable, you must restore from a snapshot:
GET _snapshotTo list available snapshots:
GET _snapshot/my-repo/_allTo restore a specific index:
POST _snapshot/my-repo/my-snapshot/_restore
{
"indices": "my-index"
}This creates a new index from the snapshot. If the original index is blocking recovery, delete it first:
DELETE my-indexVerify all nodes in the cluster are running the same Elasticsearch version:
GET _cat/nodes?v&h=name,versionIf versions differ, upgrade all nodes to the same version. Elasticsearch does not support mixed-version clusters. Replica shards cannot be allocated to nodes running older versions than the primary shard's version.
Perform a rolling upgrade:
1. Stop one node
2. Upgrade Elasticsearch and plugins
3. Restart the node
4. Wait for cluster to stabilize (shards rebalance)
5. Repeat for each remaining node
Advanced Diagnostics:
Use the Cluster Allocation Explain API to get detailed reasons why a specific shard cannot be allocated:
GET _cluster/allocation/explain
{
"index": "my-index",
"shard": 0,
"primary": true
}This shows all allocation deciders and which ones rejected the shard.
For persistent allocation issues, check the master node logs for allocation-related messages. Elasticsearch master logs contain detailed decider decisions.
If using managed Elasticsearch (AWS, Elastic Cloud), check service status pages and contact support if node health is affected.
For very large indices, recovery can take hours. Monitor using _recovery endpoint and do not trigger unnecessary reroutes during active recovery, as it can extend recovery time.
Consider setting index.number_of_replicas=0 as a temporary workaround for critical single-node clusters, but this disables fault tolerance and should only be used in testing or when the index can be deleted and recreated.
QueryShardException: No mapping found for [field] in order to sort on
How to fix "QueryShardException: No mapping found for field in order to sort on" in Elasticsearch
IllegalStateException: There are no ingest nodes in this cluster, unable to forward request to an ingest node
How to fix "There are no ingest nodes in this cluster" in Elasticsearch
IndexNotFoundException: no such index [index_name]
How to fix "IndexNotFoundException: no such index [index_name]" in Elasticsearch
DocumentMissingException: [index][type][id]: document missing
DocumentMissingException: Document missing
ParsingException: Unknown key for a START_OBJECT in [query]
How to fix "ParsingException: Unknown key for a START_OBJECT in [query]" in Elasticsearch