This error occurs when an etcd node attempts to join a cluster but has a different cluster ID stored locally, typically due to data directory conflicts or incorrect initialization. It prevents cluster members from communicating.
This error indicates that an etcd member is attempting to join a cluster but the cluster ID stored in its local data directory does not match the cluster ID of the target cluster. Each etcd cluster generates a unique cluster ID during initial bootstrap, and when a node encounters a different cluster ID, etcd rejects the connection to maintain data integrity. The root cause is that etcd stores the cluster ID in the wal (write-ahead log) and snapshot files within the data directory. Once a cluster ID is written, etcd considers itself part of that specific cluster. If you attempt to start the node with --initial-cluster-state=new when it already contains data, or try to join an existing cluster without first cleaning the data directory, the stored ID will conflict. In Kubernetes deployments using persistent volumes for etcd data, this commonly occurs when a control plane node is recreated but the PVC retains old etcd state.
Access one of the working control plane nodes and check the etcd cluster status:
# Get into the etcd container (if running in Kubernetes)
kubectl exec -it -n kube-system etcd-controlplane-node -- sh
# List all cluster members
etcdctl member list \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Check cluster health
etcdctl endpoint health \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.keyPrevent any restart attempts while you clean up the data:
# SSH into the affected control plane node
ssh user@affected-node
# If etcd runs as a static pod (kubeadm):
sudo systemctl stop kubelet
# If etcd runs as a system service:
sudo systemctl stop etcd
# Wait for processes to fully terminate
sleep 5Create a backup before deletion, then remove the data directory:
# Create a backup with timestamp
sudo cp -r /var/lib/etcd /var/lib/etcd-backup-$(date +%Y%m%d-%H%M%S)
# Remove the data directory completely
sudo rm -rf /var/lib/etcd
# Confirm it's deleted
ls -la /var/lib/etcd 2>&1 # Should show 'No such file or directory'From a healthy control plane node, remove the problematic member:
# Get the member ID of the problematic node
etcdctl member list \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Note the member ID (e.g., 'a8266ecf5fd92988')
# Remove the member
etcdctl member remove <member-id> \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.keyAdd the cleaned node back to the cluster as a fresh member:
etcdctl member add <node-name> \
--peer-urls=https://<NODE_IP>:2380 \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.keySave the output (ETCD_INITIAL_CLUSTER and ETCD_INITIAL_CLUSTER_STATE values) for the next step.
Verify the static pod manifest has correct parameters and restart:
# On the affected node, verify the static pod manifest
sudo cat /etc/kubernetes/manifests/etcd.yaml | grep -A 5 'initial-cluster-state'
# It should show: initial-cluster-state: existing
# (NOT 'new', since we're joining an existing cluster)
# Start kubelet to restart the static pods
sudo systemctl start kubelet
# Wait for the pod to start
sleep 20
# Check etcd pod status
kubectl get pods -n kube-system | grep etcd
# View logs to confirm successful startup
kubectl logs -n kube-system etcd-<node-name>Each etcd cluster generates a unique cluster ID during its first startup, derived from cluster configuration and stored in the write-ahead log directory. This ID ensures all members are coordinating with each other and prevents cross-cluster interaction.
When you delete /var/lib/etcd and restart etcd, it performs a fresh bootstrap using the --initial-cluster-state parameter. Using 'new' means the node expects to coordinate cluster formation with other new members. Using 'existing' means the node expects the cluster already exists.
When kubeadm initializes a cluster, it creates static pod manifests at /etc/kubernetes/manifests/etcd.yaml. If kubeadm init fails and you rerun it without removing the data directory, kubeadm may attempt to reinitialize with a new cluster ID, causing conflicts.
In a 3+ node cluster, you can recover one node at a time while maintaining quorum. For single-node clusters, you must clean all data and reset the entire cluster.
Kubernetes StatefulSets using PersistentVolumeClaims do not automatically delete PVCs when pods are deleted. Always verify or manually delete old PVCs when intentionally creating a new cluster.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
error: invalid configuration
How to fix "error: invalid configuration" in Kubernetes
running with swap on is not supported
How to fix "running with swap on is not supported" in kubeadm
ReplicaSet has timed out progressing
How to fix "ReplicaSet has timed out progressing" in Kubernetes