This error occurs when an operation references a non-existent etcd cluster member, typically after a node removal, incomplete cluster scaling, or misconfigured member IDs. It requires careful member list reconciliation to resolve.
The 'etcdserver: member not found' error indicates that an operation (such as promoting, removing, or updating a member) is being performed on a member ID that doesn't exist in the etcd cluster's membership registry. This typically happens when a member was removed from the cluster but references to it still exist in configuration. The error reflects a mismatch between the intended cluster topology and the actual cluster stateโthe etcd cluster maintains a list of active members, and any operation referencing a non-existent member ID will fail. This is a critical issue because unresolved member mismatches can lead to partial quorum loss, cluster instability, and an inability to schedule new workloads on Kubernetes.
Connect to a healthy etcd member and list all current members:
kubectl exec etcd-<master-node-name> -n kube-system -- etcdctl \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--endpoints https://127.0.0.1:2379 \
member listRecord the member IDs and their peer URLs. Look for members with status 'unstarted' or invalid names.
The kubeadm ConfigMap stores etcd cluster endpoints. Export and inspect it:
kubectl get configmap kubeadm-config -n kube-system -o yaml > kubeadm-config.yamlEdit the file and locate the etcd section. Remove any references to deleted or invalid nodes in the initialCluster entries.
Apply the updated configuration:
kubectl apply -f kubeadm-config.yamlUse the member IDs from step 1 to remove members that are 'unstarted' or reference deleted nodes:
kubectl exec etcd-<master-node-name> -n kube-system -- etcdctl \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--endpoints https://127.0.0.1:2379 \
member remove <member-id>Only remove members if you have a quorum (majority) of healthy members remaining.
If the node needs to rejoin the cluster, use member add with correct peer URLs:
kubectl exec etcd-<healthy-master-node> -n kube-system -- etcdctl \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--endpoints https://127.0.0.1:2379 \
member add <new-member-name> \
--peer-urls=https://<NODE_IP>:2380Save the output values (ETCD_NAME, ETCD_INITIAL_CLUSTER, ETCD_INITIAL_CLUSTER_STATE=existing) for the next step.
Before restarting etcd on the node being re-added, clear its existing state:
# On the node being re-added, stop etcd first
sudo systemctl stop etcd
# Or if running as a static pod:
kubectl delete pod etcd-<node-name> -n kube-system
# Remove the stale data directory (BE CAREFUL - this is destructive)
sudo rm -rf /var/lib/etcd/member
# Verify it's gone
ls -la /var/lib/etcd/Use the values from step 4 to restart etcd on the rejoining node:
# For static pod (most common in Kubernetes):
# Edit /etc/kubernetes/manifests/etcd.yaml and ensure:
# - ETCD_INITIAL_CLUSTER contains the full member list
# - ETCD_INITIAL_CLUSTER_STATE=existing (not new)
# - The node name matches what was provided in 'member add'
# Start kubelet to restart the static pods
sudo systemctl start kubelet
# Monitor startup
sudo journalctl -u etcd -fThe node should sync its state from the cluster leader within seconds.
etcd uses a membership registry to track cluster members. Members have unique IDs (UUIDs) and names. When a node is removed, its member entry must be explicitly deleted via 'etcdctl member remove' to prevent 'member not found' errors.
Known Bug in etcd 3.5.0: This version introduced a bug that resurrects 'zombie members' from ancient cluster states. Upgrade to etcd 3.5.2 or later to resolve this.
Learner Members: When adding a member, it can be added as a learner (read-only) until it catches up with the cluster snapshot. Use 'etcdctl member promote <member-id>' to convert it to a voting member.
Quorum Loss Prevention: Only remove members when you have a confirmed quorum (majority) of healthy members. Removing a member when quorum is already lost will cause permanent cluster failure.
Disaster Recovery: If more than half the cluster is failed (quorum loss), individual member recovery won't work. You must restore from an etcd snapshot backup using 'etcdctl snapshot restore' and rebuild with '--force-new-cluster'.
Failed to connect to server: connection refused (HTTP/2)
How to fix "HTTP/2 connection refused" error in Kubernetes
No subnets found for EKS cluster
How to fix "eks subnet not found" in Kubernetes
missing request for cpu in container
How to fix "missing request for cpu in container" in Kubernetes HPA
unable to compute replica count
How to fix "unable to compute replica count" in Kubernetes HPA
error: context not found
How to fix "error: context not found" in Kubernetes