How to fix Alertmanager notification failed in Kubernetes

KubernetesINTERMEDIATEHIGH

Alertmanager notification failures occur when alerts cannot be delivered to configured receivers like email, Slack, or webhooks. Common causes include SMTP misconfiguration, network connectivity issues, invalid receiver endpoints, TLS certificate errors, and timeout problems.

What this error means

When Alertmanager logs "notification failed", it means an alert was triggered and routed to a receiver, but the delivery to that receiver (email, Slack, webhook, PagerDuty, etc.) encountered an error. This doesn't mean the alert itself failed—Alertmanager received the alert from Prometheus and determined it should be sent somewhere, but the actual delivery step failed. Notification failures are tracked per receiver integration. If multiple Alertmanager instances are running, the failure of one instance doesn't impact delivery if another instance can send it. However, if all instances fail, the alert won't reach anyone, which is why monitoring Alertmanager's own notification metrics is critical.

How to fix "Alertmanager notification failed"

1Check Alertmanager logs for detailed error messages

View the Alertmanager pod logs to see why notification delivery failed:

bash

kubectl logs -n monitoring alertmanager-0 --tail=100

Look for messages like:
- "x509: certificate signed by unknown authority" → TLS issue
- "connection refused" → Receiver unreachable
- "context deadline exceeded" → Timeout
- "authentication failed" → SMTP auth issue

For more verbose logging, check if debug mode is enabled:

bash

kubectl get deployment alertmanager -n monitoring -o yaml | grep -i debug

If not, restart Alertmanager with debug flag in the pod spec.

2Verify receiver endpoint is reachable and responding

Test connectivity from Alertmanager pod to receiver:

bash

kubectl exec -it alertmanager-0 -n monitoring -- sh

# For email (SMTP)
telnet smtp.gmail.com 587

# For webhook
curl -v https://your-webhook-endpoint.com/alerts

# For Slack
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
  -H "Content-Type: application/json" \
  -d '{"text":"Test message"}'

If connection refused or timeout, the receiver is down or unreachable. Check receiver service status, firewalls, security groups, and DNS resolution.

3Validate SMTP configuration for email notifications

If using email receiver, check your Alertmanager config:

bash

kubectl get secret alertmanager-config -n monitoring -o jsonpath="{.data.alertmanager\.yml}" | base64 -d | grep -A 10 "global:"

Verify:
- smtp_smarthost is correct (e.g., "smtp.gmail.com:587")
- smtp_auth_username and smtp_auth_password are set
- smtp_require_tls is true (for most providers)
- from address is valid

Example correct config:

yaml

global:
  smtp_smarthost: smtp.gmail.com:587
  smtp_auth_username: [email protected]
  smtp_auth_password: your-app-password
  smtp_require_tls: true
  from: [email protected]

Note: Gmail requires app-specific passwords, not your regular password.

4Check webhook receiver configuration and status codes

For webhook receivers, validate the configuration:

bash

kubectl get alertmanagerconfig -n monitoring -o yaml
# or
kubectl get secret alertmanager-config -n monitoring -o jsonpath="{.data.alertmanager\.yml}" | base64 -d

Look for webhook receivers:

yaml

receivers:
- name: webhook-receiver
  webhook_configs:
  - url: https://your-webhook-endpoint.com/alerts
    send_resolved: true

Test the webhook with a sample alert payload:

bash

curl -X POST https://your-webhook-endpoint.com/alerts \
  -H "Content-Type: application/json" \
  -d '[{"labels":{"alertname":"TestAlert"}}]'

If the endpoint returns 4xx or 5xx, fix the endpoint configuration or the receiving service.

5Handle TLS certificate verification failures

If logs show "x509: certificate signed by unknown authority", the receiver uses a self-signed or untrusted certificate:

Option 1 (Recommended): Fix the certificate
- Obtain proper certificate from trusted CA
- Update receiver service with valid certificate

Option 2: Skip verification (use with caution)
Add to Alertmanager config:

yaml

receivers:
- name: webhook-receiver
  webhook_configs:
  - url: https://self-signed-endpoint.com
    tls_config:
      insecure_skip_verify: true

Update the secret:

bash

kubectl edit secret alertmanager-config -n monitoring

Option 3: Add certificate to trust store
Mount the CA certificate in Alertmanager pod:

bash

kubectl create configmap ca-cert --from-file=ca.crt -n monitoring

Then reference in pod:

yaml

volumeMounts:
- name: ca-cert
  mountPath: /etc/ssl/certs/custom-ca.crt
  subPath: ca.crt

6Increase timeout and check for slow receivers

If alerts occasionally fail to send, the receiver might be slow:

bash

kubectl logs -n monitoring alertmanager-0 | grep "context deadline exceeded"

Increase timeout in Alertmanager config:

yaml

receivers:
- name: webhook-receiver
  webhook_configs:
  - url: https://your-webhook-endpoint.com/alerts
    send_resolved: true
    headers:
      Content-Type: application/json
    # Default is 10s; increase if receiver is slow
    # Requires patching AlertmanagerConfig or editing secret directly

Alternatively, optimize the receiving service to respond faster. Check receiver logs for slow processing.

7Test notification delivery with a test alert

Create a test alert rule in Prometheus to verify delivery:

yaml

groups:
- name: test-alerts
  rules:
  - alert: TestAlert
    expr: up == 1  # Always fires since most targets have up=1
    for: 1m
    annotations:
      summary: "Test notification from Prometheus"
      description: "This is a test alert to verify Alertmanager notification delivery"

Apply the rule:

bash

kubectl apply -f test-alert-rule.yaml

Wait ~1 minute, then check:
- Prometheus Alerts page shows the alert
- Alertmanager UI shows the alert
- Notification is delivered to receiver

If test alert arrives, your config is working. If not, keep debugging using earlier steps.

8Monitor Alertmanager notification metrics and set up alerts

Monitor notification failures using Prometheus metrics:

bash

kubectl exec -it prometheus-0 -n monitoring -- promtool query instant 'alertmanager_notifications_failed_total'

Or query in Prometheus UI:

promql

rate(alertmanager_notifications_failed_total[5m])

Set up an alert for notification failures:

yaml

- alert: AlertmanagerNotificationFailures
  expr: rate(alertmanager_notifications_failed_total[5m]) > 0
  for: 5m
  annotations:
    summary: "Alertmanager notifications are failing"
    description: "Check Alertmanager logs and receiver configuration"

This creates a meta-alert that fires when Alertmanager itself can't send notifications.

How to fix Alertmanager notification failed in Kubernetes

What this error means

Typical symptoms

Common causes

How to fix "Alertmanager notification failed"

Advanced notes

Related errors

Official resources & further reading