How to fix Health check failed after maximum retries in Docker

DockerINTERMEDIATEMEDIUM

This error occurs when a Docker container's health check command fails consecutively for the configured number of retries (default: 3). The container is then marked as 'unhealthy'. Common fixes include increasing the retry count, extending timeout and start_period values, and debugging the underlying health check command.

What this error means

The "Health check failed after maximum retries" error in Docker indicates that the container's HEALTHCHECK instruction has failed repeatedly, exhausting all configured retry attempts. By default, Docker allows 3 consecutive failures before marking a container as unhealthy. Docker's health check system works as follows: 1. After the `start_period` (default: 0s), Docker begins running the health check command at each `interval` (default: 30s) 2. Each check must complete within the `timeout` (default: 30s) or it counts as a failure 3. After `retries` (default: 3) consecutive failures, the container is marked "unhealthy" 4. A single successful check resets the failure counter When this error occurs, it means your application inside the container has failed to respond correctly to the health probe command multiple times in a row. The container continues running but is flagged as unhealthy, which affects orchestration decisions and service dependencies. Common scenarios that trigger this error: - Application starts slowly and isn't ready before health checks begin counting failures - The health check command itself is misconfigured or uses unavailable tools - Network or resource issues prevent the application from responding - The application has crashed or become unresponsive - Timeout values are too aggressive for the health check to complete

How to fix "Health check failed after maximum retries"

1Check current health status and failure details

First, examine exactly why the health checks are failing by inspecting the container's health status:

bash

# View current container status
docker ps -a

# Get detailed health check information
docker inspect --format='{{json .State.Health}}' <container_name> | jq .

# View the last several health check results with timestamps
docker inspect <container_name> | jq '.[0].State.Health.Log'

Example output showing failed retries:

json

{
  "Status": "unhealthy",
  "FailingStreak": 3,
  "Log": [
    {
      "Start": "2024-01-15T10:00:00.000Z",
      "End": "2024-01-15T10:00:30.000Z",
      "ExitCode": 1,
      "Output": "curl: (7) Failed to connect to localhost port 8080"
    },
    {
      "Start": "2024-01-15T10:00:30.000Z",
      "End": "2024-01-15T10:01:00.000Z",
      "ExitCode": 1,
      "Output": "curl: (7) Failed to connect to localhost port 8080"
    },
    {
      "Start": "2024-01-15T10:01:00.000Z",
      "End": "2024-01-15T10:01:30.000Z",
      "ExitCode": 1,
      "Output": "curl: (7) Failed to connect to localhost port 8080"
    }
  ]
}

This shows the health check failed 3 times consecutively with exit code 1, indicating connection issues.

2Increase retries for applications with intermittent issues

If your application occasionally fails health checks due to temporary conditions (heavy load, garbage collection, etc.), increase the retry count:

In docker-compose.yml:

yaml

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5  # Increased from default 3
      start_period: 60s

In Dockerfile:

dockerfile

HEALTHCHECK --interval=30s --timeout=10s --retries=5 --start-period=60s \
  CMD curl -f http://localhost:8080/health || exit 1

Guidelines for retry values:
- 3 retries (default): Good for stable applications
- 5 retries: Better for applications with occasional hiccups
- 10+ retries: Use sparingly, only for known-flaky services

Note: More retries means longer time before a truly unhealthy container is flagged.

3Add or extend start_period for slow-starting applications

The start_period option gives your application time to initialize before health check failures count toward the retry limit. This is crucial for applications that need significant startup time.

Example for a Java/Spring Boot application:

yaml

services:
  backend:
    image: spring-app:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 120s  # 2 minutes for JVM warmup

Example for a database service:

yaml

services:
  postgres:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

How start_period works:
- During start_period, failed health checks do NOT count toward retries
- If a health check succeeds during start_period, the container is marked healthy
- After start_period expires, normal retry counting begins

Recommended start_period values:
- Node.js/Go applications: 10-30s
- Python/Ruby applications: 20-60s
- Java applications: 60-180s
- Database services: 30-60s

4Adjust timeout for slow health check responses

If your health check command takes longer than the timeout value, each check is marked as failed. Increase the timeout if your health endpoint performs complex checks:

yaml

services:
  api:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health/deep"]
      interval: 30s
      timeout: 30s  # Increased for deep health checks
      retries: 3
      start_period: 60s

Signs that timeout is too short:
- Health check logs show "context deadline exceeded" or similar timeout messages
- Exit codes indicate timeout rather than connection failure
- The health endpoint works when tested manually but fails in automated checks

Testing your health check duration:

bash

# Time how long the health check takes
docker exec <container> time curl -f http://localhost:8080/health

If the health check takes 15 seconds, set timeout to at least 20-30 seconds.

5Debug the health check command directly

Run the health check command manually inside the container to identify issues:

bash

# Get a shell in the container
docker exec -it <container_name> sh

# Run the health check command manually
curl -f http://localhost:8080/health
echo "Exit code: $?"

# Check if the service is listening
netstat -tlnp | grep 8080
# or
ss -tlnp | grep 8080

Common issues and solutions:

Command not found:

bash

sh: curl: not found

Solution: Install curl in your Dockerfile or use an alternative tool.

Connection refused:

bash

curl: (7) Failed to connect to localhost port 8080: Connection refused

Solution: Verify your application is running and listening on the correct port.

HTTP error:

bash

curl: (22) The requested URL returned error: 503

Solution: Check your application logs - the health endpoint may be returning errors.

DNS resolution:

bash

curl: (6) Could not resolve host: localhost

Solution: Use 127.0.0.1 instead of localhost in minimal containers.

6Install required health check tools

Many minimal Docker images don't include curl or wget. Add them to your Dockerfile:

For Alpine-based images:

dockerfile

FROM node:20-alpine
RUN apk add --no-cache curl
# ... rest of Dockerfile

For Debian/Ubuntu-based images:

dockerfile

FROM python:3.12-slim
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# ... rest of Dockerfile

Alternative: Use wget (often pre-installed on Alpine):

yaml

healthcheck:
  test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]

Alternative: Use native language tools:

yaml

# Node.js - no extra dependencies needed
healthcheck:
  test: ["CMD", "node", "-e", "require('http').get('http://localhost:8080/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1)).on('error', () => process.exit(1))"]

Alternative: Use netcat for TCP-only checks:

yaml

healthcheck:
  test: ["CMD", "nc", "-z", "localhost", "8080"]

7Fix health check command syntax

Incorrect syntax in docker-compose.yml is a common cause of health check failures:

CMD format - each argument must be separate:

yaml

# WRONG - arguments combined
healthcheck:
  test: ["CMD", "curl -f http://localhost:8080/health"]

# CORRECT - arguments separated
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]

CMD-SHELL format - use for shell features like || or pipes:

yaml

# CORRECT - shell form handles || exit 1
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]

String format - simplest syntax:

yaml

# CORRECT - string is automatically run in shell
healthcheck:
  test: curl -f http://localhost:8080/health || exit 1

Important: Always add || exit 1 when using curl or wget. Tools like curl return various exit codes (7 for connection refused, 22 for HTTP errors), but Docker only recognizes 0 (healthy) and 1 (unhealthy).

8Verify application health endpoint

Ensure your application has a properly functioning health endpoint:

1. Check what the health endpoint returns:

bash

docker exec <container> curl -v http://localhost:8080/health

The endpoint should return HTTP 200 for healthy status. Any non-2xx status will cause curl -f to fail.

2. Create a proper health endpoint in your application:

Node.js/Express:

javascript

app.get('/health', (req, res) => {
  // Quick check that the server is responding
  res.status(200).json({ status: 'healthy' });
});

// More thorough check including dependencies
app.get('/health/ready', async (req, res) => {
  try {
    await db.query('SELECT 1');
    res.status(200).json({ status: 'ready' });
  } catch (err) {
    res.status(503).json({ status: 'not ready', error: err.message });
  }
});

Python/FastAPI:

python

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.get("/health/ready")
async def ready():
    try:
        await database.execute("SELECT 1")
        return {"status": "ready"}
    except Exception as e:
        raise HTTPException(status_code=503, detail=str(e))

3. For database containers, use native tools:

yaml

# PostgreSQL
test: ["CMD-SHELL", "pg_isready -U postgres -d mydb"]

# MySQL
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p$MYSQL_ROOT_PASSWORD"]

# Redis
test: ["CMD", "redis-cli", "ping"]

# MongoDB
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]

Advanced notes

Understanding the retry mechanism in detail:

The health check retry mechanism works as follows:
1. start_period begins when the container starts
2. During start_period, health checks run but failures don't count
3. After start_period (or after the first successful check), counting begins
4. Each failed check increments FailingStreak
5. When FailingStreak equals retries, container becomes "unhealthy"
6. A single successful check resets FailingStreak to 0

Combining interval and retries strategically:

For a 5-minute detection window with frequent checks:

yaml

healthcheck:
  interval: 30s    # Check every 30 seconds
  retries: 10      # 10 failures = 5 minutes to declare unhealthy

For quick detection of critical failures:

yaml

healthcheck:
  interval: 10s    # Check every 10 seconds
  retries: 3       # 3 failures = 30 seconds to declare unhealthy

Health checks in orchestration systems:

Docker Swarm: Automatically replaces unhealthy containers in services. Configure with:

bash

docker service create --name api \
  --health-cmd "curl -f http://localhost:8080/health" \
  --health-interval 30s \
  --health-timeout 10s \
  --health-retries 3 \
  myimage

Kubernetes: Uses separate liveness and readiness probes with similar retry logic:

yaml

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 3  # Equivalent to retries
  periodSeconds: 30    # Equivalent to interval

Monitoring health check failures:

Use Docker events to monitor health status changes:

bash

docker events --filter event=health_status

Or programmatically with the Docker API/SDK to trigger alerts when containers become unhealthy.

Auto-healing without orchestration:

Use the willfarrell/autoheal container to automatically restart unhealthy containers:

yaml

services:
  autoheal:
    image: willfarrell/autoheal
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all

Debugging in CI/CD pipelines:

Add verbose logging to health checks for CI debugging:

yaml

healthcheck:
  test: ["CMD-SHELL", "curl -v http://localhost:8080/health 2>&1 | tee /dev/stderr | grep -q 'HTTP.*200' || exit 1"]

Or use a wait-for-healthy script in your pipeline:

bash

timeout 120 bash -c 'until docker inspect --format="{{.State.Health.Status}}" mycontainer | grep -q healthy; do sleep 5; done'

How to fix Health check failed after maximum retries in Docker

What this error means

Typical symptoms

Common causes

How to fix "Health check failed after maximum retries"

Advanced notes

Related errors

Official resources & further reading