Docker Swarm returns 'rpc error: code = ResourceExhausted desc = grpc: received message larger than max' when internal gRPC messages exceed the 4MB default limit. This typically occurs in large clusters with many services, secrets, or configs.
This error occurs when Docker Swarm's internal communication exceeds gRPC's default message size limit. Docker Swarm uses gRPC (Google Remote Procedure Call) for communication between manager nodes and for cluster state synchronization. The default maximum message size is 4,194,304 bytes (4MB). When the Swarm cluster state grows large enough (due to many services, secrets, configs, or nodes), the gRPC messages exchanged between managers can exceed this limit. The error message typically shows something like "grpc: received message larger than max (5351376 vs. 4194304)" where the first number is the actual message size and the second is the limit. This is particularly common in large production clusters with hundreds of services, or when using features like Docker secrets and configs extensively. The issue can also appear during manager node joins when the cluster snapshot being transferred is large.
First, verify you're running a Docker version with increased gRPC limits. Docker 18.09.1+ includes fixes for this issue:
docker versionIf you're running an older version, upgrading Docker is the recommended solution. Docker 18.09+ increased the gRPC message size limits for most Swarm operations.
If you're on an older version, upgrade to the latest stable Docker release:
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.ioCentOS/RHEL:
sudo yum update docker-ce docker-ce-cli containerd.ioOr use the convenience script:
curl -fsSL https://get.docker.com | shAfter upgrading, restart Docker and rejoin the Swarm if necessary.
If upgrading isn't immediately possible, reduce the cluster state by cleaning up unused resources:
# Remove unused services
docker service ls
docker service rm <unused_service>
# Remove unused secrets
docker secret ls
docker secret rm <unused_secret>
# Remove unused configs
docker config ls
docker config rm <unused_config>
# Remove unused networks
docker network prune
# Remove stopped containers and unused images
docker system prune -aPay special attention to services with restart policies that may have accumulated many task histories.
Docker keeps history of service tasks which contributes to cluster state size. Reduce the task history limit:
# Check current task history limit
docker info | grep "Task History"
# Update global task history retention (default is 5)
docker swarm update --task-history-limit 2For existing services with large histories, you can recreate them:
# Export service definition
docker service inspect <service_name> > service-backup.json
# Remove and recreate (will lose history)
docker service rm <service_name>
docker service create --name <service_name> <other_options>Large service definitions contribute to cluster state. Optimize them:
# Instead of inline configs/secrets in compose files
# Use external configs that reference existing objects
configs:
my_config:
external: true
secrets:
my_secret:
external: trueAvoid putting large files directly in Docker configs. Instead, mount them as volumes or bake them into images.
For services with many environment variables, consider using env files baked into the image rather than Swarm-level environment configuration.
If the cluster state is too corrupted or large to recover, you may need to rebuild:
# On each worker node
docker swarm leave
# On manager nodes (except one)
docker swarm leave --force
# On the last manager
docker swarm leave --force
# Reinitialize on the primary manager
docker swarm init --advertise-addr <MANAGER_IP>
# Rejoin other managers
docker swarm join-token manager
# Use the token to rejoin managers
# Rejoin workers
docker swarm join-token worker
# Use the token to rejoin workersWarning: This will remove all services, secrets, and configs. Export them first and redeploy after reinitializing.
Understanding gRPC limits in Docker Swarm: Docker Swarm uses gRPC for Raft consensus and cluster state replication. The default 4MB limit was set to prevent memory exhaustion attacks, but legitimate large clusters can exceed this. Docker 18.09+ increased limits for specific operations like GetConfigs, GetSecrets, and ListNodes.
Monitoring cluster state size: You can estimate cluster state size by checking the Raft logs:
# On a manager node
ls -la /var/lib/docker/swarm/raft/
du -sh /var/lib/docker/swarm/If the raft directory is growing beyond several hundred MB, your cluster state may be approaching limits.
High availability considerations: In HA setups with 3+ managers, each manager must be able to replicate the full cluster state. When one manager falls behind and needs to catch up, it receives a snapshot that may exceed gRPC limits. Consider:
- Using 3 managers for most clusters (5 only for very large deployments)
- Ensuring managers have low-latency network connections
- Monitoring manager synchronization status
Docker Buildx and related tools: The same gRPC limit error can occur with Docker Buildx when building large images with provenance attestations. This is a separate issue from Swarm, but the error message is similar. For Buildx issues, consider disabling provenance or using different output types.
Kernel and system limits: On systems with many containers, you may also need to increase system limits:
# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535Alternative orchestration: If your workload consistently pushes against Swarm's limits, consider whether Kubernetes might be more appropriate for your scale. Kubernetes has different architectural choices that may handle very large cluster states better.
image operating system "linux" cannot be used on this platform
How to fix 'image operating system linux cannot be used on this platform' in Docker
manifest unknown: manifest unknown
How to fix 'manifest unknown' in Docker
cannot open '/etc/passwd': Permission denied
How to fix 'cannot open: Permission denied' in Docker
Error response from daemon: failed to create the ipvlan port
How to fix 'failed to create the ipvlan port' in Docker
toomanyrequests: Rate exceeded for anonymous users
How to fix 'Rate exceeded for anonymous users' in Docker Hub