I have been using Docker Swarm for quite some time to manage a cluster of applications running on EC2 instances on AWS. Everything was fine with my docker swarm cluster until I have enabled AWS Auto-Scaling on docker workers to manage the increased load.
The issue I had was whenever a new instance was created Docker swarm could not pull Docker images into the newly launched instances as results docker is unable to run services on the newly created instances.
Of course, the first obvious solution came to my mind is to set a Cron job on the docker workers that login to ECR every day since the ECR login token is valid for 12 hours. Unfortunately, this did not work, and I was getting the same error.
After reading more about how docker-swarm authentication works, I found out that docker swarm doesn’t refresh the Auth tokens unless you update or deploy a service. So what happens is that when a service is created using
--with-registry-auth, the docker manager pull the tokens stored locally on the manager and send it to all agents so the workers can pull the image from the private registry (ECR in our case). Then docker swarm store this token in the raft storage which is shared among all the Docker swarm agents.
Also, theses tokens remain stored and only refreshed by the manager. This means even though you run a docker login command on the worker, it won’t use the local tokens; instead, it uses the tokens stored in the docker swarm raft.
According to docker swarm documentation, no command allows refreshing these tokens without running the update or the deploy command.
Note: The update command will not impact your running services in case you have a fixed docker image version.
In short the command that allows refreshing the auth tokens.
Note: Make sure you run this command from a manager node not a worker node.
Obviously you want the command above to run at least twice a day, since the tokens are valid for 12 hours only.
There are different ways to implement this, either by using:
- Cron job
- Systemd service with a timer
- As docker service that is part of your cluster
I think the first and second solution are the simplest and I don’t see a need of creating another docker service just to refresh tokens.
In my case I setup a systemd service with a timer that runs the command above on the manager node every hour.
Login to one of the manager nodes and create the following files:
Start the services