Docker Swarm

In this page, you’ll learn everything you need to know about Docker Swarm

Docker Swarm 101
What is Docker Swarm?
Docker Swarm Concepts
How Docker Swarm Works: Nodes and Services
Running Docker Swarm
Common Docker Swarm Operations
Docker Swarm and Kubernetes
Summary

What is Docker Swarm?

Docker swarm mode allows you to manage a cluster of Docker Engines, natively within the Docker platform. You can use the Docker CLI to create a swarm, deploy application services to a swarm, and manage swarm behavior.

Docker will shortly support Kubernetes Guide as well as Docker Swarm, and Docker users will be able to use either Kubernetes or Swarm to orchestrate their container workloads.

Swarm can help developers and IT administrators:

Coordinate between containers and allocate tasks to groups of containers
Perform health checks and manage lifecycle of individual containers
Provide redundancy and failover in case nodes experience failure
Scale the number of containers up and down depending on load
Perform rolling updates of software across multiple containers

Docker Swarm Concepts

Swarmkit – a separate project which implements Docker’s orchestration layer and is used directly within Docker to implement Docker swarm mode.
Swarm – a swarm consists of multiple Docker hosts which run in swarm mode and act as managers and workers.
Task – the swarm manager distributes a specific number of tasks among the nodes, based on the service scale you specify. A task carries a Docker container and the commands to run inside the container. Once a task is assigned to a node, it cannot move to another node. It can only run on the assigned node or fail.
Service – a service is the definition of the tasks to execute on the manager or worker nodes. When you create a service, you specify which container image to use and which commands to execute inside running containers. A key difference between services and standalone containers is that you can modify a service’s configuration, including the networks and volumes it is connected to, without manually restarting the service.
Nodes – a swarm node is an individual Docker Engine participating in the swarm. You can run one or more nodes on a single physical computer or cloud server, but production swarm deployments typically include Docker nodes distributed across multiple machines.
Manager nodes – dispatches units of work called tasks to worker nodes. Manager nodes also perform orchestration and cluster management functions.
Leader node – manager nodes elect a single leader to conduct orchestration tasks, using the Raft consensus algorithm.
Worker nodes – receive and execute tasks dispatched from manager nodes. By default manager nodes also run services as worker nodes. An agent runs on each worker node and reports on the tasks assigned to it to its manager node.
Load balancing – the swarm manager uses ingress load balancing to expose the services running on the Docker swarm, enabling external access. The swarm manager assigns a configurable PublishedPort for the service. External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster, whether or not the node is currently running the task for the service. All nodes in the swarm route ingress connections to a running task instance. The swarm manager uses internal load balancing to distribute requests among services within the cluster based upon the DNS name of the service.

How Docker Swarm Works: Nodes and Services

How Nodes Work

There are two types of nodes: managers and workers.

Manager nodes handle cluster management tasks: maintaining cluster state, scheduling services, and serving swarm mode HTTP API endpoints. The managers maintain a consistent state of the swarm and services running on it, using an implementation of the Raft algorithm.

Running multiple manager nodes allows you to take advantage of swarm mode’s fault-tolerance features. However, adding more managers does not mean increased scalability or higher performance. In general, the opposite is true. Docker recommends implementing an odd number of manager nodes.

A three-manager swarm tolerates a maximum loss of one manager without downtime. A five-manager swarm tolerates a maximum simultaneous loss of two manager nodes. In general, an N manager cluster will tolerate the loss of at most (N-1)/2 managers. When managers fail beyond this threshold, services continue to run, but you need to create a new cluster to recover.

Worker nodes are also instances of Docker Engine whose sole purpose is to run containers. Worker nodes require at least one manager node to function.

By default, all managers are also workers. In a single manager node cluster, you can run commands like docker service create and the scheduler places all tasks on the local Engine. To prevent a manager node from executing tasks, set the availability for a manager node to Drain.

You can promote a worker node to be a manager by running docker node promote. For example, you may want to promote a worker node when you take a manager node offline for maintenance. You can also demote a manager node to a worker node using node demote. For more details on node commands in a swarm cluster, see the Docker node CLI reference.

How Services Work

Services allow you to deploy an application image to a Docker swarm. Examples of services include an HTTP server, a database, or other software that needs to run in a distributed environment. The basic definition of the service includes a container image to run, and commands to execute inside the running containers.

Service options – when you create a service, you can specify the port to publish for external access, an overlay network for the service to connect to other services in the swarm, CPU and memory restrictions, a rolling update policy, and number of replicas of the image to run in the swarm.
Services, scheduling and desired state – when you deploy the service to the swarm, the service definition is the desired state for the service. For example, the desired state might be running three instances of an HTTP listener, with load balancing between them. The swarm manager schedules a replica task on three Docker Engines in the swarm, each of which runs a container with an HTTP listener. If one of these instances fails, the manager recognizes the desired state is not fulfilled, schedules another replica task, and spawns a new container to bring the number of listeners back to three.
When tasks fail – if a task in a Docker swarm fails, it is not recovered or restarted. The orchestrator simply removes the container related to the failed tasks, and creates a new task to replace it according to the desired state specified by the service.
Pending services – a service is pending if there aren’t currently nodes available in the cluster to run its tasks. For example, this might happen if all the nodes in the cluster are paused or Drained (defined as manager nodes which may not act as worker nodes). You can also specify constraints on a service, such as minimal memory of 100 GB on a node. If no nodes have this amount of memory, the service will be pending until a node joins the swarm that satisfies the requirement.
Replicated vs. global services – a replicated service specifies a number of identical tasks you want to run. For example, you decide to deploy an HTTP service with three replicas, each serving the same content. A global service is a service that runs one task on all the nodes in the swarm, with no pre-specified number of tasks/nodes. Each time you add a node to the swarm, the same task is run on it. For example, a typical global service is a monitoring agent or an anti-virus scanner.

Running Docker Swarm

The Docker engine runs with swarm mode disabled by default. To run Docker in swarm mode, you can either create a new swarm or have the container join an existing swarm.

To create a swarm, run the docker swarm init command, which creates a single-node swarm on the current Docker engine. The current node becomes the manager node for the newly created swarm.

The output for the docker swarm init command tells you which command you need to run on other Docker containers to allow them to join your swarm as worker nodes.

Other nodes can access the SwarmKit API using the manager node’s advertised IP address. SwarmKit is a toolkit for orchestrating distributed systems, including node discovery, task scheduling, and more.

Each node requires a secret token to join a swarm. The token for worker nodes is different from the token for manager nodes, and the token is only used at the time a container joins the swarm.

Manager tokens should be strongly protected, because any access to the manager token grants control over an entire swarm.

For more details, see the Swarm documentation: Run Docker in Swarm mode ›

Common Docker Swarm Operations

In this section you will learn:

How to create a swarm
How to add and manage nodes in a swarm
How to deploy services to a swarm
Swarm admin commands

Creating and Joining a Swarm

The Docker engine runs with swarm mode disabled by default. To run Docker in swarm mode, you can either create a new swarm or have the container join an existing swarm.

To create a swarm – run the docker swarm init command, which creates a single-node swarm on the current Docker engine. The current node becomes the manager node for the newly created swarm.

To join a swarm – the output for the docker swarm init command tells you which command you need to run on other Docker containers to allow them to join your swarm as worker nodes, including a “join token”. For example, to add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
192.168.99.100:2377

There is a different join token for worker nodes and manager nodes. The token is only used at the time a container joins the swarm. Manager tokens should be strongly protected, because any access to the manager token grants control over an entire swarm.

You can run swarm join-token --rotate at any time to invalidate the older token and generate a new one, for security purposes.

Accessing management functionality – swarm nodes can access the SwarmKit API (providing operations like node discovery and task scheduling) and overlay networking, using an “advertise address” you specify for the manager node. If you don’t specify an address, and there is a single IP for the system, Docker listens by default on port 2377. SwarmKit is a toolkit for orchestrating distributed systems, including node discovery and task scheduling.

For more details, see the Swarm documentation: Create a Swarm ›

Manage Nodes in a Swarm

To get visibility into the nodes on your swarm, list them using the docker node ls command on a manager node.

The listed nodes display an availability status that identifies whether the scheduler can assign tasks to the node.

A manager status value identifies whether the node participates in swarm management.
A blank value indicates that the node is a worker node.
A Leader value identifies the primary manager node that makes all swarm management and orchestration decisions for the swarm.
A Reachable value identifies nodes that are manager nodes and are candidates to become leader nodes in the event that a leader node is unavailable.
An Unavailable value signifies a manager node that cannot communicate with other managers. Such nodes should be replaced by promoting worker nodes or adding a new manager node.

For more details, see the Swarm Documentation: Manage Nodes in a Swarm ›

Deploying Services to a Swarm

After you declare the desired state for a service you want to run in your cluster, you rely on Docker Swarm to maintain that state.

To create a service, use the docker service create command and name it with a --name flag. After the --name flag comes the container image name that you want to use. You can also specify a command for the service’s containers to run.

The command below starts a service called my_web which uses an nginx image and runs the command ping docker.com :

$ docker service create --name my_web nginx ping docker.com

To remove a service, use the docker service remove command. You can remove a service by its ID or name, as shown in the output of the docker service ls command. The following command removes the my_web service:

$ docker service remove my_web

To update service configuration, use the docker service update command. This lets you configure settings for a service after it is created, including publishing ports to clients outside the swarm, resource constraints, and whether the service should start automatically when Docker starts.

The following is a list of some of the configuration options you can specify for a service:

Configure runtime environment
Update the command an existing service runs
Specify the image version a service should use
Publish ports
Connect the service to an overlay network
Control service scale and placement
Reserve memory or CPUs for a service
Configure a service’s update behavior
Automatically roll back if an update fails

For more details, see the Swarm documentation: Deploying Services to a Swarm ›

Admin Commands

Docker uses the Raft Consensus Algorithm to manage swarms. Raft requires a majority of manager nodes (quorum) to agree on proposed updates to the swarm, such as node additions or removals.

For fault tolerance in the Raft algorithm, you should always maintain an odd number of managers in the swarm to better support manager node failures. Having an odd number of managers results in a higher chance that a quorum remains available to process requests, if the network is partitioned into two sets.

You can monitor node health using the docker node ls command from a manager node or querying the nodes with the command line operation docker node inspect <id-node>.

You can forcibly remove a node from a swarm without shutting it down first, by using the docker node rm command and a --force flag. This might be needed if a node becomes compromised. Here is what this looks like:

$ docker node rm --force node9Node node9 removed from swarm

You can backup a swarm using any manager node, as follows:

You’ll need an unlock key, if auto-lock is enabled on the swarm.
Terminate Docker on the manager node before backing up data—this ensures no data changes in the manager node during the backup operation.
Backup the entire /var/lib/docker/swarm/ directory, which stores the swarm state and the manager logs.
Restart the manager node.

For more details and additional admin functions, see the Swarm Documentation: Swarm Administration Guide ›

Docker Swarm and Kubernetes

Tutorial steps:

Initialize a swarm
Show member of swarm
Clone sample application
Deploy stack using docker stack deploy
Check the stack has been deployed

Learn More

Docker Swarm

What is Docker Swarm?

Docker Swarm Concepts

How Docker Swarm Works: Nodes and Services

How Nodes Work

How Services Work

Running Docker Swarm

Common Docker Swarm Operations

Creating and Joining a Swarm

Manage Nodes in a Swarm

Deploying Services to a Swarm

Admin Commands

Docker Swarm and Kubernetes

Summary

Top Swarm Tutorials from the Community

Create a Swarm Cluster on DigitalOcean/Ubuntu

How to Configure Docker Swarm with multiple Docker Nodes on Ubuntu 18.04

A Developer’s Guide To Docker Swarm

Get Started With Swarm Mode

Swarm Stack Introduction

Top Swarm Videos from the Community