Docker Swarm 101

Docker Swarm 101

Learn Docker Swarm concepts, architecture and basic usage, and go in depth with tutorials and videos from the community.

In this page: everything you need to know about Docker Swarm

What is Docker Swarm

Docker swarm mode allows you to manage a cluster of Docker Engines, natively within the Docker platform. You can use the Docker CLI to create a swarm, deploy application services to a swarm, and manage swarm behavior.

Docker will shortly support Kubernetes Guide as well as Docker Swarm, and Docker users will be able to use either Kubernetes or Swarm to orchestrate their container workloads.

Swarm can help developers and IT administrators:

  • Coordinate between containers and allocate tasks to groups of containers
  • Perform health checks and manage lifecycle of individual containers
  • Provide redundancy and failover in case nodes experience failure
  • Scale the number of containers up and down depending on load
  • Perform rolling updates of software across multiple containers

Docker Swarm Concepts

  • Swarmkit – a separate project which implements Docker’s orchestration layer and is used directly within Docker to implement Docker swarm mode.
  • Swarm – a swarm consists of multiple Docker hosts which run in swarm mode and act as managers and workers.
  • Task – the swarm manager distributes a specific number of tasks among the nodes, based on the service scale you specify. A task carries a Docker container and the commands to run inside the container. Once a task is assigned to a node, it cannot move to another node. It can only run on the assigned node or fail.
  • Service – a service is the definition of the tasks to execute on the manager or worker nodes. When you create a service, you specify which container image to use and which commands to execute inside running containers. A key difference between services and standalone containers is that you can modify a service’s configuration, including the networks and volumes it is connected to, without manually restarting the service.
  • Nodes – a swarm node is an individual Docker Engine participating in the swarm. You can run one or more nodes on a single physical computer or cloud server, but production swarm deployments typically include Docker nodes distributed across multiple machines.
  • Manager nodes – dispatches units of work called tasks to worker nodes. Manager nodes also perform orchestration and cluster management functions.
  • Leader node – manager nodes elect a single leader to conduct orchestration tasks, using the Raft consensus algorithm.
  • Worker nodes – receive and execute tasks dispatched from manager nodes. By default manager nodes also run services as worker nodes. An agent runs on each worker node and reports on the tasks assigned to it to its manager node.
  • Load balancing – the swarm manager uses ingress load balancing to expose the services running on the Docker swarm, enabling external access. The swarm manager assigns a configurable PublishedPort for the service. External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster, whether or not the node is currently running the task for the service. All nodes in the swarm route ingress connections to a running task instance. The swarm manager uses internal load balancing to distribute requests among services within the cluster based upon the DNS name of the service.

How Docker Swarm Works: Nodes and Services

How Nodes Work

There are two types of nodes: managers and workers. 

Manager nodes handle cluster management tasks: maintaining cluster state, scheduling services, and serving swarm mode HTTP API endpoints. The managers maintain a consistent state of the swarm and services running on it, using an implementation of the Raft algorithm.

Running multiple manager nodes allows you to take advantage of swarm mode’s fault-tolerance features. However, adding more managers does not mean increased scalability or higher performance. In general, the opposite is true. Docker recommends implementing an odd number of manager nodes. 

A three-manager swarm tolerates a maximum loss of one manager without downtime. A five-manager swarm tolerates a maximum simultaneous loss of two manager nodes. In general, an N manager cluster will tolerate the loss of at most (N-1)/2 managers. When managers fail beyond this threshold, services continue to run, but you need to create a new cluster to recover.

Worker nodes are also instances of Docker Engine whose sole purpose is to run containers. Worker nodes require at least one manager node to function. 

By default, all managers are also workers. In a single manager node cluster, you can run commands like docker service create and the scheduler places all tasks on the local Engine. To prevent a manager node from executing tasks, set the availability for a manager node to Drain

You can promote a worker node to be a manager by running docker node promote. For example, you may want to promote a worker node when you take a manager node offline for maintenance. You can also demote a manager node to a worker node using node demote. For more details on node commands in a swarm cluster, see the Docker node CLI reference.

How Services Work

Services allow you to deploy an application image to a Docker swarm. Examples of services include an HTTP server, a database, or other software that needs to run in a distributed environment. The basic definition of the service includes a container image to run, and commands to execute inside the running containers. 

  • Service options – when you create a service, you can specify the port to publish for external access, an overlay network for the service to connect to other services in the swarm, CPU and memory restrictions, a rolling update policy, and number of replicas of the image to run in the swarm.
  • Services, scheduling and desired state – when you deploy the service to the swarm, the service definition is the desired state for the service. For example, the desired state might be running three instances of an HTTP listener, with load balancing between them. The swarm manager schedules a replica task on three Docker Engines in the swarm, each of which runs a container with an HTTP listener. If one of these instances fails, the manager recognizes the desired state is not fulfilled, schedules another replica task, and spawns a new container to bring the number of listeners back to three. 
  • When tasks fail – if a task in a Docker swarm fails, it is not recovered or restarted. The orchestrator simply removes the container related to the failed tasks, and creates a new task to replace it according to the desired state specified by the service.
  • Pending services – a service is pending if there aren’t currently nodes available in the cluster to run its tasks. For example, this might happen if all the nodes in the cluster are paused or Drained (defined as manager nodes which may not act as worker nodes). You can also specify constraints on a service, such as minimal memory of 100 GB on a node. If no nodes have this amount of memory, the service will be pending until a node joins the swarm that satisfies the requirement. 
  • Replicated vs. global services – a replicated service specifies a number of identical tasks you want to run. For example, you decide to deploy an HTTP service with three replicas, each serving the same content. A global service is a service that runs one task on all the nodes in the swarm, with no pre-specified number of tasks/nodes. Each time you add a node to the swarm, the same task is run on it. For example, a typical global service is a monitoring agent or an anti-virus scanner.

Running Docker Swarm

The Docker engine runs with swarm mode disabled by default. To run Docker in swarm mode, you can either create a new swarm or have the container join an existing swarm. 

To create a swarm, run the docker swarm init command, which creates a single-node swarm on the current Docker engine. The current node becomes the manager node for the newly created swarm.

The output for the docker swarm init command tells you which command you need to run on other Docker containers to allow them to join your swarm as worker nodes. 

Other nodes can access the SwarmKit API using the manager node's advertised IP address. SwarmKit is a toolkit for orchestrating distributed systems, including node discovery, task scheduling, and more. 

Each node requires a secret token to join a swarm. The token for worker nodes is different from the token for manager nodes, and the token is only used at the time a container joins the swarm. 

Manager tokens should be strongly protected, because any access to the manager token grants control over an entire swarm. 


For more details, see the Swarm documentation: Run Docker in Swarm mode

Common Docker Swarm Operations

In this section you will learn:

Creating and Joining a Swarm  

The Docker engine runs with swarm mode disabled by default. To run Docker in swarm mode, you can either create a new swarm or have the container join an existing swarm. 

To create a swarm – run the docker swarm init command, which creates a single-node swarm on the current Docker engine. The current node becomes the manager node for the newly created swarm.

To join a swarm – the output for the docker swarm init command tells you which command you need to run on other Docker containers to allow them to join your swarm as worker nodes, including a “join token”. For example, to add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
192.168.99.100:2377

There is a different join token for worker nodes and manager nodes. The token is only used at the time a container joins the swarm. Manager tokens should be strongly protected, because any access to the manager token grants control over an entire swarm. 

You can run swarm join-token --rotate at any time to invalidate the older token and generate a new one, for security purposes.

Accessing management functionality – swarm nodes can access the SwarmKit API (providing operations like node discovery and task scheduling) and overlay networking, using an “advertise address” you specify for the manager node. If you don’t specify an address, and there is a single IP for the system, Docker listens by default on port 2377. SwarmKit is a toolkit for orchestrating distributed systems, including node discovery and task scheduling. 

For more details, see the Swarm documentation: Create a Swarm

Manage Nodes in a Swarm 

To get visibility into the nodes on your swarm, list them using the docker node ls command on a manager node. 

The listed nodes display an availability status that identifies whether the scheduler can assign tasks to the node. 

  • manager status value identifies whether the node participates in swarm management.
  • A blank value indicates that the node is a worker node. 
  • Leader value identifies the primary manager node that makes all swarm management and orchestration decisions for the swarm. 
  • Reachable value identifies nodes that are manager nodes and are candidates to become leader nodes in the event that a leader node is unavailable.
  • An Unavailable value signifies a manager node that cannot communicate with other managers. Such nodes should be replaced by promoting worker nodes or adding a new manager node. 
For more details, see the Swarm Documentation: Manage Nodes in a Swarm

Deploying Services to a Swarm 

After you declare the desired state for a service you want to run in your cluster, you rely on Docker Swarm to maintain that state. 

To create a service, use the docker service create command and name it with a --name flag. After the --name flag comes the container image name that you want to use. You can also specify a command for the service's containers to run. 

The command below starts a service called  my_web  which uses an  nginx  image and runs the command  ping docker.com :

$ docker service create --name my_web nginx ping docker.com

To remove a service, use the docker service remove command. You can remove a service by its ID or name, as shown in the output of the docker service ls command. The following command removes the  my_web  service:

$ docker service remove my_web

To update service configuration, use the docker service update command. This lets you configure settings for a service after it is created, including publishing ports to clients outside the swarm, resource constraints, and whether the service should start automatically when Docker starts. 

The following is a list of some of the configuration options you can specify for a service: 

  • Configure runtime environment
  • Update the command an existing service runs
  • Specify the image version a service should use
  • Publish ports
  • Connect the service to an overlay network
  • Control service scale and placement
  • Reserve memory or CPUs for a service
  • Configure a service’s update behavior
  • Automatically roll back if an update fails

For more details, see the Swarm documentation: Deploying Services to a Swarm

Admin Commands 

Docker uses the Raft Consensus Algorithm to manage swarms. Raft requires a majority of manager nodes (quorum) to agree on proposed updates to the swarm, such as node additions or removals. 

For fault tolerance in the Raft algorithm, you should always maintain an odd number of managers in the swarm to better support manager node failures. Having an odd number of managers results in a higher chance that a quorum remains available to process requests, if the network is partitioned into two sets.

You can monitor node health using the docker node ls command from a manager node or querying the nodes with the command line operation docker node inspect <id-node>

You can forcibly remove a node from a swarm without shutting it down first, by using the docker node rm command and a --force flag. This might be needed if a node becomes compromised. Here is what this looks like:

$ docker node rm --force node9Node node9 removed from swarm

You can backup a swarm using any manager node, as follows: 

  1. You’ll need an unlock key, if auto-lock is enabled on the swarm.
  2. Terminate Docker on the manager node before backing up data—this ensures no data changes in the manager node during the backup operation.
  3. Backup the entire /var/lib/docker/swarm/ directory, which stores the swarm state and the manager logs.
  4. Restart the manager node.

For more details and additional admin functions, see the Swarm Documentation: Swarm Administration Guide

Docker Swarm and Kubernetes

As of the time of this writing in late 2017, Docker has announced it will support  both Swarm and Kubernetes  as orchestration engines. Analysts have  raised questions  about the future of Swarm in light of this decision. 

Some believe that with support for Kubernetes, Docker Swarm, which is less robust and has a smaller feature set, will become obsolete. Others say that Swarm will continue to be relevant, as a simpler orchestration tool which is suitable for organizations with smaller container workloads. 

Summary

Docker Swarm is an orchestration tool provided as part of the Docker platform. It is an alternative to other popular container orchestration tools, such as Kubernetes and Apache Mesos. Swarm is considered easy to use, especially for those already familiar with the Docker model, and should be an easy first step into the world of managing multiple containers and composing them into services and applications.

Further Reading


Top Swarm Tutorials from the Community

Tutorial by: Okta

Length: Short

Can help you learn: How to create a swarm, create some virtual machines that will be part of the swarm, deploy containers to the swarm, and scale those containers horizontally.

Tutorial steps:

  • Install Docker Dependencies
  • Create Some Virtual Machines with Docker Machine
  • Initialize Docker Swarm Mode
  • Manage Docker Swarm Services

Tutorial by: Docker

Length: Short

Can help you learn: All about the features of swarm mode, including initializing a cluster of Docker Engines in swarm mode and adding nodes to the swarm.

Tutorial steps:

  • Create a cluster of three Linux hosts
  • Get the IP address for the manager machine
  • Open protocols and ports between the hosts

Tutorial by: Ben Hall, Katacoda

Length: Short

Can help you learn: Pretty much everything you can do in Docker Swarm, with a live browser-based environment to experiment with the deployment.

Includes tutorials about:

  • Getting Started with Swarm Mode
  • Overlay networks
  • Load balancing and service discovery
  • Apply rolling updates across cluster
  • Add health check for containers
  • (and many more)

Tutorial by: Luc - Play with Docker

Length: Short

Can help you learn: How to deploy a stack (multi services application) against a Swarm using a Docker

Tutorial steps:

  • Initialize a swarm
  • Show member of swarm
  • Clone sample application
  • Deploy stack using docker stack deploy
  • Check the stack has been deployed

Tutorial by: Stelligent

Length: Medium

Can help you learn: A basic overview of setting up a swarm cluster on AWS and deploying a stack to Docker Swarm.

Tutorial steps:

  • Create an AWS CloudFormation template for highly available swarm cluster
  • Deploy a stack to Docker Swarm
  • Scale services
  • Update your stack

Tutorial by: Digital Ocean

Length: Medium

Can help you learn: How to set up a simple three-node cluster, with each node running Ubuntu 16.04

Tutorial steps:

  • Provision the cluster nodes
  • Configure firewall rules
  • Initialize cluster manager
  • Add two other nodes to the cluster
  • Run a test NGINX container

Tutorial by: Semaphore

Length: Long

Can help you learn: How Docker Swarm makes services available for consumption, both internally and externally, including networking and service discovery.

Tutorial steps:

  • Requires a four-node swarm cluster
  • Create a service and overlay network
  • Scale four replicas and identify virtual IPs
  • Learn about embedded DNS servers and Docker
  • See how load balancing works when servicing a request

Top Swarm Videos from the Community

  • No labels