Can't use etcd in the cluster - coreos

I've read this guide about cluster architectures, and I've built a development cluster with this config:
All machines are on the same subnet.
One machine acts like a master, and runs etcd only (IP address is 192.168.0.95).
#cloud-config
coreos:
etcd:
discovery: #url
addr: $private_ipv4:4001
peer-addr: $private_ipv4:7001
units:
name: etcd.service
command: start
I then have three workers that run fleet with this config:
#cloud-config
coreos:
fleet:
etcd_servers: "http://192.168.0.95:4001"
units:
name: fleet.service
command: start
I can control machines and units via fleetctl, but I haven't been able to use etcdctl properly.
Each worker uses local etcd when working with keys, therefore keys exist on the worker they've been created on only!
Isn't this supposed to happen, given the above config?
Thanks

You'll need to point etcdctl at the etcd cluster just like you're doing with fleet, right?
etcdctl --peers "http://192.168.0.95:4001" ls /

Related

User cannot log into EKS Cluster using kubectl

I am trying to host an application in AWS Elastic Kubernetes Service(EKS). I have configured the EKS cluster using the AWS Console using an IAM user (user1). Configured the Node Group and added a Node to the EKS Cluster and everything is working fine.
In order to connect to the cluster, I had spin up an EC2 instance (Centos7) and configured the following:
1. Installed docker, kubeadm, kubelet and kubectl.
2. Installed and configured AWS Cli V2.
I had used the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY_ID of user1 to configure AWS Cli from within the EC2 Instance in order to connect to the cluster using kubectl.
I ran the below commands in order to connect to the cluster as user1:
1. aws sts get-caller-identity
2. aws eks update-kubeconfig --name trojanwall --region ap-south-1
I am able to do each and every operations in the EKS cluster as user1.
However, I have now create a new user named 'user2' and I have replaced the current AWS_ACCESS_KEY_ID and AWS_SECRET_KEY_ID with that of user2. Did the same steps and when I try to run 'kubectl get pods', I am getting the following error:
error: You must be logged in to the server (Unauthorized)
Result after running kubectl describe configmap -n kube-system aws-auth as user1:
Name: aws-auth
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
mapRoles:
----
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::XXXXXXXXXXXX:role/AWS-EC2-Role
username: system:node:{{EC2PrivateDNSName}}
BinaryData
====
Events: <none>
Does anyone know how to resolve this?
When you create an EKS cluster, only the user that created a cluster has access to it. In order to allow someone else to access the cluster, you need to add that user to the aws-auth. To do this, in your data section, add
mapUsers: |
- userarn: arn:was:iam::<your-account-id>:user/<your-username>
username: <your-username>
groups:
- systems:masters
You can use different groups, based on the rights you want to give to that user.
If you don't already have a config map on your machine:
Download the config map curl -o aws-auth-cm.yaml https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-10-29/aws-auth-cm.yaml
Replace default values with your values (role arn, username, account id...)
add the mapUsers section as described above
from terminal execute kubectl apply -f aws-auth-cm.yaml
You can also follow steps from the documentation (it's more detailed)

kubectl cluster-info why is running on control plane and not master node

Why kubectl cluster-info is running on control plane and not master node
And on the control plane it is running on a specific IP Address https://192.168.49.2:8443
and not not localhost or 127.0.0.1
Running the following command in terminal:
minikube start --driver=docker
😄 minikube v1.20.0 on Ubuntu 16.04
✨ Using the docker driver based on user configuration
🎉 minikube 1.21.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.21.0
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
> gcr.io/k8s-minikube/kicbase...: 358.10 MiB / 358.10 MiB 100.00% 797.51 K
❗ minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.22, but successfully downloaded kicbase/stable:v0.0.22 as a fallback image
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.6 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by
default
kubectl cluster-info
Kubernetes control plane is running at https://192.168.49.2:8443
KubeDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The Kubernetes project is making an effort to move away from wording that can be considered offensive, with one concrete recommendation being renaming master to control-plane. In other words control-plane and master mean essentially the same thing, and the goal is to switch the terminology to use control-plane exclusively going forward. (More info in this answer)
The kubectl command is a command line interface that executes on a client (i.e your computer) and interacts with the cluster through the control-plane.
The IP address you are seing through cluster-info is the IP address through which you reach the control-plane

Kops rolling-update fails with "Cluster did not pass validation" for master node

For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). In the manifest I'm upgrading kubernetesVersion from 1.11.9 -> 1.12.9. This is the only change I'm making. However when I run kops rolling-update cluster --yes I get the following error:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes I no longer see that master node in my cluster.
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd there is no output.
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try?
I have made some research I got few ideas for you:
If there is no output for the etcd grep it means that your etcd server is down. Check the logs for the 'Exited' etcd container | grep Exited | grep etcd and than logs <etcd-container-id>
Try this instruction I found:
1 - I removed the old master from de etcd cluster using etcdctl. You
will need to connect on the etcd-server container to do this.
2 - On the new master node I stopped kubelet and protokube services.
3 - Empty Etcd data dir. (data and data-events)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and
etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to
existing.
5 - Get the name and PeerURLS from new master and use etcdctl to add
the new master on the cluster. (etcdctl member add "name"
"PeerULR")You will need to connect on the etcd-server container to do
this.
6 - Start kubelet and protokube services on the new master.
If that is not the case than you might have a problem with the certs. They are provisioned during the creation of the cluster and some of them have the allowed master's endpoints. If that is the case you'd need to create new certs and roll them for the api server/etcd clusters.
Please let me know if that helped.

Can I get a Pod's pause container ID via the Kubernetes API?

When I list the pods in a cluster (on a specific node and in all namespaces) then each pod listed also contains the container statuses, and therein I get the container runtime engine IDs of each of the containers listed.
To illustrate, I'm using this Python3 script to access the cluster API via the official Kubernetes Python client; this is a slightly modified version from How to find all Kubernetes Pods on the same node from a Pod using the official Python client?
from kubernetes import client, config
import os
def main():
# it works only if this script is run by K8s as a POD
config.load_incluster_config()
# use this outside pods
# config.load_kube_config()
# grab the node name from the pod environment vars
node_name = os.environ.get('KUHBERNETES_NODE_NAME', None)
v1 = client.CoreV1Api()
print("Listing pods with their IPs on node: ", node_name)
# field selectors are a string, you need to parse the fields from the pods here
field_selector = 'spec.nodeName='+node_name
ret = v1.list_pod_for_all_namespaces(watch=False, field_selector=field_selector)
for i in ret.items:
print("%s\t%s\t%s" %
(i.status.pod_ip, i.metadata.namespace, i.metadata.name))
for c in i.status.container_statuses:
print("\t%s\t%s" %
(c.name, c.container_id))
if __name__ == '__main__':
main()
N.B. The Pod uses a suitable ServiceAccount which enables it to list pods in all namespaces.
A typical result output when run on a minikube setup might look like this:
Listing pods with their IPs on node: minikube
172.17.0.5 cattle-system cattle-cluster-agent-c949f5b48-llm65
cluster-register docker://f12fcb1acbc2e7c01c24dbd831ed53ab2a6df2353abe80988ae132c39f7c68c6
10.0.2.15 cattle-system cattle-node-agent-hmq86
agent docker://e335a3d30ea37887ac2a1a1cc339eabb0a0098471f86db1926cfe02eef2c6b8f
172.17.0.6 gw pyk8s
py8ks docker://1272747b52983e8f745bd118b2d935c1d314e9c6cc310e88013021ba974bc030
172.17.0.4 kube-system coredns-c4cffd6dc-7lsdn
coredns docker://8b0c3c67532ee2d7d16958a33cb942d5bd09ed37ded1d570830b5f7e5f7a09ab
10.0.2.15 kube-system etcd-minikube
etcd docker://5e0e0ee48248e9779a2a5f9347a39c58743562b10719a31d7d6fc0af5e79e093
10.0.2.15 kube-system kube-addon-manager-minikube
kube-addon-manager docker://96908bc5d5fd9b87779c8a8544591e5aeda2d58956fb365ab595681605b01001
10.0.2.15 kube-system kube-apiserver-minikube
kube-apiserver docker://0711ec9a2321b1b5a801ab2b19409a1edc731058aa994978f989185efc4c8294
10.0.2.15 kube-system kube-controller-manager-minikube
kube-controller-manager docker://16d2e11a8dea2a46cd44bc97a5f894e7ff9da2da70f3c24376b4189dd912336e
172.17.0.2 kube-system kube-dns-86f4d74b45-wbdf6
dnsmasq docker://653c7ef27760a820449ee518b59e39ab4a7f65cade996ed85313c98038827f67
kubedns docker://6cf6aaeac1192cf1d580293e03164db57bc70bce41cf91e5cac081010fe48cf7
sidecar docker://9816e10d8455988aa400f98df32cfa69ce89fbfc3e3e1554145d9d6418c02157
10.0.2.15 kube-system kube-proxy-ll7lq
kube-proxy docker://6b8c7ce1ae3c8fbc487bf05ccca9105dffaf675f916cdb62a595d8be7902e69b
10.0.2.15 kube-system kube-scheduler-minikube
kube-scheduler docker://ab79e46ba900753d86b7000061720551a199c0ea6eee923fcd86bda2d86cc54a
172.17.0.3 kube-system kubernetes-dashboard-6f4cfc5d87-bmnl8
kubernetes-dashboard docker://a73ef6b30fb87826a4a71ba428a01511278a759d69fade82ddd654911ec3f14f
10.0.2.15 kube-system storage-provisioner
storage-provisioner docker://51eaf90bc3ae11baa354a436e366730c19206c73743c6517a0ad9eb8f0b89896
Please note that this lists the container IDs of the pod containers, except the pause container IDs. Is there an API method to also get/list the container IDs of the pause containers in pods?
I tried searching for things like "kubernetes api pod pause container id" ... but I did not get any useful answers, except the usual API results for containerStatuses, etc.
After some research into how Kubernetes' Docker shim works, it's clear that the pause containers are not visible at the Kubernetes cluster API. That's because pause containers are an artefact required with some container engines, such as Docker, but not in others (CRI-O if I'm not mistaken).
However, when the low-level Docker container view is necessary and needs to be related to the Kubernetes node-scheduled pod view, then the predictable Docker container naming scheme used in the Kubernetes Docker shim can be used. The shim creates the container names in the form of k8s_conainer_pod_namespace_uid_attempt with an optional _random suffix in case od hitting the Docker <=1.11 name conflict bug.
k8s is the fixed prefix which triggers the shim to regard this container as a Kubernetes container.
container is the name as specified in the pod spec. Please note that Kubernetes only allows lowercase a-z, 0-9, and dashes. Pause containers thus get the "reserved" name "POD" in all-uppercase.
pod is the pod name.
namespace is the namespace name as assigned, or "default".
pod UID with verying formats.
attempt is a counter starting from 0 that the shim needs in order to correctly manage pod updates, that is, container cleanup, etc.
See also:
container names implementation
name of pause pod
Docker name conflict bug

how to create docker overlay network between multi hosts?

I have been trying to create an overlay network between two hosts with no success. I keep getting the error message:
mavungu#mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 network create -d overlay myapp
Error response from daemon: 500 Internal Server Error: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)
mavungu#mavungu-Aspire-5250:~$ sudo docker network create -d overlay myapp
[sudo] password for mavungu:
Error response from daemon: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)
My environment details:
mavungu#mavungu-Aspire-5250:~$ sudo docker info Containers: 1
Images: 364 Server Version: 1.9.1 Storage Driver: aufs Root Dir:
/var/lib/docker/aufs Backing Filesystem: extfs Dirs: 368 Dirperm1
Supported: true Execution Driver: native-0.2 Logging Driver:
json-file Kernel Version: 3.19.0-26-generic Operating System: Ubuntu
15.04 CPUs: 2 Total Memory: 3.593 GiB Name: mavungu-Aspire-5250 Registry: https://index.docker.io/v1/ WARNING: No swap limit support
I have a swarm cluster working well with consul as the discovery mechanism:
mavungu#mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 info
Containers: 4
Images: 51
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
mavungu-Aspire-5250: 192.168.0.36:2375
└ Containers: 1
└ Reserved CPUs: 0 / 2
└ Reserved Memory: 0 B / 3.773 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.19.0-26-generic, operatingsystem=Ubuntu 15.04, storagedriver=aufs
mavungu-HP-Pavilion-15-Notebook-PC: 192.168.0.18:2375
└ Containers: 3
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 3.942 GiB
└ Labels: executiondriver=native-0.2, kernelversion=4.2.0-19-generic, operatingsystem=Ubuntu 15.10, storagedriver=aufs
CPUs: 6
Total Memory: 7.715 GiB
Name: bb47f4e57436
My consul is available at 192.168.0.18:8500 and it works well with the swarm cluster.
I would like to be able to create an overlay network across the two hosts. I have configured the docker engines on both hosts with this additional settings:
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:0"
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:0"
I had to stop and restart the engines and reset the swarm cluster...
After failing to create the overlay network, I changed the --cluster-advertise setting to this :
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"
But still it did not work. I am not sure of what ip:port should be set for the --cluster-advertise= . Docs, discussions and tutorials are not clear on this advertise thing.
There is something that I am getting wrong here. Please help.
When you execute the docker runcommand, be sure to add --net myapp.
Here is a full step-by-step tutorial (online version):
How to deploy swarm on a cluster with multi-hosts network
TL;DR: step-by-step tutorial to deploy a multi-hosts network using Swarm. I wanted to put online this tutorial ASAP so I didn't even take time for the presentation. The markdown file is available on the github of my website. Feel free to adapt and share it, it is licensed under a Creative Commons Attribution 4.0 International License.
Prerequisites
Environment
Tutorial done with docker engine 1.9.0.
Swarm agents are discovered through a shared file (other methods are available).
Consul 0.5.2 is used for discovery for the multi-hosts network swarm containers.
Swarm manager and consul master will be run on the machine named bugs20. Other nodes, bugs19, bugs18, bugs17 and bugs16, will be swarm agents and consul members.
Before we start
Consul is used for the multihost networking, any other key-value store can be used -- note that the engine supports Consul Etcd, and ZooKeeper.
Token (or static file) are used for the swarm agents discovery. Tokens use REST API, a static file is preferred.
The network
The network is range 192.168.196.0/25. The host named bugsN has the IP address 192.168.196.N.
The docker daemon
All nodes are running docker daemon as follow:
/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise eth0:2375 --cluster-store consul://127.0.0.1:8500
Options details:
-H tcp://0.0.0.0:2375
Binds the daemon to an interface to allow be part of the swarm cluster. An IP address can obviously be specificied, it is a better solution if you have several NIC.
--cluster-advertise eth0:2375
Defines the interface and the port of the docker daemon should use to advertise itself.
--cluster-store consul://127.0.0.1:8500
Defines the URL of the distributed storage backend. In our case we use consul, though there are other discovery tools that can be used, if you want to make up your mind you should be interested in reading this service discovery comparison.
As consul is distributed, the URL can be local (remember, swarm agents are also consul members) and this is more flexible as you don't have to specify the IP address of the consul master and be selected after the docker daemon has been started.
The aliases used
In the following commands these two aliases are used:
alias ldocker='docker -H tcp://0.0.0.0:2375'
alias swarm-docker='docker -H tcp://0.0.0.0:5732' #used only on the swarm manager
Be sure to have the path of the consul binary in your $PATH. Once you are in the directory just type export PATH=$PATH:$(pwd) will do the trick.
It is also assumed that the variable $IP has been properly set and exported. It can be done, thanks to .bashrc or .zshrc or else, with something like this:
export IP=$(ifconfig |grep "192.168.196."|cut -d ":" -f 2|cut -d " " -f 1)
Consul
Let's start to deploy all consul members and master as needed.
Consul master (bugs20)
consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -node=master -bind=$IP -client $IP
Options details:
agent -server
Start the consul agent as a server.
-bootstrap-expect 1
We expect only one master.
-node=master20
This consul server/master will be named "master20".
-bind=192.168.196.20
Specifies the IP address on which it should be bound. Optional if you have only one NIC.
-client=192.168.196.20
Specifies the RPC IP address on which the server should be bound. By default it is localhost. Note that I am unsure about the necessity of this option, and this force to add -rpc-addr=192.168.196.20:8400 for local request such as consul members -rpc-addr=192.168.196.20:8400 or consul join -rpc-addr=192.168.196.20:8400 192.168.196.9 to join the consul member that has the IP address 192.168.196.9.
Consul members (bugs{16..19})
consul agent -data-dir /tmp/consul -node=$HOSTNAME -bind=192.168.196.N
It is suggested to use tmux, or similar, with the option :setw synchronize-panes on so this one command: consul -d agent -data-dir /tmp/consul -node=$HOST -bind=$IP starts all consul members.
Join consul members
consul join -rpc-addr=192.168.196.20:8400 192.168.196.16
consul join -rpc-addr=192.168.196.20:8400 192.168.196.17
consul join -rpc-addr=192.168.196.20:8400 192.168.196.18
consul join -rpc-addr=192.168.196.20:8400 192.168.196.19
A one line command can be used too. If you are using zsh, then consul join -rpc-addr=192.168.196.20:8400 192.168.196.{16..19} is enough, or a foor loop: for i in $(seq 16 1 19); do consul join -rpc-addr=192.168.196.20:8400 192.168.196.$i;done. You can verify if your members are part of your consul deployment with the command:
consul members -rpc-addr=192.168.196.20:8400
Node Address Status Type Build Protocol DC
master20 192.168.196.20:8301 alive server 0.5.2 2 dc1
bugs19 192.168.196.19:8301 alive client 0.5.2 2 dc1
bugs18 192.168.196.18:8301 alive client 0.5.2 2 dc1
bugs17 192.168.196.17:8301 alive client 0.5.2 2 dc1
bugs16 192.168.196.16:8301 alive client 0.5.2 2 dc1
Consul members and master are deployed and working. The focus will now be on docker and swarm.
Swarm
In the following the creation of swarm manager and swarm members discovery are detailed using two different methods: token and static file. Tokens use a hosted discovery service with Docker Hub while static file is just local and does not use the network (nor any server). Static file solution should be preferred (and is actually easier).
[static file] Start the swarm manager while joining swarm members
Create a file named /tmp/cluster.disco with the content swarm_agent_ip:2375.
cat /tmp/cluster.disco
192.168.196.16:2375
192.168.196.17:2375
192.168.196.18:2375
192.168.196.19:2375
Then just start the swarm manager as follow:
ldocker run -v /tmp/cluster.disco:/tmp/cluster.disco -d -p 5732:2375 swarm manage file:///tmp/cluster.disco
And you're done !
[token] Create and start the swarm manager
On the swarm master (bugs20), create a swarm:
ldocker run --rm swarm create > swarm_id
This create a swarm and save the token ID in the file swarm_id of the current directory. Once created, the swarm manager need to be run as a daemon:
ldocker run -d -p 5732:2375 swarm manage token://`cat swarm_id`
To verify if it is started you can run:
ldocker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d28238445532 swarm "/swarm manage token:" 5 seconds ago Up 4 seconds 0.0.0.0:5732->2375/tcp cranky_liskov
[token] Join swarm members into the swarm cluster
Then the swarm manager will need some swarm agent to join.
ldocker run swarm join --addr=192.168.196.16:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.17:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.18:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.19:2375 token://`cat swarm_id`
std[in|out] will be busy, these commands need to be ran on different terminals. Adding -d abefore the join should solve this and enables a for-loop to be used for the joins.
After the join of the swarm members:
auzias#bugs20:~$ ldocker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d1de6e4ee3fc swarm "/swarm join --addr=1" 5 seconds ago Up 4 seconds 2375/tcp fervent_lichterman
338572b87ce9 swarm "/swarm join --addr=1" 6 seconds ago Up 4 seconds 2375/tcp mad_ramanujan
7083e4d6c7ea swarm "/swarm join --addr=1" 7 seconds ago Up 5 seconds 2375/tcp naughty_sammet
0c5abc6075da swarm "/swarm join --addr=1" 8 seconds ago Up 6 seconds 2375/tcp gloomy_cray
ab746399f106 swarm "/swarm manage token:" 25 seconds ago Up 23 seconds 0.0.0.0:5732->2375/tcp ecstatic_shockley
After the discovery of the swarm members
To verify if the members are well discovered, you can execute swarm-docker info:
auzias#bugs20:~$ swarm-docker info
Containers: 4
Images: 4
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
bugs16: 192.168.196.16:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs17: 192.168.196.17:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs18: 192.168.196.18:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs19: 192.168.196.19:2375
└ Containers: 4
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
CPUs: 48
Total Memory: 198.5 GiB
Name: ab746399f106
At this point swarm is deployed and all containers run will be run over different nodes. By executing several:
auzias#bugs20:~$ swarm-docker run --rm -it ubuntu bash
and then a:
auzias#bugs20:~$ swarm-docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
45b19d76d38e ubuntu "bash" 6 seconds ago Up 5 seconds bugs18/boring_mccarthy
53e87693606e ubuntu "bash" 6 seconds ago Up 5 seconds bugs16/amazing_colden
b18081f26a35 ubuntu "bash" 6 seconds ago Up 4 seconds bugs17/small_newton
f582d4af4444 ubuntu "bash" 7 seconds ago Up 4 seconds bugs18/naughty_banach
b3d689d749f9 ubuntu "bash" 7 seconds ago Up 4 seconds bugs17/pensive_keller
f9e86f609ffa ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/pensive_cray
b53a46c01783 ubuntu "bash" 7 seconds ago Up 4 seconds bugs18/reverent_ritchie
78896a73191b ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/gloomy_bell
a991d887a894 ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/angry_swanson
a43122662e92 ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/pensive_kowalevski
68d874bc19f9 ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/modest_payne
e79b3307f6e6 ubuntu "bash" 7 seconds ago Up 5 seconds bugs18/stoic_wescoff
caac9466d86f ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/goofy_snyder
7748d01d34ee ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/fervent_einstein
99da2a91a925 ubuntu "bash" 7 seconds ago Up 5 seconds bugs18/modest_goodall
cd308099faac ubuntu "bash" 7 seconds ago Up 6 seconds bugs19/furious_ritchie
As shown, the containers are disseminated over bugs{16...19}.
Multi-hosts network
A network overlay is needed so all the containers can be "plugged in" this overlay. To create this network overlay, execute:
auzias#bugs20:~$ swarm-docker network create -d overlay net
auzias#bugs20:~$ swarm-docker network ls|grep "net"
c96760503d06 net overlay
And voilà !
Once this overlay is created, add --net net to the command swarm-docker run --rm -it ubuntu bash and all your containers will be able to communicate natively as if they were on the same LAN. The default network is 10.0.0.0/24.
Enabling Multicast
Multicast is not support by the default overlay. Another driver is required to be able to use multicast. The docker plugin weave net does support multicast.
To use this driver, once installed, you will need to run $weave launch on all Swarm agents and Swarm manager. Then you'll need to connect the weave together, this is done by running $weave connect $SWARM_MANAGER_IP. It is not obviously the IP address of the Swarm manager but it is cleaner to do so (or use another node than the Swarm agents).
At this point the weave cluster is deployed, but no weave network has been created. Running $swarm-docker network create --driver weave weave-net will create the weave network named weave-net. Starting containers with the --net weave-net will enable them to share the same LAN and use multicast. Example of a full command to start such containers is: $swarm-docker run --rm -it --privileged --net=weave-net ubuntu bash.
I think the options that you specify should use cluster-store=consul instead of cluster-store-consul. Try to reset and restart the engine and swarm and check if it works. It should work after that. The getting started doc clearly explains how to configure docker overlay networks using consul as backing data-store.
DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"
DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"
For anyone coming to this since Docker 1.12 was released, this is now trivially easy - Swarm Mode is built into the engine and you don't need Consul or any other extra components.
Assuming you have two hosts with Docker installed, intitialize the Swarm on the first machine:
> docker swarm init
Swarm initialized: current node (6ujd4o5fx1dmav5uvv4khrp33) is now a manager
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc \
172.17.0.54:2377
That host becomes the first manager node in the swarm, and it writes out the command you use to join other nodes to the swarm - the secret token, and the IP address where the manager is listening.
On the second host:
> docker swarm join 172.17.0.54:2377 --token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc
This node joined a swarm as a worker.
Now you have a secure 2-node swarm which has service discovery, rolling updates and service scaling.
Create your overlay network on the manager node with:
> docker network create -d overlay my-net
d99lmsfzhcb16pdp2k7o9sehv
And you now have a multi-host overlay network with built-in DNS, so services can resolve each other based on service name.

Resources