kubectl cluster-info why is running on control plane and not master node - linux

Why kubectl cluster-info is running on control plane and not master node
And on the control plane it is running on a specific IP Address https://192.168.49.2:8443
and not not localhost or 127.0.0.1
Running the following command in terminal:
minikube start --driver=docker
😄 minikube v1.20.0 on Ubuntu 16.04
✨ Using the docker driver based on user configuration
🎉 minikube 1.21.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.21.0
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
> gcr.io/k8s-minikube/kicbase...: 358.10 MiB / 358.10 MiB 100.00% 797.51 K
❗ minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.22, but successfully downloaded kicbase/stable:v0.0.22 as a fallback image
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.6 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by
default
kubectl cluster-info
Kubernetes control plane is running at https://192.168.49.2:8443
KubeDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

The Kubernetes project is making an effort to move away from wording that can be considered offensive, with one concrete recommendation being renaming master to control-plane. In other words control-plane and master mean essentially the same thing, and the goal is to switch the terminology to use control-plane exclusively going forward. (More info in this answer)
The kubectl command is a command line interface that executes on a client (i.e your computer) and interacts with the cluster through the control-plane.
The IP address you are seing through cluster-info is the IP address through which you reach the control-plane

Related

Docker: Linux Worker does not join Overlay network defined by Windows Master. Windows Worker works fine

I have the following 3 nodes..
C:\Users\yadamu>docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
4tpray751pk50bl6o6gtbjfq2 YADAMU-DB5 Ready Active 20.10.8
xzx0gxu1m0qo59z6gtr4j2i1p * yadamu-db3 Ready Active Leader 20.10.8
x1yh3l6m6k73gytkxx3ipimq4 yadamu-db4 Ready Active 20.10.10
On yadamu-db3 (Manager), a Windows 11 box, I created an overlay network with
docker network create --driver overlay --attachable Y_OVERLAY
I then started a set of containers using docker-compose on YADAMU-DB3. They came up as expected and could talk to each other. I then started a second set of containers on YADAMU-DB5, which is also a Windows 11 box using a different docker compose file and they could also talk to each other and to the to containers running on YADAMU-DB5.
I then started a third set of containers using docker compose on YADAMU-DB4, which is running Oracle Enterpise Linux 8. These container can talk to each other but are isolated from the containers running on YADAMU-DB3 and YADAMU-DB5.
All three docker compose files contain the following 'networks' sections
networks:
YADAMU-NET:
name: Y_OVERLAY0
attachable : true
However when I run docker-compose on the linux box I see
C:\Development\YADAMU\docker\dockerfiles\swarm\YADAMU-DB4>docker-compose up -d
WARNING: The Docker Engine you're using is running in swarm mode.
Compose does not use swarm mode to deploy services to multiple nodes in a swarm. All containers will be scheduled on the current node.
To deploy your application across the swarm, use `docker stack deploy`.
Creating network "Y_OVERLAY" with the default driver
Creating volume "YADAMU_01-SHARED" with default driver
and when I list the networks I see
C:\Development\YADAMU\docker\dockerfiles\swarm\YADAMU-DB4>docker network ls
NETWORK ID NAME DRIVER SCOPE
b87206cfac4c Y_OVERLAY bridge local
393beef01a7f bridge bridge local
e19e5f965e8d docker_gwbridge bridge local
1b4cbfa566f0 host host local
y58mwnbratkj ingress overlay swarm
32a41d9b3d7c none null local
Where as when I list the networks on YADAMU-DB5 I see
C:\Users\Mark D Drake>docker network ls
NETWORK ID NAME DRIVER SCOPE
luh9dw47k5a1 Y_OVERLAY overlay swarm
y58mwnbratkj ingress overlay swarm
8fd8ef298f47 nat nat local
dce21ec8e1ae none null local
So it appears the on the LINUX box it has not resolved Y_OVERLAY as the overlay network defined by the swarm manager.
Any ideas what I'm missing here..
Note the intent here is not to build a resiliant swarm it's to build a qa environment that can us used for testing interaction between windows and linux hosts on limited hardware.
I found a workaround.. If I down the containers on the linux node, and then promote it to a 'Manager', and then run docker-compose up on the linux node, the contains join the Overlay network...
It's doesn't explain why this did not work with the linux node as a worker, but it means I can move ahead.

userns-remap option in Docker Swarm (existing) installation

I decided to increase security by enabling userns-remap option in Docker running in swarm mode.
Installation is not new, there's plenty of running services.
Followed configuration with official manual: https://docs.docker.com/engine/security/userns-remap/
Docker service is starting but docker service ls throws error:
Handler for GET /v1.40/services returned error: This node is not a swarm manager. Use docker swarm init or docker swarm join to connect this node to swarm and try again
Error getting services: This node is not a swarm manager. Use docker swarm init or docker swarm join to connect this node to swarm and try again
cat /etc/docker/daemon.json is simple as
{
"userns-remap": "default"
}
cat /etc/subuid /etc/subgid
dockremap:100000:65536
dockremap:100000:65536
id dockremap
uid=1000(dockremap) gid=1000(dockremap) groups=1000(dockremap)
ls -ld /var/lib/docker/100000.100000/
drwx------ 11 231072 231072 26 Mar 21 20:19 /var/lib/docker/100000.100000/
Removing userns-remap from config brings services back to normal.
Running CentOS 7.7 and docker 19.03.8
How can I make it work?
From https://docs.docker.com/engine/security/userns-remap/:
Enabling userns-remap effectively masks existing image and container
layers, as well as other Docker objects within /var/lib/docker/. This
is because Docker needs to adjust the ownership of these resources and
actually stores them in a subdirectory within /var/lib/docker/. It is
best to enable this feature on a new Docker installation rather than
an existing one.
Ergo - all existing images and containers will not be available after enabling user namespace.

Kops rolling-update fails with "Cluster did not pass validation" for master node

For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). In the manifest I'm upgrading kubernetesVersion from 1.11.9 -> 1.12.9. This is the only change I'm making. However when I run kops rolling-update cluster --yes I get the following error:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes I no longer see that master node in my cluster.
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd there is no output.
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try?
I have made some research I got few ideas for you:
If there is no output for the etcd grep it means that your etcd server is down. Check the logs for the 'Exited' etcd container | grep Exited | grep etcd and than logs <etcd-container-id>
Try this instruction I found:
1 - I removed the old master from de etcd cluster using etcdctl. You
will need to connect on the etcd-server container to do this.
2 - On the new master node I stopped kubelet and protokube services.
3 - Empty Etcd data dir. (data and data-events)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and
etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to
existing.
5 - Get the name and PeerURLS from new master and use etcdctl to add
the new master on the cluster. (etcdctl member add "name"
"PeerULR")You will need to connect on the etcd-server container to do
this.
6 - Start kubelet and protokube services on the new master.
If that is not the case than you might have a problem with the certs. They are provisioned during the creation of the cluster and some of them have the allowed master's endpoints. If that is the case you'd need to create new certs and roll them for the api server/etcd clusters.
Please let me know if that helped.

Cannot get kube-dns to start on Kubernetes

Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

how to create docker overlay network between multi hosts?

I have been trying to create an overlay network between two hosts with no success. I keep getting the error message:
mavungu#mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 network create -d overlay myapp
Error response from daemon: 500 Internal Server Error: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)
mavungu#mavungu-Aspire-5250:~$ sudo docker network create -d overlay myapp
[sudo] password for mavungu:
Error response from daemon: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)
My environment details:
mavungu#mavungu-Aspire-5250:~$ sudo docker info Containers: 1
Images: 364 Server Version: 1.9.1 Storage Driver: aufs Root Dir:
/var/lib/docker/aufs Backing Filesystem: extfs Dirs: 368 Dirperm1
Supported: true Execution Driver: native-0.2 Logging Driver:
json-file Kernel Version: 3.19.0-26-generic Operating System: Ubuntu
15.04 CPUs: 2 Total Memory: 3.593 GiB Name: mavungu-Aspire-5250 Registry: https://index.docker.io/v1/ WARNING: No swap limit support
I have a swarm cluster working well with consul as the discovery mechanism:
mavungu#mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 info
Containers: 4
Images: 51
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
mavungu-Aspire-5250: 192.168.0.36:2375
└ Containers: 1
└ Reserved CPUs: 0 / 2
└ Reserved Memory: 0 B / 3.773 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.19.0-26-generic, operatingsystem=Ubuntu 15.04, storagedriver=aufs
mavungu-HP-Pavilion-15-Notebook-PC: 192.168.0.18:2375
└ Containers: 3
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 3.942 GiB
└ Labels: executiondriver=native-0.2, kernelversion=4.2.0-19-generic, operatingsystem=Ubuntu 15.10, storagedriver=aufs
CPUs: 6
Total Memory: 7.715 GiB
Name: bb47f4e57436
My consul is available at 192.168.0.18:8500 and it works well with the swarm cluster.
I would like to be able to create an overlay network across the two hosts. I have configured the docker engines on both hosts with this additional settings:
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:0"
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:0"
I had to stop and restart the engines and reset the swarm cluster...
After failing to create the overlay network, I changed the --cluster-advertise setting to this :
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"
DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"
But still it did not work. I am not sure of what ip:port should be set for the --cluster-advertise= . Docs, discussions and tutorials are not clear on this advertise thing.
There is something that I am getting wrong here. Please help.
When you execute the docker runcommand, be sure to add --net myapp.
Here is a full step-by-step tutorial (online version):
How to deploy swarm on a cluster with multi-hosts network
TL;DR: step-by-step tutorial to deploy a multi-hosts network using Swarm. I wanted to put online this tutorial ASAP so I didn't even take time for the presentation. The markdown file is available on the github of my website. Feel free to adapt and share it, it is licensed under a Creative Commons Attribution 4.0 International License.
Prerequisites
Environment
Tutorial done with docker engine 1.9.0.
Swarm agents are discovered through a shared file (other methods are available).
Consul 0.5.2 is used for discovery for the multi-hosts network swarm containers.
Swarm manager and consul master will be run on the machine named bugs20. Other nodes, bugs19, bugs18, bugs17 and bugs16, will be swarm agents and consul members.
Before we start
Consul is used for the multihost networking, any other key-value store can be used -- note that the engine supports Consul Etcd, and ZooKeeper.
Token (or static file) are used for the swarm agents discovery. Tokens use REST API, a static file is preferred.
The network
The network is range 192.168.196.0/25. The host named bugsN has the IP address 192.168.196.N.
The docker daemon
All nodes are running docker daemon as follow:
/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise eth0:2375 --cluster-store consul://127.0.0.1:8500
Options details:
-H tcp://0.0.0.0:2375
Binds the daemon to an interface to allow be part of the swarm cluster. An IP address can obviously be specificied, it is a better solution if you have several NIC.
--cluster-advertise eth0:2375
Defines the interface and the port of the docker daemon should use to advertise itself.
--cluster-store consul://127.0.0.1:8500
Defines the URL of the distributed storage backend. In our case we use consul, though there are other discovery tools that can be used, if you want to make up your mind you should be interested in reading this service discovery comparison.
As consul is distributed, the URL can be local (remember, swarm agents are also consul members) and this is more flexible as you don't have to specify the IP address of the consul master and be selected after the docker daemon has been started.
The aliases used
In the following commands these two aliases are used:
alias ldocker='docker -H tcp://0.0.0.0:2375'
alias swarm-docker='docker -H tcp://0.0.0.0:5732' #used only on the swarm manager
Be sure to have the path of the consul binary in your $PATH. Once you are in the directory just type export PATH=$PATH:$(pwd) will do the trick.
It is also assumed that the variable $IP has been properly set and exported. It can be done, thanks to .bashrc or .zshrc or else, with something like this:
export IP=$(ifconfig |grep "192.168.196."|cut -d ":" -f 2|cut -d " " -f 1)
Consul
Let's start to deploy all consul members and master as needed.
Consul master (bugs20)
consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -node=master -bind=$IP -client $IP
Options details:
agent -server
Start the consul agent as a server.
-bootstrap-expect 1
We expect only one master.
-node=master20
This consul server/master will be named "master20".
-bind=192.168.196.20
Specifies the IP address on which it should be bound. Optional if you have only one NIC.
-client=192.168.196.20
Specifies the RPC IP address on which the server should be bound. By default it is localhost. Note that I am unsure about the necessity of this option, and this force to add -rpc-addr=192.168.196.20:8400 for local request such as consul members -rpc-addr=192.168.196.20:8400 or consul join -rpc-addr=192.168.196.20:8400 192.168.196.9 to join the consul member that has the IP address 192.168.196.9.
Consul members (bugs{16..19})
consul agent -data-dir /tmp/consul -node=$HOSTNAME -bind=192.168.196.N
It is suggested to use tmux, or similar, with the option :setw synchronize-panes on so this one command: consul -d agent -data-dir /tmp/consul -node=$HOST -bind=$IP starts all consul members.
Join consul members
consul join -rpc-addr=192.168.196.20:8400 192.168.196.16
consul join -rpc-addr=192.168.196.20:8400 192.168.196.17
consul join -rpc-addr=192.168.196.20:8400 192.168.196.18
consul join -rpc-addr=192.168.196.20:8400 192.168.196.19
A one line command can be used too. If you are using zsh, then consul join -rpc-addr=192.168.196.20:8400 192.168.196.{16..19} is enough, or a foor loop: for i in $(seq 16 1 19); do consul join -rpc-addr=192.168.196.20:8400 192.168.196.$i;done. You can verify if your members are part of your consul deployment with the command:
consul members -rpc-addr=192.168.196.20:8400
Node Address Status Type Build Protocol DC
master20 192.168.196.20:8301 alive server 0.5.2 2 dc1
bugs19 192.168.196.19:8301 alive client 0.5.2 2 dc1
bugs18 192.168.196.18:8301 alive client 0.5.2 2 dc1
bugs17 192.168.196.17:8301 alive client 0.5.2 2 dc1
bugs16 192.168.196.16:8301 alive client 0.5.2 2 dc1
Consul members and master are deployed and working. The focus will now be on docker and swarm.
Swarm
In the following the creation of swarm manager and swarm members discovery are detailed using two different methods: token and static file. Tokens use a hosted discovery service with Docker Hub while static file is just local and does not use the network (nor any server). Static file solution should be preferred (and is actually easier).
[static file] Start the swarm manager while joining swarm members
Create a file named /tmp/cluster.disco with the content swarm_agent_ip:2375.
cat /tmp/cluster.disco
192.168.196.16:2375
192.168.196.17:2375
192.168.196.18:2375
192.168.196.19:2375
Then just start the swarm manager as follow:
ldocker run -v /tmp/cluster.disco:/tmp/cluster.disco -d -p 5732:2375 swarm manage file:///tmp/cluster.disco
And you're done !
[token] Create and start the swarm manager
On the swarm master (bugs20), create a swarm:
ldocker run --rm swarm create > swarm_id
This create a swarm and save the token ID in the file swarm_id of the current directory. Once created, the swarm manager need to be run as a daemon:
ldocker run -d -p 5732:2375 swarm manage token://`cat swarm_id`
To verify if it is started you can run:
ldocker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d28238445532 swarm "/swarm manage token:" 5 seconds ago Up 4 seconds 0.0.0.0:5732->2375/tcp cranky_liskov
[token] Join swarm members into the swarm cluster
Then the swarm manager will need some swarm agent to join.
ldocker run swarm join --addr=192.168.196.16:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.17:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.18:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.19:2375 token://`cat swarm_id`
std[in|out] will be busy, these commands need to be ran on different terminals. Adding -d abefore the join should solve this and enables a for-loop to be used for the joins.
After the join of the swarm members:
auzias#bugs20:~$ ldocker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d1de6e4ee3fc swarm "/swarm join --addr=1" 5 seconds ago Up 4 seconds 2375/tcp fervent_lichterman
338572b87ce9 swarm "/swarm join --addr=1" 6 seconds ago Up 4 seconds 2375/tcp mad_ramanujan
7083e4d6c7ea swarm "/swarm join --addr=1" 7 seconds ago Up 5 seconds 2375/tcp naughty_sammet
0c5abc6075da swarm "/swarm join --addr=1" 8 seconds ago Up 6 seconds 2375/tcp gloomy_cray
ab746399f106 swarm "/swarm manage token:" 25 seconds ago Up 23 seconds 0.0.0.0:5732->2375/tcp ecstatic_shockley
After the discovery of the swarm members
To verify if the members are well discovered, you can execute swarm-docker info:
auzias#bugs20:~$ swarm-docker info
Containers: 4
Images: 4
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
bugs16: 192.168.196.16:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs17: 192.168.196.17:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs18: 192.168.196.18:2375
└ Containers: 0
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
bugs19: 192.168.196.19:2375
└ Containers: 4
└ Reserved CPUs: 0 / 12
└ Reserved Memory: 0 B / 49.62 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
CPUs: 48
Total Memory: 198.5 GiB
Name: ab746399f106
At this point swarm is deployed and all containers run will be run over different nodes. By executing several:
auzias#bugs20:~$ swarm-docker run --rm -it ubuntu bash
and then a:
auzias#bugs20:~$ swarm-docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
45b19d76d38e ubuntu "bash" 6 seconds ago Up 5 seconds bugs18/boring_mccarthy
53e87693606e ubuntu "bash" 6 seconds ago Up 5 seconds bugs16/amazing_colden
b18081f26a35 ubuntu "bash" 6 seconds ago Up 4 seconds bugs17/small_newton
f582d4af4444 ubuntu "bash" 7 seconds ago Up 4 seconds bugs18/naughty_banach
b3d689d749f9 ubuntu "bash" 7 seconds ago Up 4 seconds bugs17/pensive_keller
f9e86f609ffa ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/pensive_cray
b53a46c01783 ubuntu "bash" 7 seconds ago Up 4 seconds bugs18/reverent_ritchie
78896a73191b ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/gloomy_bell
a991d887a894 ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/angry_swanson
a43122662e92 ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/pensive_kowalevski
68d874bc19f9 ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/modest_payne
e79b3307f6e6 ubuntu "bash" 7 seconds ago Up 5 seconds bugs18/stoic_wescoff
caac9466d86f ubuntu "bash" 7 seconds ago Up 5 seconds bugs17/goofy_snyder
7748d01d34ee ubuntu "bash" 7 seconds ago Up 5 seconds bugs16/fervent_einstein
99da2a91a925 ubuntu "bash" 7 seconds ago Up 5 seconds bugs18/modest_goodall
cd308099faac ubuntu "bash" 7 seconds ago Up 6 seconds bugs19/furious_ritchie
As shown, the containers are disseminated over bugs{16...19}.
Multi-hosts network
A network overlay is needed so all the containers can be "plugged in" this overlay. To create this network overlay, execute:
auzias#bugs20:~$ swarm-docker network create -d overlay net
auzias#bugs20:~$ swarm-docker network ls|grep "net"
c96760503d06 net overlay
And voilà !
Once this overlay is created, add --net net to the command swarm-docker run --rm -it ubuntu bash and all your containers will be able to communicate natively as if they were on the same LAN. The default network is 10.0.0.0/24.
Enabling Multicast
Multicast is not support by the default overlay. Another driver is required to be able to use multicast. The docker plugin weave net does support multicast.
To use this driver, once installed, you will need to run $weave launch on all Swarm agents and Swarm manager. Then you'll need to connect the weave together, this is done by running $weave connect $SWARM_MANAGER_IP. It is not obviously the IP address of the Swarm manager but it is cleaner to do so (or use another node than the Swarm agents).
At this point the weave cluster is deployed, but no weave network has been created. Running $swarm-docker network create --driver weave weave-net will create the weave network named weave-net. Starting containers with the --net weave-net will enable them to share the same LAN and use multicast. Example of a full command to start such containers is: $swarm-docker run --rm -it --privileged --net=weave-net ubuntu bash.
I think the options that you specify should use cluster-store=consul instead of cluster-store-consul. Try to reset and restart the engine and swarm and check if it works. It should work after that. The getting started doc clearly explains how to configure docker overlay networks using consul as backing data-store.
DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"
DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"
For anyone coming to this since Docker 1.12 was released, this is now trivially easy - Swarm Mode is built into the engine and you don't need Consul or any other extra components.
Assuming you have two hosts with Docker installed, intitialize the Swarm on the first machine:
> docker swarm init
Swarm initialized: current node (6ujd4o5fx1dmav5uvv4khrp33) is now a manager
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc \
172.17.0.54:2377
That host becomes the first manager node in the swarm, and it writes out the command you use to join other nodes to the swarm - the secret token, and the IP address where the manager is listening.
On the second host:
> docker swarm join 172.17.0.54:2377 --token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc
This node joined a swarm as a worker.
Now you have a secure 2-node swarm which has service discovery, rolling updates and service scaling.
Create your overlay network on the manager node with:
> docker network create -d overlay my-net
d99lmsfzhcb16pdp2k7o9sehv
And you now have a multi-host overlay network with built-in DNS, so services can resolve each other based on service name.

Resources