Kubernetes Unable to connect to the server: dial tcp x.x.x.x:6443: i/o timeout - linux

I am using test kubenetes cluster (Kubeadm 1 master and 2 nodes setup), My public ip change time to time and when my public IP changed, I am unable to connect to cluster and i get below error
Kubernetes Unable to connect to the server: dial tcp x.x.x.x:6443: i/o timeout
I also have private IP 10.10.10.10 which is consistent all the time.
I have created kubernetes cluster using below command
kubeadm init --control-plane-endpoint 10.10.10.10
But still it failed because certificates are signed to public IP and below is the error
The connection to the server x.x.x.x:6443 was refused - did you specify the right host or port?
Can someone help to setup kubeadm, and should allow for all IP's something like 0.0.0.0 and I am fine for security view point since it is test setup. or any parament fix.

Since #Vidya has already solved this issue by using a static IP address, I decided to provide a Community Wiki answer just for better visibility to other community members.
First of all, it is not recommended to have a frequently changing master/server IP address.
As we can find in the discussion on GitHub kubernetes/88648 - kubeadm does not provide an easy way to deal with this.
However, there are a few workarounds that can help us, when the IP address on the Kubernetes master node changes.
Based on the discussion Changing master IP address, I prepared a script that regenerates certificates and re-init master node.
This script might be helpful, but I recommend running one command at a time (it will be safer).
In addition, you may need to customize some steps to your needs:
NOTE: In the example below, I'm using Docker as the container runtime.
root#kmaster:~# cat reinit_master.sh
#!/bin/bash
set -e
echo "Stopping kubelet and docker"
systemctl stop kubelet docker
echo "Making backup kubernetes data"
mv /etc/kubernetes /etc/kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup
echo "Restoring certificates"
mkdir /etc/kubernetes
cp -r /etc/kubernetes-backup/pki /etc/kubernetes/
rm /etc/kubernetes/pki/{apiserver.*,etcd/peer.*}
echo "Starting docker"
systemctl start docker
echo "Reinitializing master node"
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
echo "Updating kubeconfig file"
cp /etc/kubernetes/admin.conf ~/.kube/config
Then you need to rejoin the worker nodes to the cluster.

Related

Kops rolling-update fails with "Cluster did not pass validation" for master node

For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). In the manifest I'm upgrading kubernetesVersion from 1.11.9 -> 1.12.9. This is the only change I'm making. However when I run kops rolling-update cluster --yes I get the following error:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes I no longer see that master node in my cluster.
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd there is no output.
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try?
I have made some research I got few ideas for you:
If there is no output for the etcd grep it means that your etcd server is down. Check the logs for the 'Exited' etcd container | grep Exited | grep etcd and than logs <etcd-container-id>
Try this instruction I found:
1 - I removed the old master from de etcd cluster using etcdctl. You
will need to connect on the etcd-server container to do this.
2 - On the new master node I stopped kubelet and protokube services.
3 - Empty Etcd data dir. (data and data-events)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and
etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to
existing.
5 - Get the name and PeerURLS from new master and use etcdctl to add
the new master on the cluster. (etcdctl member add "name"
"PeerULR")You will need to connect on the etcd-server container to do
this.
6 - Start kubelet and protokube services on the new master.
If that is not the case than you might have a problem with the certs. They are provisioned during the creation of the cluster and some of them have the allowed master's endpoints. If that is the case you'd need to create new certs and roll them for the api server/etcd clusters.
Please let me know if that helped.

How to wait until services are ready

I have been setting up a Jenkins pipeline using docker images. Now I need to run various services like MySQL, Redis, Memcache, Beanstalkd and Elasticsearch. To wait the job until MySQL is ready, I am using the following command :
sh "while ! mysqladmin ping -u root -h mysqlhost ; do sleep 1; done"
sh 'echo MySQL server is up and running'
Where mysqlhost is the hostname I have provided for the container. Similarly, I need to check and wait for Redis, Memcached, Beanstalkd and Elasticsearch. But pinging to these services are not working as it is done for MySQL . How can I implement this ?
The Docker docs mention this script to manage container readiness checks: https://github.com/vishnubob/wait-for-it
I also use this one which is compatible with Alpine:
https://github.com/eficode/wait-for
You can do a curl to this services in order to check if they are alive or not.
For redis you can also do https://redis.io/commands/ping

External DNS resolution stopped working in Container Engine

I have a simple container on Google Container Engine that has been running for months with no issues. Suddenly, I cannot resolve ANY external domain. In troubleshooting I have re-created the container many times, and upgraded the cluster version to 1.4.7 in an attempt to resolve with no change.
To rule the app code out as much as possible, even a basic node.js code cannot resolve an external domain:
const dns = require('dns');
dns.lookup('nodejs.org', function(err, addresses, family) {
console.log('addresses:', addresses);
});
/* logs 'undefined' */
The same ran on a local machine or local docker container works as expected.
This kubectl call fails as well:
# kubectl exec -ti busybox -- nslookup kubernetes.default
nslookup: can't resolve 'kubernetes.default'
Two show up when getting kube-dns pods (admittedly not sure if that is expected)
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
kube-dns-v20-v8pd6 3/3 Running 0 1h
kube-dns-v20-vtz4o 3/3 Running 0 1h
Both say this when trying to check for errors in the DNS pod:
# kubectl logs --namespace=kube-system pod/kube-dns-v20-v8pd6 -c kube-dns
Error from server: container kube-dns is not valid for pod kube-dns-v20-v8pd6
I expect the internally created kube-dns is not properly pulling external DNS results or some other linkage disappeared.
I'll accept almost any workaround if one exists, as this is a production app - perhaps it is possible to manually set nameservers in the Kubernetes controller YAML file or elsewhere. Setting the contents of /etc/resolv.conf in Dockerfile does not seem to work.
Just checked and in our own clusters we usually have 3 kube-dns pods so something seems off there.
What does this say: kybectl describe rc kube-dns-v20 --namespace=kube-system
What happens when you kill the kube-dns pods? (the rc should automatically restart them)
What happens when you do an nslookup with a specific nameserver? nslookup nodejs.org 8.8.8.8

Cannot setup multi-host Docker overlay network with etcd

I am trying to connect two Docker hosts with an overlay network and am using etcd as a KV-store. etcd is running directly on the first host (not in a container). I finally managed to connect the Docker daemon of the first host to etcd but cannot manage to establish a connection the Docker daemon on the second host.
I downloaded etcd from the Github releases page and followed the instructions under the "Linux" section.
After starting etcd, it is listening to the following ports:
etcdmain: listening for peers on http://localhost:2380
etcdmain: listening for peers on http://localhost:7001
etcdmain: listening for client requests on http://localhost:2379
etcdmain: listening for client requests on http://localhost:4001
And I started the Docker daemon on the first host (on which etcd is running as well) like this:
docker daemon --cluster-advertise eth1:2379 --cluster-store etcd://127.0.0.1:2379
After that, I could also create an overlay network with:
docker network create -d overlay <network name>
But I can't figure out how to start the daemon on the second host. No matter which values I tried for --cluster-advertise and --cluster-store, I keep getting the following error message:
discovery error: client: etcd cluster is unavailable or misconfigured
Both my hosts are using the eth1 interface. The IP of host1 is 10.10.10.10 and the IP of host2 is 10.10.10.20. I already ran iperf to make sure they can connect to each other.
Any ideas?
So I finally figured out how to connect the two hosts and to be honest, I don't understand why it took me so long to solve the problem. But in case other people run into the same problem I will post my solution here. As mentioned earlier, I downloaded etcd from the Github release page and extracted the tar file.
I followed the instructions from the etcd documentation and applied it to my situation. Instead of running etcd with all the options directly from the command line I created a simple bash script. This makes it a lot easier to adjust the options and rerun the command. Once you figured out the right options it would be handy to place them separately in a config file and run etcd as a service as explaind in this tutorial. So here is my bash script:
#!/bin/bash
./etcd --name infra0 \
--initial-advertise-peer-urls http://10.10.10.10:2380 \
--listen-peer-urls http://10.10.10.10:2380 \
--listen-client-urls http://10.10.10.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.10.10.10:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.10.10.10:2380,infra1=http://10.10.10.20:2380 \
--initial-cluster-state new
I placed this file in the etcd-vX.X.X-linux-amd64 directory (that I just downloaded and extracted) which also contains the etcd binary. On the second host I did the same thing but changed the --name from infra0 to infra1 and adjusted the IP to that one the second host (10.10.10.20). The --initial-cluster option is not modified.
Then I executed the script on host1 first and then on host2. I'm not sure if the order matters, but in my case I got an error message when I did it the other way round.
To make sure your cluster is set up correctly you can run:
./etcdctl cluster-health
If the output looks similar to this (listing the two members) it should work.
member 357e60d488ae5ab3 is healthy: got healthy result from http://10.10.10.10:2379
member 590f234979b9a5ee is healthy: got healthy result from http://10.10.10.20:2379
If you want to be really sure, add a value to your store on host1 and retrieve it on host2:
host1$ ./etcdctl set myKey myValue
host2$ ./etcdctl get myKey
Setting up docker overlay network
In order to set up a docker overlay network I had to restart the Docker daemon with the --cluster-store and --cluster-advertise options. My solution is probably not the cleanest one but it works. So on both hosts first stopped the docker service and then restarted the daemon with the options:
sudo service docker stop
sudo /usr/bin/docker daemon --cluster-store=etcd://10.10.10.10:2379 --cluster-advertise=10.10.10.10:2379
Note that on host2 the IP addresses need to be adjusted. Then I created the overlay network like this on one of the hosts:
sudo docker network create -d overlay <network name>
If everything worked correctly, the overlay network can now be seen on the other host. Check with this command:
sudo docker network ls

Restarting named container assigns different IP

Im a trying to deploy my application using Docker and came across an issue that restarting named containers assigns a different IP to container. Maybe explaining what I am doing will better explain the issue:
Postgres runs inside a separate container named "postgres"
$ PG_ID=$(docker run --name postgres postgres/image)
My webapp container links to postgres container
$ APP_ID=$(docker run --link postgres:postgres webapp/image)
Linking postgres container image to webapp container inserts in webapp container a hosts file entry with the IP of the postgres container. This allows me to point to postgres db within my webapp using postgres:5432 (I am using Django btw). This all works well except if for some reason postgres crashes.
Before I manually stop postgres process to simulate postgres process crashing I verify IP of postgres container:
$ docker inspect --format "{{.NetworkSettings.IPAddress}}" $PG_ID
172.17.0.73
Now to simulate crash I stop postgres container:
$ docker stop $PG_ID
If now I restart postgres by using
$ docker start $PG_ID
the ip of the container changes:
$ docker inspect --format "{{.NetworkSettings.IPAddress}}" $PG_ID
172.17.0.74
Therefore the IP which points to postgres container in webapp container is no longer correct. I though that by naming container docker assigns a name to it with specific configs so that you can reliably link between containers (both network and volumes). If the IP changes this seems to defeat the purpose.
If I have to restart my webapp process each time I postgres restarts, this does not seem any better than just using a single container to run both processes. Then I can use supervisor or something similar to keep both of them running and use localhost to link between processes.
I am still new to Docker so am I doing something wrong or is this a bug in docker?
2nd UPDATE: maybe you already discovered this, but as workaround, I plan to map the service to share the database to the host interface (ej: with -p 5432:5432), and connect the webapps to the host IP (the IP of the docker0 interface: in my Ubuntu and CentOS, the IP is 172.17.42.1). If you restart the postgres container, the conteiner's IP will change, but I wil be accesible using 172.17.42.1:5432. The downside is that you are exposing that port to all the containers, and loose the fine-grained mapping that --link gives you.
--- OLD UPDATES:
CORRECTION: Docker will map 'postgres' to the container's IP in the /etc/hosts files, on the webapp container. So, in the webapp container, you can ping 'postgres', and it will be mapped to the IP.
1st UPDATE: I've seen that Docker generates and mounts /etc/hosts, /etc/resolv.conf, etc. to have always the correct information, but this does not apply when the linked container is restarted. So, I've assumed (wrongly) that Docker would update the hosts files.
-- ORIGINAL (wrong) response:
Add --hostname=postgres-db (you can use anythin, I'm using something different than 'postgres' to avoid confussion with the container name):
$ docker run --name postgres --hostname postgres-db postgres/image
Docker will map 'postgres-db' to the container's IP (check the contents of /etc/hosts on the webapp container).
This will allow you run 'ping postgres-db' from the webapp container. If the IP changes, Dockers will update /etc/hosts for you.
In the Django app, use 'postgres-db' instead of the IP (or whatever you use for --hostname of the container with PostgreSql).
Bye!
Horacio
According to https://docs.docker.com/engine/reference/commandline/run/, it should be possible to assign a static IP for your container -- at the time of container creation -- using the --ip option:
Example:
docker run -itd --ip 172.30.100.104 --name postgres postgres/image
....where 172.30.100.104 is a free IP address on a custom bridge/overlay network.
This should then retain the same IP address even if postgres container crashes/restarts.
Looks like this was released in Docker Engine v 1.10 or greater, therefore if you have a lower version, you have to upgrade first.
As of Docker 1.0 they implemented a stronger sense of linked containers. Now you can use the container instance name as if it were the host name.
Here is a link
I found a link that better describes your problem. And while that question was answered I wonder whether or not this ambassador pattern might not solve the problem... this assumes that the ambassador is more reliable than the services that link.

Resources