I have a simple container on Google Container Engine that has been running for months with no issues. Suddenly, I cannot resolve ANY external domain. In troubleshooting I have re-created the container many times, and upgraded the cluster version to 1.4.7 in an attempt to resolve with no change.
To rule the app code out as much as possible, even a basic node.js code cannot resolve an external domain:
const dns = require('dns');
dns.lookup('nodejs.org', function(err, addresses, family) {
console.log('addresses:', addresses);
});
/* logs 'undefined' */
The same ran on a local machine or local docker container works as expected.
This kubectl call fails as well:
# kubectl exec -ti busybox -- nslookup kubernetes.default
nslookup: can't resolve 'kubernetes.default'
Two show up when getting kube-dns pods (admittedly not sure if that is expected)
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
kube-dns-v20-v8pd6 3/3 Running 0 1h
kube-dns-v20-vtz4o 3/3 Running 0 1h
Both say this when trying to check for errors in the DNS pod:
# kubectl logs --namespace=kube-system pod/kube-dns-v20-v8pd6 -c kube-dns
Error from server: container kube-dns is not valid for pod kube-dns-v20-v8pd6
I expect the internally created kube-dns is not properly pulling external DNS results or some other linkage disappeared.
I'll accept almost any workaround if one exists, as this is a production app - perhaps it is possible to manually set nameservers in the Kubernetes controller YAML file or elsewhere. Setting the contents of /etc/resolv.conf in Dockerfile does not seem to work.
Just checked and in our own clusters we usually have 3 kube-dns pods so something seems off there.
What does this say: kybectl describe rc kube-dns-v20 --namespace=kube-system
What happens when you kill the kube-dns pods? (the rc should automatically restart them)
What happens when you do an nslookup with a specific nameserver? nslookup nodejs.org 8.8.8.8
Related
While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service
Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.
When I list the pods in a cluster (on a specific node and in all namespaces) then each pod listed also contains the container statuses, and therein I get the container runtime engine IDs of each of the containers listed.
To illustrate, I'm using this Python3 script to access the cluster API via the official Kubernetes Python client; this is a slightly modified version from How to find all Kubernetes Pods on the same node from a Pod using the official Python client?
from kubernetes import client, config
import os
def main():
# it works only if this script is run by K8s as a POD
config.load_incluster_config()
# use this outside pods
# config.load_kube_config()
# grab the node name from the pod environment vars
node_name = os.environ.get('KUHBERNETES_NODE_NAME', None)
v1 = client.CoreV1Api()
print("Listing pods with their IPs on node: ", node_name)
# field selectors are a string, you need to parse the fields from the pods here
field_selector = 'spec.nodeName='+node_name
ret = v1.list_pod_for_all_namespaces(watch=False, field_selector=field_selector)
for i in ret.items:
print("%s\t%s\t%s" %
(i.status.pod_ip, i.metadata.namespace, i.metadata.name))
for c in i.status.container_statuses:
print("\t%s\t%s" %
(c.name, c.container_id))
if __name__ == '__main__':
main()
N.B. The Pod uses a suitable ServiceAccount which enables it to list pods in all namespaces.
A typical result output when run on a minikube setup might look like this:
Listing pods with their IPs on node: minikube
172.17.0.5 cattle-system cattle-cluster-agent-c949f5b48-llm65
cluster-register docker://f12fcb1acbc2e7c01c24dbd831ed53ab2a6df2353abe80988ae132c39f7c68c6
10.0.2.15 cattle-system cattle-node-agent-hmq86
agent docker://e335a3d30ea37887ac2a1a1cc339eabb0a0098471f86db1926cfe02eef2c6b8f
172.17.0.6 gw pyk8s
py8ks docker://1272747b52983e8f745bd118b2d935c1d314e9c6cc310e88013021ba974bc030
172.17.0.4 kube-system coredns-c4cffd6dc-7lsdn
coredns docker://8b0c3c67532ee2d7d16958a33cb942d5bd09ed37ded1d570830b5f7e5f7a09ab
10.0.2.15 kube-system etcd-minikube
etcd docker://5e0e0ee48248e9779a2a5f9347a39c58743562b10719a31d7d6fc0af5e79e093
10.0.2.15 kube-system kube-addon-manager-minikube
kube-addon-manager docker://96908bc5d5fd9b87779c8a8544591e5aeda2d58956fb365ab595681605b01001
10.0.2.15 kube-system kube-apiserver-minikube
kube-apiserver docker://0711ec9a2321b1b5a801ab2b19409a1edc731058aa994978f989185efc4c8294
10.0.2.15 kube-system kube-controller-manager-minikube
kube-controller-manager docker://16d2e11a8dea2a46cd44bc97a5f894e7ff9da2da70f3c24376b4189dd912336e
172.17.0.2 kube-system kube-dns-86f4d74b45-wbdf6
dnsmasq docker://653c7ef27760a820449ee518b59e39ab4a7f65cade996ed85313c98038827f67
kubedns docker://6cf6aaeac1192cf1d580293e03164db57bc70bce41cf91e5cac081010fe48cf7
sidecar docker://9816e10d8455988aa400f98df32cfa69ce89fbfc3e3e1554145d9d6418c02157
10.0.2.15 kube-system kube-proxy-ll7lq
kube-proxy docker://6b8c7ce1ae3c8fbc487bf05ccca9105dffaf675f916cdb62a595d8be7902e69b
10.0.2.15 kube-system kube-scheduler-minikube
kube-scheduler docker://ab79e46ba900753d86b7000061720551a199c0ea6eee923fcd86bda2d86cc54a
172.17.0.3 kube-system kubernetes-dashboard-6f4cfc5d87-bmnl8
kubernetes-dashboard docker://a73ef6b30fb87826a4a71ba428a01511278a759d69fade82ddd654911ec3f14f
10.0.2.15 kube-system storage-provisioner
storage-provisioner docker://51eaf90bc3ae11baa354a436e366730c19206c73743c6517a0ad9eb8f0b89896
Please note that this lists the container IDs of the pod containers, except the pause container IDs. Is there an API method to also get/list the container IDs of the pause containers in pods?
I tried searching for things like "kubernetes api pod pause container id" ... but I did not get any useful answers, except the usual API results for containerStatuses, etc.
After some research into how Kubernetes' Docker shim works, it's clear that the pause containers are not visible at the Kubernetes cluster API. That's because pause containers are an artefact required with some container engines, such as Docker, but not in others (CRI-O if I'm not mistaken).
However, when the low-level Docker container view is necessary and needs to be related to the Kubernetes node-scheduled pod view, then the predictable Docker container naming scheme used in the Kubernetes Docker shim can be used. The shim creates the container names in the form of k8s_conainer_pod_namespace_uid_attempt with an optional _random suffix in case od hitting the Docker <=1.11 name conflict bug.
k8s is the fixed prefix which triggers the shim to regard this container as a Kubernetes container.
container is the name as specified in the pod spec. Please note that Kubernetes only allows lowercase a-z, 0-9, and dashes. Pause containers thus get the "reserved" name "POD" in all-uppercase.
pod is the pod name.
namespace is the namespace name as assigned, or "default".
pod UID with verying formats.
attempt is a counter starting from 0 that the shim needs in order to correctly manage pod updates, that is, container cleanup, etc.
See also:
container names implementation
name of pause pod
Docker name conflict bug
I am running Kubernetes local cluster with using ./hack/local-up-cluster.sh script. Now, when my firewall is off, all the containers in kube-dns are running:
```
# cluster/kubectl.sh get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-dns-73328275-87g4d 3/3 Running 0 45s
```
But when firewall is on, I can see only 2 containers are running:
```
# cluster/kubectl.sh get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-dns-806549836-49v7d 2/3 Running 0 45s
```
After investigating in details, turns out the pod is failing becase dnsmasq container is not running:
```
7m 7m 1 kubelet, 127.0.0.1 spec.containers{dnsmasq} Normal Killing Killing container with id docker://41ef024a0610463e04607665276bb64e07f589e79924e3521708ca73de33142c:pod "kube-dns-806549836-49v7d_kube-system(d5729c5c-24da-11e7-b166-52540083b23a)" container "dnsmasq" is unhealthy, it will be killed and re-created.
```
Can you help me with how do I run dnsmasq container with firewall on, and what exactly would I need to change? TIA.
Turns out my kube-dns service has no endpoints, any idea why that is?
You can turn off iptables( iptables -F ) before starting your cluster, it can solve your problems.
Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029
I've install a Kubernetes 1.2.4 on 3 minons/master (1 master/minion, 2 minions) and installed the SkyDNS addons. After fixing SSL cert problems, I know have SkyDNS working. But kubeletes still says that I didn't set cluster-dns and cluster-domain.
(see edits at the bottom)
But you can see --cluster-dns=192.168.0.10 --cluster-domain=cluster.local:
ps ax | grep kubelet
18717 ? Ssl 0:04 /opt/kubernetes/bin/kubelet --logtostderr=true --v=0 --address=0.0.0.0 --port=10250 --hostname-override=k8s-minion-1 --api-servers=http://k8s-master:8080 --allow-privileged=false --cluster-dns=192.168.0.10 --cluster-domain=cluster.local
Launching this pod:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
I see:
kubectl describe pod busybox
7m 7m 2 {kubelet k8s-master.XXX} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
I restarted kubelete services before to launch this pod, and I have no other pod running.
If I launch docker container using "--dns" option:
docker run --rm -it --dns 192.168.0.10 busybox nslookup cluster.local
Server: 192.168.0.10
Address 1: 192.168.0.10
Name: cluster.local
Address 1: 192.168.0.10
Address 2: 172.16.50.2
Address 3: 192.168.0.1
Address 4: 172.16.96.3
docker run --rm -it --dns 192.168.0.10 busybox cat /etc/resolv.conf
search XXX YYYY
nameserver 192.168.0.10
That's absolutly normal (I've hidden my client dns)
But the pod says something else:
kubectl exec busybox -- nslookup cluster.local
Server: XXX.YYY.XXX.YYY
Address 1: XXX.YYYY.XXXX.YYY XXX.domain.fr
nslookup: can't resolve 'cluster.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
I tried to set "--dns" option to the docker daemon, but the error is the same.
See that logs:
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v11-osikn 4/4 Running 0 13m
And:
kubectl logs kube-dns-v11-osikn kube2sky --namespace=kube-system
I0621 15:44:48.168080 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0621 15:44:49.170404 1 kube2sky.go:529] Using https://192.168.0.1:443 for kubernetes master
I0621 15:44:49.170422 1 kube2sky.go:530] Using kubernetes API <nil>
I0621 15:44:49.170823 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0621 15:44:49.209691 1 kube2sky.go:660] Successfully added DNS record for Kubernetes service.
"Using kubernetes API <nil>" is a problem, isn't it ?
edit: I forced kube-master-url in the pod to let kube2sky contacting the master.
kubectl logs kube-dns-v11-osikn skydns --namespace=kube-system
2016/06/21 15:44:50 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns/config) [10]
2016/06/21 15:44:50 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/06/21 15:44:50 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
Note this too:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 1/1 Running 0 17m
kube-system kube-dns-v11-osikn 4/4 Running 0 18m
So I've got no problem with skydns.
I'm sure that the problem comes from kubelet, I've tried to remove /var/lib/kubelet and restart the entire cluster. I've tried to restart kubelete services before and after installing dns also. I changed docker configuration, removed "--dns" option afterwoard, and I've got the same behaviour: Docker + dns is ok, Kubelet gives a MissingClusterDNS error saying that kubelet has got no configured cluster dns.
So please... Help (one more time :) )
EDITS:
- now kube2sky doesn't complain about <nil> api version forcing kube2sky option
- I can force nslookup to use my sky DNS:
kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local 192.168.0.10
Server: 192.168.0.10
Address 1: 192.168.0.10
Name: kubernetes.default.svc.cluster.local
Address 1: 192.168.0.1
But the "MissingClusterDNS" error remains on pod creation, as if kubelet doesn't the startup options "--cluster-dns" and "--cluster-domain"
#Brendan Burns:
kubectl get services --namespace=kube-system
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns 192.168.0.10 <none> 53/UDP,53/TCP 12m
I finally managed my problem... Shame on me (or not).
I've taken kubelet sources to understand what happends and now I found.
In the "kubelet" file, I set:
KUBE_ARGS="--cluster-dns=10.10.0.10 --cluster-domain=cluster.local"
And the log I added in source says that "cluster-dns" option as this value:
10.10.0.10 --cluster-domain=cluster.local
That's mainly because the config file is interpreted by SystemD as a "bash environment vars" so KUBE_ARGS is "one argument", and it's badly parsed by kubelet service.
The solution is to split variable in two and change kubelet.service file to use vars. After a call to systemctl daemon-reload; systemctl restart kubelet everything is ok.
I opened an issue here: https://github.com/kubernetes/kubernetes/issues/27722 where I explain that the comment in example config file is ambiguous and/or the arguments are not parsed as expected.
Have you created the DNS service with the right IP address?
What does kubectl get services --namespace=kube-system show?