Cannot get kube-dns to start on Kubernetes - dns

Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!

If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.

as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working

After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

Related

kubectl cluster-info why is running on control plane and not master node

Why kubectl cluster-info is running on control plane and not master node
And on the control plane it is running on a specific IP Address https://192.168.49.2:8443
and not not localhost or 127.0.0.1
Running the following command in terminal:
minikube start --driver=docker
😄 minikube v1.20.0 on Ubuntu 16.04
✨ Using the docker driver based on user configuration
🎉 minikube 1.21.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.21.0
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
> gcr.io/k8s-minikube/kicbase...: 358.10 MiB / 358.10 MiB 100.00% 797.51 K
❗ minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.22, but successfully downloaded kicbase/stable:v0.0.22 as a fallback image
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.6 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by
default
kubectl cluster-info
Kubernetes control plane is running at https://192.168.49.2:8443
KubeDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The Kubernetes project is making an effort to move away from wording that can be considered offensive, with one concrete recommendation being renaming master to control-plane. In other words control-plane and master mean essentially the same thing, and the goal is to switch the terminology to use control-plane exclusively going forward. (More info in this answer)
The kubectl command is a command line interface that executes on a client (i.e your computer) and interacts with the cluster through the control-plane.
The IP address you are seing through cluster-info is the IP address through which you reach the control-plane

Could not get apiVersions from Kubernetes: Unable to retrieve the complete list of server APIs

While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service
Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.

exec user process caused "exec format error" during setup

I'm trying to install haproxy-ingress under Kubernetes ver 1.18 (hosted on raspberry pi).
The master node has been correctly labeled with role=ingress-controller.
The kubectl create works also fine:
# kubectl create -f https://haproxy-ingress.github.io/resources/haproxy-ingress.yaml
namespace/ingress-controller created
serviceaccount/ingress-controller created
clusterrole.rbac.authorization.k8s.io/ingress-controller created
role.rbac.authorization.k8s.io/ingress-controller created
clusterrolebinding.rbac.authorization.k8s.io/ingress-controller created
rolebinding.rbac.authorization.k8s.io/ingress-controller created
configmap/haproxy-ingress created
daemonset.apps/haproxy-ingress created
But then, the pod is in crash loop:
# kubectl get pods -n ingress-controller -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
haproxy-ingress-dpcvc 0/1 CrashLoopBackOff 1 30s 192.168.1.101 purple.cloudlet <none> <none>
And the logs shows that error:
# kubectl logs haproxy-ingress-dpcvc -n ingress-controller
standard_init_linux.go:211: exec user process caused "exec format error"
Does anyone experience something similar? Can this be related to the arm (32-bit) architecture of the raspbian that I'm using?
raspberry pi's run arm architectures which unfortunately are not supported by haproxy-ingress.

Error: container "dnsmasq" is unhealthy, it will be killed and re-created while running local cluster in kubernetes

I am running Kubernetes local cluster with using ./hack/local-up-cluster.sh script. Now, when my firewall is off, all the containers in kube-dns are running:
```
# cluster/kubectl.sh get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-dns-73328275-87g4d 3/3 Running 0 45s
```
But when firewall is on, I can see only 2 containers are running:
```
# cluster/kubectl.sh get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-dns-806549836-49v7d 2/3 Running 0 45s
```
After investigating in details, turns out the pod is failing becase dnsmasq container is not running:
```
7m 7m 1 kubelet, 127.0.0.1 spec.containers{dnsmasq} Normal Killing Killing container with id docker://41ef024a0610463e04607665276bb64e07f589e79924e3521708ca73de33142c:pod "kube-dns-806549836-49v7d_kube-system(d5729c5c-24da-11e7-b166-52540083b23a)" container "dnsmasq" is unhealthy, it will be killed and re-created.
```
Can you help me with how do I run dnsmasq container with firewall on, and what exactly would I need to change? TIA.
Turns out my kube-dns service has no endpoints, any idea why that is?
You can turn off iptables( iptables -F ) before starting your cluster, it can solve your problems.

Docker Registry Stays Pending After Deployment

I have installed OpenShift Enterprise as per the online guide (quick installation) but I'm stuck at deploying the registry.
[https://docs.openshift.com/enterprise/3.0/admin_guide/install/docker_registry.html#deploy-registry][1]
I create the registry
oadm registry --config=/etc/openshift/master/admin.kubeconfig \
--credentials=/etc/openshift/master/openshift-registry.kubeconfig \
--images='registry.access.redhat.com/openshift3/ose-${component}:${version}'
I check that it was configured
[justin#172 ~]$ oc get se docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.144.220 5000/TCP
But it never runs it stays pending
[justin#172 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-deploy 0/1 Pending 0 2h
I try to get some more info
[justin#172 ~]$ oc logs docker-registry-1-deploy
[justin#172 ~]$
but the logs command returns nothing
I had attempted an install with one node sharing the machine with the master.
My nodes looked like this:
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
Note: SchedulingDisabled
I ran this command:
oc describe pod docker-registry-1-deploy
And it gave the reason for not being deployed which was that there were no nodes to schedule a deployment on. Just to get things going quickly I performed the install again added a node on another VM.
Then
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
node1.mydomain.com kubernetes.io/hostname=node1.mydomain.com Ready
and I managed to successfully deploy the registry.

Resources