kube-dns stays in ContainerCreating status - dns

I have 5 machines running Ubuntu 16.04.1 LTS. I want to set them up as a Kubernetes Cluster. Iḿ trying to follow this getting started guide where they're using kubeadm.
It all worked fine until step 3/4 Installing a pod network. I've looked at there addon page to look for a pod network and chose the flannel overlay network. Iǘe copied the yaml file to the machine and executed:
root#up01:/home/up# kubectl apply -f flannel.yml
Which resulted in:
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
So i thought that it went ok, but when I display all the pod stuff:
root#up01:/etc/kubernetes/manifests# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-d5f50 1/1 Running 0 50m
kube-system etcd-up01 1/1 Running 0 48m
kube-system kube-apiserver-up01 1/1 Running 0 50m
kube-system kube-controller-manager-up01 1/1 Running 0 49m
kube-system kube-discovery-1769846148-jvx53 1/1 Running 0 50m
kube-system kube-dns-2924299975-prlgf 0/4 ContainerCreating 0 49m
kube-system kube-flannel-ds-jb1df 2/2 Running 0 32m
kube-system kube-proxy-rtcht 1/1 Running 0 49m
kube-system kube-scheduler-up01 1/1 Running 0 49m
The problem is that the kube-dns keeps in the ContainerCreating state. I don't know what to do.

It is very likely that you missed this critical piece of information from the guide:
If you want to use flannel as the pod network, specify
--pod-network-cidr 10.244.0.0/16 if you’re using the daemonset manifest below.
If you omit this kube-dns will never leave the ContainerCreating STATUS.
Your kubeadm init command should be:
# kubeadm init --pod-network-cidr 10.244.0.0/16
and not
# kubeadm init

Did you try restarting NetworkManager ...? it worked for me.. Plus, it also worked when I also disabled IPv6.

Related

Rook and ceph on kubernetes

I am new to Kubernetes. I am in need of integrating rook and ceph, adding NFS as block storage. Does anyone have any working examples? I followed https://www.digitalocean.com/community/tutorials/how-to-set-up-a-ceph-cluster-within-kubernetes-using-rook this document and I am getting errors(stuck at container creating, stuck at pod initializing) while creating ceph cluster in rook on Kubernetes. Any help would be appreciated.
kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-provisioner-5bcd46f965-42f9r 0/5 ContainerCreating 0 12m
csi-cephfsplugin-provisioner-5bcd46f965-zszwz 5/5 Running 0 12m
csi-cephfsplugin-xcswb 3/3 Running 0 12m
csi-cephfsplugin-zwl9x 3/3 Running 0 12m
csi-rbdplugin-4mh9x 3/3 Running 0 12m
csi-rbdplugin-nlcjr 3/3 Running 0 12m
csi-rbdplugin-provisioner-6658cf554c-4xx9f 6/6 Running 0 12m
csi-rbdplugin-provisioner-6658cf554c-62xc2 0/6 ContainerCreating 0 12m
rook-ceph-detect-version-bwcmp 0/1 Init:0/1 0 9m18s
rook-ceph-operator-5dc456cdb6-n4tgm 1/1 Running 0 13m
rook-discover-l2r27 1/1 Running 0 13m
rook-discover-rxkv4 0/1 ContainerCreating 0 13m
Use kubectl describe pod <name> -n rook-ceph to see the list of events, it is on the bottom of the output. This will show where the pods get stuck.
It may be also the case that one of your nodes is in bad state, as it seems that some pod replicas are failing to start. You can confirm by running
kubectl get pod -o wide | grep -v Running
Possible all failing pods are running on the same node. If that is the case you can inspect the problematic node with
kubectl describe node [node]

Can't do 'helm install' on cluster. Tiller was installed by gitab

I created a cluster in GKE using Gitlab and installed Helm & Tiller and some other stuffs like ingress and gitlab runner using gitab's interface. But when I try to install something using helm from gcloud, it gives "Error: Transport is closing".
I did gcloud container clusters get-credentials ....
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default jaeger-deployment-59ffb979c8-lmjk5 1/1 Running 0 17h
gitlab-managed-apps certmanager-cert-manager-6c8cd9f9bf-67wnh 1/1 Running 0 17h
gitlab-managed-apps ingress-nginx-ingress-controller-75c4d99549-x66n4 1/1 Running 0 21h
gitlab-managed-apps ingress-nginx-ingress-default-backend-6f58fb5f56-pvv2f 1/1 Running 0 21h
gitlab-managed-apps prometheus-kube-state-metrics-6584885ccf-hr8fw 1/1 Running 0 22h
gitlab-managed-apps prometheus-prometheus-server-69b9f444df-htxsq 2/2 Running 0 22h
gitlab-managed-apps runner-gitlab-runner-56798d9d9d-nljqn 1/1 Running 0 22h
gitlab-managed-apps tiller-deploy-74f5d65d77-xk6cc 1/1 Running 0 22h
kube-system heapster-v1.6.0-beta.1-7bdb4fd8f9-t8bq9 2/2 Running 0 22h
kube-system kube-dns-7549f99fcc-bhg9t 4/4 Running 0 22h
kube-system kube-dns-autoscaler-67c97c87fb-4vz9t 1/1 Running 0 22h
kube-system kube-proxy-gke-cluster2-pool-1-05abcbc6-0s6j 1/1 Running 0 20h
kube-system kube-proxy-gke-cluster2-pool-2-67e57524-ht5p 1/1 Running 0 22h
kube-system metrics-server-v0.2.1-fd596d746-289nd 2/2 Running 0 22h
visual-react-10450736 production-847c7d879c-z4h5t 1/1 Running 0 22h
visual-react-10450736 production-postgres-64cfcf9464-jr74c 1/1 Running 0 22h
$ ./helm install stable/wordpress --tiller-namespace gitlab-managed-apps --name wordpress
E0127 10:27:29.790366 418 portforward.go:331] an error occurred forwarding 39113 -> 44134: error forwarding port 44134 to pod 86b33bdc7bc30c08d98fe44c0772517c344dd1bdfefa290b46e82bf84959cb6f, uid : exit status 1: 2019/01/27 04:57:29 socat[11124] E write(5, 0x14ed120, 186): Broken pipe
Error: transport is closing
Another one
$ ./helm install incubator/jaeger --tiller-namespace gitlab-managed-apps --name jaeger --set elasticsearch.rbac.create=true --set provisionDataStore.cassandra=false --set provisionDataStore.elasticsearch=true --set storage.type=elasticsearch
E0127 10:30:24.591751 429 portforward.go:331] an error occurred forwarding 45597 -> 44134: error forwarding port 44134 to pod 86b33bdc7bc30c08d98fe44c0772517c344dd1bdfefa290b46e82bf84959cb6f, uid : exit status 1: 2019/01/27 05:00:24 socat[13937] E write(5, 0x233d120, 8192): Connection reset by peer
Error: transport is closing
I tried forwarding ports myself and it never returns to prompt, takes forever.
kubectl port-forward --namespace gitlab-managed-apps tiller-deploy 39113:44134
Apparently installing anything from Gitab's ui uses Helm and those do not fail. Yet doing so from shell fails. Please help me out.
Thanks in advance.
I know it's late but I'll share this just in case someone else struggles with this issue. I've found an answer in the gitlab forums: HERE.
The trick is to export and decode the certificates from the tiller service account and pass them as arguments to helm like this:
helm list --tiller-connection-timeout 30 --tls --tls-ca-cert tiller-ca.crt --tls-cert tiller.crt --tls-key tiller.key ---all --tiller-namespace gitlab-managed-apps

How to configure pod disruption budget to drain kubernetes node?

I'd like to configure cluster autoscaler on AKS. When scaling down it fails due to PDB:
I1207 14:24:09.523313 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-0 cannot be removed: no enough pod disruption budget to move kube-system/metrics-server-5cbc77f79f-44f9w
I1207 14:24:09.523413 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-3 cannot be removed: non-daemonset, non-mirrored, non-pdb-assignedkube-system pod present: cluster-autoscaler-84984799fd-22j42
I1207 14:24:09.523438 1 scale_down.go:490] 2 nodes found to be unremovable in simulation, will re-check them at 2018-12-07 14:29:09.231201368 +0000 UTC m=+8976.856144807
All system pods have minAvailable: 1 PDB assigned manually. I can imagine that this is not working for PODs with only a single replica like the metrics-server:
❯ k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nodepool1-32797235-0 Ready agent 4h v1.11.4 10.240.0.4 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
aks-nodepool1-32797235-3 Ready agent 4h v1.11.4 10.240.0.6 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
❯ ks get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cluster-autoscaler-84984799fd-22j42 1/1 Running 0 2h 10.244.1.5 aks-nodepool1-32797235-3 <none>
heapster-5d6f9b846c-g7qb8 2/2 Running 0 1h 10.244.0.16 aks-nodepool1-32797235-0 <none>
kube-dns-v20-598f8b78ff-8pshc 4/4 Running 0 3h 10.244.1.4 aks-nodepool1-32797235-3 <none>
kube-dns-v20-598f8b78ff-plfv8 4/4 Running 0 1h 10.244.0.15 aks-nodepool1-32797235-0 <none>
kube-proxy-fjvjv 1/1 Running 0 1h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kube-proxy-szr8z 1/1 Running 0 1h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-2rhvg 2/2 Running 0 4h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-r2m4r 2/2 Running 0 4h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kubernetes-dashboard-68f468887f-c8p78 1/1 Running 0 4h 10.244.0.7 aks-nodepool1-32797235-0 <none>
metrics-server-5cbc77f79f-44f9w 1/1 Running 0 4h 10.244.0.3 aks-nodepool1-32797235-0 <none>
tiller-deploy-57f988f854-z9qln 1/1 Running 0 4h 10.244.0.8 aks-nodepool1-32797235-0 <none>
tunnelfront-7cf9d447f9-56g7k 1/1 Running 0 4h 10.244.0.2 aks-nodepool1-32797235-0 <none>
What needs be changed (number of replicas? PDB configuration?) for down-scaling to work?
Basically, this is an administration issues when draining nodes that are configured by PDB ( Pod Disruption Budget )
This is because the evictions are forced to respect the PDB you specify
you have two options:
Either force the hand:
kubectl drain foo --force --grace-period=0
you can check other options from the doc -> https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
or use the eviction api:
{
"apiVersion": "policy/v1beta1",
"kind": "Eviction",
"metadata": {
"name": "quux",
"namespace": "default"
}
}
Anyhow, the drain or the eviction api attempts delete on pod to let them be scheduled elswhere before completely draining the node
As mentioned in the docs:
the API can respond in one of three ways:
If the eviction is granted, then the pod is deleted just as if you had sent a DELETE request to the pod’s URL and you get back 200 OK.
If the current state of affairs wouldn’t allow an eviction by the rules set forth in the budget, you get back 429 Too Many Requests. This is typically used for generic rate limiting of any requests
If there is some kind of misconfiguration, like multiple budgets pointing at the same pod, you will get 500 Internal Server Error.
For a given eviction request, there are two cases:
There is no budget that matches this pod. In this case, the server always returns 200 OK.
There is at least one budget. In this case, any of the three above responses may apply.
If it gets stuck then you might need to do it manually
you can read me here or here

not able to access statefulset headless service from kubernetes

I have created a headless statefull service in kubernates. and cassandra db is running fine.
PS C:\> .\kubectl.exe get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cassandra None <none> 9042/TCP 50m
kubernetes 10.0.0.1 <none> 443/TCP 6d
PS C:\> .\kubectl.exe get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 49m
cassandra-1 1/1 Running 0 48m
cassandra-2 1/1 Running 0 48m
I am running all this on minikube. From my laptop i am trying to connect to 192.168.99.100:9402 using a java program. But it is not able to connect.
Looks like your service not defined with NodePort. can you change service type to NodePort and test it.
when we define svc to NodePort we should get two port number for the service.

Kubernetes on Rasperry Pi kube flannel CrashLoopBackOff and kube dns rpc error code = 2

I used this tutorial to set up a kubernetes cluster on my Raspberry 3.
I followed the instructions until the setup of flannel by:
curl -sSL https://rawgit.com/coreos/flannel/v0.7.0/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -
I get the following error message on kubectl get po --all-namespaces:
kube-system etcd-node01 1/1 Running
0 34m
kube-system kube-apiserver-node01 1/1 Running
0 34m
kube-system kube-controller-manager-node01 1/1 Running
0 34m
kube-system kube-dns-279829092-x4dc4 0/3 rpc error:
code = 2 desc = failed to start container
"de9b2094dbada10a0b44df97d25bb629d6fbc96b8ddc0c060bed1d691a308b37":
Error response from daemon: {"message":"cannot join network of a non
running container:
af8e15c6ad67a231b3637c66fab5d835a150da7385fc403efc0a32b8fb7aa165"}
15 39m
kube-system kube-flannel-ds-zk17g 1/2
CrashLoopBackOff
11 35m
kube-system kube-proxy-6zwtb 1/1 Running
0 37m
kube-system kube-proxy-wbmz2 1/1 Running
0 39m
kube-system kube-scheduler-node01 1/1 Running
Interestingly I have the same issue, installing kubernetes with flannel on my laptop with another tutorial.
Version details are here:
Client Version: version.Info{Major:"1", Minor:"6",
GitVersion:"v1.6.3",
GitCommit:"0480917b552be33e2dba47386e51decb1a211df6",
GitTreeState:"clean", BuildDate:"2017-05-10T15:48:59Z",
GoVersion:"go1.8rc2", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3",
GitCommit:"0480917b552be33e2dba47386e51decb1a211df6",
GitTreeState:"clean", BuildDate:"2017-05-10T15:38:08Z",
GoVersion:"go1.8rc2", Compiler:"gc", Platform:"linux/arm"}
Any suggestions, that might help?
I solved this issue by generating cluster-roles before setting up the pod network driver:
curl -sSL https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml | sed "s/amd64/arm/g" | kubectl create -f -
Then setting up the pod network driver by:
curl -sSL https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -
Worked for me so far...

Resources