Could not get apiVersions from Kubernetes: Unable to retrieve the complete list of server APIs - azure

While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service

Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.

This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.

Related

Kubernetes many restarts but pod keeps running

I'm seeing a lot of restarts on all the pods of every service that I have deployed on Kubernetes.
But when I see the logs in real time:
kubectl -n my-namespace logs -c my-pod -f my-pod-some-hash --tail=50
I see nothing, there's no restarts, there's no signal of failure. Readiness keep workings. So what it means all those restarts? Where or how can I get more info about those restarts?
Edit:
By viewing the pod details of the pod that has 158 on the picture above, I can see this, but I don't know what it means or if it's related to the restarts:
Replication via one sample example pod with CLI commands
If any pod restarts, in order to check the logs of the previous run user "--previous"
Step1:
Connect to cluster using below command
az aks get-credentials --resource-group <resourcegroupname> --name <Clustername>
Step2:
verify the pod logs
kubectl get pods
Step3:
Verify the restart pods logs using command
kubectl logs <PodName> --previous

Kops rolling-update fails with "Cluster did not pass validation" for master node

For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). In the manifest I'm upgrading kubernetesVersion from 1.11.9 -> 1.12.9. This is the only change I'm making. However when I run kops rolling-update cluster --yes I get the following error:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes I no longer see that master node in my cluster.
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd there is no output.
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try?
I have made some research I got few ideas for you:
If there is no output for the etcd grep it means that your etcd server is down. Check the logs for the 'Exited' etcd container | grep Exited | grep etcd and than logs <etcd-container-id>
Try this instruction I found:
1 - I removed the old master from de etcd cluster using etcdctl. You
will need to connect on the etcd-server container to do this.
2 - On the new master node I stopped kubelet and protokube services.
3 - Empty Etcd data dir. (data and data-events)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and
etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to
existing.
5 - Get the name and PeerURLS from new master and use etcdctl to add
the new master on the cluster. (etcdctl member add "name"
"PeerULR")You will need to connect on the etcd-server container to do
this.
6 - Start kubelet and protokube services on the new master.
If that is not the case than you might have a problem with the certs. They are provisioned during the creation of the cluster and some of them have the allowed master's endpoints. If that is the case you'd need to create new certs and roll them for the api server/etcd clusters.
Please let me know if that helped.

Azure AKS HPA failed to get cpu utilization

I have a single node K8s cluster in Azure using AKS. I created a deployment and a service using a simple command:
kubectl run php-apache --image=pilchard/hpa-example:latest --requests=cpu=200m,memory=300M --expose --port=80
And enabled HPA via command:
kubectl autoscale deployment php-apache --cpu-percent=10 --min=1 --max=15
Upon running kubectl describe hpa php-apache, I see an error saying:
horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
And CPU metric is unknown upon running kubectl get hpa. Any help would be really appreciated.
My AKS kube version is v1.9.11.
You either need to install the heapster(Deprecated) or the metrics-server minimally to be able to use an HPA.
This provides the minimum set of CPU and Memory metrics to be able to autoscale. A good way to see if you have either installed is that if you get this kind output from kubectl top pod:
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
http-svc-xxxxxxxxxx-xxxxx 1m 7Mi
myapp-pod 0m 53Mi
sleep-xxxxxxxxxx-xxxxx 4m 27Mi
if you have metrics-server added (which comes as default with latest asks) do the following to fix it.
kubectl edit deployment applicationName
in the above cmd replace appliationName with name of your application and if you call youre deployment.yml deploy then replace that too and in the edit mode add the following after "terminationMessagePolicy" line.
resource:
requests:
cpu: 50m
limits:
cpu: 500m
Once you have finished editing press "esc" key and type :wq your changes will be saved and after few seconds if you do
kubectl get hpa
or
kubectl describe hpa applicationName
you should not see any error,
please note to change the cpu limit and request based on your application usage they are only example values here

Cannot get kube-dns to start on Kubernetes

Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

External DNS resolution stopped working in Container Engine

I have a simple container on Google Container Engine that has been running for months with no issues. Suddenly, I cannot resolve ANY external domain. In troubleshooting I have re-created the container many times, and upgraded the cluster version to 1.4.7 in an attempt to resolve with no change.
To rule the app code out as much as possible, even a basic node.js code cannot resolve an external domain:
const dns = require('dns');
dns.lookup('nodejs.org', function(err, addresses, family) {
console.log('addresses:', addresses);
});
/* logs 'undefined' */
The same ran on a local machine or local docker container works as expected.
This kubectl call fails as well:
# kubectl exec -ti busybox -- nslookup kubernetes.default
nslookup: can't resolve 'kubernetes.default'
Two show up when getting kube-dns pods (admittedly not sure if that is expected)
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
kube-dns-v20-v8pd6 3/3 Running 0 1h
kube-dns-v20-vtz4o 3/3 Running 0 1h
Both say this when trying to check for errors in the DNS pod:
# kubectl logs --namespace=kube-system pod/kube-dns-v20-v8pd6 -c kube-dns
Error from server: container kube-dns is not valid for pod kube-dns-v20-v8pd6
I expect the internally created kube-dns is not properly pulling external DNS results or some other linkage disappeared.
I'll accept almost any workaround if one exists, as this is a production app - perhaps it is possible to manually set nameservers in the Kubernetes controller YAML file or elsewhere. Setting the contents of /etc/resolv.conf in Dockerfile does not seem to work.
Just checked and in our own clusters we usually have 3 kube-dns pods so something seems off there.
What does this say: kybectl describe rc kube-dns-v20 --namespace=kube-system
What happens when you kill the kube-dns pods? (the rc should automatically restart them)
What happens when you do an nslookup with a specific nameserver? nslookup nodejs.org 8.8.8.8

Resources