Azure AKS HPA failed to get cpu utilization - azure

I have a single node K8s cluster in Azure using AKS. I created a deployment and a service using a simple command:
kubectl run php-apache --image=pilchard/hpa-example:latest --requests=cpu=200m,memory=300M --expose --port=80
And enabled HPA via command:
kubectl autoscale deployment php-apache --cpu-percent=10 --min=1 --max=15
Upon running kubectl describe hpa php-apache, I see an error saying:
horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
And CPU metric is unknown upon running kubectl get hpa. Any help would be really appreciated.
My AKS kube version is v1.9.11.

You either need to install the heapster(Deprecated) or the metrics-server minimally to be able to use an HPA.
This provides the minimum set of CPU and Memory metrics to be able to autoscale. A good way to see if you have either installed is that if you get this kind output from kubectl top pod:
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
http-svc-xxxxxxxxxx-xxxxx 1m 7Mi
myapp-pod 0m 53Mi
sleep-xxxxxxxxxx-xxxxx 4m 27Mi

if you have metrics-server added (which comes as default with latest asks) do the following to fix it.
kubectl edit deployment applicationName
in the above cmd replace appliationName with name of your application and if you call youre deployment.yml deploy then replace that too and in the edit mode add the following after "terminationMessagePolicy" line.
resource:
requests:
cpu: 50m
limits:
cpu: 500m
Once you have finished editing press "esc" key and type :wq your changes will be saved and after few seconds if you do
kubectl get hpa
or
kubectl describe hpa applicationName
you should not see any error,
please note to change the cpu limit and request based on your application usage they are only example values here

Related

Kubernetes many restarts but pod keeps running

I'm seeing a lot of restarts on all the pods of every service that I have deployed on Kubernetes.
But when I see the logs in real time:
kubectl -n my-namespace logs -c my-pod -f my-pod-some-hash --tail=50
I see nothing, there's no restarts, there's no signal of failure. Readiness keep workings. So what it means all those restarts? Where or how can I get more info about those restarts?
Edit:
By viewing the pod details of the pod that has 158 on the picture above, I can see this, but I don't know what it means or if it's related to the restarts:
Replication via one sample example pod with CLI commands
If any pod restarts, in order to check the logs of the previous run user "--previous"
Step1:
Connect to cluster using below command
az aks get-credentials --resource-group <resourcegroupname> --name <Clustername>
Step2:
verify the pod logs
kubectl get pods
Step3:
Verify the restart pods logs using command
kubectl logs <PodName> --previous

Could not get apiVersions from Kubernetes: Unable to retrieve the complete list of server APIs

While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service
Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.

cAdvisor does not show all container's uptime (Prometheus+cAdvisor+Grafana)

Environment:
Linux (Redhat7)
Deployed docker (cAdvisor, Prometheus, Grafana)
cAdvisor collect the metrics > Pass to Prometheus > Display with Grafana
Apache reverse proxy is in the environment (Therefore no direct connection with specific ports)
Issue:
cAdvisor does not show all container's uptime
Grafana does not show Prometheus and Grafana container's uptime
Only displays cAdvisor container's uptime
What I have (Issue):
What I want to have:
Setting in Prometheus:
Command to run cAdvisor:
sudo docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8080:8080 --name=cadvisor --detach=true --privileged=true --volume=/cgroup:/cgroup:ro --network=docker8443 --ip=172.28.5.201 google/cadvisor:latest
Questions:
Is something missing in Prometheus target to show all container's uptime?
Is something missing in cAdvisor setup?
Is query for the graph incorrect? (I have tried following, but it shows "No data point")
I think the issue is about cadvisor.
You can try check the cadvisor metrics endpoint: http://172.28.5.201:8080/metrics
to make sure the cadvisor return the metrics samples:
example:
# HELP container_start_time_seconds Start time of the container since unix epoch in seconds.
# TYPE container_start_time_seconds gauge
container_start_time_seconds{id="/",image="",name=""} 1.525939343e+09
container_start_time_seconds{id="/docker",image="",name=""} 1.526006565e+09
container_start_time_seconds{id="/docker/d4b87911bd0842ee1d6969e6a05aa3d36a48a801184faf14e1b23169e056da92",image="busybox",name="trusting_bassi"}
On top of Yunlong's answer: add --volume=/sys:/sys:ro
You will also need to run following under shell:
$ mount -o remount,rw '/sys/fs/cgroup'
$ ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu
Otherwise your cAdvisor crashes immediately after container is up.

Cannot get kube-dns to start on Kubernetes

Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

Docker Registry Stays Pending After Deployment

I have installed OpenShift Enterprise as per the online guide (quick installation) but I'm stuck at deploying the registry.
[https://docs.openshift.com/enterprise/3.0/admin_guide/install/docker_registry.html#deploy-registry][1]
I create the registry
oadm registry --config=/etc/openshift/master/admin.kubeconfig \
--credentials=/etc/openshift/master/openshift-registry.kubeconfig \
--images='registry.access.redhat.com/openshift3/ose-${component}:${version}'
I check that it was configured
[justin#172 ~]$ oc get se docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.144.220 5000/TCP
But it never runs it stays pending
[justin#172 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-deploy 0/1 Pending 0 2h
I try to get some more info
[justin#172 ~]$ oc logs docker-registry-1-deploy
[justin#172 ~]$
but the logs command returns nothing
I had attempted an install with one node sharing the machine with the master.
My nodes looked like this:
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
Note: SchedulingDisabled
I ran this command:
oc describe pod docker-registry-1-deploy
And it gave the reason for not being deployed which was that there were no nodes to schedule a deployment on. Just to get things going quickly I performed the install again added a node on another VM.
Then
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
node1.mydomain.com kubernetes.io/hostname=node1.mydomain.com Ready
and I managed to successfully deploy the registry.

Resources