Kubernetes many restarts but pod keeps running - azure

I'm seeing a lot of restarts on all the pods of every service that I have deployed on Kubernetes.
But when I see the logs in real time:
kubectl -n my-namespace logs -c my-pod -f my-pod-some-hash --tail=50
I see nothing, there's no restarts, there's no signal of failure. Readiness keep workings. So what it means all those restarts? Where or how can I get more info about those restarts?
Edit:
By viewing the pod details of the pod that has 158 on the picture above, I can see this, but I don't know what it means or if it's related to the restarts:

Replication via one sample example pod with CLI commands
If any pod restarts, in order to check the logs of the previous run user "--previous"
Step1:
Connect to cluster using below command
az aks get-credentials --resource-group <resourcegroupname> --name <Clustername>
Step2:
verify the pod logs
kubectl get pods
Step3:
Verify the restart pods logs using command
kubectl logs <PodName> --previous

Related

Where are kubelet logs in AKS stored?

I would like to view kubelet logs going back in time in Azure AKS. All I could find from Azure docs was how to ssh into the nodes and list the logs (https://learn.microsoft.com/en-us/azure/aks/kubelet-logs) but I feel like this has to be aggregated in Log Analytics somewhere right ?
However I wasn't able to find anything in Log Analytics for Kubernetes. Am I missing something ?
We have omsagent daemonset installed and Microsoft.OperationalInsights/workspaces is enabled
Thanks :)
I tried to reproduce this issue in my environment and got below results:
I created resource group and VM by setting up the Subscription
az account set --subscription "subscription_name"
az group create --location westus resourcegroup_name
created the AKS Cluster with the parameter to enable the AKS Container Insights
The following Example will creates the Cluster with name AKSRG.
az aks create -g myResourceGroup -n resourceGroup_name --enable-managed-identity --node-count 1 --enable-addons monitoring --enable-msi-auth-for-monitoring --generate-ssh-keys
Here I have configured the kubectl to connect the kubernetes cluster with the get-credentials command
I have created the interactive shell connection to the node
using kubectl debug
kubectl debug node/pod_name -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0
I have used the below Command in after hash(#) tag
journalctl -u kubelet -o cat
To get the logs check the nodes and pods
We can use the below command to check the KUBE LOGS
kubectl logs pod_name
Reference:
View kubelet logs in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Docs

kubectl get nodes not returning any result

Issue: kubectl get nodes, returning an empty result
Cloud provider: Azure
K8s cluster built from scratch with VMSS instances/VMs
azureuser#khway-vms000000:~$ kubectl get no
No resources found in default namespace.
I am a bit stuck and do not know what else I could check to get to the bottom of this issue.
Thanks in advance!
it seems like you logged on to one of the nodes of the managed VMSS.
Instead do (e.g. from your dev machine):
az aks get-credentials --name MyManagedCluster --resource-group MyResourceGroup
https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az_aks_get_credentials
then you can run kubectl

Could not get apiVersions from Kubernetes: Unable to retrieve the complete list of server APIs

While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service
Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.

Kubernetes: Unable to access to kubernetes dashboard

I add bitnami.bitnami/rabbitmq into my acr.
In VSO release pipeline, I add 2 tasks kubectl run & expose, looks like below.
kubectl run rabbitmq --image xxxxxx.azurecr.io/bitnami.bitnami/rabbitmq:3.7.7 --port=15672
kubectl expose deployment rabbitmq --type=LoadBalancer --port=15672 --target-port=15672
After save and release it, everything is successful, but now I can't proxy into my dashboard using
az aks browse -g {groupname} -n {k8sname}
When I remove the above 2 task in my release, I able to connect to my dashboard.
Can someone explain to me what going wrong, how to troubleshoot it.
You can check if the pods work well in your Azure Kubernetes Cluster. If everything is OK. Then you should make sure that if your current OS has the browser. The command az aks browse -g {groupname} -n {k8sname} need a browser to open the dashboard where it executes.
You can open the k8s dashboard in another OS with the command you posted after you get the credential with the command az aks get-credentials -g {groupname} -n {k8sname}. Of curse, you need to execute az login first.
If things above all are OK, you could try this link.

AKS using Kubernetes : not able to connect to cluster nodes once logged in to the cluster through azure-cli on Ubuntu

I am getting issues when trying to getting the information about the nodes created using AKS(Azure Connected Service) for Kubernetes after the execution of creating the clusters and getting the credentials.
I am using the azure-cli on ubuntu linux machine.
Followed the Url for creation of clusters: https://learn.microsoft.com/en-us/azure/aks/kubernetes-walkthrough
I get the following error when using the command kubectl get nodes
after execution of connecting to cluster using
az aks get-credentials --resource-group <resource_group_name> --name <cluster_name>
Error:
kubectl get nodes
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)
I do get the same error when i use :
kubectl get pods -n kube-system -o=wide
When i connect back as another user by the following commands i.e.,
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
I will be able to retrieve the nodes i.e..,
kubectl get nodes
NAME STATUS ROLES AGE VERSION
<host-name> Ready master 20m v1.10.0
~$ kubectl get pods -n kube-system -o=wide
NAME READY STATUS RESTARTS AGE
etcd-actaz-prod-nb1 1/1 Running 0
kube-apiserver-actaz-prod-nb1 1/1 Running 0
kube-controller-manager-actaz-prod-nb1 1/1 Running 0
kube-dns-86f4d74b45-4qshc 3/3 Running 0
kube-flannel-ds-bld76 1/1 Running 0
kube-proxy-5s65r 1/1 Running 0
kube-scheduler-actaz-prod-nb1 1/1 Running 0
But this is actually overwriting newly clustered information from file $HOME/.kube/config
Am i missing something when we connect to AKS-cluster get-credentials command-let that's leading me to the error
*Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)*
After you
az aks get-credentials -n cluster-name -g resource-group
If should have merged to your local configuration:
/home/user-name/.kube/config
Can you check your config
kubectl config view
And check if it is pointing to the right cluster.
Assuming you have chosen default configuartion while deploying AKS. So You need to create SSH key pair to login to AKS Node.
Push above created public key to AKS node using "az vm user update" {plz take help to know what all switch you need to pass. It quite simple)
To create an SSH connection to an AKS node, you run a helper pod in your AKS cluster. This helper pod provides you with SSH access into the cluster and then additional SSH node access.
To create and use this helper pod, complete the following steps:
- Run a debian (or any other container like centos7 etc) container image and attach a terminal session to it. This container can be used to create an SSH session with any node in the AKS cluster:
kubectl run -it --rm aks-ssh --image=debian
The base Debian image doesn't include SSH components.
apt-get update && apt-get install openssh-client -y
Copy private key (the one you created in the begining to pod) using kubelet cmd. kubelet toolkit must be present on your machine from where you created ssh pair.
kubectl cp :/
Now you will see private key file on your container location, change the private key permission to 600 and now able to ssh your AKS node
Hope this helps.

Resources