I have an AKS cluster. I install ingress with the command
helm upgrade --install --create-namespace ingress-nginx ingress-nginx/ingress-nginx --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux --set controller.replicaCount=2 --set controller.service.loadBalancerIP=$IngressIP --namespace nginx-ingress --atomic
According to schedule I create a cluster, run tests and delete. I deploy the application using Helm charts. But since yesterday it stopped working. Although before half a year it worked without interruptions. For some reason I got errors in the nginx logs
Service "test-apis/test-load-api" does not have any active Endpoint.
All labels are present. And I can't understand what changed a day ago in ingress or AKS , what stopped working. Could you please help me. Thank you.
The error might be due to several reason. You can try with below solution.
in my application definition you might using name as my selector, Whereas in my service I was using app.After updating my service to use app.
Another situation when it may happen is when ingress class of the ingress controller does not match ingress class in the ingress resource manifest used for your services.
In our case, this was caused by having the ingress resource definition on a different namespace then the services.
You can refer this Stack thread to troubleshoot your issue.
Also this might be a bug with newer version of nginx-ingress-controller. You can also go through the troubleshooting steps given in the github discussion might works for you if still not please report a bug in github for the same.
Related
To bring some context, I was using AKS and have deployed a APIM solution on a cluster, which was working fine for a month, but some days ago, I went back on my cluster and the CoreDNS & the CoreDNS autoscaler pods are on a CrashBackLoop.
Here are the descriptions of the Pod:
I've tried to scale the deployment
Restarted the deployment
Deleted the pods, updated the deployment image
But none of the actions I did worked so far, if anyone have any suggestions
Here are the deployments files if it can help:
I partially resolved my problem by restarting my cluster on AKS.
I have followed this tutorial microsoft_website to pull images from an azure container. My yaml successfully creates a pod job, which can pull the image, BUT only when it runs on the agentpool node in my cluster.
For example, adding nodeName: aks-agentpool-33515997-vmss000000 to the yamlworks fine, but specifying a different node name, e.g. nodeName: aks-cpu1-33515997-vmss000000, the pod fails. The error message I get with describe pods is Failed to pull image and then kubelet Error: ErrImagePull.
What I'm missing?
Create secret:
kubectl create secret docker-registry <secret-name> \
--docker-server=<container-registry-name>.azurecr.io \
--docker-username=<service-principal-ID> \
--docker-password=<service-principal-password>
As #user1571823 told solution to the problem is deleting the old image from the acr and creating/pushing a new one.
The problem was related to some sort of corruption in the image saved in the azure container registry (acr). The reason why one agent pool could pulled the image was actually because the image already existed in the VM.
Henceforth as #andov said it is good option to open an incident case to Azure support for AKS from your subscription, where AKS is deployed. The support team has full access to the AKS service backend and they can tell exactly what was causing your problem.
Four things to check:
Is it a subscription issue? Are the nodes in different subscriptions?
Is it a rights issue? Does the service principle of the node have rights to pull the image.
Is it a network issue? Are the nodes on different subnets?
Is there something with the image size or configuration, that means that it cannot run on the other cluster.
Edit
New-AzAksNodePool has a parameter -DefaultProfile
It can be AzContext, AzureRmContext, AzureCredential
If this is different between your nodes it would explain the error
I'm using Azure DevOps pipelines to update our deployment in K8s cluster in Azure. It used to be working fine until yesterday, as for some reason the Pods in the cluster remain in their previous state. I can see that the image was successfully updated in ACR (container registry) and has a label 'latest'. However, the release pipeline doesn't seem to be doing anything useful. I use 'set' command in the task to update the Pod (it is well described in the Kubernetes docs and cheatsheet here)
This is the command sample extracted from the log:
kubectl set image deployments/identityserver identityserver='myacr'/identityserver:latest -n identityserver-dev
As it indicates, I'm getting the latest image from ACR and trying to roll an update. It executes well (both in cmd and Azure DevOps). no errors, although, the Pod remains unaffected. Have I missed something in the docs? Should I raise the ticket with Microsoft?
why do you have ' in image name? also, latest wont work if you already have latest on the image, you need to be specific https://github.com/kubernetes/kubernetes/issues/33664.
This is not an Azure issue
Please check here answers to similar question on SO, on why it is not a good option to use :latest tag in your Deployment spec, along with workarounds provided.
We have a K8s cluster on Azure (aks). On this cluster, we added a loadbalancer on the setup which installed an nginx-ingress controller.
Looking at the deployments:
addon-http-application-routing-default-http-backend 1
addon-http-application-routing-external-dns 1
addon-http-application-routing-nginx-ingress-controller 1
I see there is 1 of each running. Now I find very little information if these should be scaled (there is 1 pod each) and if they should, how?
I've tried running
kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
Which temporarily scales it to 3 pods, but after a few moments, it is downscaled again.
So again, are these supposed to be scaled? Why? How?
EDIT
For those that missed it like I did: The AKS addon-http-application is not ready for production, it is there to quickly set you up and start experimenting. Which is why I wasn't able to scale it properly.
Read more
That's generally the way how you do it:
$ kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
However, I suspect you have an HPA configured which will scale up/down depending on the load or some metrics and has the minReplicas spec set to 1. You can check with:
$ kubectl get hpa
$ kubectl describe hpa <hpa-name>
If that's the case you can scale up by just patching the HPA:
$ kubectl patch hpa <hpa-name> -p '{"spec": {"minReplicas": 3}}'
or edit it manually:
$ kubectl edit hpa <hpa-name>
More information on HPAs here.
And yes, the ingress controllers are supposed to be scaled up and down depending on the load.
In AKS, being a managed service, this "system" workloads like kube-dns and the ingress controller, are managed by the service itself and they cannot be modified by the user (because they're labeled with addonmanager.kubernetes.io/mode: Reconcile, which forces the current configuration to reflect what's on disk at /etc/kubernetes/addons on the masters).
My question is 'probably' specific to Azure.
How can I review the Kube-Proxy logs?
After SSH'ing into an Azure AKS Node (done) I can use the following to view the Kubelet logs:
journalctl -u kubelet -o cat
Azure docs on the Azure Kubelet logs can be found here:
https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
I have reviewed the following Kubernetes resource regarding logs but Kube-Proxy logs on Azure do not appear in any of the suggested locations on the AKS node:
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#looking-at-logs
This is part of a trouble shooting effort related to a Kubernetes nGinx Ingress temporarily returning a '504 Gateway Time-out' when a service has not been accessed / going idle for some period of time (perhaps 5 to 10 minutes) but then becoming accessible on the next attempt(s).
On AKS, kube-proxy runs as a DaemonSet in the kube-system namespace
You can list the kube-proxy pods + node information with:
kubectl get pods -l component=kube-proxy -n kube-system -o wide
And then you can review the logs by running:
kubectl logs kube-proxy-<suffix> -n kube-system
On the same note as Acanthamoeba's answer the logs for the Kube-Proxy pod can also be accessed via the browse UI interface that can be launched via:
az aks browse --resource-group <ClusterResourceGroup> --name <ClusterName>
The above should pop open a new browser window pointed at the following URL: http://127.0.0.1:8001/#!/overview?namespace=default
Switch to Kube-System Namespace
Once the browser window is open, change to the Kube-System namespace, by selecting that option from the drop down on the left side:
Kube-System namespace is all the way at the bottom of the drop down... and probably requires scrolling.
Navigate to Pods
From there click "pods" (also on the left hand side menu, below the namespaces drop down) and then click the Kube-Proxy pod:
View Kube-Proxy Logs
Click to view logs of your Azure AKS based Kube-Proxy pod, logs button in is in the top right hand menu to the left of "Delete' and 'Edit' just below create:
Other Azure AKS Trouble Shooting Resources
Since you are trying to view the Kube-Proxy logs you are probably trouble shooting some networking issues or something along those lines. Here are some other resources that I used during my trouble shooting tour of my Azure AKS Cluster:
View Kubelet Logs on Azure AKS: https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
nGinx Ingress Troubleshooting: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md
SSH into an Azure AKS Cluster VM: https://learn.microsoft.com/en-us/azure/aks/aks-ssh