Azure AKS 'Kube-Proxy' Kubernetes Node Log file location? - azure

My question is 'probably' specific to Azure.
How can I review the Kube-Proxy logs?
After SSH'ing into an Azure AKS Node (done) I can use the following to view the Kubelet logs:
journalctl -u kubelet -o cat
Azure docs on the Azure Kubelet logs can be found here:
https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
I have reviewed the following Kubernetes resource regarding logs but Kube-Proxy logs on Azure do not appear in any of the suggested locations on the AKS node:
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#looking-at-logs
This is part of a trouble shooting effort related to a Kubernetes nGinx Ingress temporarily returning a '504 Gateway Time-out' when a service has not been accessed / going idle for some period of time (perhaps 5 to 10 minutes) but then becoming accessible on the next attempt(s).

On AKS, kube-proxy runs as a DaemonSet in the kube-system namespace
You can list the kube-proxy pods + node information with:
kubectl get pods -l component=kube-proxy -n kube-system -o wide
And then you can review the logs by running:
kubectl logs kube-proxy-<suffix> -n kube-system

On the same note as Acanthamoeba's answer the logs for the Kube-Proxy pod can also be accessed via the browse UI interface that can be launched via:
az aks browse --resource-group <ClusterResourceGroup> --name <ClusterName>
The above should pop open a new browser window pointed at the following URL: http://127.0.0.1:8001/#!/overview?namespace=default
Switch to Kube-System Namespace
Once the browser window is open, change to the Kube-System namespace, by selecting that option from the drop down on the left side:
Kube-System namespace is all the way at the bottom of the drop down... and probably requires scrolling.
Navigate to Pods
From there click "pods" (also on the left hand side menu, below the namespaces drop down) and then click the Kube-Proxy pod:
View Kube-Proxy Logs
Click to view logs of your Azure AKS based Kube-Proxy pod, logs button in is in the top right hand menu to the left of "Delete' and 'Edit' just below create:
Other Azure AKS Trouble Shooting Resources
Since you are trying to view the Kube-Proxy logs you are probably trouble shooting some networking issues or something along those lines. Here are some other resources that I used during my trouble shooting tour of my Azure AKS Cluster:
View Kubelet Logs on Azure AKS: https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
nGinx Ingress Troubleshooting: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md
SSH into an Azure AKS Cluster VM: https://learn.microsoft.com/en-us/azure/aks/aks-ssh

Related

How can i determine if AKS uses the new Azure Monitor Agent?

I have onboarded my AKS cluster to Azure Monitor. I.e. assigning a Log Analytics Workspace.
This onboarding process has created pods into my cluster having the name omsagent-xxxx. The pod uses image: mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08052021
From the Log Analytics Workspace i can query for logs and metrics produced from within my cluster. I assume the logs and metrics gets sent there by the newly created omsagent pods.
According to https://azure.microsoft.com/en-us/updates/were-retiring-the-log-analytics-agent-in-azure-monitor-on-31-august-2024/, the "Log Analytics Agent" is going to be replaced by the new "Azure Monitor Agent".
According to https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-manage-agent#how-to-upgrade-the-container-insights-agent, the agents are to be upgraded automatically:
"When a new version of the agent is released, the agent is automatically upgraded on your managed Kubernetes clusters hosted on Azure Kubernetes Service (AKS) ..."
How can i determine whether my cluster is using Log Analytics Agent, or the new Azure Monitor Agent?
At this point, We have not found single command line, but we may use Queries to trace the same in later point.
As a workaround you can use i.e., based on limitation, and access you will predict which monitoring tool you are using.
Example: Limitation for Azure Monitoring agent tool.
No support yet for networking scenarios involving private links.
No support yet collecting custom logs (files) or IIS log files.
No support yet for Event Hubs and Storage accounts as destinations
In the documentation there is also shown list of OS’s supported in Azure monitoring. You can try using that and predict that one.
Reference: https://learn.microsoft.com/en-us/azure/azure-monitor/agents/agents-overview
Check the latest available versión of omsagent ciprod10132021 in the AKS Release notes.
Using the next commands get the current running oms versión:
→ kubectl get deployments -n kube-system -o wide
→ kubectl get ds omsagent --namespace=kube-system -o wide
→ kubectl get pods -n kube-system --selector=component=oms-agent -o jsonpath="{.items[*].spec.containers[*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq –c
→ kubectl get pods -n kube-system --selector=component=oms-agent-win -o jsonpath="{.items[*].spec.containers[*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c

`kubectl delete service` gets stuck in 'Terminating' state

I'm trying to delete a service I wrote & deployed to Azure Kubernetes Service (along with required Dask components that accompany it), and when I run kubectl delete -f my_manifest.yml, my service gets stuck in the Terminating state. The console tells me that it was deleted, but the command hangs:
> kubectl delete -f my-manifest.yaml
service "dask-scheduler" deleted
deployment.apps "dask-scheduler" deleted
deployment.apps "dask-worker" deleted
service "my-service" deleted
deployment.apps "my-deployment" deleted
I have to Ctrl+C this command. When I check my services, Dask has been successfully deleted, but my custom service hasn't. If I try to manually delete it, it similarly hangs/fails:
> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP x.x.x.x <none> 443/TCP 18h
my-service LoadBalancer x.x.x.x x.x.x.x 80:30786/TCP,443:31934/TCP 18h
> kubectl delete service my-service
service "my-service" deleted
This question says to delete the pods first, but all my pods are deleted (kubectl get pods returns nothing). There's also this closed K8s issue that says --wait=false might fix foreground cascade deletion, but this doesn't work and doesn't seem to be the issue here anyway (as the pods themselves have already been deleted).
I assume that I can completely wipe out my AKS cluster and re-create, but that's an option of last resort here. I don't know whether it's relevant, but my service is using the azure-load-balancer-internal: "true" annotation for the service, and I have a webapp deployed to my VNet that uses this service.
Is there any other way to force shutdown this service?
Thanks to #4c74356b41's suggestion of looking at kubectl describe service my-service (which I hadn't considered for some reason), I saw this warning:
Code="LinkedAuthorizationFailed" Message="The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/loadBalancers/write' on scope '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.Network/loadBalancers/kubernetes-internal'; however, it does not have permission to perform action 'Microsoft.Network/virtualNetworks/subnets/join/action' on the linked scope(s) '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.Network/virtualNetworks/<vnet>/subnets/<subnet>' or the linked scope(s) are invalid.
(The client and object id GUIDs are the same value.)
This indicated that it's not exactly a Kubernetes issue, but moreso permissions within the Azure ecosystem. I looked through the portal and didn't find that GUID in any of my users, groups, or apps, so I'm not sure what it's referring to. However, I granted the Owner role to this client id, and after a few minutes, the service deleted.
az role assignment create `
--role Owner `
--assignee xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
I had a similar issue with a svc not connecting to the pod cause the pod was already deleted:
HTTPConnectionPool(host='scv-name-not-shown-because-prod.namespace-prod', port=7999): Max retries exceeded with url:
my-url-not-shown-because-prod (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7faee4b112b0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
I was able to solve this with the patch command:
kubectl patch service scv-name-not-shown-because-prod -n namespace-prod -p '{"metadata":{"finalizers":null}}'
I think the service went into some illegal state and was not able to ricover

Scale Azure nginx ingress controller

We have a K8s cluster on Azure (aks). On this cluster, we added a loadbalancer on the setup which installed an nginx-ingress controller.
Looking at the deployments:
addon-http-application-routing-default-http-backend 1
addon-http-application-routing-external-dns 1
addon-http-application-routing-nginx-ingress-controller 1
I see there is 1 of each running. Now I find very little information if these should be scaled (there is 1 pod each) and if they should, how?
I've tried running
kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
Which temporarily scales it to 3 pods, but after a few moments, it is downscaled again.
So again, are these supposed to be scaled? Why? How?
EDIT
For those that missed it like I did: The AKS addon-http-application is not ready for production, it is there to quickly set you up and start experimenting. Which is why I wasn't able to scale it properly.
Read more
That's generally the way how you do it:
$ kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
However, I suspect you have an HPA configured which will scale up/down depending on the load or some metrics and has the minReplicas spec set to 1. You can check with:
$ kubectl get hpa
$ kubectl describe hpa <hpa-name>
If that's the case you can scale up by just patching the HPA:
$ kubectl patch hpa <hpa-name> -p '{"spec": {"minReplicas": 3}}'
or edit it manually:
$ kubectl edit hpa <hpa-name>
More information on HPAs here.
And yes, the ingress controllers are supposed to be scaled up and down depending on the load.
In AKS, being a managed service, this "system" workloads like kube-dns and the ingress controller, are managed by the service itself and they cannot be modified by the user (because they're labeled with addonmanager.kubernetes.io/mode: Reconcile, which forces the current configuration to reflect what's on disk at /etc/kubernetes/addons on the masters).

Change Kubernetes pods logging

I Have an AKS (Azure Container Service) configured, up and running, with kubernetes installed.
Deploying containers on using [kubectl proxy] and the GUI of Kubernetes provided.
I am trying to increase the log level of the pods in order to get more information for better debugging.
I read a lot about kubectl config set
and the log level --v=0 [0-10]
but not being able to change the log level. it seems the documentation
can someone point me out in the right direction?
The --v flag is an argument to kubectl and specifies the verbosity of the kubectl output. It has nothing to do with the log levels of the application running inside your Pods.
To get the logs from your Pods, you can run kubectl logs <pod>, or read /var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/ on the Kubernetes node.
To increase the log level of your application, your application has to support it. And like #Jose Armesto said above, this is usually configured using an environment variable.

Unable to connect AKS cluster: connection time out

I've created an AKS cluster in the UK region in Azure.
Currently, I can no longer access my AKS cluster. Connecting to the public IPs fails; all connections time out.
Furthermore, I can't run the kubectl command either:
fcarlier#ubuntu:~$ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
Is there a known issue with AKS in that region or is it something on my side?
Is there a known issue with AKS in that region or is it something on
my side?
Sorry to give you a bad experience.
For now, Azure AKS still in preview, please try to recreate it, ukwest works fine now.
Here is a similar case about you, please refer to it.
I just successfully created a single node AKS cluster on UK West with no issues. Can you please retest? For now, I would avoid provisioning on West US 2 until the threshold issues are fixed. I'm aware the AKS team is actively engaged to restore service on West US. Sorry for the inconvenience. Below is the sample cmd to create in UK if you need the reference. Hope this helps.
Create Resource Group (UK West):az group create --name myResourceGroupUK --location ukwest
Create AKS cluster in (UK west):az aks create --resource-group myResourceGroupUK --name myK8sClusterUK --agent-count 1 --generate-ssh-keys
I just finished a big post over here on this topic (which is not as straight forward as a single solution / workaround): 'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure Kubernetes server? (AKS)
That being said, the solution to this one for me was to scale the nodes up — and then back down — for my impacted Cluster from the Azure Kubernetes service blade web console.
Workaround / Potential Solution
Log into the Azure Console — Kubernetes Service blade.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Also added this solution to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure Kubernetes server? (AKS)

Resources