Azure k8s HPA on custom metric - azure

I am trying to achieve HPA on azure cluster. But it is not working as expected, as it is not scaling up the pods when it is clearly showing the metric value is double of the target value. As you can see in the below screenshot
Here is the HPA configuration for the same.

Might be your Metrics server is not automatically installed with AKS,The Metrics Server is used to provide resource utilization to Kubernetes, and is automatically deployed in AKS clusters versions 1.10 and higher.
To see the version of your AKS cluster, use the az aks show command, as shown in the following example:
az aks show --resource-group myResourceGroup --name myAKSCluster --query kubernetesVersion --output table
If your AKS cluster is less than 1.10, the Metrics Server is not automatically installed. You can install via url.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
To use the autoscaler, all containers in your pods and your pods must have CPU requests and limits defined.
For more information how to implement you can refer this Microsoft Document

Related

How to Simulate Eviction of nodes in Azure Kubernetes

I have spot instance nodes in Azure Kubernetes Cluster. I want to simulate the eviction of a node so as to debug my code but not able to. All I could find in azure docs is we can simulate eviction for a single spot instance, using the following:
az vm simulate-eviction --resource-group test-eastus --name test-vm-26
However, I need to simulate the eviction of a spot node pool or a spot node in an AKS cluster.
For simulating evictions, there is no AKS REST API or Azure CLI command because evictions of the underlying infrastructure is not handled by AKS RP.
Only during creation of the AKS cluster the AKS RP can set eviction Policy on the underlying infrastructure by instructing the Azure Compute RP to do so.
Instead to simulate the eviction of node infrastructure, the customer can use az vmsss simulate-eviction command or the corresponding REST API.
az vmss simulate-eviction
az vmss simulate-eviction --instance-id
--name
--resource-group
[--subscription]
Reference Documents:
https://learn.microsoft.com/en-us/cli/azure/vmss?view=azure-cli-latest#az_vmss_simulate_eviction
https://learn.microsoft.com/en-us/rest/api/compute/virtual-machine-scale-set-vms/simulate-eviction
Use the following commands to get the name of the vmss with nodepool:
1.
az aks nodepool list -g $ClusterRG --cluster-name $ClusterName -o
table
Get the desired node pool name from the output
2.
CLUSTER_RESOURCE_GROUP=$(az aks show –resource-group YOUR_Resource_Group --name YOUR_AKS_Cluster --query
nodeResourceGroup -o tsv)
az vmss list -g $CLUSTER_RESOURCE_GROUP --query "[?tags.poolName == '<NODE_POOL_NAME>'].{VMSS_Name:name}" -o tsv
References:
https://louisshih.gitbooks.io/kubernetes/content/chapter1.html
https://ystatit.medium.com/azure-ssh-into-aks-nodes-471c07ad91ef
https://learn.microsoft.com/en-us/cli/azure/vmss?view=azure-cli-latest#az_vmss_list_instances
(you may create vmss if you dont have it configured. Refer :create a VMSS)

Azure Ui shows wrong amount of nodes after deleting nodes with kubectl

I removed two nodes of my Kubernetes cluster manually first calling "kubectl drain " and then "kubectl delete " for each. While the cluster seems to work without a problem the Azure UI shows me exactly two nodes more than I see when I use "kubectl get nodes". So when I configure Kubernetes to have 9 nodes in the Azure UI only 7 nodes are there if I take a look with kubectl. Scaling up or down does not solve the problem as Azure is always off by two nodes.
How can I solve this problem? Is there a way I can notify Azure that a node has been deleted?
If you want to solve the issue, you need to have a deeper understanding of the k8s cluster.
When you use the command kubectl delete to remove the node from the agent pool, it means the agent pool won't have control over that node. But it does not mean you really delete the machine. So you can find the number of the machine does not change in the Azure portal. This is the truth you find.
How can I solve this problem? Is there a way I can notify Azure that a
node has been deleted?
Here are two questions. For the first, you can express it in this way:
How to restore the node that deleted before to the agent pool?
It's simple to solve. You only need to restart the kubelet service in that node. For example, you use the VMSS as the agent pool of the AKS and that node instance id is 4. Then you can do it like this:
az vmss run-command invoke --resource-group group_name --name vmss_name --instance-id 4 --command-id RunShellScript --scripts "service kubelet restart"
For the second one, you can only use the Azure command to let Azure know the update. Here it means you can scale the agent pool, for example, using the Azure CLI command:
az aks nodepool --resource-group group_name --name agentpool_name --cluster-name cluster_name --node-count 2

Changing --network-plugin in Azure Kubernetes Service for existing cluster

I'm trying to implement Azure Key Vault such that API keys, credentials and other Kubernetes secrets are read into production and staging environments. Ultimately, I'd like to try to expand that to local development environments so devs don't have to mess with it at all. It is just read in when they start their cluster.
Anyway, I'm following this to enable Pod Identities:
https://learn.microsoft.com/en-us/azure/aks/use-azure-ad-pod-identity
When I get to this step, I'm modifying the:
az aks create -g myResourceGroup -n myAKSCluster --enable-managed-identity --enable-pod-identity --network-plugin azure
To the following because I'm trying to change an existing cluster:
az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity --enable-pod-identity --network-plugin azure
This doesn't work and figured out I need to run each flag one at a time, so I had to run --enable-managed-identity first since --enable-pod-identity depends on it.
At any rate, when I get to the --enable-pod-identity I get the following error:
Operation failed with status: 'Bad Request'. Details: Network plugin kubenet is not supported to use with PodIdentity addon.
So I try the --network-plugin azure and get:
az: error: unrecognized arguments: --network-plugin azure
Apparently this is flag is not available with update.
Poking around in the Azure portal for the AKS resource, I do see kubenet listed, but I'm not able to change it.
So, the question: Is it possible to change the Network Plugin on existing cluster or do I need to start a new?
EDIT: Looks like others are having similar issues on existing clusters:
https://github.com/Azure/AKS/issues/2094
Is it possible to change the Network Plugin on the existing cluster or do
I need to start a new?
It's impossible to change the network plugin on the existing cluster, so you need to create a new cluster and set the network plugin with azure at the creation time. You can find there is no parameter --network-plugin in the CLI command az aks update even if you install the aks-preview extension. It means it does not support changing the network plugin of the existing cluster.

Error while applying Node Autoscaler for existing AKS cluster

I am trying to experiment with Preview feature available in Azure AKS as per documentation available we need to have the following requirements
Kubernetes version 1.12.4 or later
Azure CLI version 2.0.55 or later.
add aks preview :- az extension add --name aks-preview
register scale set provider:- az feature register --name VMSSPreview --namespace Microsoft.ContainerService
ensure that it is registerd
created AKS cluster with terraform
when i try to apply following command
az aks update --resource-group rg-euwest-d04-dvag-001 --name k8s-euwest-d04-dvag-dfs-dfsapp-001 --enable-cluster-autoscaler --min-count 3 --max-count 5
error
Operation failed with status: 'Bad Request'. Details: AgentPool
'' has set auto scaling as enabled but is not on Virtual
Machine Scale Sets, this is not allowed
As per my understanding, it is not supported at this time through terraform or from Azure Portal but only possible from Azure CLI
Your cluster needs to be created via Azure CLI to enable autoscaling. So if you have created on evia Azure portal, you need to delete it and create new one through Azure CLI. Ref: https://github.com/MicrosoftDocs/azure-docs/issues/29199

Azure AKS - I disabled addon-http-application-routing but pods, deployments, services and stuff is till in the cluster

I have a brand new kubernetes cluster on AKS.
I disabled the addons with the azure-cli as described in documentation:
az aks disable-addons --addons http_application_routing --name myAKSCluster --resource-group myResourceGroup --no-wait
The portal shows no domain associated to the cluster.
But with kubelets I still see all the pods and deployments related to the addon.
I tried to delete deployments and stuff with kubectl, but the deployments recreates themself.
Have anybody experienced the same?
Thanks!
there is a known issue with 1.12.6
Unable to disable addons on deployed clusters
AKS Engineering is diagnosing an issue around existing/deployed clusters being unable to disable Kubernetes addons within the addon-manager. When we have identified and repaired the issue we will roll out the required hot fix to all regions.
This impacts all addons including monitoring, http application routing, etc.
https://github.com/Azure/AKS/releases

Resources