How to Simulate Eviction of nodes in Azure Kubernetes - azure

I have spot instance nodes in Azure Kubernetes Cluster. I want to simulate the eviction of a node so as to debug my code but not able to. All I could find in azure docs is we can simulate eviction for a single spot instance, using the following:
az vm simulate-eviction --resource-group test-eastus --name test-vm-26
However, I need to simulate the eviction of a spot node pool or a spot node in an AKS cluster.

For simulating evictions, there is no AKS REST API or Azure CLI command because evictions of the underlying infrastructure is not handled by AKS RP.
Only during creation of the AKS cluster the AKS RP can set eviction Policy on the underlying infrastructure by instructing the Azure Compute RP to do so.
Instead to simulate the eviction of node infrastructure, the customer can use az vmsss simulate-eviction command or the corresponding REST API.
az vmss simulate-eviction
az vmss simulate-eviction --instance-id
--name
--resource-group
[--subscription]
Reference Documents:
https://learn.microsoft.com/en-us/cli/azure/vmss?view=azure-cli-latest#az_vmss_simulate_eviction
https://learn.microsoft.com/en-us/rest/api/compute/virtual-machine-scale-set-vms/simulate-eviction
Use the following commands to get the name of the vmss with nodepool:
1.
az aks nodepool list -g $ClusterRG --cluster-name $ClusterName -o
table
Get the desired node pool name from the output
2.
CLUSTER_RESOURCE_GROUP=$(az aks show –resource-group YOUR_Resource_Group --name YOUR_AKS_Cluster --query
nodeResourceGroup -o tsv)
az vmss list -g $CLUSTER_RESOURCE_GROUP --query "[?tags.poolName == '<NODE_POOL_NAME>'].{VMSS_Name:name}" -o tsv
References:
https://louisshih.gitbooks.io/kubernetes/content/chapter1.html
https://ystatit.medium.com/azure-ssh-into-aks-nodes-471c07ad91ef
https://learn.microsoft.com/en-us/cli/azure/vmss?view=azure-cli-latest#az_vmss_list_instances
(you may create vmss if you dont have it configured. Refer :create a VMSS)

Related

Azure k8s HPA on custom metric

I am trying to achieve HPA on azure cluster. But it is not working as expected, as it is not scaling up the pods when it is clearly showing the metric value is double of the target value. As you can see in the below screenshot
Here is the HPA configuration for the same.
Might be your Metrics server is not automatically installed with AKS,The Metrics Server is used to provide resource utilization to Kubernetes, and is automatically deployed in AKS clusters versions 1.10 and higher.
To see the version of your AKS cluster, use the az aks show command, as shown in the following example:
az aks show --resource-group myResourceGroup --name myAKSCluster --query kubernetesVersion --output table
If your AKS cluster is less than 1.10, the Metrics Server is not automatically installed. You can install via url.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
To use the autoscaler, all containers in your pods and your pods must have CPU requests and limits defined.
For more information how to implement you can refer this Microsoft Document

Get aks nodepool / vmss subnet ID

I created an aks using az cli with minimal parameters and specified a node-count and auto scaling. This created a nodepool and VMSS etc. and an accompanying vnet and subnet automatically.
How do I find out the created vnet and subnet using az cli?
az aks nodepool list --cluster-name aks -g rg-aks
report vnetSubnetId and podSubnetId as null.
Using
az vmss list
does show the subnet but I haven't found any properties of the vmss linking it to the nodepool or aks cluster to enable finding it.
The autogenerated name is something like:
aks-nodepool1-15343534-vmss
Which I guess I could filter for along the lines of aks-nodepool1-*-vmss but that seems dodgy and flaky.
I have tested in my environment
The VNET is created along with the VMSS in a different resource group which starts with MC_
To get the subnet ID, you can use the below script:
$CLUSTER_RESOURCE_GROUP = az aks show --resource-group RGName --name AKSClusterName --query nodeResourceGroup -o tsv
$VMSS_NAME = az vmss list -g $CLUSTER_RESOURCE_GROUP --query "[0].name"
az vmss show -g $CLUSTER_RESOURCE_GROUP -n $VMSS_NAME --query virtualMachineProfile.networkProfile.networkInterfaceConfigurations[0].ipConfigurations[0].subnet.id

Error while applying Node Autoscaler for existing AKS cluster

I am trying to experiment with Preview feature available in Azure AKS as per documentation available we need to have the following requirements
Kubernetes version 1.12.4 or later
Azure CLI version 2.0.55 or later.
add aks preview :- az extension add --name aks-preview
register scale set provider:- az feature register --name VMSSPreview --namespace Microsoft.ContainerService
ensure that it is registerd
created AKS cluster with terraform
when i try to apply following command
az aks update --resource-group rg-euwest-d04-dvag-001 --name k8s-euwest-d04-dvag-dfs-dfsapp-001 --enable-cluster-autoscaler --min-count 3 --max-count 5
error
Operation failed with status: 'Bad Request'. Details: AgentPool
'' has set auto scaling as enabled but is not on Virtual
Machine Scale Sets, this is not allowed
As per my understanding, it is not supported at this time through terraform or from Azure Portal but only possible from Azure CLI
Your cluster needs to be created via Azure CLI to enable autoscaling. So if you have created on evia Azure portal, you need to delete it and create new one through Azure CLI. Ref: https://github.com/MicrosoftDocs/azure-docs/issues/29199

Stop all compute in AKS (Azure Managed Kubernetes)

I have created a managed Kubernetes cluster in Azure, but it's only for learning purposes and so I only want to pay for the compute whilst I'm actually using it.
Is there a easy way to gracefully shut down and start up the VMs, availablity sets and load balancers?
You could use the Azure CLI to stop the the entire cluster:
az aks stop --name myAksCluster --resource-group myResourceGroup
And start it again with
az aks start --name myAksCluster --resource-group myResourceGroup
Before this feature, it was possible to stop the virtual machines via Powershell:
az vm deallocate --ids $(az vm list -g MC_my_resourcegroup_westeurope --query "[].id" -o tsv)
Replace MC_my_resourcegroup_westeurope with the name of your resource group that contains the VM(s).
When you want to start the VM(s) again, run:
az vm start --ids $(az vm list -g MC_my_resourcegroup_westeurope --query "[].id" -o tsv)
Only VMs cost money out of all AKS resources (well, VHDs as well, but you cannot really stop those). So you only need to take care of those. Edit: Public Ips also cost money, but you cannot stop those either.
For my AKS cluster I just use portal and issue stop\deallocate command. And start those back when I need them (everything seems to be working fine).
You can use REST API\powershell\cli\various SKDs to achieve the same result in an automated fashion.
Above method (az vm <deallocate|start> --ids $(...)) no longer seems to work.
Solved by first listing the VM scale sets and use these to deallocate/start:
$ResourceGroup = "MyResourceGroup"
$ClusterName = "MyAKSCluster"
$Location = "westeurope"
$vmssResourceGroup="MC_${ResourceGroup}_${ClusterName}_${Location}"
# List all VM scale sets
$vmssNames=(az vmss list --resource-group $vmssResourceGroup --query "[].id" -o tsv | Split-Path -Leaf)
# Deallocate first instance for each VM scale set
$vmssNames | ForEach-Object { az vmss deallocate --resource-group $vmssResourceGroup --name $_ --instance-ids 0}
# Start first instance for each VM scale set
$vmssNames | ForEach-Object { az vmss start --resource-group $vmssResourceGroup --name $_ --instance-ids 0}
There is a new feature just added to AKS:
The AKS Stop/Start cluster feature now in public preview allows AKS
customers to completely pause an AKS cluster and pick up where they
left off later with a switch of a button, saving time and cost.
Previously, a customer had to take multiple steps to stop or start a
cluster, adding to operations time and wasting compute resources. The
stop/start feature keeps cluster configurations in place and customers
can pick up where they left off without reconfiguring the clusters.
https://learn.microsoft.com/en-gb/azure/aks/start-stop-cluster
In your AKS cluster, goto properties and find your Resource group name. search for the Resource group and when you select it, it will list your virtual machines. For each Virtual Machine, select the Operations > Auto-Shutdown option and turn it on. This will turn the VM off saving you money when you aren't developing! To turn them back on again, you will need to follow the advice on previous answers or the answer here

How to find --name of kubernetes cluster in Azure portal?

anyone knows how to find the name of a kubernetes cluster in azure portal? I did not create the cluster but I'm trying to connect to it and I don't know and can't find what to put in the --name tag
Well, the name of the kubernetes cluster is the name of the resource you see in the portal. As simple as that.
Just find the cluster in question and look how its called.
The az aks list command can list all Kubernetes clusters in given resource group. You can use the command to query names of your clusters:
> az aks list --resource-group MyAKSRG --query "[].name"
[
"MyCluster1",
"MyCluster2",
]

Resources