I have an AKS-Cluster in Azure. When I scale down the cluster with az aks scale for example I want to control which Node should be removed.
I cannot find a documentation that describes how Azure decides.
Will Azure prefer removing nodes that are already cordoned or drained?
Deleting it from the Azure Portal is not an option, because I want an application to communicate with Azure via CLI or API.
First of all, it's impossible to control which node to remove when you scale down the AKS cluster. Then I will show you how do the nodes change when you scale the AKS cluster.
When you do not use the VMSS as the agent pool, it means the AKS cluster use the individual VMs as the nodes. If you scale up, then it will increase the nodes with the index after the existing nodes. For example, the cluster has one node with the index 0 and then it will use the index 1 if you scale up one node. And if you scale down, it will remove the nodes with the biggest index in the sequence at first.
When you use the VMSS as the agent pool, it will comply with the scale rules of VMSS. And you can see the VMSS scale rules in the changes of VMSS scale up and down.
Also, you can take a look at the Azure CLI command az aks scale that scale the AKS cluster and the REST API.
Related
In Azure K8s service, you can scale up the node pool but only we define the min and max nodes.
When i check the node pool scale set scale settings, i found it set to manual.
So i assume that the Node Pool auto scale does't rely on the belonging scale set, but i wonder, can we just rely on the scale set auto scale with the several metric roles instead of the very limited Node Pool scale settings ?
The AKS autoscaling works slightly different as the VMSS autoscaling.
From the official docs:
The cluster autoscaler watches for pods that can't be scheduled on
nodes because of resource constraints. The cluster then automatically
increases the number of nodes.
The AKS autoscaler is tightly coupled with the control plane and the kube-scheduler, so it takes resource requests and limits into account that is far the better scaling method as the VMSS autoscaler (for k8s workload) that is anyway not supported for AKS:
The cluster autoscaler is a Kubernetes component. Although the AKS
cluster uses a virtual machine scale set for the nodes, don't manually
enable or edit settings for scale set autoscale in the Azure portal or
using the Azure CLI.
I am unable to scale vertical my AKS cluster.
Currently, I have 3 nodes in my cluster with 2 core and 8 ram, I am trying to upgrade it with 16 code and 64 RAM, how do I do it?
I tried scaling the VM scale set, on Azure portal it shows it is scaled but when I do "kubectl get nodes -o wide" it still shows the old version.
Any leads will be helpful.
Thanks,
Abhishek
Vertical scaling or changing the node pool VM size is not supported. You need to create a new node pool and schedule your pods on the new nodes.
https://github.com/Azure/AKS/issues/1556#issuecomment-615390245
this UX issues is due to how the VMSS is managed by AKS. Since AKS is
a managed service, we don't support operations done outside of the AKS
API to the infrastructure resources. In this example you are using the
VMSS portal to resize, which uses VMSS APIs to resize the resource and
as a result has unexpected changes.
AKS nodepools don't support resize in place, so the supported way to
do this is to create a new nodepool with a new target and delete the
previous one. This needs to be done through the AKS portal UX. This
maintains the goal state of the AKS node pool, as at the moment the
portal is showing the VMSize AKS knows you have because that is what
was originally requested.
I am trying to figure out what is the trigger to scale AKS cluster out horizontally with nodes. I am having a cluster that runs on 103% CPU for 5+ minutes but there is no action taken. Any ideas what the triggers are and how I could customize them? If I start more jobs the cluster will lower the CPU allocation for all pods.
The article that MS has doesn't have anything specific around that https://learn.microsoft.com/en-us/azure/aks/cluster-autoscaler
You need to notice that:
The cluster autoscaler is a Kubernetes component. Although the AKS
cluster uses a virtual machine scale set for the nodes, don't manually
enable or edit settings for scale set autoscale in the Azure portal or
using the Azure CLI. Let the Kubernetes cluster autoscaler manage the
required scale settings.
Which brings us to the actual Kubernetes Cluster Autoscaler:
Cluster Autoscaler is a tool that automatically adjusts the size of
the Kubernetes cluster when one of the following conditions is true:
there are pods that failed to run in the cluster due to insufficient resources.
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing
nodes.
The first condition above is the trigger you are looking for.
To get more details regarding the installation and configuration you can go through the Cluster Autoscaler on Azure. For example, you can customize your CA based on the Resources:
When scaling from an empty VM Scale Set (0 instances), Cluster
Autoscaler will evaluate the provided presources (cpu, memory,
ephemeral-storage) based on that VM Scale Set's backing instance type.
This can be overridden (for instance, to account for system reserved
resources) by specifying capacities with VMSS tags, formated as:
k8s.io_cluster-autoscaler_node-template_resources_<resource name>: <resource value>. For instance:
k8s.io_cluster-autoscaler_node-template_resources_cpu: 3800m
k8s.io_cluster-autoscaler_node-template_resources_memory: 11Gi
If an AKS cluster is created without zone-awareness (https://learn.microsoft.com/en-us/azure/aks/availability-zones#create-an-aks-cluster-across-availability-zones), what does this mean behind the scenes?
Are all the VMs running behind running in 1 of the 3 available availability zones?
When that zone has an outage, will Azure move the cluster to another AZ that is running?
This means that control plane components might (or will?) be in a single availability zone:
If you don't define any zones for the default agent pool when you
create an AKS cluster, the AKS control plane components for your
cluster will not use availability zones. You can add additional node
pools using the az aks nodepool add command and specify --zones for
those new nodes, however the control plane components remain without
availability zone awareness.
emphasis is mine. so this means even if your worker nodes will be up, your cluster won't work properly, because master nodes are down.
And no, it won't get moved to another availability zone in case of the outage
I've installed DC/OS via the Azure Container Service (ACS), but I can't find any information anywhere on how to either scale this manually (just increase the number of agents), or ideally automatically in response to load.
There are a number of ways you can scale an ACS cluster:
CLI: https://blogs.msdn.microsoft.com/azurelinux/2016/07/20/azure-cli-0-10-2-release-update-5th-july-2016/
ACS resource provider: simply resubmit your ARM template for ACS with a new number of agents.
VMSS: use the portal to configure the scale set (including autoscale) https://azure.microsoft.com/en-us/documentation/articles/virtual-machine-scale-sets-autoscale-overview/