Changing VM size of running HDInsight cluster - azure-hdinsight

Is there a backend (PowerShell etc.) way to scale up VM type/size of an existing HDInsight cluster ? The Azure portal UI for sure doesn't allow it, it only allows scale out (changing number of worker nodes). ADX/Kusto is allowing this for running clusters even from Azure portal UI, so just trying to figure out if there is a backend way of doing something similar for HDInsight?

Related

Vertical scaling of azure kubernetes cluster

I am unable to scale vertical my AKS cluster.
Currently, I have 3 nodes in my cluster with 2 core and 8 ram, I am trying to upgrade it with 16 code and 64 RAM, how do I do it?
I tried scaling the VM scale set, on Azure portal it shows it is scaled but when I do "kubectl get nodes -o wide" it still shows the old version.
Any leads will be helpful.
Thanks,
Abhishek
Vertical scaling or changing the node pool VM size is not supported. You need to create a new node pool and schedule your pods on the new nodes.
https://github.com/Azure/AKS/issues/1556#issuecomment-615390245
this UX issues is due to how the VMSS is managed by AKS. Since AKS is
a managed service, we don't support operations done outside of the AKS
API to the infrastructure resources. In this example you are using the
VMSS portal to resize, which uses VMSS APIs to resize the resource and
as a result has unexpected changes.
AKS nodepools don't support resize in place, so the supported way to
do this is to create a new nodepool with a new target and delete the
previous one. This needs to be done through the AKS portal UX. This
maintains the goal state of the AKS node pool, as at the moment the
portal is showing the VMSize AKS knows you have because that is what
was originally requested.

Azure Kubernetes Services scale up trigger

I am trying to figure out what is the trigger to scale AKS cluster out horizontally with nodes. I am having a cluster that runs on 103% CPU for 5+ minutes but there is no action taken. Any ideas what the triggers are and how I could customize them? If I start more jobs the cluster will lower the CPU allocation for all pods.
The article that MS has doesn't have anything specific around that https://learn.microsoft.com/en-us/azure/aks/cluster-autoscaler
You need to notice that:
The cluster autoscaler is a Kubernetes component. Although the AKS
cluster uses a virtual machine scale set for the nodes, don't manually
enable or edit settings for scale set autoscale in the Azure portal or
using the Azure CLI. Let the Kubernetes cluster autoscaler manage the
required scale settings.
Which brings us to the actual Kubernetes Cluster Autoscaler:
Cluster Autoscaler is a tool that automatically adjusts the size of
the Kubernetes cluster when one of the following conditions is true:
there are pods that failed to run in the cluster due to insufficient resources.
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing
nodes.
The first condition above is the trigger you are looking for.
To get more details regarding the installation and configuration you can go through the Cluster Autoscaler on Azure. For example, you can customize your CA based on the Resources:
When scaling from an empty VM Scale Set (0 instances), Cluster
Autoscaler will evaluate the provided presources (cpu, memory,
ephemeral-storage) based on that VM Scale Set's backing instance type.
This can be overridden (for instance, to account for system reserved
resources) by specifying capacities with VMSS tags, formated as:
k8s.io_cluster-autoscaler_node-template_resources_<resource name>: <resource value>. For instance:
k8s.io_cluster-autoscaler_node-template_resources_cpu: 3800m
k8s.io_cluster-autoscaler_node-template_resources_memory: 11Gi

is there any way to run aks in azure dev/test labs?

I am looking a way to run aks or k8s cluster in dev/test labs but I couldn't find an official way. I guess Azure has allow using production services in Dev/Test Lab however they haven't published yet a document to achieve this. I need rich memory VMs such as 128/256 gb though AKS doesn't support that vm on cluster. And AutoShutdown option will be cost saving for these VMs. So I have to build this in dev/test lab. Any suggestion would be helpful. Thanks!
AKS is a managed service and you can't run it on you own VMs or the ones from Dev/Test Labs. Why are you saying that you can't use 128/256GB RAM VMs? When selecting your VMs size in the portal, make sure to select the Memory Optimized family.
If I understand correctly, your goal is to save money running these high cost VMs. One possible way you can achieve this is create your cluster with a single instance of a smaller VM and create a a second node pool with the larger VMs. You can then create and destroy that second pool on demand.

Azure AKS: Control which node should be removed while downscaling

I have an AKS-Cluster in Azure. When I scale down the cluster with az aks scale for example I want to control which Node should be removed.
I cannot find a documentation that describes how Azure decides.
Will Azure prefer removing nodes that are already cordoned or drained?
Deleting it from the Azure Portal is not an option, because I want an application to communicate with Azure via CLI or API.
First of all, it's impossible to control which node to remove when you scale down the AKS cluster. Then I will show you how do the nodes change when you scale the AKS cluster.
When you do not use the VMSS as the agent pool, it means the AKS cluster use the individual VMs as the nodes. If you scale up, then it will increase the nodes with the index after the existing nodes. For example, the cluster has one node with the index 0 and then it will use the index 1 if you scale up one node. And if you scale down, it will remove the nodes with the biggest index in the sequence at first.
When you use the VMSS as the agent pool, it will comply with the scale rules of VMSS. And you can see the VMSS scale rules in the changes of VMSS scale up and down.
Also, you can take a look at the Azure CLI command az aks scale that scale the AKS cluster and the REST API.

How to make a HDInsight/Spark cluster shrink when idle?

We use Spark 2.2 on Azure HDInsight for ad hoc exploration and batch jobs.
The jobs should run ok on a 5x medium VM cluster. They are
1. notebooks (Zeppelin with Livy.spark2 magics)
2. compiled jars being run with Livy.
I have to remember to scale this cluster down to 1 worker when not using it, to save money. (0 workers would be nice, if that were possible).
I'd like Spark to manage this for me... When a Job starts, scale the cluster up to a minimum size first, then pause ~10 mins while that completes. After an idle period without Jobs, scale down again.
You can use PowerShell or Azure classic CLI to scale up/down the cluster. But you might need to write a script to track the cluster resource usage and scale down automatically.
Here is a powershell syntax
Set-AzureRmHDInsightClusterSize -ClusterName <Cluster Name> -TargetInstanceCount <NewSize>
Here is a PowerShell workflow runbook that will help you automate the process of scaling in or out your HDInsight clusters depending on your needs
https://gallery.technet.microsoft.com/scriptcenter/Scale-your-HDInsight-f57bb4d8
or
You can use the below option to scale it manually (even though your question is how to scale up/down automatically, I thought it would be useful to someone who wants to scale up/down manually)
Below is the link for an article explaining different methods to scale the cluster using PowerShell or Classic CLI (remember: the latest CLI does n't support scaling feature)
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices
If you want Spark to handle it dynamically, Azure Databricks is the best choice (but it is only Spark cluster, no Hadoop components (except Hive)). As HDInsight - Spark is not a Azure managed service, and will not solve your use case.
Below is the image of a new cluster (in Azure Data bricks) - I highlighted an "enable auto scaling option" which will allow you to scale dynamically when job is executed.
I'm told that Azure Databricks may be a better solution for this use case.

Resources