Azure VM scale set auto-scale scale down notifications - azure

I have a VM scale set that I want to set up auto-scaling for and I want to know how abrupt scaling down is. Before VMs get destroyed, I want to make sure any active long-running requests complete. Is this possible?
I am curious about the following:
How does auto-scaling decides which VMs to destroy when scaling down?
Is there any notification inside the VM that it is scheduled to be destroyed?
Can a VM that is scheduled to be destroyed control when it gets destroyed (and hold off destruction until all requests are complete)?
The VMs in my scale set will be behind a load balancer and I need to be able to drain connections (remove VMs from the backend pool) before destruction.

The autoscaling has several policies by which it selects which VMs to remove on scale-in, for example "NewestVM" will remove the ones which launched last, you can read more here: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-scale-in-policy
Regarding notification inside the VM about termination, there's a new feature called "termination notification" that sends an event which you can read from localhost metadata, for example
curl -s -H "Metadata:true" "http://169.254.169.254/metadata/instance?api-version=2019-06-01"
Read more here: https://azure.microsoft.com/en-us/blog/azure-virtual-machine-scale-sets-now-provide-simpler-management-during-scalein/
The VM can either wait for termination timeout, or send a signal to metadata (POST request) to proceed with termination before timeout.
To drain connections, one of the methods is to block health probe IP address 168.63.129.16, so the VM will be "unhealthy" in load balancer or application-gateway, depends what you use, and no new traffic will be sent while old existing traffic will still be active.

How does auto-scaling decides which VMs to destroy when scaling down?
By default, auto-scaling will delete the larger Instance ID (for example, instances ID are 0,2,3, vmss will delete 3). We can use powershell to get the vmss vms' instance id.
PS C> Get-AzureRmVmssvm -ResourceGroupName "vmss" -VMScaleSetName "vmss"
ResourceGroupName Name Location Sku Capacity InstanceID ProvisioningState
----------------- ---- -------- --- -------- ---------- -----------------
VMSS vmss_0 westus Standard_D1_v2 0 Succeeded
VMSS vmss_2 westus Standard_D1_v2 2 Succeeded
Is there any notification inside the VM that it is scheduled to be
destroyed?
As far as I know, autoscale notifies the administrators and contributors of the resource by email, VM will not receive the notification.
Can a VM that is scheduled to be destroyed control when it gets
destroyed (and hold off destruction until all requests are complete)?
We can't hold off destruction until all requests are complete for now.
In most cases, we deploy vmss with load balancer which using a "round-robin" approach, the VMSS instances will not receive requests until the instances were deleted.
I want to make sure any active long-running requests complete. Is this
possible?
As far as I know, we can choose different OS metrics for autoscale, but we can't make sure VMSS will delete vm instances after the long-running requests complete.

Related

Azure alert to notify when a vm is stopped

I'd like to get a notification whenever a VM is stopped. Currently I've done it (for one VM only) using a hearbeat log alert to check every 10 min but that I want it to implemented for the whole subscription (100+ vms) I cannot do it sincgle alert for each vm due to cost.
After google it I found that there is a signal alert called Power Off Virtual Machine (Microsoft.Compute/virtualMachines) that should fulfill the requirement but after set the alert and stopping a VM nothing is being received. Is there any missing step maybe ?
PS. I'm using VM Insights + new Azure Monitor Agent.
I tried to reproduce the same in my environment and dint get the email notification when the VM was stopped for Power Off Virtual Machine operation:
To receive an alert to notify when the VM is stopped, please try the below:
Make sure to select the scope for the whole subscription:
Please note that, to receive an alert when the VM is stopped, make sure to select Deallocate Virtual Machine operation:
While creating the Action Group, select the Email option:
When I VM stopped, I got the email notification successfully like below:
The Alert rule is successfully fired in the Monitor like below:

In Azure do fired alerts for a specific resource auto-resolve once they are deleted?

I have an AKS cluster for which i have received container working set memory alerts as well as "pods not in ready state" alerts, if these pods /clusters do not exist any longer would the alerts continue to stay in "fired" state?
NOTE: Auto resolve is turned on for the above alert rules.

Windows Virtual Desktop: Automatically start and deallocate dedicated VM

As far as I understand, Windows Virtual Desktop's host pools can be configured in a pooled (assign a user to a VM with free resources) or personal (dedicated VM per user) mode.
I have some users with special needs (available applications, configuration and VM resources) and unpredictable usage times. Would it be possible to assign specific machines to them and tie their lifecycle to the user login? What I'd like to achieve is to shutdown and deallocate the VM if the user logged out or shutdown the VM, and automatically start it (accepting some initial delay) when logging in, to only pay for the VMs when they are actually needed.
Start/Stop VMs during off-hours
It starts or stops machines on user-defined schedules, provides insights through Azure Monitor logs, and sends optional emails by using action groups. The feature can be enabled on both Azure Resource Manager and classic VMs for most scenarios.
This feature uses Start-AzVm cmdlet to start VMs. It uses Stop-AzVM for stopping VMs.
Prerequisites
The runbooks for the Start/Stop VMs during off hours feature work with an Azure Run As account. The Run As account is the preferred authentication method because it uses certificate authentication instead of a password that might expire or change frequently.
An Azure Monitor Log Analytics workspace that stores the runbook job logs and job stream results in a workspace to query and analyze. The Automation account and Log Analytics workspace need to be in the same subscription and supported region. The workspace needs to already exist, you cannot create a new workspace during deployment of this feature.
Recommended: Use a separate Automation account for working
with VMs enabled for the Start/Stop VMs during off-hours feature.
Azure module versions are frequently upgraded, and their parameters
might change. The feature isn't upgraded on the same cadence and it
might not work with newer versions of the cmdlets that it uses. Before
importing the updated modules into your production Automation
account(s), we recommend you import them into a test Automation
account to verify there aren't any compatibility issues.
Permissions
You must have certain permissions to enable VMs for the Start/Stop VMs during off-hours feature. The permissions are different depending on whether the feature uses a pre-created Automation account and Log Analytics workspace or creates a new account and workspace.
You don't need to configure permissions if you're a Contributor on the subscription and a Global Administrator in your Azure Active Directory (AD) tenant. If you don't have these rights or need to configure a custom role, make sure that you have the permissions described below.
Runbooks
The following link lists the runbooks that the feature deploys to your Automation account. Do NOT make changes to the runbook code. Instead, write your own runbook for new functionality.
Don't directly run any runbook with child appended to its name.
All parent runbooks include the WhatIf parameter. When set to True, the parameter supports detailing the exact behavior the runbook takes when run without the parameter and validates that the correct VMs are targeted. A runbook only performs its defined actions when the WhatIf parameter is set to False.
Main default runbooks:
ScheduledStartStop_Parent
SequencedStartStop_Parent
Variables (used by Runbooks)
The following table lists the variables created in your Automation account. Only modify variables prefixed with External. Modifying variables prefixed with Internal causes undesirable effects.
Main variables to use with your Runbooks:
External_Start_ResourceGroupNames: Comma-separated list of one or more resource groups that are targeted for start actions.
External_Stop_ResourceGroupNames: Comma-separated list of one or more resource groups that are targeted for stop actions.
External_ExcludeVMNames: Comma-separated list of VM names to exclude, limited to 140 VMs. If you add more than 140 VMs to the list, VMs specified for exclusion might be inadvertently started or stopped.
Schedules
Don't enable all schedules, because doing so might create overlapping schedule actions. It's best to determine which optimizations you want to do and modify them accordingly.
Scheduled_StopVM: Runs the ScheduledStopStart_Parent runbook with a parameter of Stop every day at the specified time. Automatically stops all VMs that meet the rules defined by variable assets. Enable the related schedule Scheduled-StartVM.
Scheduled_StartVM: Runs the ScheduledStopStart_Parent runbook with a parameter value of Start every day at the specified time. Automatically starts all VMs that meet the rules defined by variable assets. Enable the related schedule Scheduled-StopVM.
Sequenced-StopVM: Runs the Sequenced_StopStop_Parent runbook with a parameter value of Stop every Friday at the specified time. Sequentially (ascending) stops all VMs with a tag of SequenceStop defined by the appropriate variables. For more information on tag values and asset variables, see Runbooks. Enable the related schedule, Sequenced-StartVM.
Sequenced-StartVM: Runs the SequencedStopStart_Parent runbook with a parameter value of Start every Monday at the specified time. Sequentially (descending) starts all VMs with a tag of SequenceStart defined by the appropriate variables. For more information on tag values and variable assets, see Runbooks. Enable the related schedule, Sequenced-StopVM.
How to enable and configure Start/Stop VMs during Off-hours.
Search for and select Automation Accounts.
On the Automation Accounts page, select your Automation account from the list.
From the Automation account, select Start/Stop VM under Related Resources. From here, you can click Learn more about and enable the solution. If you already have the feature deployed, you can click Manage the solution and find it in the list.
On the Start/Stop VMs during off-hours page for the selected deployment, review the summary information and then click Create.
With the resource created, the Add Solution page appears. You're prompted to configure the feature before you can import it into your Automation account.
On the Add Solution page, select Workspace. Select an existing Log Analytics workspace from the list. If there isn't an Automation account in the same supported region as the workspace, you can create a new Automation account in the next step.
On the Add Solution page if there isn't an Automation account available in the supported region as the workspace, select Automation account. You can create a new Automation account to associate with it by selecting Create an Automation account, and on the Add Automation account page, provide the the name of the Automation account in the Name field.
All other options are automatically populated, based on the Log Analytics workspace selected. You can't modify these options. An Azure Run As account is the default authentication method for the runbooks included with the feature.
After you click OK, the configuration options are validated and the Automation account is created. You can track its progress under Notifications from the menu.
On the Add Solution page, select Configure parameters. The Parameters page appears.
Specify a value for the Target ResourceGroup Names field. The field defines group names that contain VMs for the feature to manage. You can enter more than one name and separate the names using commas (values are not case-sensitive). Using a wildcard is supported if you want to target VMs in all resource groups in the subscription. The values are stored in the External_Start_ResourceGroupNames and External_Stop_ResourceGroupNames variables.
The default value for Target ResourceGroup Names is a *. This setting
targets all VMs in a subscription. If you don't want the feature to
target all the VMs in your subscription, you must provide a list of
resource group names before selecting a schedule.
Specify a value for the VM Exclude List (string) field. This value is the name of one or more virtual machines from the target resource group. You can enter more than one name and separate the names using commas (values are not case-sensitive). Using a wildcard is supported. This value is stored in the External_ExcludeVMNames variable.
Use the Schedule field to select a schedule for VM management by the feature. Select a start date and time for your schedule to create a recurring daily schedule starting at the chosen time. Selecting a different region is not available. To configure the schedule to your specific time zone after configuring the feature, see Modify the startup and shutdown schedules.
To receive email notifications from an action group, accept the default value of Yes in the Email notifications field, and provide a valid email address. If you select No but decide at a later date that you want to receive email notifications, you can update the action group that is created with valid email addresses separated by commas. The following alert rules are created in the subscription:
AutoStop_VM_Child
Scheduled_StartStop_Parent
Sequenced_StartStop_Parent
After you have configured the initial settings required for the feature, click OK to close the Parameters page.
Click Create. After all settings are validated, the feature deploys to your subscription. This process can take several seconds to finish, and you can track its progress under Notifications from the menu.
Scenario 1: Start/Stop VMs on a schedule
This scenario is the default configuration when you first deploy Start/Stop VMs during off-hours. For example, you can configure the feature to stop all VMs across a subscription when you leave work in the evening, and start them in the morning when you are back in the office. When you configure the schedules Scheduled-StartVM and Scheduled-StopVM during deployment, they start and stop targeted VMs.
Configuring the feature to just stop VMs is supported. See Modify the startup and shutdown schedules to learn how to configure a custom schedule.
The time zone used by the feature is your current time zone when you
configure the schedule time parameter. However, Azure Automation
stores it in UTC format in Azure Automation. You don't have to do any
time zone conversion, as this is handled during machine deployment.
To control the VMs that are in scope, configure the variables: External_Start_ResourceGroupNames, External_Stop_ResourceGroupNames, and External_ExcludeVMNames.
You can enable either targeting the action against a subscription and resource group, or targeting a specific list of VMs, but not both.
Target the start and stop action by VM list
Run the ScheduledStartStop_Parent runbook with ACTION set to start.
Add a comma-separated list of VMs (without spaces) in the VMList parameter field. An example list is vm1,vm2,vm3.
Set the WHATIF parameter field to True to preview your changes.
Configure the External_ExcludeVMNames variable with a comma-separated list of VMs (VM1,VM2,VM3), without spaces between comma-separated values.
This scenario does not honor the External_Start_ResourceGroupNames and External_Stop_ResourceGroupnames variables. For this scenario, you need to create your own Automation schedule. For details, see Schedule a runbook in Azure Automation.
Scenario 2: Start/Stop VMs in sequence by using tags
Target the start and stop actions against a subscription and resource group
Add a sequencestart and a sequencestop tag with positive integer values to VMs that are targeted in External_Start_ResourceGroupNames and External_Stop_ResourceGroupNames variables. The start and stop actions are performed in ascending order. To learn how to tag a VM, see Tag a Windows virtual machine in Azure and Tag a Linux virtual machine in Azure.
Modify the schedules Sequenced-StartVM and Sequenced-StopVM to the date and time that meet your requirements and enable the schedule.
Run the SequencedStartStop_Parent runbook with ACTION set to start and WHATIF set to True to preview your changes.
Preview the action and make any necessary changes before implementing against production VMs. When ready, manually execute the runbook with the parameter set to False, or let the Automation schedules Sequenced-StartVM and Sequenced-StopVM run automatically following your prescribed schedule.
Scenario 3: Start or stop automatically based on CPU utilization
Start/Stop VMs during off-hours can help manage the cost of running Azure Resource Manager and classic VMs in your subscription by evaluating machines that aren't used during non-peak periods, such as after hours, and automatically shutting them down if processor utilization is less than a specified percentage.
By default, the feature is pre-configured to evaluate the percentage CPU metric to see if average utilization is 5 percent or less. This scenario is controlled by the following variables and can be modified if the default values don't meet your requirements:
External_AutoStop_MetricName
External_AutoStop_Threshold
External_AutoStop_TimeAggregationOperator
External_AutoStop_TimeWindow
External_AutoStop_Frequency
External_AutoStop_Severity
You can enable and target the action against a subscription and resource group, or target a specific list of VMs.
When you run the AutoStop_CreateAlert_Parent runbook, it verifies that the targeted subscription, resource group(s), and VMs exist. If the VMs exist, the runbook calls the AutoStop_CreateAlert_Child runbook for each VM verified by the parent runbook. This child runbook:
Creates a metric alert rule for each verified VM.
Triggers the AutoStop_VM_Child runbook for a particular VM if the CPU drops below the configured threshold for the specified time interval.
Attempts to stop the VM.
Target the autostop action against all VMs in a subscription
Ensure that the External_Stop_ResourceGroupNames variable is empty or set to * (wildcard).
[Optional] If you want to exclude some VMs from the autostop action, you can add a comma-separated list of VM names to the External_ExcludeVMNames variable.
Enable the Schedule_AutoStop_CreateAlert_Parent schedule to run to create the required Stop VM metric alert rules for all of the VMs in your subscription. Running this type of schedule lets you create new metric alert rules as new VMs are added to the subscription.
Target the autostop action against all VMs in a resource group or multiple resource groups
Add a comma-separated list of resource group names to the External_Stop_ResourceGroupNames variable.
If you want to exclude some of the VMs from the autostop, you can add a comma-separated list of VM names to the External_ExcludeVMNames variable.
Enable the Schedule_AutoStop_CreateAlert_Parent schedule to run to create the required Stop VM metric alert rules for all of the VMs in your resource groups. Running this operation on a schedule allows you to create new metric alert rules as new VMs are added to the resource group(s).
Target the autostop action to a list of VMs
Create a new schedule and link it to the AutoStop_CreateAlert_Parent runbook, adding a comma-separated list of VM names to the VMList parameter.
Optionally, if you want to exclude some VMs from the autostop action, you can add a comma-separated list of VM names (without spaces) to the External_ExcludeVMNames variable.
Configure email notifications
In the Azure portal, click on Alerts under Monitoring, then Manage actions. On the Manage actions page, make sure you're on the Action groups tab. Select the action group called StartStop_VM_Notification.
On the StartStop_VM_Notification page, the Basics section will be filled in for you and can't be edited, except for the Display name field. Edit the name, or accept the suggested name. In the Notifications section, click the pencil icon to edit the action details. This opens the Email/SMS message/Push/Voice pane. Update the email address and click OK to save your changes.
Add a VM
There are two ways to ensure that a VM is included when the feature runs:
Each of the parent runbooks of the feature has a VMList parameter. You can pass a comma-separated list of VM names (without spaces) to this parameter when scheduling the appropriate parent runbook for your situation, and these VMs will be included when the feature runs.
To select multiple VMs, set External_Start_ResourceGroupNames and External_Stop_ResourceGroupNames with the resource group names that contain the VMs you want to start or stop. You can also set the variables to a value of * to have the feature run against all resource groups in the subscription.
Exclude a VM
To exclude a VM from Stop/start VMs during off-hours, you can add its name to the External_ExcludeVMNames variable. This variable is a comma-separated list of specific VMs (without spaces) to exclude from the feature. This list is limited to 140 VMs. If you add more than 140 VMs to this list, VMs that are set to be excluded might be inadvertently started or stopped.
Modify the startup and shutdown schedules
Managing the startup and shutdown schedules in this feature follows the same steps as outlined in Schedule a runbook in Azure Automation. Separate schedules are required to start and stop VMs.
Configuring the feature to just stop VMs at a certain time is supported. In this scenario you just create a stop schedule and no corresponding start schedule.
Ensure that you've added the resource groups for the VMs to shut down in the External_Stop_ResourceGroupNames variable.
Create your own schedule for the time when you want to shut down the VMs.
Navigate to the ScheduledStartStop_Parent runbook and click Schedule. This allows you to select the schedule you created in the preceding step.
Select Parameters and run settings and set the ACTION field to Stop.
Select OK to save your changes.
I believe an Automation Runbook with Start/Stop following a specific schedule doesn't fit the requirements of "unpredictable usage times". Using CPU usage would work when logging off (although even if implemented properly, there could be a small acceptable delay). For log-on I think you have a Catch-22. From what I understand a user is generally tied to a WVD pool and if that pool isn't available, the user can't login. So turning on the machine at log on wouldn't work as for that user to even be able to login would require the machine to be turned on prior.
Assuming your users are not tech savvy and giving them access to an Start/Stop Runbook might not be practical, the only other option I can think of is having an Automation Runbook triggered by a custom callback using webhooks. See Azure Automation Webhooks. With a very simple web interface and just two options provided, (Start WVD, Stop WVD) users can trigger the actions themselves.
If you want explore other options like Azure Logic Apps, you get even more event listeners. Some examples: Users could send an email with specific keyword to an inbox that the Logic App is actively checking, which would then trigger the desired action.
I don't know if you're using Log Analytics for collecting Azure Activity Logs but if yes, a mix with the options above could do the trick.
Use Logic App to listen for an event or Runbooks with webhooks, to give users the flexibility to turn on the machine when they need.
Use Logic App and Log Analytics, to listen for a log out event from the machine or to monitor for CPU utilisation over a period of time and act on the result. See Azure Automation Solution for VM Management
I have to implement something similar myself and these are the options I have in mind. I wish I could provide more concrete examples with actual use cases. Will update this in the future when I get there.
EDIT: Did some digging and stumbled upon this repo that provides more info, with screenshots and also the scripts when compared with the official Azure documentation that seem very vague.
Also, since I mentioned Log Analytics earlier, here's a Log Analytics query you can use to find start/end connections in the past 24hrs.
WVDConnections
| where TimeGenerated > ago(24h)
| where State == "Connected"
| project CorrelationId , UserName , StartTime=TimeGenerated, SessionHostName
| join (WVDConnections
| where State == "Completed"
| project EndTime=TimeGenerated, CorrelationId)
on CorrelationId
| parse SessionHostName with Host "domain" //If applicable use parser to remove part of a value to clean it up.
| project StartTime, EndTime, UserName, Host
| sort by StartTime desc

How to depend on Terraform aws aws_autoscaling_group's ec2 instance state and status

I have a Terraform module that needs to depend on the resource aws_autoscaling_group ec2 "Instance State" and "Status Checks" (from the ec2 console) to go green before starting. How can this be done? Thanks.
In my understanding that's the default behavior of autoscaling_group. From the docs:
Terraform provides two mechanisms to help consistently manage ASG scale uptime across dependent resources.
The first is the default behavior. Terraform waits after ASG creation for min_size (or desired_capacity, if specified) healthy instances to show up in the ASG before continuing.
[...]
Terraform considers an instance "healthy" when the ASG reports HealthStatus: "Healthy" and LifecycleState: "InService"
Also, from the AWS docs:
When each instance is fully configured and passes the Amazon EC2 health checks, it is attached to the Auto Scaling group and it enters the InService state. The instance is counted against the desired capacity of the Auto Scaling group.

Using Packer to Spin a VM and extract the image in an availability set

We have our corporate requirement ( due to pricing and whitelisting) to have Availability sets in our Azure subscription and resources like Compute should be spun inside that particular availability set. Since Packer while creating the Image spins up a temporary VM inside a temporary resource Group , I am confused (since did not find any documentation around it) if we can configure packer to spin the temporary VM inside the whitelisted availability set.
One possible way I can think of is to spin up the VM in the Resource Group which we created for the Availability Set (Since everything in Azure needs to be inside the Resource Group) that way I am guessing it will be tracked as part of billing but I am still not sure if the intermittent VM will be part of availability set.
Please help and suggest if there is an alternate way to the same .

Resources