I have an Azure Function App inside an App Service Plan - Elastic Premium. The Function App is inside a VNet and it is a durable function app with some functions triggered by non-HTTP triggers:
I noticed the the Service Plan is not able to scale out properly, indeed the number of minimum instances is 4 and maximum instances 100
but checking the number of instances in the last 24 hours in:
"Function App -> Diagnose and solve problems -> Availability and performance -> HTTP Functions scaling"
the number of instances is never higher than 5/6 instances:
which is not good because if I check in:
"Function App -> Diagnose and solve problems -> Availability and performance -> Linux Load Average"
I have the following message:
"High Load Average Detected. Shows Load Average information if system has Load Average > 4 times # of Cpu's in the timeframe. For more information see Additional Investigation notes below."
and also checking the CPU and memory usage from the metrics I have some high spikes
so this means that my App Service plan is not able to scale out properly.
One of my colleague, suggested to verify that:
"Function App -> Configuration -> Function runtime settings -> Runtime Scale Monitoring"
is set to "on":
because if it is set to "off", may be that the VNet blocks Azure from diagnostic our app and as a result, Azure is not spawning more instances because is not seeing in real-time what CPU Load is.
But I didn't understand how this setting can help me to scale out.
Do you know what the "Runtime Scale Monitoring" is used for and why this can help me to scale out?
And also, do you think is more a problem related to scaling up instead of scaling out?
I assume that you are not using HTTP triggers, but instead something like ServiceBus. Also I don't think there is any "scaling up" in Consumption or Premium plans.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-networking-options?tabs=azure-cli#premium-plan-with-virtual-network-triggers
When you enable virtual network trigger support, only the trigger types shown in the previous table scale dynamically with your application. You can still use triggers that aren't in the table, but they're not scaled beyond their pre-warmed instance count. For the complete list of triggers, see Triggers and bindings.
My understanding is that the setting will allow the scale controller to gain insight into what is triggering your function. So it is not scaling horizontally by looking at CPU usage, but rather by looking at the executed triggers to check if the messages are being processed fast enough.
https://learn.microsoft.com/en-us/azure/azure-functions/event-driven-scaling#runtime-scaling
For example, when you're using an Azure Queue storage trigger, it scales based on the queue length and the age of the oldest queue message.
If you disable the setting, the Scale Controller shown in the image will not have access to the queue length and you will not observe any scaling.
FUNCTIONS_WORKER_PROCESS_COUNT - This will Scale-up the Function Workers but not the hosts running.
It means if you set the value to X, then each host instance runs the X individual function worker processes concurrently.
This setting will scale up the host instance to run as many as Function Worker processes concurrently based on specified value.
Runtime Scale Monitoring requires the Contributor Access Level on both App Service Plan and Function App Level to enable the setting for Scaling Up/down operation performing.
I believe the above settings should need the pre-warmed instances to be at least set to 1.
For more information on how runtime scale controller works and cost optimization by enabling above settings, I have found the useful information in SO 1 and2 provided by the users #HariKrishna and #HuryShen.
Updated Answer:
In Azure Functions, an orchestrator function code can be either HTTP, Timer or any other Code Events, Sub-orchestrations, and activity functions.
To deal with the multiple orchestrator functions in parallel, you can use the below setting under extensions > durableTask > maxConcurrentOrchestratorFunctions to X Value in the host.json file. Refer to the JonasW Blog Article regarding how scaling happens in Azure Durable Functions.
Related
We're looking for automated way to horizontally, vertically scale the pull of self hosted integration runtime virtual machines used in ADF.
Reading Microsoft docs does not provide answer.
Well, I don't have the experience, so I can only give you a theoretical answer, but maybe it's helpfull for you.
AFAIK, neither way is configurable out-of-the-box. For scale-out you'll have to deploy an additional IR machine yourself. So probably you'll want to create an image that you can provision from docker or kubernetes and has the IR and pre-requirements installed. The IR installation provides an PowerShell script that can be used to create an automated connection.
For scale-up/down, you'll have to run some script that scales your vm. In an IaaS solution (f.e.) Azure VM, that should be doable with an API call to change your VM.
For both cases you'll have to have some kind of montitor in place that monitors the IR loads and makes changess as needed. I think the measures provided in the Data Factory should do. Maybe you can use Log Analyics to monitor the loads.
I'm curious about your use case for this.
My solution is just for scaling out/in since the VM must be restarted if you are scaling up/down, which causes downtime and job failures etc.
At a high level this solution requires just 3 simple things:
Azure Metric Alert that fires when Scale-Out should occur (VM Start)
Azure Metric Alert that fires when Scale-In should occur (VM Deallocation)
Logic App that is triggered by Azure Alert and actually executes the Start/Stop of the VM, along with any other automation associated with this (eg posting to a Teams channel when Scale in/out occurs)
Here are more of the details surrounding how we setup the conditions for the alerts, but the main thing to keep in mind is (IR CPU %, IR queue length, Number of Nodes, and possibly IR Memory)
Scale-Out
Scale-In
Actions for Alerts
As you can see below we have the alert triggering 1 Logic App, using the payload that is passed to the Logic App, you can determine if the Logic App should be starting the VM, or stopping the VM. (As well as any other additional actions)
Logic App
There is a small chance that due to timing (and depending on how many ADF's the IR is shared to), that pipeline activities could be sent to Node 2 at the same time a deallocation command is sent to the VM for Node 2. I have not seen this as of yet, but adjusting the alert conditions based on your need could help avoid this. Feel free to play around with the conditions of the alerts, granularity, thresholds, etc. This is not a one size fits all solution.
Is it possible to fix a cap on the number of servers on which the azure functions scale? I have a consumption plan and basically I would like to set a cap on the number of resources that azure functions can use.
The only solutions I found are:
set a cap for daily GB/sec threshold, after which the functions are stopped until the following day, which is definitely something I do not want because I need to use some functions for online tasks.
In the host.json, changing parameters for http.maxConcurrentRequests and http.maxOutstandingRequests, which will affect the number of concurrent functions running. Is this the thing should I look into? isn't this setting per-server level? my fear is that this won't end up capping resources, but insted will let Azure create just more and more servers, in order to comply with request loads
You can use the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting: The maximum number of instances that the function app can scale out to. Default is no limit.
Note: This setting is a preview feature - and only reliable if set to a value <= 5
Ref: https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#websitemaxdynamicapplicationscaleout
One thing to note is that timer-triggered functions are automatically singletons. In my case that was sufficient, as I can wake-up such function every minute and process specific amount of data. Even if the function takes longer than expected, there's no risk a second one will be run concurrently.
More info: https://stackoverflow.com/a/53919048/4619705
I have an azure cloud service which scales instances out and in. This works fine using some app insights metrics to manage the auto-scaling rules.
The issue comes in when the scales in and azure eliminates hosts; is there a way for it to only scale in an instance once that instance is done processing its task?
There is no way to do this automatically. Azure will always scale in the highest number instance.
The ideal solution is to make the work idempotent and chunked so that if an instance that was doing some set of work is interrupted (scaling in, VM reboot, power loss, etc), then another instance can pick up the work where it left off. This lets you recover from a lot of possible scenarios such as power loss, instead of just trying to design something specific for scale in.
Having said that, you can manually create a scaling solution that only removes instances that are not doing work, but doing so will require a fair bit of code on your part. Essentially you will use a signaling mechanism running in each instance that will let some external service (a Logic app or WebJob or something like that) know when an instance is free or busy, and that external service can delete the free instances using the Delete Role Instances API (https://learn.microsoft.com/en-us/rest/api/compute/cloudservices/rest-delete-role-instances).
For more discussion on this topic see:
How to Stop single Instance/VM of WebRole/WorkerRole
Azure autoscale scale in kills in use instances
Another solution but this one breaks an assumption that we are using Azure cloud service; if you use app services instead of the cloud service you will be able to setup auto scaling on the app service plan effectively taking care of the instance drop you are experiencing.
This is an infrastructure change so it's not a two click thing but I believe app services are better suited in many situations including this one.
You can look at some pros and cons but if your product is traffic managed this switch will not be painful.
Kwill, thanks for the links/information, the top item in the second link was the best compromise.
The process work length was usually under 5 minutes and the service already had re-handling of failed processes, so after some research it was decided to track state of when the service was processing a queue item and use a while loop in the RoleEnvironment.Stopping event to delay restart and scale-in events until the process had a chance to finish.
App Insights was used to track custom events during the on stopping event to track how often it completes vs restarts during the delay cycles.
I have an Azure Function that calls a web service hosted on a VM. Because of this dependency it is important that no more than 8 instances of the Azure Function are ever running concurrently (the VM only has 8 cores and each call to the web service uses a single core). If more than 8 instances are spawned then the web service calls will start to backup, items from the queue that is triggering the Azure Function will start timing out, and messages will be dropped. However, I also want to utilize the available resources as much as possible to maximize the processing of the queue so I'd like there to always be 8 instances of the Azure function running whenever there are 8 or more items in the queue.
In order to accomplish the required throttling I have set the Azure Function plan to run on an App Service plan instead of a Consumption plan and I have set the following values in the host.json for the Azure Function service:
{
"queues": {
"batchSize": 1,
"newBatchThreshold": 7
}
}
Theoretically this makes it so that as long as 7 or fewer instances of the function are running then 1 more message will be dequeued until there are 8 messages being processed. I have this running right now and it seems to be working, but I can't find anywhere in the host.json documentation or the WebJobs SDK documentation that says whether or not it is okay for batchSize to be less than newBatchThreshold. I only know that the recommendation is for newBatchThreshold to be half of batchSize (which is clearly not what I am doing).
My question is: Is this configuration okay? Or is there a better way to accomplish my throttling goals?
Yes, that is perfectly ok. Each flag has a precise (albeit kind of confusing) semantic, and any combination is valid and will honor the documented semantic.
Be aware that when using Consumption Azure Functions, you may end up getting scaled to multiple instances, each of which having those limits. If you want to avoid that try setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT to 1, per https://github.com/Azure/azure-webjobs-sdk-script/wiki/Configuration-Settings
We're trying to test scalability of Azure functions (it's a bear). We came across this https://azure.microsoft.com/en-in/documentation/articles/functions-reference/#parallel-execution
If a function app is using the Dynamic Service Plan, the function app
could scale out automatically up to 10 concurrent instances. Each
instance of the function app, whether the app runs on the Dynamic
Service Plan or a regular App Service Plan
Does this mean that the maximum scalability of a single function is just 10? we've never been able to get over 10 units running... (previous question on the algorithm to determine adding another consumption unit, this to determine the upper end of scalability).
Thanks
UPDATE: There is no official maximum number of instances. We see customers who are able to scale out to hundreds. The number you achieve depends mostly on your workload, but partly on the region you're running in (some regions have more capacity than others). The 10 instance limit mentioned in previous versions of the docs has been removed.
You can find more information about our consumption plan and scaling here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#how-the-consumption-plan-works
Also note that each instance in Azure Functions can run multiple function executions in parallel. For example, if you have a function app which has a single function that runs quickly, you could expect to see dozens or even hundreds of concurrent executions on a single instance. This is unlike other services such as AWS Lambda which only execute a single function at a time per instance. New instances are added only when the system decides that the current number of instances is insufficient to handle the current load (more details on that in my answer to your other question).