Azure Functions concurrency: maxConcurrentRequests vs FUNCTIONS_WORKER_PROCESS_COUNT - multithreading

How to see difference between maxConcurrentRequests vs FUNCTIONS_WORKER_PROCESS_COUNT in terms of concurrency and limits for Azure Functions.
We can find some definitions at https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practices that I have pasted below:
Use multiple worker processes
By default, any host instance for Functions uses a single worker process. To improve performance, especially with single-threaded runtimes like Python, use the FUNCTIONS_WORKER_PROCESS_COUNT to increase the number of worker processes per host (up to 10). Azure Functions then tries to evenly distribute simultaneous function invocations across these workers.
The FUNCTIONS_WORKER_PROCESS_COUNT applies to each host that Functions creates when scaling out your application to meet demand.
Configure host behaviors to better handle concurrency
The host.json file in the function app allows for configuration of host runtime and trigger behaviors. In addition to batching behaviors, you can manage concurrency for a number of triggers. Often adjusting the values in these options can help each instance scale appropriately for the demands of the invoked functions.
Settings in the host.json file apply across all functions within the app, within a single instance of the function. For example, if you had a function app with two HTTP functions and maxConcurrentRequests requests set to 25, a request to either HTTP trigger would count towards the shared 25 concurrent requests. When that function app is scaled to 10 instances, the ten functions effectively allow 250 concurrent requests (10 instances * 25 concurrent requests per instance).
Other host configuration options are found in the host.json configuration article https://learn.microsoft.com/en-us/azure/azure-functions/functions-host-json.
{
"extensions": {
"http": {
"routePrefix": "api",
"maxOutstandingRequests": 200,
"maxConcurrentRequests": 100,
"dynamicThrottlesEnabled": true,
"hsts": {
"isEnabled": true,
"maxAge": "10"
},
"customHeaders": {
"X-Content-Type-Options": "nosniff"
}
}
}
}
EDIT:
Its not related to question, but its good to know that to>
From https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=csharp#limits
Limits
The HTTP request length is limited to 100 MB (104,857,600 bytes), and the URL length is limited to 4 KB (4,096 bytes). These limits are specified by the httpRuntime element of the runtime's Web.config file.
If a function that uses the HTTP trigger doesn't complete within 230 seconds, the Azure Load Balancer will time out and return an HTTP 502 error. The function will continue running but will be unable to return an HTTP response. For long-running functions, we recommend that you follow async patterns and return a location where you can ping the status of the request. For information about how long a function can run, see Scale and hosting - Consumption plan.

How to see difference between maxConcurrentRequests vs FUNCTIONS_WORKER_PROCESS_COUNT in terms of concurrency and limits for Azure Functions.
Both parameters work independently.
Both work at host level. Not per function, not per function app, but per host.
I.e. maxConcurrentRequests is enforced across all function executions within a host irrespective of FUNCTIONS_WORKER_PROCESS_COUNT.
Few things in case it's not clear:
If Azure decides that your App needs to scale and creates a new host, and say there are two hosts, then values of these params are applied per host not across host.
If your App has multiple Functions then maxConcurrentRequests applies to-all/across Functions within this host, not per Function.
Default and allowed value vary by plan/language-runtime:
maxConcurrentRequests: The default for a Consumption plan is 100. The default for a Dedicated plan is unbounded (-1).
FUNCTIONS_WORKER_PROCESS_COUNT: ...default value of 1. The maximum value allowed is 10. ... This setting applies to all non-.NET languages.
Can it have more than one function app on a single host? -- Despicable me
Not sure of relation between App and Host, I think it's one-to-one.
maxConcurrentRequests = 100 does this really means that all 100 requests will be processed in parallel by a single host (Consumption plan , 1 core ,1.5GB Host )? -- Despicable me
Read this description of how scaling works. So if average CPU & Memory usage of your Function instance per request is low then it might execute them "in parallel" (of course I don't think Hosts have 100 cores, so some scheduling will happen at that level). But if the avg resource consumption / request is high then it'll automatically scale out, new hosts would be spawned with your app and after the usual cold start things, your app would be able to process more requests in parallel.

According to the description, it seems the two are similar but not exactly the same. Here is another document which describe the feature of FUNCTIONS_WORKER_PROCESS_COUNT in application settings for your reference.
If you want to know more details of the two properties, you can create an azure support request on azure portal to ask azure support team about this question. They can give you the most authoritative explanation.

Related

Ktor, Netty and increasing the number of threads per endpoint

Using Ktor and Kotlin 1.5 to implement a REST service backed by Netty. A couple of things about this service:
"Work" takes non-trivial amount of time to complete.
A unique client endpoint sends multiple requests in parallel to this service.
There are only a handful of unique client endpoints.
The service is not scaling as expected. We ran a load test with parallel requests coming from a single client and we noticed that we only have two threads on the server actually processing the requests. It's not a resource starvation problem - there is plenty of network, memory, CPU, etc. and it doesn't matter how many requests we fire up in parallel - it's always two threads keeping busy, while the others are sitting idle.
Is there a parameter we can configure to increase the number of threads available to process requests for specific endpoints?
Netty use what is called Non-blocking IO model (http://tutorials.jenkov.com/java-concurrency/single-threaded-concurrency.html).
In this case you have only a single thread and it can handle a lot of sub-processes in parallel, as long as you follow best practices (not blocking the main thread event loop).
You might need to check the following configuration options for Netty https://ktor.io/docs/engines.html#configure-engine
connectionGroupSize = x
workerGroupSize = y
callGroupSize = z
Default values usually are set rather low and tweaking them could be useful for the time-consuming 'work'. The exact values might vary depending on the available resources.

Azure Functions concurrency: maxConcurrentRequests - is it Truly parallel to give a simultaneous execution of all requests happening at the same time

Here in this thread https://stackoverflow.com/a/66163971/6514559 it is explained that
If Azure decides that your App needs to scale and creates a new
host, and say there are two hosts, then values of these params (maxConcurrentRequests ,FUNCTIONS_WORKER_PROCESS_COUNT) are
applied per host not across host.
If your App has multiple Functions
then maxConcurrentRequests applies to-all/across Functions within
this host, not per Function.
The questions are,
Is it possible to have more than one function app on a single host
(Is this what is controlled by FUNCTIONS_WORKER_PROCESS_COUNT?)
maxConcurrentRequests = 100 does this really means that all
100 requests will be processed in parallel (simultaneously) by a single host
(Consumption plan , 1 CPU,1.5GB Host ) . This thread here suspects everything is executed in series?!
since each instance of the Functions host in the Consumption plan is limited to 1.5 GB of memory and one CPU (Reference), how can it run parallel loads with one CPU? On a different thought this does say ACU per instance is 100 for Consumption Plan
See this, this and this. And OP already read it, but this also for completeness.
Is it possible to have more than one function app on a single host
Documentation is very confusing. AFAIK:
On consumption plan, no.
On premium/app-service plan, there is a hint that might mean relation is one host to many apps, but IMO it's debatable.
(Is this what is controlled by FUNCTIONS_WORKER_PROCESS_COUNT?)
NO.
Need to understand the terms:
Function App: One Function App. Top level Azure Resource. A logical collection of Functions.
Function: One Function with in/out-trigger/binding(s). One Function App contains one or more Functions.
Function Host: Virtual/physical host where Function App runs as Linux/Windows process.
Worker Process: One process (one pid) running on a Function Host.
One Worker Process hosts all Functions of one Function App.
One Host will have FUNCTIONS_WORKER_PROCESS_COUNT (default 1) Worker Processes running on it, sharing all resources (RAM, CPU, ..)
maxConcurrentRequests = 100 does this really means that all 100 requests will be processed in parallel (simultaneously) by a single host (Consumption plan , 1 CPU,1.5GB Host ) .
Discounting cold start problems, execution would be in parallel within limits of selected plan.
This thread here suspects everything is executed in series?!
I'm sure there is an explanation. There is unambiguous documentation that says requests do get executed in parallel, within limits.

Azure Python function timeout

I have an Azure python HTTP trigger function that needs to execute dynamic code. If 100 users executing dynamic code simultaneously, even if one user has bad code (infinite loop), other valid requests were failing. Is there a way in Azure to invoke HTTP function as it's own instance so other API requests were not impacted or programmatically terminate invalid request?
I tried functionTimeout in host.json but this is terminating invalid and other valid requests too that were processing simultaneously.
Thanks
This behavior could be due to the single threaded architecture of Python. This is an expected behavior.
It is documented in Python Functions Developer reference on how to handle such scenario’s: https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python#scaling-and-concurrency
Here are the two methods to handle this:
Use Async calls
Add more Language worker processes per host, this can be done by using application setting : FUNCTIONS_WORKER_PROCESS_COUNT up to a maximum value of 10. ( So basically, for the CPU-bound workload you are simulating with any loops, we do recommend setting FUNCTIONS_WORKER_PROCESS_COUNT to a higher number to parallelize the work given to a single instance.
[Please note that each new language worker is spawned every 10 seconds until they are warmed up.]
Here is a GitHub issue which talks about this issue in detail : https://github.com/Azure/azure-functions-python-worker/issues/236

Bursts of Redis errors

We've recently created a new Standard 1 GB Azure Redis cache specifically for distributed locking - separated from our main Redis cache. This was done to improve stability on our main Redis cache which is a very long term issue which this action seems to of significantly helped with.
On our new cache, we observe bursts of ~100 errors within the same few seconds every 1 - 3 days. The errors are either:
No connection is available to service this operation (StackExchange.Redis error)
Or:
Could not acquire distributed lock: Conflicted (RedLock.net error)
As they are errors from different packages, I suspect the Redis cache itself is the problem here. None of the stats during this time look out of the ordinary and the workload should fit comfortably in the Standard 1GB size.
I'm guessing this could be caused by the advertised Low network performance advertised, is this likely the cause?
Your theory sounds plausible.
Checking for insufficient network bandwidth
Here is a handy table showing the maximum observed bandwidth for various pricing tiers. Take a look at the observed maximum bandwidth for your SKU, then head over to your Redis blade in the Azure Portal and choose Metrics. Set the aggregation to Max, and look at the sum of cache read and cache write. This is your total bandwidth consumed. Overlay the sum of these two against the time period when you're experiencing the errors, and see if the problem is network throughput. If that's the case, scale up.
Checking server load
Also on the Metrics tab, take a look at server load. This is the percentage that Redis is busy and is unable to process requests. If you hit 100%, Redis cannot respond to new requests and you will experience timeout issues. If that's the case, scale up.
Reusing ConnectionMultiplexer
You can also run out of connections to a Redis server if you're spinning up a new instance of StackExchange.Redis.ConnectionMultiplexer per request. The service limits for the number of connections available based on your SKU are here on the pricing page. You can see if you're exceeding the maximum allowed connections for your SKU on the Metrics tab, select max aggregation, and choose Connected Clients as your metric.
Thread Exhaustion
This doesn't sound like your error, but I'll include it for completeness in this Rogue's Gallery of Redis issues, and it comes into play with Azure Web Apps. By default, the thread pool will start with 4 threads that can be immediately allocated to work. When you need more than four threads, they're doled out at a rate of one thread per 500ms. So if you dump a ton of requests on a Web App in a short period of time, you can end up queuing work and eventually having requests dropped before they even get to Redis. To test to see if this is a problem, go to Metrics for your Web App and choose Threads and set the aggregation to max. If you see a huge spike in a short period of time that corresponds with your trouble, you've found a culprit. Resolutions include making proper use of async/await. And when that gets you no further, use ThreadPool.SetMinThreads to a higher value, preferably one that is close to or above the max thread usage that you see in your bursts.
Rob has some great suggestions but did want to add information on troubleshooting traffic burst and poor ThreadPool settings. Please see: Troubleshoot Azure Cache for Redis client-side issues
Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.
I hope you find this additional information is helpful.

Azure Functions EventHub trigger scale job function instances

I have an Azure Function that has EventHub trigger, with Consumption plan. In my test I shoot 3000 events to event hub using in a few batches. Since time for those 3000 events was almost 10 times bigger than the time for 300 events, I suspected that this Azure Function didn't scale to multiple VMs/instances.
To verify that hypothesis, I used a Guid static variable, which I initialized once and logged in every run of the function. All 3000 runs logged the same Guid.
That happens even if I specify following configuration in host.json:
"eventHub": {
"maxBatchSize": 1,
"prefetchCount": 10
}
Logic was that this would limit parallel processing within single instance and multiple instances would be started because of that, but again only 1 Guid is logged.
As a note, this is not the only function in App Service. Could that be the issue? What is the condition that needs to be satisfied so that Function is started on multiple VMs?
Edit:
I have 32 partitions and 20 throughput units. First issue was that I was using SendBatchAsync, which doesn't partition events. Even SendAsync didn't bring any scale, like it wasn't partitioning. So I created partitioned eventhub senders and did round robin partitioning when sending events in client application.
That increased number of events processed by AzureFunction, but still didn't create more than 1 VM.
Furthermore, number of events processed per second was much larger in the beginning (~200 in each moment), and after 2000 events, or near the end, they dropped to ~5. This has nothing to do with load of the system, as same behavior was observed with 9000 events, where slowing down happened after ~5k events.
This Azure function lasts 50-250 ms, depending on the load.
It also sends event to another Azure function through Azure Storage Queue trigger. What is interesting is that neither that function which is triggered by Queue trigger scales to more than 1 VM, and it has ~1k messages in queue at the beginning, before slowness of eventhub triggered azure function. Queue settings in host.json are "queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:10",
"batchSize": 32,
"maxDequeueCount": 5,
"newBatchThreshold": 1
}
Thanks.
It depends on a few factors:
the number of partitions your event hub has and whether the events you are writing are being distributed across your partitions. Azure Functions uses Event Processor Host to process your workload and the maximum scale you can get in this mode is one VM per partition.
the per-event workload you're executing. For example if your function does nothing but log, those 3000 events could be processed in less than 5 seconds on a single VM. This would not warrant scaling your application onto multiple instances.
However if you're writing a batch of events across several partitions which takes several minutes in total to process and you don't see your throughput accelerating as your function scales up then that could indicate that something is not working right and would warrant further investigation.

Resources