I have some kind of that code in my azure function:
foreach (var ins in instances)
{
TaskList.Add(ins.DeallocateAsync());
}
Task.WaitAll(TaskList.ToArray());
And I'm wondering how many parallel tasks can be spawned inside the one azure function? Is the host tolerant for such model or it will kill subprocess?
#Gaploid I recommend to use durable functions in such scenarios which maintains yours state for the such long running operations.
Also, as MS recommends, azure functions should be less time consuming operations because if you're running your azure function in consumption plan then you may found it a bit unresponsiveness for your calls as Azure functions cold starts after an specific idle time. So when it restarts your lengthy process would be more complicated to boot up.
More importantly, if you use orchestration aka durable functions, you'll found yourself in saving some money as MS doesn't charges you for you background tasks.
specifically, you can run as much as threads/tasks in case you're using durable one as preferred by MSFT.
Related
I'm running an Azure Function under the Consumption Plan. The Host Health Monitor is shutting it down every week or two due to hitting the Threads thresholds. I've read most of what I can find about this (e.g., starting with https://aka.ms/functions-thresholds) and understand what the health monitor is doing. Most of the articles I've read discuss the Connections threshold, not Threads. What I'm unable to figure out is how to troubleshoot this deeper and determine which part of my code is causing the threading issue. I'm not using any explicit TPL code. I'm doing a lot of very basic/standard async await operations against a REST API using a static HttpClient and standard crud operations against CosmosDB using a static DocumentClient. The only two Task-related methods I'm calling are async and await. I'm not using any retry patterns in this function. Application Insights doesn't give me much in the way of telling me what calls are creating threads, what threads still open or abandoned, etc.
Any suggestions on how I can troubleshoot things deeper? I've opened a support ticket with Microsoft and am waiting on their help as well.
Thanks in advance, Tom
I believe there are a couple of things to consider like
The number of functions in your function app
The number of async operations (2 in your case)
The number of parallel executions at a moment
Functions in the Consumption Plan scale based on the number of events (uses heuristics) that are triggered (doc) up to 200 instances without a limit on concurrent executions in a single instance.
So, even if you have async/await code, multiple long-running concurrent executions could cause problems I believe. Also, what triggers you use and at what rate they are triggered would influence the scale out.
To workaround the problem you could
Split the function app into multiple function apps if possible
Use Durable Functions which should allow you to scale your long running operations, if any
I have a service bus trigger function that when receiving a message from the queue will do a simple db call, and then send out emails/sms. Can I run > 1000 calls in my service bus queue to trigger a function to run simultaneously without the run time being affected?
My concern is that I queue up 1000+ messages to trigger my function all at the same time, say 5:00PM to send out emails/sms. If they end up running later because there is so many running threads the users receiving the emails/sms don't get them until 1 hour after the designated time!
Is this a concern and if so is there a remedy?
FYI - I know I can make the function run asynchronously, would that make any difference in this scenario?
1000 messages is not a big number. If your e-mail/sms service can handle them fast, the whole batch will be gone relatively quickly. Few things to know though:
Functions won't scale to 1000 parallel executions in this case. They will start with 1 instance doing ~16 parallel calls at the same time, and then observe how fast the processing goes, then maybe add a second instance, wait again etc.
The exact scaling behavior is not publicly described and can change over time. Thus, YMMV, and you need to test against your specific scenario.
Yes, make the functions async whenever you can. I don't expect a huge boost in processing speed just because of that, but it certainly won't hurt.
Bottom line: your scenario doesn't sound like a problem for Functions, but if you need very short latency, you'll have to run a test before relying on it.
I'm assuming you are talking about an Azure Service Bus Binding to an Azure Function. There should be no issue with >1000 Azure Functions firing at the same time. They are a Serverless runtime and should be able to scale greatly if you are running under a consumption model. If you are running the functions in a service plan, you may be limited by the service plan.
In your scenario you are probably more likely to overwhelm the downstream dependencies: the database and SMS sending system, before you overwhelm the Azure Functions infrastructure.
The best thing to do is to do some load testing, and monitor the exceptions coming out of the connections to the database and SMS systems.
We're using Google App Engine Standard Environment for our application. The runtime we are using is Python 2.7. We have a single service which uses multiple versions to deploy the app.
Most of our long-running tasks are done via Task Queues. Most of those tasks do a lot of Cloud Datastore CRUD operations. Whenever we have to send the results back to the front end, we use Firebase Cloud Messaging for that.
I wanted to try out Cloud Functions for those tasks, mostly to take advantage of the serverless architecture.
So my question is What sort of benefits can I expect if I migrate the tasks from Task Queues to Cloud Functions? Is there any guideline which tells when to use which option? Or should we stay with Task Queues?
PS: I know that migrating a code which is written in Python to Node.js will be a trouble, but I am ignoring this for the time being.
Apart from the advantage of being serverless, Cloud Functions respond to specific events "glueing" elements of your architecture in a logical way. They are elastic and scale automatically - spinning up and down depending on the current demand (therefore they incur costs only when they are actually used). On the other hand Task Queues are a better choice if managing execution concurrency is important for you:
Push queues dispatch requests at a reliable, steady rate. They
guarantee reliable task execution. Because you can control the rate at
which tasks are sent from the queue, you can control the workers'
scaling behavior and hence your costs.
This is not possible with Cloud Functions which handle only one request at a time and run in parallel. Another thing for which Task Queues would be a better choice is handling retry logic for the operations that didn't succeed.
Something you can also do with Cloud Functions together with App Engine Cron jobs is to run the function based on a time interval, not an event trigger.
Just as a side note, Google is working on implementing Python to Cloud Functions also. It is not known when that will be ready, however it will be surely announced in Google Cloud Platform Blog.
Is it possible to limit the maximum number of Functions that run in parallel?
I read the documentation and came across this:
When multiple triggering events occur faster than a single-threaded function runtime can process them, the runtime may invoke the function multiple times in parallel.
If a function app is using the Consumption hosting plan, the function app could scale out automatically. Each instance of the function app, whether the app runs on the Consumption hosting plan or a regular App Service hosting plan, might process concurrent function invocations in parallel using multiple threads.
The maximum number of concurrent function invocations in each function app instance varies based on the type of trigger being used as well as the resources used by other functions within the function app.
https://learn.microsoft.com/en-gb/azure/azure-functions/functions-reference#parallel-execution
I am using a Function on an App Service plan with an Event Hub input binding and only have a single Function within my Function App. If I can't limit it, does anyone know what the maximum number of concurrent function invocations will be for this kind of setup?
There isn't a way to specify a maximum concurrency for Event Hubs triggered functions, but you can control batch size and fetching options as described here.
The maximum number of concurrent invocations may also vary depending on your workload and resource utilization.
If concurrency limits are needed, this is (currently) something you'd need to handle, and the following posts discuss some patterns you may find useful:
Throttling Azure Storage Queue processing in Azure Function App
Limiting the number of concurrent jobs on Azure Functions queue
Just for reference, I came across here in my search for throttling. You can use the [Singleton] attribute on your function ensuring only one-at-a-time execution. Maybe not really what you were looking for and a very rigorous way of throttling, but still, it is an option.
https://learn.microsoft.com/en-us/azure/app-service/webjobs-sdk-how-to#singleton-attribute
Microsoft has added a new setting which can be used to limit concurrency of function execution. The setting is WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT and can be used to limit how many function instances will execute in parallel. However, according to Microsoft, it isn't fully implemented yet.
https://github.com/Azure/azure-functions-host/wiki/Configuration-Settings
For those who are still interested:
https://learn.microsoft.com/en-us/azure/azure-functions/event-driven-scaling#limit-scale-out
There's a way to limit the number of parallel execution by setting functionAppScaleLimit parameter.
I have a C# console application which extracts 15GB FireBird database file on a server location to multiple files and loads the data from files to SQLServer database. The console application uses System.Threading.Tasks.Parallel class to perform parallel execution of the dataload from files to sqlserver database.
It is a weekly process and it takes 6 hours to complete.
What is best option to move this (console application) process to azure cloud - WebJob or WorkerRole or Any other cloud service ?
How to reduce the execution time (6 hrs) after moving to cloud ?
How to implement the suggested option ? Please provide pointers or code samples etc.
Your help in detail comments is very much appreciated.
Thanks
Bhanu.
let me give some thought on this question of yours
"What is best option to move this (console application) process to
azure cloud - WebJob or WorkerRole or Any other cloud service ?"
First you can achieve the task with both WebJob and WorkerRole, but i would suggest you to go with WebJob.
PROS about WebJob is:
Deployment time is quicker, you can turn your console app without any change into a continues running webjob within mintues (https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/)
Build in timer support, where WorkerRole you will need to handle on your own
Fault tolerant, when your WebJob fail, there is built-in resume logic
You might want to check out Azure Functions. You pay only for the processing time you use and there doesn't appear to be a maximum run time (unlike AWS Lambda).
They can be set up on a schedule or kicked off from other events.
If you are already doing work in parallel you could break out some of the parallel tasks into separate azure functions. Aside from that, how to speed things up would require specific knowledge of what you are trying to accomplish.
In the past when I've tried to speed up work like this, I would start by spitting out log messages during the processing that contain the current time or that calculate the duration (using the StopWatch class). Then find out which areas can be improved. The slowness may also be due to slowdown on the SQL Server side. More investigation would be needed on your part. But the first step is always capturing metrics.
Since Azure Functions can scale out horizontally, you might want to first break out the data from the files into smaller chunks and let the functions handle each chunk. Then spin up multiple parallel processing of those chunks. Be sure not to spin up more than your SQL Server can handle.