Trigger multiple concurrent service bus trigger azure functions without time degradation - azure

I have a service bus trigger function that when receiving a message from the queue will do a simple db call, and then send out emails/sms. Can I run > 1000 calls in my service bus queue to trigger a function to run simultaneously without the run time being affected?
My concern is that I queue up 1000+ messages to trigger my function all at the same time, say 5:00PM to send out emails/sms. If they end up running later because there is so many running threads the users receiving the emails/sms don't get them until 1 hour after the designated time!
Is this a concern and if so is there a remedy?
FYI - I know I can make the function run asynchronously, would that make any difference in this scenario?

1000 messages is not a big number. If your e-mail/sms service can handle them fast, the whole batch will be gone relatively quickly. Few things to know though:
Functions won't scale to 1000 parallel executions in this case. They will start with 1 instance doing ~16 parallel calls at the same time, and then observe how fast the processing goes, then maybe add a second instance, wait again etc.
The exact scaling behavior is not publicly described and can change over time. Thus, YMMV, and you need to test against your specific scenario.
Yes, make the functions async whenever you can. I don't expect a huge boost in processing speed just because of that, but it certainly won't hurt.
Bottom line: your scenario doesn't sound like a problem for Functions, but if you need very short latency, you'll have to run a test before relying on it.

I'm assuming you are talking about an Azure Service Bus Binding to an Azure Function. There should be no issue with >1000 Azure Functions firing at the same time. They are a Serverless runtime and should be able to scale greatly if you are running under a consumption model. If you are running the functions in a service plan, you may be limited by the service plan.
In your scenario you are probably more likely to overwhelm the downstream dependencies: the database and SMS sending system, before you overwhelm the Azure Functions infrastructure.
The best thing to do is to do some load testing, and monitor the exceptions coming out of the connections to the database and SMS systems.

Related

Limit Azure Function restart rate

I already faced similar problem few times:
Azure Function with ServiceBusTrigger by some reason (misconfiguration, infrastructure issues, doesn't really matter) fails to connect to ServiceBus (so it happens on trigger level) and it leads to two issues:
It tries to restart all the time, increasing CPU consumption
It generates literally a millions of exceptions in AppInsights, which leads to quota exceedance
Practically every error in configuration means significantly increased bills and requires thorough monitoring after every deployment, which is annoying and error prone solution.
So, my question: If there is a way to set some delay between restart attempts to (for example) one second? And, as addition - is there way to limit amount of restart attempts and then shut down the Function?
Establishing a connection to the broker to fetch messages is Functions responsibility, Scale Controller. That aspect is entirely abstracted from customers and not configurable. I suggest raising an issue with Azure Functions team, likely under the Runtime repo.

what will happen on Azure Functions during maintenance?

I know that Web Apps will be rebooted during maintenance without notice.
But how about the case of Functions?
During maintenance, does the current execution get stopped?
I think it is difficult to retry Timer, Http, Event Hub Triggered Functions.
But I wish Functions runtime will make my code retry after the maintenance finishes.
Your question has several parts, so:
Probably yes, Azure will stop routing requests to an instance which is about to get maintenance done. Because Function executions are short-lived (on Consumption Plan), that's relatively easy to do.
"Probably" - because this is not something they guarantee to you. Overall, Functions on Consumption Plan have no SLA, and host behavior details might change over time.
If stopping in the middle of function execution is a problem for your business case, you still need to handle it. Any instance can experience hardware failure at any time, including the least convenient time possible.
The observed behavior in case of such failure will differ per trigger type. E.g. HTTP call will just fail with 5xx code and the client is supposed to retry it. Queue-based triggers have a mechanism with locks, timeouts and retry counts. Event Hub will restart at the last checkpoint.
I might be wrong but the whole thing about serverless computing is that you don't have to worry about these things anymore. So I would trust Microsoft that they won't stop your function during a maintenance. Thats probably one of the reasons why a function can only run for a limit time period.

Task scheduling behind multiple instances

Currently I am solving an engineering problem, and want to open the conversation to the SO community.
I want to implement a task scheduler. I have two separate instances of a nodeJS application sitting behind an elastic load balancer (ELB). The problem is when both instances come up, they try to execute the same tasks logic, causing the tasks run more than once.
My current solution is to use node-schedule to schedule tasks to run, then have them referencing the database to check if the task hasn't already been run since it's specified run time interval.
The logic here is a little messy, and I am wondering if there is a more elegant way I could go about doing this.
Perhaps it is possible to set a particular env variable on a specific instance - so that only that instance will run the tasks.
What do you all think?
What you are describing appears to be a perfect example of a use case for AWS Simple Queue Service.
https://aws.amazon.com/sqs/details/
Key points to look out for in your solution:
Make sure that you pick a visibility timeout that is reflective of your workload (so messages don't reenter the queue whilst still in process by another worker)
Don't store your workload in the message, reference it! A message can only be up to 256kb in size and message sizes have an impact on performance and cost.
Make sure you understand billing! As billing is charged in 64KB chunks, meaning 1 220KB message is charged as 4x 64KB chucks / requests.
If you make your messages small, you can save more money by doing batch requests as your bang for buck will be far greater!
Use longpolling to retrieve messages to get the most value out of your message requests.
Grant your application permissions to SQS by the use of an EC2 IAM Role, as this is the best security practice and the recommended approach by AWS.
It's an excellent service, and should resolve your current need nicely.
Thanks!
Xavier.

How do Azure WebJobs prioritize messages when monitoring multiple queues?

I'm using Azure WebJobs as part of a project at work. These are configured as continuously running jobs that monitor a number of different queues. As queue messages are received they cause various API commands to be run. The issue I have is that some of the API commands run quickly (ie. a few seconds) and some run slowly (several minutes), and I'm not sure how best to split the queue handlers between the WebJobs.
For example, I could put all of the slow API command handlers in one WebJob and all of the quick handlers in a different WebJob. My concern is that the "slow" WebJob process would always be busy whereas the "quick" WebJob process would be idling most of the time.
Another approach would be to mix quick and slow handlers in the same WebJob project. My concern with that would be the quicker handlers starving the slower ones of attention, or vice versa.
A third approach would be to have a separate WebJob for each individual message handler, but given the number of message types we have to deal with I'd rather not go down that route. It also seems like overkill to be honest.
I was wondering if anyone had encountered a similar scenario and could offer any insight into how Azure WebJobs choose which message to handle when they are monitoring multiple queues? Numerous internet searches have failed to turn-up any guidance or help in this area. To be clear, I'm not really after opinions as to which approach people think would be best; I'm looking for answers from people who have actually dealt with this kind of problem and can say with some degree of certainty which of the different approaches would be best given the way the Azure WebJobs API currently prioritizes queue message handling.
If you have multiple functions listening on different queues, the SDK will call them in parallel when messages are received simultaneously. You can not set which queue should be processed first.
Depending on you configuration, you will handle them parallel. If you think that some executions will stall others, you can split the handling in multiple webjobs and scale them seperately.

How much latency is there transferring data to the Windows Azure Worker Role External Endpoint?

I have an app that I'm thinking about moving to Azure as a Worker Role with an external facing endpoint. It's a small little process that runs in about 200-400ms, but our users would like to start running the little job 50K-100K times a day, per user. Before I go building the Azure prototype, I need to figure out what kind of latency I can expect communicating with an Azure external endpoint. Obviously, the latency depends on the size of information that I'm sending and receiving, and it depends on the speed of my internet connection, but I can't find any metrics anywhere. Are there any kind of base line numbers out there?
For the sake of argument, lets say I'm on a T1 and I'm sending 10K up and 10K down with each job run.
I don't think latency is exactly the term you looking for, that's the delay it takes sending each packet over the network which is affected more by your distance from the server, and the nature of your network.
Having said that, everyones results wrt to latency will be different, the only way to be sure will be to set up a prototype and run some performance tests on it. Also remember with Azure you can specify your data center, so select one near you.

Resources