Executing Azure Function back to back - azure

If I schedule a timer triggered Azure function to run every second and my function is taking 2 seconds to execute, will I just get back-to-back executions or will some execution queue eventually overflow?
Background:
We have an timer triggered Azure function that is currently executing every 30 seconds and is checking for new rows in a database table. If there are new rows, the data will be processed and the rows will be marked as handled.
If there are no new rows the execution is very fast. If there are 500 new rows (which is the max we are fetching at the moment) the execution takes about 20-25 seconds.
We would like to decrease the interval to one second to reduce the latency or row processing.
Update: I want back-to-back executions and I want to avoid overlapping executions.

Multiple azure functions can run concurrently. This is means you can still trigger the function again while the previous triggered function is still running. They will both run concurrently. They will only queue up if you setup options to run only 1 function at a time on 1 instance but doesn't look like you want that.
With concurrency, this means that 2 functions will read the same table on the DB at the same time. So you should read your table with UPDLOCK option LINK. This will prevent the subsequent triggered function from reading the same rows that were read in the previous function.
In short, the answer to your question is neither. If your functions overlap, by default, you will get multiple functions running at the same time. LINK
To achieve back to back execution for time triggers, set WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT and FUNCTIONS_WORKER_PROCESS_COUNT as 1 in the application settings configuration. This will ensure only 1 function executes runs at a time . See this LINK.

Related

Distributing tasks across multiple Cloud Functions

Let's say, I have 1000 documents in a Firestore collection.
How do I execute the same 1 Cloud Function but 10 times in parallel to process 100 documents each, say every 5 minutes?
I am aware I can use a Scheduler for the "every 5 minutes" part. The objective here is to distribute the load using multiple executions of the same function in parallel to handle the tasks. When the collection grows, I would like to add more instances. For example, let's say 1 execution per 100 documents.
I don't mind having another (or more) function to handle the distribution itself, and I don't mind the number of executions. I just don't want to loop through a large collection and process the tasks in a single function execution.
The numbers given above are examples. I am also open to using other services within GCP.
If you wanna execute the Cloud Function every time some changes occur in the Firestore documents, then you can use Cloud Firestore Trigger in Cloud Functions. The Cloud Function basically waits for changes, triggers when an event occurs and performs its tasks. You can go through these documents on Firestore triggers: Google Cloud Firestore Trigger, Cloud Firestore Triggers.
In case you are concerned that Cloud Function will not be able to process the requests parallely, then you should check out this document. Cloud Functions handle incoming requests by assigning it to an instance, in case the volume of requests increases, the Cloud Functions will start new instances to handle the requests.
Let's assume you have a function that, when called, process the single document and does anything you need with it. Let's call that function doSomething and let's assume it takes the document's path as parameter.
Then, you can create a function that will be scheduled every 5 minutes. In this function, you'll retrieve all the documents, holding them in an array (let's call it documents) and do something like:
const doSomething = httpsCallable(functions, 'doSomething');
let calls = [];
documents.map((document) => {
calls.push(
doSomething({path: document.path})
);
});
await Promise.all(calls);
This will create an array of calls, then it will fire all the calls at once, obtaining parallel executions of the same function.

How to check if the DAG is complete within Given time or not?

I have a Dag A, It runs at a time let's say 10 Am, and typically completes within 15-20 mins, but sometimes it takes more time and due to some tables in the Database it goes into an endless running state, how can I know that if my DAG is completed within a given time frame and if not it should send email Alerts that it's not completed in this time and you need to check.
My thought process:
To build a parallel DAg or process within the same DAG and then write a python function in it which just checks the start time and match it with the Current time and then keeps subtracting it unless it reaches some fixed value lets say 10 mins and then shoots an email that it has not been completed.
Please correct me if I am wrong or what are the other ways to check it
It sounds like you just need to define an SLA. You can find an example here.

Bulls Queue Performance and Scalability: Queue.add(), Queue.getJob(jobId), Job.remove()

My use case is to create dynamic delayed job. (I am Using Bulls Queue which can be used to create delayed Jobs.)
Based on some event add some more delay to the delayed interval (further delay the job).
Since I could not find any function to update the Delayed Interval for a Job I came up with the following steps:
onEvent(jobId):
// queue is of Type Bull.Queue
// job is of type bull.Job
job = queue.getJob(jobId)
data = job.data
delay = job.toJSON().delay
job.remove()
queue.add("jobName", {value: 1}, {jobId: jobId, delayed: delay + someValue})
This pretty much solves my problem.
But I am worried about the SCALE at which these operations will happen.
I am expecting nearly 50K events per minute or even more in near future.
My Queue size is expected to grow based on unique JobId.
I am expecting more than:
1 million daily entry
around 4-5 million weekly entry
10-12 million monthly entry.
Also, after 60-70 days delayed interval for jobs will reach, and older jobs will be removed one by one.
I can run multiple processor to handle these delayed job which is not an issue.
My queue size will be stabilise after 60-70 days and more or less my queue will have around 10 million jobs.
I can vertically scale my REDIS as required.
But I want to understand the time complexity for below queries:
queue.getJob(jobId) // Get Job By Id
job.remove() // remove job from queue
queue.add(name, data, opts) // add a delayed job to this queue
If any of these operations are O(N) OR the QUEUE can keep some max number of Jobs which is less than 10 million.
Then I might have to discard this design and come up with something entirely different.
Need advice from experienced folks who can guide me on how solve this problem.
Any kind of help is appreciated.
Taking reference from the source code:
queue.getJob(jobId)
This should be O(1) since it's mostly using hash based solutions using hmget. You're only requesting one job and according to official redis docs, the time complexity is O(N) where N is the requested number of keys which will be in the order of O(1) since I'm expecting bull is storing few number of fields at the hash key.
job.remove()
Considering that a considerable number of your jobs is going to be delayed and a fraction of them are moved to waiting or active queue. This should be O(logN) on an amortized level as it's mostly using zrem for these operations.
queue.add(name, data, opts)
For job addition in a delayed queue, bull is using zadd so this is again O(logN).

Spark Structured Streaming StreamingQueryListener.onQueryProgress not called per microbatch?

I'm using Spark 3.0.2 and I have a streaming job that consumes data from Kafka with trigger duration of "1 minute".
I see in Spark UI that there is a new job every 1 minute as defined, but I see method onQueryProgress is being called every 5~6 minutes. I thought this method should be called directly after each microbatch.
Is there a way to control this duration and make it equals the trigger duration?
The inQueryProgress method of the StreamingQueryListener is called asynchronously after the data has been completely processed within each micro-batch.
You are seeing this listener being triggered only every 5~6 minutes because it takes the streaming job that time to process all the data fetched in the micro-batch. Setting the Trigger duration to 1 minute will have Spark to plan tasks accordingly but it does not mean that the job is also able to process all available data within this time frame of 1 minute.
To reduce the amount of data being fetched by your query from Kafka you can play around with the source option maxOffsetsPerTrigger.
By the way, if you are not processing any data, this method is called every 10 seconds by default. In case you want to avoid this from happening you can do an if(event.progress.numInputRows > 0).
I found the reason for my case that onQueryProgress method was taking 5 minutes to complete.
as Mike mentioned that onQueryProgress is being called asynchronously, but I think it's using the same thread to call this method. So it's waiting for the method call to finish to call it again.
So the solution in my case was to figure out why it was taking that long and to make it faster than the trigger duration.

Azure Functions scaling and concurrency using Queue triggers and functionAppScaleLimit on the Consumption Plan

I have an Azure Function app on the Linux Consumption Plan that has two queue triggers. Both queue triggers have the batchSize parameter set to 1 because they can both use about 500 MB of memory each and I don't want to exceed the 1.5GB memory limit, so they should only be allowed to pick up one message at a time.
If I want to allow both of these queue triggers to run concurrently, but don't want them to scale beyond that, is setting the functionAppScaleLimit to 2 enough to achieve that?
Edit: added new examples, thank you #Hury Shen for providing the framework for these examples
Please see #Hury Shen's answer below for more details. I've tested three queue trigger scenarios. All use the following legend:
QueueTrigger with no functionAppScaleLimit
QueueTrigger with functionAppScaleLimit set to 2
QueueTrigger with functionAppScaleLimit set to 1
For now, I think I'm going to stick with the last example, but in the future I think I can safely set my functionAppScaleLimit to 2 or 3 if I upgrade to the premium plan. I also am going to test two queue triggers that listen to different storage queues with a functionAppScaleLimit of 2, but I suspect the safest thing for me to do is to create separate Azure Function apps for each queue trigger in that scenario.
Edit 2: add examples for two queue triggers within one function app
Here are the results when using two queue triggers within one Azure Function that are listening on two different storage queues. This is the legend for both queue triggers:
Both queue triggers running concurrently with functionAppScaleLimit set to 2
Both queue triggers running concurrently with functionAppScaleLimit set to 1
In the example where two queue triggers are running concurrently with functionAppScaleLimit set to 2 it looks like the scale limit is not working. Can someone from Microsoft please explain? There is no warning in the official documentation (https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#limit-scale-out) that this setting is in preview mode, yet we can clearly see that the Azure Function is scaling out to 4 instances when the limit is set to 2. In the following example, it looks like the limit is being respected, but the functionality is not what I want and we still see the waiting that is present in #Hury Shen's answer.
Conclusion
To limit concurrency and control scaling in Azure Functions with queue triggers, you must limit your Azure Function to use one queue trigger per function app and use the batchSize and functionAppScaleLimit settings. You will encounter race conditions and waiting that may lead to timeouts if you use more than one queue trigger.
Yes, you just need to set functionAppScaleLimit to 2. But there are some mechanisms about consumption plan you need to know. I test it in my side with batchSize as 1 and set functionAppScaleLimit to 2(I set WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT as 2 in "Application settings" of function app instead of set functionAppScaleLimit, they are same). And I test with the code below:
import logging
import azure.functions as func
import time
def main(msg: func.QueueMessage) -> None:
logging.info('=========sleep start')
time.sleep(30)
logging.info('=========sleep end')
logging.info('Python queue trigger function processed a queue item: %s',
msg.get_body().decode('utf-8'))
Then I add message to the queue, I sent 10 messages: 111, 222, 333, 444, 555, 666, 777, 888, 999, 000, I sent them one by one. The function was triggered success and after a few minutes, we can see the logs in "Monitor". Click one of the log in "Monitor", we can see the logs show as:
I use 4 red boxes on the right of the screenshot above, I named each of the four logs as "s1", "s2", "s3", "s4"(step 1-4). And summarize the logs in excel for your reference:
I make cells from "s2" to "s4" as yellow because this period of time refer to the execution time of the function task.
According the screenshot of excel, we can infer the following points:
1. The maximum number of instances can only be extended to 2 because we can find it doesn't exist more than two yellow cells in each line of the excel table. So the function can not scale beyond 2 instances as you mentioned in your question.
2. You want to allow both of these queue triggers to run concurrently, it can be implemented. But the instances will be scale out by mechanism of consumption. In simple terms, when one function instance be triggered by one message and hasn't completed the task, and now another message come in, it can not ensure another instance be used. The second message might be waiting on the first instance. We can not control whether another instance is enabled or not.
===============================Update==============================
As I'm not so clear about your description, I'm not sure how many storage queues you want to listen to and how many function apps and QueueTrigger functions you created in your side. I summarize my test result as below for your reference:
1. For your question about would the Maximum Burst you described on the premium plan behave differently than this ? I think if we choose premium plan, the instances will also be scale out with same mechanism of consumption plan.
2. If you have two storage queues need to be listen to, of course we should create two QueueTrigger functions to listen to each storage queue.
3. If you just have one storage queue need to be listen to, I test with three cases(I set max scale instances as 2 in all of three cases):
A) Create one QueueTrigger function in one function app to listen to one storage queue. This is what I test in my original answer, the excel table shows us the instances will scale out by mechanism of consumption plan and we can not control it.
B) Create two QueueTrigger functions in one function app to listen to same storage queue. The result is almost same with case A, we can not control how many instances be used to deal with the messages.
C) Create two function apps and create a QueueTrigger function in each of function app to listen to same storage queue. The result also similar to case A and B, the difference is the max instances can be scaled to 4 because I created two function apps(both of them can scale to 2 instances).
So in a word, I think all of the three cases are similar. Although we choose case 3, create two function apps with one QueueTrigger function in each of them. We also can not make sure the second message be deal with immediately, it still may be processed to first instance and wait for frist instance complete deal with the first message.
So the answer for your current question in this post is setting the functionAppScaleLimit to 2 enough to achieve that? is: If you want both of instances be enabled to run concurrently, we can't make sure of it. If you just want two instances to deal with the messages, the answer is yes.

Resources