Are Message Queue Triggers betters than Http Triggers for scalability? - azure

2 types of triggers for Azure Functions are message queues and http triggers. I'm guessing that one difference is that with http triggers a request could be denied if there are not enough instances to service the request, whereas with message queues it will check if there is an instance available and if not spin one up before trying to process the message. Is that a correct understanding?

Not quite..! I think you're getting the information from here:
Whenever possible, refactor large functions into smaller function sets that work together and return responses fast. For example, a webhook or HTTP trigger function might require an acknowledgment response within a certain time limit; it is common for webhooks to require an immediate response.
The acknowledgement within a certain time limit is all about how long the client is willing to wait for a response. So if you were to execute some task that takes a long time (on the order of minutes), you may not receive an informative response because the client would just call the connection dead. However, your function would still execute just fine as long as it stayed within the functionTimeout limits (one a consumption plan, the default is 5 minutes, max being 10 minutes).

Related

Google Cloud Function: behaviour when --max-instances reached

I’m learning to work with Cloud Functions (and Cloud Run) and would like to know it’s behaviour when (HTTP triggered) function already running at Max instances capacity and more HTTP requests come-in
1)
Here’s my function basic code (simple function with approx. 1000ms execution time per invocation):
ctr = 0
def hello_world(request):
global ctr
print("hello_world(): "+str(ctr))
ctr=ctr+1
time.sleep(10)
response = flask.Response("success::", 200)
return response
2)
deployed this function with flag --max-instances=1 (just to ensure no new VM instances come up to handle concurrent requests)
3)
and then send 5 concurrent results
From what I observe, only one of the requests get processed. Other 4 requests just dropped (client received HTTP status code 500, and no trace for these dropped requests in Stackdriver Logging either)
In the link here https://cloud.google.com/functions/docs/max-instances it says:
In that case, incoming requests queue for up to 60 seconds. During this 60 second window, if an instance finishes processing a request, it becomes available to process queued requests. If no instances become available during the 60 second window, the request fails.
Based on which I expected, that while one request is being handled, others would be queued up for max 60 seconds. Therefore, all 5 requests should have been processed (or at-least >1 requests if not all 5!). However actual behaviour that I'm seeing is different from this.
Can someone explain pls.
UPDATE-1: It seems the fix has been released and documentations updated
1) It still continued to return status code 500 during initial colds start (when no instances running) for some of the concurrent requests.. EXPECTED I suppose
2) Also, temporarily exceeded max-instances=1 during very initial bursts of 10 requests, launching upto 4 instances AND successfully server all 4 requests.
3) thereafter, # instanced did come down to respect --max-instances=1settings and all but one requests returned withstatus code 429`
The Cloud Function engineering team is now aware of this and they will proceed to change the documentation to reflect the changes made to the feature still in active development. Stay tuned for the documentation, the update will come shortly =). Thank you for noticing this with your tests!
Aside from this, and as a suggestion from the team, if you are interested on queuing jobs, it is recommended to use Cloud Tasks

Azure Function Queue Trigger Executing Multiple Times

I have azure function triggered off of a storage queue. The behavior of the function is to create out around 10,000 additional messages to another storage queue for another function (within the same function app) to execute. I'm seeing some strange behavior whenever it executes, where the first function appears to be executed multiple times. I observed this by watching the queue it's publishing to receive significantly more messages than expected.
I understand that the function should be coded defensively (i.e. expect to be executed multiple times), but this is happening consistently each time the first function executes. I don't think the repeated executions re due to it timing out or failing (according to app insights).
Could it be that when the 10,000 messages get queued up the function is scaling out and that is somehow causing the original message to be executed multiple times?
The lock on the original message that triggers the first Azure Function to execute is likely expiring. This will cause the Queue to assume that processing the message failed, and it will then use that message to trigger the Function to execute again. You'll want to look into renewing the message lock while you're sending out 10,000 messages to the next Queue.
Also, since you're sending out 10,000 messages, you may need to redesign that to be more efficient at scaling out what ever massively parallel processing you're attempting to implement. 10,000 is a really high amount of messages to send from a single triggered event.

Cancelling an Azure Function with a queue trigger without the poison queue

We're using the following setup:
We're using an Azure Function with a queue trigger to process a queue of JSON messages.
These messages are each just forwarded to an API endpoint via HTTP POST, for further processing.
The API can return 3 possible HTTP status codes; 200 (OK), 400 (Bad Request), 500 (Internal Server Error).
If the API returns 200, the message was processed properly and everything is fine. The queue trigger function appears to automatically delete the queue message and that's fine by us.
If the API returns 400, the API has logic which takes the message and adds it to a table with a status indicating that it was malformed or otherwise couldn't be processed. We are therefore fine with the message being automatically being deleted from the queue and the Azure Function can end normally.
If the API returns 500, we make sure the function retries posting the message to the API, until the status code is 200 or 400 (because there's likely a problem with the API and we don't want lost messages). We're using Polly to achieve this. We have it set up so it's essentially going to keep retrying forever on an exponential backoff.
We recently encountered this problem however:
There are certain situations where the API will return 500 for certain messages. This error is completely transient and will come and go unpredictably. Retrying forever using Polly would be fine except not all messages cause this error and essentially the "bad" messages are blocking "good" messages from being processed.
Let's say for example I have 50 messages in the queue. The first 32 messages at the front of the queue are "bad" and will sometimes return 500 from the API. These messages are picked up by the Azure Function and worked on concurrently. The other 18 messages are "good" and will return 200. These "good" messages will not be processed until the "bad" ones have been successfully processed. Essentially the bad ones cause a traffic jam for the good ones.
My solution was to try to cancel execution of the Azure Function if the current message has been retried a certain number of times. I thought maybe the message would then be made visible after some time, but in that time it gives the good messages time to be processed. However, I have no idea how to cancel execution of the function without either causing the queue message to be completely deleted or pushed onto a poison queue.
Am I able to achieve this using a queue trigger function? Is this something I can maybe do using a timer trigger instead?
Thanks very much!
As you've mentioned, you can't effectively cancel execution, so I'd suggest finishing the function and moving the message to a queue where it will be processed later.
A few suggestions:
Throw an error to use the poison queue, and handle the logic there.
Push these messages to another 'long running' queue of your choice, with an output binding.
Use an output binding of type CloudQueue that connects to your input queue. When you encounter a problematic message, add it to the output queue using the initialVisibilityDelay parameter to push it to the back of the queue: https://learn.microsoft.com/en-us/dotnet/api/microsoft.windowsazure.storage.queue.cloudqueue.addmessage?view=azure-dotnet
Edit: Here's a cheatsheet of parameter binding types

How to get the status of all requests to one API in nodejs

I want to get API server status in nodejs. I'm using nodejs to open an interface: "api/request?connId=50&timeout=90". This API will keep the request running for provided time on the server side. After the successful completion of the provided time it should return status/OK. And when we have multiple connection ids & timeout, we want the API return all the running requests on the server with their time left for completion, something like below, where 4 and 8 are the connId and 25 and 15 is the time remaining for the requests to complete (in seconds):
{"4":"25","8":"15"}
please help.
Node.js server uses async model in one single thread, which means at any time, only one request (connId) is under execution by Node (except... you have multiple node.js instance, but let's keep the scenario simple and ignore this case).
When one request is processed (running its handler code), it may start an async task such as read a file, and continue execution. The request itself's handler code would be executed without waiting for async task, and when this handler code is finished running, from Node.js point of view, the request handling itself is done -- the handling of async task's result is another thing in another time, node does not care about the progress of it.
Thus, in order to return remaining time of all requests -- I guess this is the remaining time of other request's async task, because remaining time of other request's handler code execution does not make any sense, there must be some place to store the information of all requests, including:
request's connId and startTime (the time when request is received).
request's timeout value, which is passed as parameter in URL.
request's estimated remaining time, this information is mission specific and must be retrieved from other async task related services (you can pull time by time using setInterval or make other services push the latest remaining time). Node.js doesn't know the remaining time information of any async task.
In this way, you can track all running requests and their remaining time. Before one request is returned, you can check the above "some place" to calculate all requests' remaining time. This "some place" could be global variable, memory database such as Redis, or even a plain database such as MySQL.
Please note: the calculated remaining time would not be accurate, as the read&calculation itself would cost time and introduce error.

Azure Request-Response Session Timeout handling

We are using the azure service bus to facilitate the parallel processing of messages through workers listening to a queue.
First an aggregated message is received and then this message is split in thousands of individual messages which are posted through a request-response pattern since we need to know when all messages have been completed to run a separate process.
Our issue is that the request-response method has a timeout which is causing the following issue:
Lets say we post 1000 messages to be processed and there is only one worker listening. Messages left in the queue after the timeout expiration are discarded which is something that we do not want. If we set the expiry time to a large value that will guarantee that all messages will be processed then we run the risk of a message failing and having to wait the timeout to understand that something has gone wrong.
Is there a way to dynamically change the expiration of a single message in a request-response scenario or any other pattern that we should consider?
Thanks!
You got the things wrong, The Time to live of an azure service bus message https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.timetolive.aspx It is the time on which the message will be on the queue if it is consumed or not.
This it is not the timeout, if you post a message with this larger time to live the message will stay on the queue for a long time but if you fail to consume you should warn the other end that you failed to consume this message.
You can do this using another queue and putting another message on this other queue with the message id that failed and the error.
This is an asynchronous process so you should not be holding requests based on that but work with the asynchronous nature of the problem.

Resources