Can Azure WebJobs poll queues on demand? - azure

I have a WebJob which gets triggered when a user uploads a file to the blob storage - it is triggered by a queue storage message which is created once the upload is complete.
Depending on the purpose of the file, it will post messages to other queues to trigger processing jobs.
Some of these jobs are time critical, and run relatively quickly. In one case the processing takes about three seconds, and the user is waiting for the result.
However, because the minimum queue polling interval is 2 seconds, the time the user must wait for the two WebJobs to be invoked is generally doubling their wait time.
I tried combining the two WebJobs into one, hoping that when the first handler posts a queue message the corresponding processing handler would be immediately triggered, but in fact it consistently waits two seconds before picking up the message.
My question is, is there a way for me to tell my WebJob to check the queue triggers immediately from within the same WebJob if I know there is a message waiting? Or even better configure it to immediately check the queue triggers if I post to a queue from inside the WebJob?
Or would switching to a service bus queue improve the responsiveness to new messages?
Update
In the docs about using blob triggers, it says:
There is an exception for blobs that you create by using the Blob attribute. When the WebJobs SDK creates a new blob, it passes the new blob immediately to any matching BlobTrigger functions. Therefore if you have a chain of blob inputs and outputs, the SDK can process them efficiently. But if you want low latency running your blob processing functions for blobs that are created or updated by other means, we recommend using QueueTrigger rather than BlobTrigger.
http://azure.microsoft.com/en-gb/documentation/articles/websites-dotnet-webjobs-sdk-storage-blobs-how-to/
However there is no mention of anything similar for queues. Meaning if you need really low latency in this scenario then blobs are the better than queues, which seems wrong.
Update 2
I ended up working around this by pulling the orchestrating code out of the first WebJob and into the service layer of the application and removing the WebJob.. it was fast running anyway so perhaps separating it into its own WebJob was an overkill. This means only the processing WebJob has to be triggered after the file upload.

Currently 2 sec is the minimum time it will take for the SDK to poll for the new message. The SDK does an exponential back off polling so you can configure the MaxPollingInterval to be 2 sec always.
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(15);
For more details please see http://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-storage-queues-how-to/#config

Related

Azure Function Queue Trigger: Queue message dropping even before it got picked by Azure function Intermittently

I have a Queue trigger Azure function which is triggered whenever a queue msg appears in Azure Queue Storage.
My work flow is:
A user may schedule a task which needs to run after a few days at a particular time (execute at time)
So I put a message in azure queue with visibility timeout as the time difference b/w current time and the execute at time of that task.
So when the msg is visible in the queue it gets picked up by the azure Function and gets executed.
I'm facing an intermittent issue when the queue message is supposed to be visible after a few days(<7 days). But somehow it got dropped/removed from the queue. So it was never picked up by the function, and that task still shows pending.
I've gone through all the articles I have found on the internet and didn't find solution to my problem.
The worst part is that it works fine for a few weeks but every now and then the queue messages (invisible ones)
Suddenly disappears. (I use azure storage explorer to check number of invisible messages)

How do you scale Azure function app(background job) based on the number items pending in a database?

So suppose that you have an application that lets user request a job. For example (hypothetical): user uploads a video. There is an entry made in RDBMs with the URL to video on blob and the status is set to "Pending".
There is a recurring time triggered functionapp that is executed every 10 seconds or so which gets 10 pending jobs from RDBMS and performs some compression etc.
The problem here is that as long as the number of requests stay 10-30 videos per 10 seconds we should be fine. But if the number of requests increase all of a sudden .. say 200 requests per 10 seconds this would mean that there will be a lot of job pending and the user would have to wait 10 times longer than usual to see status change. How do you scale out function app automatically in such scenario? Does it have to be manual?
There's an easier way to get fan out and parallel processing through multiple concurrently running Azure Functions.
Add an Azure Service Bus Queue to your solution.
For each video that needs to be processed, enqueue a service bus message with the appropriate data you'll need to retrieve and process the video (like the BlobId).
Have your Azure Function triggered by an ServiceBusTrigger.
Azure will spin up additional instances of your Azure Function as the queue depth increases. It'll also scale in idle instances after there's no more data to process.

Azure Function App process time can it be extended by extending the QueueMessage's invisibility until it is processed?

I am working with Azure Function Apps in Python, that has two Functions HTTPTrigger & QueueTrigger, in the QueueTrigger I call my custom code, which takes more than 10 mins to process. I changed it from 5 mins to 10 mins in host.json {"functionTimeout": "00:10:00"} . My question is, is there a way to extend process time by updating the QueueMessage content
or visibilityTimeout or Timeout? In other words, would the Function App process time be extended if you extend the message's invisibility until it is processed? see Python API QueueService.update_message()
Is there any other serverless options to run long processes?
Updates the visibility timeout of a message. You can also use this
operation to update the contents of a message.
This operation can be used to continually extend the invisibility of a
queue message. This functionality can be useful if you want a worker
role to "lease" a queue message. For example, if a worker role calls
get_messages and recognizes that it needs more time to process a
message, it can continually extend the message's invisibility until it
is processed. If the worker role were to fail during processing,
eventually the message would become visible again and another worker
role could process it.
update_message(queue_name, message_id, pop_receipt, visibility_timeout, content=None, timeout=None)
If you need Functions that can run longer then 10 min, you need to switch to App Service Plan. There you can run Functions indefinitely: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#timeout
Be aware, though, this isn't fully "serverless" any more in terms of scaling. App Service plan won't scale more or less indefinitely in the same way as consumption plan scales. Plus you pay the fixed price for the app service plan.

Requeue or delete messages in Azure Storage Queues via WebJobs

I was hoping if someone can clarify a few things regarding Azure Storage Queues and their interaction with WebJobs:
To perform recurring background tasks (i.e. add to queue once, then repeat at set intervals), is there a way to update the same message delivered in the QueueTrigger function so that its lease (visibility) can be extended as a way to requeue and avoid expiry?
With the above-mentioned pattern for recurring background jobs, I'm also trying to figure out a way to delete/expire a job 'on demand'. Since this doesn't seem possible outside the context of WebJobs, I was thinking of maybe storing the messageId and popReceipt for the message(s) to be deleted in Table storage as persistent cache, and then upon delivery of message in the QueueTrigger function do a Table lookup to perform a DeleteMessage, so that the message is not repeated any more.
Any suggestions or tips are appreciated. Cheers :)
Azure Storage Queues are used to store messages that may be consumed by your Azure Webjob, WorkerRole, etc. The Azure Webjobs SDK provides an easy way to interact with Azure Storage (that includes Queues, Table Storage, Blobs, and Service Bus). That being said, you can also have an Azure Webjob that does not use the Webjobs SDK and does not interact with Azure Storage. In fact, I do run a Webjob that interacts with a SQL Azure database.
I'll briefly explain how the Webjobs SDK interact with Azure Queues. Once a message arrives to a queue (or is made 'visible', more on this later) the function in the Webjob is triggered (assuming you're running in continuous mode). If that function returns with no error, the message is deleted. If something goes wrong, the message goes back to the queue to be processed again. You can handle the failed message accordingly. Here is an example on how to do this.
The SDK will call a function up to 5 times to process a queue message. If the fifth try fails, the message is moved to a poison queue. The maximum number of retries is configurable.
Regarding visibility, when you add a message to the queue, there is a visibility timeout property. By default is zero. Therefore, if you want to process a message in the future you can do it (up to 7 days in the future) by setting this property to a desired value.
Optional. If specified, the request must be made using an x-ms-version of 2011-08-18 or newer. If not specified, the default value is 0. Specifies the new visibility timeout value, in seconds, relative to server time. The new value must be larger than or equal to 0, and cannot be larger than 7 days. The visibility timeout of a message cannot be set to a value later than the expiry time. visibilitytimeout should be set to a value smaller than the time-to-live value.
Now the suggestions for your app.
I would just add a message to the queue for every task that you want to accomplish. The message will obviously have the pertinent information for processing. If you need to schedule several tasks, you can run a Scheduled Webjob (on a schedule of your choice) that adds messages to the queue. Then your continuous Webjob will pick up that message and process it.
Add a GUID to each message that goes to the queue. Store that GUID in some other domain of your application (a database). So when you dequeue the message for processing, the first thing you do is check against your database if the message needs to be processed. If you need to cancel the execution of a message, instead of deleting it from the queue, just update the GUID in your database.
There's more info here.
Hope this helps,
As for the first part of the question, you can use the Update Message operation to extend the visibility timeout of a message.
The Update Message operation can be used to continually extend the
invisibility of a queue message. This functionality can be useful if
you want a worker role to “lease” a queue message. For example, if a
worker role calls Get Messages and recognizes that it needs more time
to process a message, it can continually extend the message’s
invisibility until it is processed. If the worker role were to fail
during processing, eventually the message would become visible again
and another worker role could process it.
You can check the REST API documentation here: https://msdn.microsoft.com/en-us/library/azure/hh452234.aspx
For the second part of your question, there are really multiple ways and your method of storing the id/popReceipt as a lookup is a possible option, you can actually have a Web Job dedicated to receive messages on a different queue (e.g plz-delete-msg) and you send a message containing the "messageId" and this Web Job can use Get Message operation then Delete it. (you can make the job generic by passing the queue name!)
https://msdn.microsoft.com/en-us/library/azure/dd179474.aspx
https://msdn.microsoft.com/en-us/library/azure/dd179347.aspx

How to handle long running jobs that are posted to a service bus with only 5min peek lock

What do people tend to do when they have a design that put jobs on a service queue or topic that takes longer then the 5min max of peeklock?
I have been using the OnMessage(...) async messagepump of service bus and is wondering if thats not such a good idea after also since if I start moving the jobs to a table while processing them, then the messagepump will just empty the queue and I just have the problem elsewhere of making sure my jobs are scheduled even between servers.
If you have a long running message processing workflow the you can check the lockedUntilUtc property of the message and call RenewLock at the appropriate time.
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.servicebus.messaging.brokeredmessage.renewlock.aspx
in the next release of SDK the OnMessage processing loop will automatically do that for you so that convenience API is always a good idea to use.

Resources