Will webjob scale-in kill "busy" instances? - azure

Let's say I have a webjob that consumes messages from a storage queue. I am planning a scale rule to scale-out when my queue has too many messages waiting to be picked up.
Some of the messages will take a long time to process.
My question is... If scale-out happens and that the scale-in rule kicks in (when the nbr of message in the queue decrease), will Azure wait until the messages have finished processing before killing the instances or it will just kill the instances right away?

Update 01/20:
For auto-scaling, it doesn't matter, it will wait for the instance to finish it's job.
But for manually scale-in, it does not wait but kill it right away.
Original:
For manually scale-in, no, it does not kill the busy instance. Azure WebApp / webjobs should use a specified LoadBalance strategy for multiple instances, and it will balance the workload and route the queue messages to the 3 instances respectively. That means the 3 instances will work in parallel.
I didn't find any official document about this, but it's easy to test it out.
I set up a queue trigger webjob, and upload it as continuous in azure. After scale out, you can see the 3 instances are all working.
Note: the instance id 940246 is the original one, the other 2 instances id 4c7ed0 / f3753c comes after scale out.
In the kudu site, the 3 instances are listed there. Screenshot as below:
In the webjob logs:

Related

Is it possible to stop MassTransit service after a saga/consumer completes

So I know this runs a bit counter to MassTransit style, but I want to take advantage of some key features of MT such as message broker connection management, sagas, scheduled messages.
However, I know the service will be rarely used. This is a fairly large data take from an API which has a throttle of 12,000 requests per hour. Once every 24 hours a saga will start to take data and move it into Data Lake. The service will run for some minutes until the throttle is hit, then start again where it left off (state) when enough time has passed, maybe something like 30 minutes later. The amount of data means this will repeat for several hours (2 to 4).
The fit for a saga and and scheduled message seems pretty good. But it would be better if the service did not have incur operating costs for being awake 24x7. There will only ever be one request at a time for one set of API credentials. There may come a time when we might have multiple sets of credentials.
Is there a way to nicely close down the service when the saga completes?
As this is likely to be implemented with a container instance I propose to start an instance from a queue triggered function or similar.
Assuming that this is the approach you want to take (versus just an Azure Web Job, triggered by Azure Scheduler), there are a number of options:
Publish an event when the saga completes, consume that event, use Task.Run() or whatever to stop the bus.
Use a receive observer to keep track of in-flight messages and when it reaches zero and stays there for n seconds, stop the bus, exit the function.
Though I wonder why not just use a scheduled job via Azure, seems easier unless MassTransit is being used for more than just scheduling.

Azure Function App process time can it be extended by extending the QueueMessage's invisibility until it is processed?

I am working with Azure Function Apps in Python, that has two Functions HTTPTrigger & QueueTrigger, in the QueueTrigger I call my custom code, which takes more than 10 mins to process. I changed it from 5 mins to 10 mins in host.json {"functionTimeout": "00:10:00"} . My question is, is there a way to extend process time by updating the QueueMessage content
or visibilityTimeout or Timeout? In other words, would the Function App process time be extended if you extend the message's invisibility until it is processed? see Python API QueueService.update_message()
Is there any other serverless options to run long processes?
Updates the visibility timeout of a message. You can also use this
operation to update the contents of a message.
This operation can be used to continually extend the invisibility of a
queue message. This functionality can be useful if you want a worker
role to "lease" a queue message. For example, if a worker role calls
get_messages and recognizes that it needs more time to process a
message, it can continually extend the message's invisibility until it
is processed. If the worker role were to fail during processing,
eventually the message would become visible again and another worker
role could process it.
update_message(queue_name, message_id, pop_receipt, visibility_timeout, content=None, timeout=None)
If you need Functions that can run longer then 10 min, you need to switch to App Service Plan. There you can run Functions indefinitely: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#timeout
Be aware, though, this isn't fully "serverless" any more in terms of scaling. App Service plan won't scale more or less indefinitely in the same way as consumption plan scales. Plus you pay the fixed price for the app service plan.

Failure handling for Queue Centric work pattern

I am planning to use a queue centric design as described here for one of my applications. That essentially consists of using a Azure queue where work requests are queued from the UI. A worker reads from the queue, processes and deletes the message from the queue.
The 'work' done by the worker is within a transaction so if the worker fails before completing, upon restart it again picks up the same message (as it has not be deleted from the queue) and tries to perform the operation again (up to a max number of retries)
To scale I could use two methods:
Multiple workers each with a separate queue. So if I have five workers W1 to W5, I have 5 queues Q1 to Q5 and each worker knows which queue to read from and failure handling is similar as the case with one queue and one worker
One queue and multiple workers. Here failure/Retry handling here would be more involved and might end up using the 'Invisibility' time in the message queue to make sure no two workers pick up the same job. The invisibility time would have to be calculated to make sure that its enough for the job to complete and yet not be large enough that retries are performed after a long time.
Would like to know if the 1st approach is the correct way to go? What are robust ways of handling failures in the second approach above?
You would be better off taking approach 2 - a single queue, but with multiple workers.
This is better because:
The process that delivers messages to the queue only needs to know about a single queue endpoint. This reduces complexity at this end;
Scaling the number of workers that are pulling from the queue is now decoupled from any code / configuration changes - you can scale up and down much more easily (and at runtime)
If you are worried about the visibility, you can initially choose a default timespan, and then if the worker looks like it's taking too long, it can periodically call UpdateMessage() to update the visibility of the message.
Finally, if your worker timesout and failed to complete processing of the message, it'll be picked up again by some other worker to try again. You can also use the DequeueCount property of the message to manage number of retries.
Multiple workers each with a separate queue. So if I have five workers
W1 to W5, I have 5 queues Q1 to Q5 and each worker knows which queue
to read from and failure handling is similar as the case with one
queue and one worker
With this approach I see following issues:
This approach makes your architecture tightly coupled (thus beating the whole purpose of using queues). Because each worker role listens to a dedicated queue, the web application responsible for pushing messages in the queue always need to know how many workers are running. Anytime you scale up or down your worker role, some how you need to tell web application so that it can start pushing messages in appropriate queue.
If a worker role instance is taken down for whatever reason there's a possibility that some messages may not be processed ever as other worker role instances are working on their dedicated queues.
There may be a possibility of under utilization/over utilization of worker role instances depending on how web application pushes the messages in the queue. For optimal utilization, web application should know about the worker role utilization so that it can decide which queue to send message to. This is certainly not a desired thing for a web application to do.
I believe #2 is the correct way to go. #Brendan Green has covered your concerns about #2 in his answer excellently.

Can Azure WebJobs poll queues on demand?

I have a WebJob which gets triggered when a user uploads a file to the blob storage - it is triggered by a queue storage message which is created once the upload is complete.
Depending on the purpose of the file, it will post messages to other queues to trigger processing jobs.
Some of these jobs are time critical, and run relatively quickly. In one case the processing takes about three seconds, and the user is waiting for the result.
However, because the minimum queue polling interval is 2 seconds, the time the user must wait for the two WebJobs to be invoked is generally doubling their wait time.
I tried combining the two WebJobs into one, hoping that when the first handler posts a queue message the corresponding processing handler would be immediately triggered, but in fact it consistently waits two seconds before picking up the message.
My question is, is there a way for me to tell my WebJob to check the queue triggers immediately from within the same WebJob if I know there is a message waiting? Or even better configure it to immediately check the queue triggers if I post to a queue from inside the WebJob?
Or would switching to a service bus queue improve the responsiveness to new messages?
Update
In the docs about using blob triggers, it says:
There is an exception for blobs that you create by using the Blob attribute. When the WebJobs SDK creates a new blob, it passes the new blob immediately to any matching BlobTrigger functions. Therefore if you have a chain of blob inputs and outputs, the SDK can process them efficiently. But if you want low latency running your blob processing functions for blobs that are created or updated by other means, we recommend using QueueTrigger rather than BlobTrigger.
http://azure.microsoft.com/en-gb/documentation/articles/websites-dotnet-webjobs-sdk-storage-blobs-how-to/
However there is no mention of anything similar for queues. Meaning if you need really low latency in this scenario then blobs are the better than queues, which seems wrong.
Update 2
I ended up working around this by pulling the orchestrating code out of the first WebJob and into the service layer of the application and removing the WebJob.. it was fast running anyway so perhaps separating it into its own WebJob was an overkill. This means only the processing WebJob has to be triggered after the file upload.
Currently 2 sec is the minimum time it will take for the SDK to poll for the new message. The SDK does an exponential back off polling so you can configure the MaxPollingInterval to be 2 sec always.
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(15);
For more details please see http://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-storage-queues-how-to/#config

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).

Resources