NServiceBus and Azure long running handler pattern - azure

We are using Azure service bus via NServiceBus and I am facing a problem with deciding the correct architecture for dealing with long running tasks as a result of messages.
As is good practice, we don't want to block the message handler from returning by making it wait for long running processes (downloading a large file from a remote server), and actually doing so will cause the lock on the message to be lost with Azure SB. The plan is to respond by spawning a separate task and allow the message handler to return immediately.
However this means that the handler is now immediately available for the next message which will cause another task to be spawned and so on until the message queue is empty. What I'd like is some way to stop taking messages while we are processing (a limited number of) earlier messages. Is there an accepted pattern for this with NServiceBus and Azure Service Bus?
The following is what I'd kind of do if I was programming directly against the Azure SB
{
while(true)
{
var message = bus.Next();
message.Complete();
// Do long running stuff here
}
}
The verbs Next and Complete are probably wrong but what happens under Azure is that Next gets a temporary lock on the message so that other consumers can no longer see the message. Then you can decide if you really want to process the message and if so call Complete. That then removes the message from the queue entirely, failing to do so will cause the message to appear back on the queue after a period of time as Azure assumes you crashed. As dirty as this code looks it would achieve my goals (so why not do it?) as my consumer is only going to consume the next time I'm available (after the long running task). Other consumers (other instances) can jump in if necessary.
The problem is that NServiceBus adds a level of abstraction so that now handling a message is via a method on a handler class.
void Handle(NewFileMessage message)
{
// Do work here
}
The problem is that Azure does not get the call to message.Complete() until after your work and after the Handle method exits. This is why you need to keep the work short. However if you exit you also signal that you are ready to handle another message. This is my Catch 22

Downloading on a background thread is a good idea. You don't want to to increase lock duration, because that's a symptom, not the problem. Your download can easily get longer than maximum lock duration (5mins) and then you're back to square one.
What you can do is have an orchestrating saga for download. Saga can monitor the download process and when download is completed, b/g process would signal to the saga about completion. If download is never finished, you can have a timeout (or multiple timeouts) to indicate that and have a compensating action or retry, whatever works for your business case.
Documentation on Sagas should get you going: http://docs.particular.net/nservicebus/sagas/

In Azure Service Bus you can increase the lock duration of a message (default set to 30 seconds) in case the handling will take a long time.
But, besides you are able to increase your lock duration, it's generally an indication that your handler takes care of to much work which can be divided over different handlers.

If it is critical that the file is downloaded, I would keep the download operation in the handler. That way if the download fails the message can be handled again and the download can be retried. If however you want to free up the handler instantly to handle more messages, I would suggest that you scale out the workers that perform the download task so that the system can cope with the demand.

Related

What happens to the messages being processed on functions running when we disable the function?

We are working with Azure functions, which are triggered on every message in the service bus queue. We are trying to solve a problem whereby we need to disable a function on the function app processing messages, dynamically, so that it does not process messages any further and we do not lose any message in the process as well.
We can disable the functions via multiple ways, referring to link but the problem remains the same. Unable to figure out what happens to the functions already spawned when trying to disable the same.
Since the function is service bus triggered there is always a possibility that the function is processing a message and we disable the same, does it get processed, any sorts of cancellation is raised, it just dies out with an exception?
It would be great someone could direct me to some documentation or something. Thanks.
Azure Service Bus triggered function will already have a lock on the message that's being processed. If Function is terminated and the message was not completed or disposition, the lock will expire and the message will reappear on the queue. That's because of the Functions runtime receives a message in PeekLock mode.
One factor to consider is the queue's MaxDeliveryCount. If a function is terminated upon the last processing attempt, the message will be dead-lettered as all processing attempts have been exhausted. That's a standard Azure Service Bus behaviour.

How to handle long running jobs that are posted to a service bus with only 5min peek lock

What do people tend to do when they have a design that put jobs on a service queue or topic that takes longer then the 5min max of peeklock?
I have been using the OnMessage(...) async messagepump of service bus and is wondering if thats not such a good idea after also since if I start moving the jobs to a table while processing them, then the messagepump will just empty the queue and I just have the problem elsewhere of making sure my jobs are scheduled even between servers.
If you have a long running message processing workflow the you can check the lockedUntilUtc property of the message and call RenewLock at the appropriate time.
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.servicebus.messaging.brokeredmessage.renewlock.aspx
in the next release of SDK the OnMessage processing loop will automatically do that for you so that convenience API is always a good idea to use.

Azure ServiceBus Retry Delay

I am using the Microsoft Azure ServiceBus for Queue messages using WCF for the subscriptions. I am trying to implement retry logic. I use Peak/Lock to view the message and then have to do some local processing on the message. If that processing fails, I unlock the message so I can try processing it again. The problem is I need to build a have a delay in-between processing tries. Currently it is popped back into the queue and then is processed almost immediately. There needs to be about 2 minutes between attempts.
If you always have to wait 2 minutes before re-processing the message of that particular queue, you could try to configure the lock-timeout on the queue to be 2 minutes (plus the time you expect it will take you to process the message) and then just let the lock expire, instead of unlocking it. This has the downside that you would need to keep an eye on your processing time, and extend the lock's timeout if needed.
Another option could be to receive and complete the message, set a scheduled delivery of 2 minutes into the future, and re send the message. This has the downside that you need to consume it and ack it, which involves certain risks (e.g. your process dies before you get a chance to re-send the message).
"If the message is Peeked in Peek Lock mode from a Queue then you don't have the receive context in the message. You can receive the message in Peek Lock mode, which will lock the message for the interval specified for the 'lock duration' property of the queue. Locked messages cannot be received until its lock expires. Thus, by setting the lock duration to 2 minutes and Receiving messages in Peek Lock mode will solve this issue.
You can either write custom code to update the Lock Duration property. Tools like Service Bus Explorer, Serverless360 etc provides options to update property using graphical user interface."

Azure queue - can I verify a message will be read only once?

I am using an Azure queue and have several different processes reading from the queue.
My system is built in a way that assumes each message is read only once.
This Microsoft article claims Azure queues have an at least once delivery guarantee which potentially means two processes can read the same message from the queue.
This StackOverflow thread claims that if I use GetMessage then the message becomes invisible to all other processes for the invisibility timeout.
Assuming I use GetMessage() and never exceed the message invisibility time before I DeleteMessage, can I assume I will get each message only once?
I think there is a property in queue message named DequeueCount, which is the number of times this message has been dequeued. And it's maintained by queue service. I think you can use this property to identify whether your message had been read before.
https://learn.microsoft.com/en-us/dotnet/api/azure.storage.queues.models.queuemessage.dequeuecount?view=azure-dotnet
No. The following can happen:
GetMessage()
Add some records in a database...
Generate some files...
DeleteMessage() -> Unexpected failure (process that crashes, instance that reboots, network connectivity issues, ...)
In this case your logic was executed without calling DeleteMessage. This means, once the invisibility timeout expires, the message will appear in the queue and be processed once again. You will need to make sure that your process is idempotent:
Idempotence is the property of certain operations in mathematics and
computer science, that they can be applied multiple times without
changing the result beyond the initial application.
An alternative solution would be to use Service Bus Queues with the ReceiveAndDelete mode (see this page under How to Receive Messages from a Queue). If you receive the message it will be marked as consumed and never appear again. This way you can be sure it is delivered At-Most-Once (see the comparison with Storage Queues here). But then again, if something happens while your are processing the message (ie: server crashes, ...), you could loose valuable information.
Update:
This will simulate an At-Most-Once in storage queues. The message can arrive multiple times via GetMessage, but will only be processed once by your business logic (with the risk that some of your business logic will never execute).
GetMessage()
DeleteMessage()
AddRecordsToDatabase()
GenerateFiles()

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).

Resources