Azure Service Bus ReceiveMessages with Sub processes - azure

I thought my question was related to post "Azure Service Bus: How to Renew Lock?" but I have tried the RenewLockAsync.
Here is the concern, I am receiving messages from the ServBus with Sessions enabled so I get the session then receive messages. All good, here's the Rub.
There are TWO ADDITIONAL processes to complete per message. A manual transform / harvest of the message into some other object which is then sent out to a Kafka topic (stream). Note its all Async on top of this craziness. My team lead is insistent that the two sub processes can just be added INTO the receive process (ReceiveAsync) and finally call session.CompleteAsync() AFTER the OTHER two processes complete.
Well needles to say I'm consistently erroring with "The session lock has expired on the MessageSession. Accept a new MessageSession." with that architecture. I haven't even fleshed out the send to Kafka part its just mocked so its going to take longer once fleshed out.
Is it even remotely plausible to session.CompleteAsync() AFTER the sub processes or shouldn't that be done when the message is successfully received, then move on to other processing? I thought separate tasks would be more appropriate but again he didn't dig that idea..
I appreciate all insight and opinions thank you !

"The session lock has expired on the MessageSession. Accept a new MessageSession." indicates one of 2 things:
The lock has been open for too long, in which case calling "RenewLockAsync" before it expires would help.
The message lock has been explicitly released, through a call to CompleteAsync, AbandonAsync, DeadLetterAsync, etc. That would indicate a bug, since the lock can not be used after it has been released

Related

Azure Servicebus competing receivers are picking up locked messages

We have a Topic with Subscriptions with a default LockDuration of 1min, and multiple SubscriptionClients listening to each subscription. For our test purposes, there are 3 clients listening to a single subscription.
SubscriptionClients are created as:
Client = new SubscriptionClient(endPoint, topicName, subscriptionName);
We put one message on the Topic, which is filtered into the Subscription.
We would expect one of the SubscriptionClients to pick up the message, and the other two clients cannot because it is locked.
What is actually happening, is all three clients are simultaneously picking up the same message, with different DeliveryCounts, and all within the 1minute lock duration.
Is there something wrong with the way we're creating the SubscriptionClient such that the lock is shared between them rather than being exclusive?
There are possibly two things that could be wrong. And none of those would be the broker but likely the client-side code.
MaxLockDuration is too short and while one client is still working on the message, the other client(s) receives that same message. You should be able to confirm by looking at the duration of the message processing. If it exceeds MaxLockDuration set on the queue, that's it.
You're using a message handler with automatic lock renewal and that one is failing to extend the lock. In that case, you would have a message handler error callback raised with the details.
Either way, you could log the errors and share the logs if possible to help with pinpointing what the issue is.

NServiceBus and Azure long running handler pattern

We are using Azure service bus via NServiceBus and I am facing a problem with deciding the correct architecture for dealing with long running tasks as a result of messages.
As is good practice, we don't want to block the message handler from returning by making it wait for long running processes (downloading a large file from a remote server), and actually doing so will cause the lock on the message to be lost with Azure SB. The plan is to respond by spawning a separate task and allow the message handler to return immediately.
However this means that the handler is now immediately available for the next message which will cause another task to be spawned and so on until the message queue is empty. What I'd like is some way to stop taking messages while we are processing (a limited number of) earlier messages. Is there an accepted pattern for this with NServiceBus and Azure Service Bus?
The following is what I'd kind of do if I was programming directly against the Azure SB
{
while(true)
{
var message = bus.Next();
message.Complete();
// Do long running stuff here
}
}
The verbs Next and Complete are probably wrong but what happens under Azure is that Next gets a temporary lock on the message so that other consumers can no longer see the message. Then you can decide if you really want to process the message and if so call Complete. That then removes the message from the queue entirely, failing to do so will cause the message to appear back on the queue after a period of time as Azure assumes you crashed. As dirty as this code looks it would achieve my goals (so why not do it?) as my consumer is only going to consume the next time I'm available (after the long running task). Other consumers (other instances) can jump in if necessary.
The problem is that NServiceBus adds a level of abstraction so that now handling a message is via a method on a handler class.
void Handle(NewFileMessage message)
{
// Do work here
}
The problem is that Azure does not get the call to message.Complete() until after your work and after the Handle method exits. This is why you need to keep the work short. However if you exit you also signal that you are ready to handle another message. This is my Catch 22
Downloading on a background thread is a good idea. You don't want to to increase lock duration, because that's a symptom, not the problem. Your download can easily get longer than maximum lock duration (5mins) and then you're back to square one.
What you can do is have an orchestrating saga for download. Saga can monitor the download process and when download is completed, b/g process would signal to the saga about completion. If download is never finished, you can have a timeout (or multiple timeouts) to indicate that and have a compensating action or retry, whatever works for your business case.
Documentation on Sagas should get you going: http://docs.particular.net/nservicebus/sagas/
In Azure Service Bus you can increase the lock duration of a message (default set to 30 seconds) in case the handling will take a long time.
But, besides you are able to increase your lock duration, it's generally an indication that your handler takes care of to much work which can be divided over different handlers.
If it is critical that the file is downloaded, I would keep the download operation in the handler. That way if the download fails the message can be handled again and the download can be retried. If however you want to free up the handler instantly to handle more messages, I would suggest that you scale out the workers that perform the download task so that the system can cope with the demand.

How AZURE Service Bus rules with messages on subscription?

I don't get it.
Say I have one queve, one topic, one subscroption. And three clients that subscribe on that.
I send a message.
First client recieve a message and call complete() action.
Will second client recieve a message?
What if there will forth client, who subscribe on it?
Question is - when will message completely remove from queve/topic/subscription??
P.S. Case when "one-to-one" - is clear.
I would recommend you check out the Competing Consumers pattern. (https://msdn.microsoft.com/en-us/library/dn568101.aspx?f=255&MSPPError=-2147217396).
You can have many roles (i.e. Azure Worker Role) checking for messages in a queue (competing consumers), locking them exclusively while processing. Each role is fighting for a message, and the first guy who grabs it (by chance), has it "exclusive" for the moment. If the consumer that gets and processes the message succeeds, run the Complete() method, otherwise the Abandon(message). Complete() finishes it for good, and Abandon throws it back into the frenzy of competing consumers. You can even grab it again if you're healthy!
You can set the dead message parameter in the Azure Management Portal which determines how many times it can be reintroduced for other consumers. At some point, things just aren't working, so kill the message so other messages can resume unimpeded.
Let me know if you have more specific needs. Would be happy to help. This pattern works extremely well.
Kindest regards...

Set visibilitytimeout to 7days means automatically delete for azure queue?

It seems that new azure SDK extends the visibilitytimeout to <= 7 days. I know by default, when I add a message to an azure queue, the live time is 7days. When I get message out, and set the visibilitytimeout to 7days. Does that mean I don't need to delete this message if I don't care about message reliable? the message will disappear later 7 days.
I want to take this way because DeleteMessage is very slow. If I don't delete message, doesn't it have any impact on performance of GetMessage?
Based on the documentation for Get Messages, I believe it is certainly possible to set the VisibilityTimeout period to 7 days so that messages are fetched only once. However I see some issues with this approach instead of just deleting the message once the process is done:
What happens when you get the message and start processing it and somehow the process fails? If you set the visibility timeout to be 7 days, then the message would never appear in the queue again and thus the process it was supposed to do never gets done.
Even though the message is hidden, it is still there in the queue thus you keep on incurring storage charges for that message. Even though the cost is trivial but why keep the message when you don't really need it.
A lot of systems rely on Approximate Messages Count property of a queue to check on the health of processes which are performed by messages in a queue. Please note that even though you make the message hidden, it is still there in the queue and thus will be included in total messages count in the queue. So if you're building a system which relies on this for health check, you will always find your system to be unhealthy because you're never deleting the messages.
I'm curious to know why you find deleting messages to be very slow. In my experience this is quite fast. How are you monitoring message deletion?
Rather than hacking around the problem I think you should drill into understanding why the deletes are slow. Have you enabled logs and looked at the e2elatency and serverlatency numbers across all your queue operations. Ideally you shouldn't be seeing a large difference between the two for all your queue operations. If you do see a large difference then it implies something is happening on the client that you should investigate further.
For more information on logging take a look at the following articles:
http://blogs.msdn.com/b/windowsazurestorage/archive/tags/analytics+2d00+logging+_2600_amp_3b00_+metrics/
http://msdn.microsoft.com/en-us/library/azure/hh343262.aspx
Information on client side logging can also be found in this post: e client side logging – which you can learn more about in this blog post. http://blogs.msdn.com/b/windowsazurestorage/archive/2013/09/07/announcing-storage-client-library-2-1-rtm.aspx
Please let me know what you find.
Jason

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).

Resources