Azure Request-Response Session Timeout handling - azure

We are using the azure service bus to facilitate the parallel processing of messages through workers listening to a queue.
First an aggregated message is received and then this message is split in thousands of individual messages which are posted through a request-response pattern since we need to know when all messages have been completed to run a separate process.
Our issue is that the request-response method has a timeout which is causing the following issue:
Lets say we post 1000 messages to be processed and there is only one worker listening. Messages left in the queue after the timeout expiration are discarded which is something that we do not want. If we set the expiry time to a large value that will guarantee that all messages will be processed then we run the risk of a message failing and having to wait the timeout to understand that something has gone wrong.
Is there a way to dynamically change the expiration of a single message in a request-response scenario or any other pattern that we should consider?
Thanks!

You got the things wrong, The Time to live of an azure service bus message https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.timetolive.aspx It is the time on which the message will be on the queue if it is consumed or not.
This it is not the timeout, if you post a message with this larger time to live the message will stay on the queue for a long time but if you fail to consume you should warn the other end that you failed to consume this message.
You can do this using another queue and putting another message on this other queue with the message id that failed and the error.
This is an asynchronous process so you should not be holding requests based on that but work with the asynchronous nature of the problem.

Related

Azure Web Jobs, Azure Service Bus Queue Trigger prevent message from getting deleted

I am looking into setting up a web job trigger to read message from service bus queue. What would be the best practice to implement a retry logic in case of any errors handling the downstream systems.
Would we be able to throw an exception so that the message will not be deleted from the queue and will be retried after certain time period?
Appreciate your feedback.
You don't need to define retry logic explicitly. When the message is de-queued from service bus , it gets invisible from queue for certain time period (lock time default 30secs , you can configure it). You try to process the message , if it gets successful you simply call BrokeredMessage.CompleteAsync which means i am done and mark this message as completed. If you have some problem in down stream you can abandon the message by calling BrokeredMessage.AbandonAsync . This will unlock the message and the message appears back in the queue. The message will be picked up by the worker again and process it. Until you get successful or reach the max retry limit after which the message is send to dead letter queue.

Azure Service Bus - Add a message to the queue in a deferred state

I'm wondering if it is possible to send a brokered message to a queue/topic where the message is already in a deferred state?
I'm asking this because I currently have a process that does the following ...
The process starts and a brokered message is sent to a queue (this triggers a function that records the message body as an entity in table storage with a 'Processing' status).
Additional work is done in the process
If we get to the end of the process without any issues, another brokered message is sent to the queue with a completion message (this triggers the same function that updates the entity in table storage with a 'Complete' status).
While this method is mostly working, it feels clunky and fragile. I would really like to be able to send a message to the queue and then have the final step make the message visible on the queue so it can be consumed by the function (Durable Function).
I thought about setting the ScheduledEnqueueTimeUtc, but I can't guarantee when the process will finish (I'm thinking worst case scenario here) so I'm not sure how long to set it.
I also looked at the Defer option for a BrokeredMessage but it seems this can only be set from the receiver and not be in a deferred state initially.
Is what I'm trying to do possible with Service Bus brokered messages? Could I set the Scheduled Enqueue time so some ridiculously long time (e.g. 2 hours) and if it reaches that time it is automatically expired and moved to the Dead Letter queue? Should I send the initial message to the Dead Letter queue and then once the process is complete, retrieve it and resubmit it?
Has anyone had any experience with implementing a process like this ... send a start message and only process the message once a completion notification has been received? I need this to be as robust as possible as I'm dealing with financial transactions in this process.
Hopefully my explanation makes sense.
I'm wondering if it is possible to send a brokered message to a queue/topic where the message is already in a deferred state?
That's not possible. You can only delay a brand new message, not defer it. Deferring required a message to be received first for it to have a SequenceNumber.
Using ScheduledEnqueueTimeUtc has its challenges as you will be sending it in the future, but cannot cancel once processing is over. Instead, you could leverage QueueClient.ScheduleMessageAsync() that returns back SequenceNumber immediately. This way you can set the message far into future, but also cancel it if processing is finished earlier.
I ended up solving this issue by keeping the process of sending two messages, but refactoring my durable function to record the messages in Table Storage, check that both messages have been received and if they have, add a new message to Azure Queue Storage. A second function listens to the queue which starts its process.
After much testing, this appears to be quite a robust solution. It then doesn't matter what order the two messages arrive, or how long they take ... as long as both of them have arrive, that is when the second function will kick off.

Hidden messages in Azure storage queue

Sometimes there are some messages in Azure Queues that are not taken in charge by Azure Functions and also are not visible from StorageExplorer.
These messages are created without any visibility delay.
Is there any way to know what do those messages contain, and why are they not processed by our Azure Functions?
In the image you can see that we have a message in queue but it is not visible in the list and it is there from hours.
The Azure Queue API currently has no way to check invisible messages.
There are several situations in which a message will become invisible:
The message was added with an VisibilityTimeout in the Put Message request. The message will be invisible until this initial timeout expires.
The message has been retrieved (dequeued). Whenever a message is retrieved it will be invisible for the duration of the VisibilityTimeout specified by the Get Messages request, or 30 seconds by default.
The message has expired. Messages expire after 7 days by default, or after the MessageTTL specified in the Put Message request. Note: after a while these messages are automatically deleted, but until that point they are there as invisible message.
Use cases
Initial VisibilityTimeout
Messages are created with an initial VisibilityTimeout so that the message can be created now, but processed later (after the timeout expires), for whatever reason the creator has for wanting to delay this processing.
VisibilityTimeout on retrieving
The intended process for processing queue messages is:
The application dequeues one or more messages, optionally specifying the next VisibilityTimeout. This timeout should be bigger than the time it takes to process the message(s).
The application processes the message(s).
The application deletes the messages. When the processing fails the message(s) are not deleted.
Message(s) for which the process failed will become visible again as soon as their VisibilityTimeout expires, so that they can be re-tried. To prevent endless retries step 2. should start by checking the DequeueCount of the message: if it is bigger than the desired retry-count it should be deleted, instead of processed. It is good practice to copy such messages to a deadletter / poison queue (for example a queue with the original queue name plus a -poison suffix).
MessageTTL
By default messages have a time-to-live of 7 days. If the application processing cannot handle the amount of messages being added, a backlog could build up. Adjusting the TTL will determine what happens to such backlog.
Alternatively the application could crash, so that the backlog would build up until the application would be started again.
It seems that the message is expired. The following steps could reproduce the issue, you could test it.
Add message with a short TTL
After the message has been expired

Understanding Timeout In Partitioned Batch Jobs

I am trying to understand the ways timeouts cal be specified for partitioned steps.
jmsoutbound-gateway receive-timeout
jmsoutbound-gateway reply-timeout
jmsoutbound-gateway repyListener receive-timeout
partition handler messagingOperations receive-timeout
I want to be able to timeout when a step takes too long and clean up. By looking at the stack trace, the reply listener does not go away after partition ends (and may receive a late responding message after job has completed).
The time the executor thread will wait in the gateway for a reply to arrive (partition to complete) before giving up.
A timeout when writing to the reply-channel - in general will only apply if the send can block - such as when the reply channel is a bounded queue channel that is full.
When using a reply listener, the container polls the JMS client for messages, this timeout is simply how long the thread blocks in the client waiting for a reply before looping around and waiting again - it has no bearing on messages timing out; it only really affects how quickly the container will respond to a stop().
The time the partition handler will wait for all partitions to complete (unless pollRepositoryForResults is true in which case, the handler's timeout property represents that and the receive timeout is not used).
So it sounds like #4 is what you want.

Status as never finished by one of my Webjob while processing the message

I have a webjob which process the message only once by using the condition (DevliverCount = 1). Because I don't want other instance to process it if the locktime expired by first webjob. As other webjob try to process the message after locktime expired, the condition (DevliverCount = 1) will not met and comes out of the method which deletes the message from the queue automatically.
The problem over here is if the message state went to never finished (other than success) I wont have message in queue to process. How to handle this situation?
I think part of the problem is that you're trying to use the MaxDeliveryCount property to prevent concurrent message processing:
MaxDeliveryCount
The max delivery count setting is not used to prevent multiple consumers from processing a message at the same time, it's used to prevent "poison messages" where any consumer attempts to process a message whose contents prevent successful processing, and therefore the message would otherwise be processed forever.
I recommend you determine exactly what it is you're trying to accomplish. If you want a simple competing consumers scenario where multiple webjobs consume messages from a single queue, then there are standard ways to accomplish that:
good description of competing consumers
competing consumers with Service Bus queues
You can use MaxDeliveryCount in conjunction with competing consumers... if you want to prevent poison messages you can set MaxDeliveryCount to something larger than 1 and still give other consumers a chance to process messages whose locks expire.
Azure Service Bus supports dead-lettering of poison messages that exceed max delivery count, so you're able to examine such messages offline... they aren't simply deleted forever.
You might also need to add code in your webjobs to renew locks prior to their expiration... otherwise service bus can't differentiate between "valid messages that are taking a long time to process" and "poison messages that can't be processed". Without lock renewal your long-running valid messages will be dead-lettered the same as poison messages, which is almost certainly not what you want.
Good luck!

Resources