How does Azure Service Bus identify a duplicate message? - azure

I understand that Azure Service Bus has a duplicate message detection feature which will remove messages it believes are duplicates of other messages. I'd like to use this feature to help protect against some duplicate delivery.
What I'm curious about is how the service determines two messages are actually duplicates:
What properties of the message are considered?
Is the content of the message considered?
If I send two messages with the same content, but different message properties, are they considered duplicates?

The duplicate detection is looking at the MessageId property of the brokered message. So, if you set the message Id to something that should be unique per message coming in the duplicate detection can catch it. As far as I know only the message Id is used for detection. The contents of the message are NOT looked at, so if you have two messages sent that have the same actual content, but have different message IDs they will not be detected as duplicate.
References:
MSDN Documentation: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-queues-topics-subscriptions
If the scenario cannot tolerate duplicate processing, then additional
logic is required in the application to detect duplicates which can be
achieved based upon the MessageId property of the message which will
remain constant across delivery attempts. This is known as Exactly
Once processing.
There is also a Brokered Message Duplication Detection code sample on WindowsAzure.com that should be exactly what you are looking for as far as proving it out.
I also quickly tested this out and sent in 5 messages to a queue with RequiresDuplicateDetection set to true, all with the exact same content but different MessageIds. I then retrieved all five messages. I then did the reverse where I had matching MessageIds but different payloads, and only one message was retrieved.

In my case I have to apply ScheduledEnqueueTimeUtc on top of MessageId.
Because most of the time the first message already got pickup by worker, before the sub-sequence duplicate message were arrive in the Queue.
By adding ScheduledEnqueueTimeUtc. We tell the Service bus to hold on the the message for some time before letting worker them up.
var message = new BrokeredMessage(json)
{
MessageId = GetMessageId(input, extra)
};
// Delay 30 seconds for Message to process
// So that Duplication Detection Engine has enought time to reject duplicated message
message.ScheduledEnqueueTimeUtc = DateTime.UtcNow.AddSeconds(30);

Another important property to be considered while dealing with 'RequiresDuplicateDetection' property of a Azure Service Bus entity is 'DuplicateDetectionHistoryTimeWindow', the time frame within which message with duplicate message id will be rejected.
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered.

Related

Google PubSub: drop nacked message after n retries

Is there way to configure pull subscription in the way that messages which caused error and were nacked, were re-queued (and so that redelivered) no more than n times?
Ideally on the last processing if it also failed I would like to handle this case (for example, log that this message is given up to process and will be dropped).
Or probably it's possible to find out, how much times received message was tried to be processed before?
I use node.js. I can see a lot of different options in the source code by am not sure how should I achieve desired behaviour.
Cloud Pub/Sub supports Dead Letter Queues that can be used to drop nacked messages after a configurable number of retries.
Currently, there is no way in Google Cloud Pub/Sub to automatically drop messages that were redelivered some designated number of times. The message will stop being delivered once the retention deadline has passed for that message (by default, seven days). Likewise, Pub/Sub does not keep track of or report the number of times a message was delivered.
If you want to handle these kinds of messages, you'd need to maintain a persistent storage keyed by message ID that you could use to keep track of the delivery count. If the delivery count exceeds your desired threshold, you could write the message to a separate topic that you use as a dead letter queue and then acknowledge original message.

Azure ServiceBus message reprocessing using MaxDeliveryCount

I'm trying to use the ServiceBus subscriptions's MaxDeliveryCount for implementing retry of message processing. Not sure it's the best idea, but don't want to lose messages.
Scenario:
Having one topic, with two subscribers one sending (A) and the other
(B) receiving messages, using Peek-Lock
Both subscriptions have configured MaxDeliveryCount=10
Both clients use
SubscriptionClient.ReceiveBatch(50,TimeSpan.FromMilliseconds(50)) to
get the messages from the Queue
A sends 5 messages, having payload "1", "2",..."5"
The first messages ("1") fails to process on B and is marked as abandoned (BrokeredMessage.Abandon())
Reason: for internal reasons, app can't process this message now.
It's not yet BlackLettered since DeliveryCount < MaxDeliveryCount)
Next, since the message "1" previously failed, only one message is
requested from, and it's expected to be message "1"
SubscriptionClient.ReceiveBatch(1,TimeSpan.FromMilliseconds(50))
After 2-3 repetitions of step 7, instead of receiving message "1",
message "2" is received Message "2" is also marked as Abandoned
since message "1" is expected
Then message "3" is received
Message "3" is also marked as Abandoned since message "1" is
expected
and so on.
It seems, in this scenario, the SB is delivering the messages in a Round Robing manner.
Is this the intended behavior of ServiceBus?
I am aware about the existence of some debates whether SB guarantees ordered delivery or not. For the applications it's really important that messages are processed in the same order they are sent.
Any ideas how reprocessing of message "1" could be performed until DeliveryCount reaches MaxDeliveryCount before processing the message "2"?
Firstly, as Thomas shared in his comment, you could try to specify SupportOrdering property to true for the Topic.
TopicDescription td = new TopicDescription("TopicTest");
td.SupportOrdering = true;
And if subscription client received a message and call Abandon method to abandon the lock on a peek-locked message, we could call Receive method again to get it again, like this.
output:
On the other hand, if possible, you could try to combine a complete work steps with a specific order in a single message instead of splitting steps in multiple messages, and then you could control the processing in specific order in your code logic rather than reply on the service bus infrastructure to provide this guarantee.

Azure queue - return timed out messages to the head of the queue

When a message is retrieved from an azure queue but not deleted from it, the messages visibility timeout expires and the message is (re)added to the end of the queue.
Is there a way to return such messages to the head of the queue instead?
When Azure Queue messages re-appear, they don't necessarily get sent to the end of the queue. They just reappear, and at that point, no real guarantee of order. It doesn't even get moved from its current position; it's just visible again. Azure storage queues aren't set up for guaranteed order. So no, there's no way to force a message to appear at the head of the queue when it reappears after its invisibility timeout expires.
Also, check out this forum answer from Jai Haridas regarding queue message ordering. Specifically:
The messages in a queue today are sorted by its visibility time. So the ordering of messages purely depends on when they are made visible. However, it is important for an app to not assume FIFO order or any specific order as it may change in future. You can only rely that 1) a message will be eligible based on its visibility timeout and 2) Message processing should be made idempotent and use the new UpdateMessage to save state
UpdateMessage() allows you to modify the queue message (e.g.adding breadcrumbs), so the next time you start processing it, you can pick up at a point beyond "start." Note that you can also adjust the timeout value, while it's still in your possession and invisible, to allow you to keep working on the message.

How does Windows Azure Service Bus Queues Duplicate Detection work?

I know that you can set duplicate detection to work over a time period with an azure service bus queue. However, does anyone know whether this works based on the objects in the queue?
So if I have an object with an id of "SO_1" which gets put on the queue and is subsequently consumed, is the duplicate detection still valid?
What I think I'm asking is - is it the timeframe and the object, or just the timeframe that make the queue decide what is a duplicate?
http://blog.iquestgroup.com/en/windows-azure-service-bus-duplicate-detection/#.UaiXrd7frIU
When we activate duplication, the Windows Azure Service Bus will start to store a history of our messages. This period of time can be configured to range from only a few minutes to days. If a duplicate message is sent to the Service Bus, the service will automatically ignore the message.
Posting this to clarify on a couple of misconceptions in the responses found above,
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered. (the blog referenced in one of the responses says the message content cannot be duplicate which is not correct).
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Refer this documentation for more details.
This actually just bit me, the default seems to be to have it enabled and the default time is 10 minutes. The "key" is the MessageId. In our case, in most scenarios duplicate detection is fine, but in some it was bad news (especially with the 10 minute range). To get around this, we introduced a "breaker":
// For this message, we need to prevent dups from being detected
msg.MessageId = messageId + "_" + DateTime.Now.ToString("u");
If you just want to prevent "spamming" you might consider setting the duplicate detection window to the minimum (20 seconds). (Personally, I would love to see a threshold as low as 5 seconds).
The current ranges allowed are 20 seconds to 7 days.
You will have to create message id based on object e.g. hash of object and enable duplicate message detection in topic/queue.
Azure Service Bus duplicate detection points to keep in mind:
• Duplicate is identified based on SessionId(if present), PartitionKey(if present), and MessageId in a time window
• Duplicate detection time window:
o 20 secs to 7 days (default : 10 mins)
o Larger window can impact throughput due to matching, better to keep as small window as possible
• Duplicate detection can be enabled only while creating topic/queue, window can be update at any point of time
• Duplicate messages will be ignored/dropped
ref: https://learn.microsoft.com/en-us/azure/service-bus-messaging/duplicate-detection

Azure Queue unique message

I would like to make sure that I don't insert a message to the queue multiple times. Is there any ID/Name I can use to enforce uniqueness?
vtortola pretty much covered it, but I wanted to add a bit more detail into why it's at least once delivery.
When you read a queue item, it's not removed from the queue; instead, it becomes invisible but stays in the queue. That invisibility period defaults to 30 seconds (max: 2 hours). During that time, the code that got the item off the queue has that much time to process whatever command was in the queue message and delete the queue item.
Assuming the queue item is deleted before the timeout period is reached, all is well. However: Once the timeout period is reached, the queue item becomes visible again, and the code holding the queue item may no longer delete it. In this case, someone else can read the same queue message and re-process that message.
Because of the fact a queue message can timeout, and can re-appear:
Your queue processing must be idempotent - operations on a queue message must result in the same outcome (such as rendering a thumbnail for a photo).
You need to think about timeout adjustments. You might find that commands are valid but processing is taking too long (maybe your 45-second thumbnail rendering code worked just fine until someone uploaded a 25MP image)
You need to think about poison messages - those that will never process correctly. Maybe they cause an exception to be thrown or have some invalid condition that causes the message processor to abort processing, which leads to the message eventually re-appearing in the queue. There's a property callded DequeueCount - consider viewing that property upon reading a queue item and, if equal to, say, 3, push the message into a table or blob and send yourself a notification to spend some time debugging that message offline.
More details on the get-queue low-level REST API is here. This will give you more insight into Windows Azure queue message handling.
Azure queues doesn't ensure message order and either message uniqueness. Messages will be processed "at least once", but nothing ensures it won't be processed twice, so it doesn't ensure "at most once".
You should get ready to receive the same message twice. You can put an ID in the body of the message as part of your data.

Resources