I know that you can set duplicate detection to work over a time period with an azure service bus queue. However, does anyone know whether this works based on the objects in the queue?
So if I have an object with an id of "SO_1" which gets put on the queue and is subsequently consumed, is the duplicate detection still valid?
What I think I'm asking is - is it the timeframe and the object, or just the timeframe that make the queue decide what is a duplicate?
http://blog.iquestgroup.com/en/windows-azure-service-bus-duplicate-detection/#.UaiXrd7frIU
When we activate duplication, the Windows Azure Service Bus will start to store a history of our messages. This period of time can be configured to range from only a few minutes to days. If a duplicate message is sent to the Service Bus, the service will automatically ignore the message.
Posting this to clarify on a couple of misconceptions in the responses found above,
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered. (the blog referenced in one of the responses says the message content cannot be duplicate which is not correct).
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Refer this documentation for more details.
This actually just bit me, the default seems to be to have it enabled and the default time is 10 minutes. The "key" is the MessageId. In our case, in most scenarios duplicate detection is fine, but in some it was bad news (especially with the 10 minute range). To get around this, we introduced a "breaker":
// For this message, we need to prevent dups from being detected
msg.MessageId = messageId + "_" + DateTime.Now.ToString("u");
If you just want to prevent "spamming" you might consider setting the duplicate detection window to the minimum (20 seconds). (Personally, I would love to see a threshold as low as 5 seconds).
The current ranges allowed are 20 seconds to 7 days.
You will have to create message id based on object e.g. hash of object and enable duplicate message detection in topic/queue.
Azure Service Bus duplicate detection points to keep in mind:
• Duplicate is identified based on SessionId(if present), PartitionKey(if present), and MessageId in a time window
• Duplicate detection time window:
o 20 secs to 7 days (default : 10 mins)
o Larger window can impact throughput due to matching, better to keep as small window as possible
• Duplicate detection can be enabled only while creating topic/queue, window can be update at any point of time
• Duplicate messages will be ignored/dropped
ref: https://learn.microsoft.com/en-us/azure/service-bus-messaging/duplicate-detection
Related
I'm trying to determine if there's a way for Azure Service Bus to provide message collapsing. Specifically I'm after something like:
First event into a queue gets picked up straight away
All other events that are queued within the next N seconds, and match some criteria (e.g. matching message ids), have the schedule enqueue set to a value so they fire at the end of the N seconds. If a "waiting" message already exists it should be deleted.
After the N seconds has expired the newest scheduled message appears and is picked up.
Basically I need a way to get a good time-to-first-event, but provide protection from over processing events from chatty sources.
Does anyone have a pattern they've used to get something close to these semantics?
Update 1
The messages involved aren't true duplicates, rather they're the current state of an entity that is used for some processing (e.g. a message that's generated each time a file is updated). The result of the processing of an early message is fully replaced by that of later messages (e.g. the result is the size of the file). So we still need to guarantee we process the most recent message, but it's a waste to process all M within N seconds.
It sounds like you're talking about Duplicate Detection, especially in regards to matching MessageIds. If you want to evaluate some other attribute in the message for duplicate detection, maybe it's worth taking a step back and asking Why are my publishers sending so many duplicate messages? If it's unavoidable, maybe you can segregate your chatty consumers into a separate consumer group and manually handle the the duplicate check, then re-enqueue (just thinking out loud).
Is there way to configure pull subscription in the way that messages which caused error and were nacked, were re-queued (and so that redelivered) no more than n times?
Ideally on the last processing if it also failed I would like to handle this case (for example, log that this message is given up to process and will be dropped).
Or probably it's possible to find out, how much times received message was tried to be processed before?
I use node.js. I can see a lot of different options in the source code by am not sure how should I achieve desired behaviour.
Cloud Pub/Sub supports Dead Letter Queues that can be used to drop nacked messages after a configurable number of retries.
Currently, there is no way in Google Cloud Pub/Sub to automatically drop messages that were redelivered some designated number of times. The message will stop being delivered once the retention deadline has passed for that message (by default, seven days). Likewise, Pub/Sub does not keep track of or report the number of times a message was delivered.
If you want to handle these kinds of messages, you'd need to maintain a persistent storage keyed by message ID that you could use to keep track of the delivery count. If the delivery count exceeds your desired threshold, you could write the message to a separate topic that you use as a dead letter queue and then acknowledge original message.
I'm working with sensor systems where each sensor sends a new reading every 15 seconds.
Each sensor type also has defined rules that when triggered will generate an alarms output - e.g. sensor of type "temperature" sends a value that is higher than MAX temperature allowed.
Lets assume sensor with ID "XXX_01" sends 2 readings in 30 seconds, each reading has higher value than MAX value allowed.
Event in: 01/10/2018 12:00:00
{ id:"XXX_01", value: 90, "temperature" }
Event in: 01/10/2018 12:15:00
{ id:"XXX_01", value: 95, "temperature" }
Now, I want to notify the end user that there is an alarm - I have to send out some sort of a notification to end user(s). The problem and confusion is that I do not want to send out the alarms twice.
Assuming I use something like Twilio to send SMS or just send out Email notifications, I don't want to spam my end users with a new notification every 15 seconds assuming incoming sensor readings stay above MAX value allowed.
What kind of an Azure Service, architecture or design paradigm could I use to avoid such issue?
I have to say that A (don't want to spam users notification) and B (alarm high temperature as soon as it touches MAX line) have some contradictions. It's hard to implement it.
In my opinion, you can send notifications to users at a fixed frequency.
1.In that frequency period, such as 1 minute, use Azure stream analytics service to receive sensor data every 15 seconds.
2.Then output the data to Azure Storage Queue.
3.Then use Azure Queue Time Trigger to get latest temperature value in the Azure Storage Queue current messages every 1 minute. If it touches MAX line,then send notification to end users. If you want to notify user that it touched MAX line no matter it already dropped, then just sort the messages by value and judge it.
4.Finally, empty the queue.
using Azure Stream Analytics, you can trigger the alert when the threshold is passed AND if it's the first time in the last 30s for example.
I give you the sample SQL for this example:
SELECT *
FROM input
WHERE ISFIRST(second, 30) OVER (WHEN value> 90)=1
Let us know if you have any further question.
I also agree with Jay's response about contradiction.
But one more way we can handle it, I also faced similar issue in one of my assignment, what I tried is keeping track of once sent alarm via cache (i.e. redi cache, memcache etc) and every time check if alarm already sent then don't sent . Obviously trade-off is that everytime we needs to check, but that's the concern that you needs to decide
We can also extend same to notify user if max temperature is reset to normal.
Hope this helps.
I understand that Azure Service Bus has a duplicate message detection feature which will remove messages it believes are duplicates of other messages. I'd like to use this feature to help protect against some duplicate delivery.
What I'm curious about is how the service determines two messages are actually duplicates:
What properties of the message are considered?
Is the content of the message considered?
If I send two messages with the same content, but different message properties, are they considered duplicates?
The duplicate detection is looking at the MessageId property of the brokered message. So, if you set the message Id to something that should be unique per message coming in the duplicate detection can catch it. As far as I know only the message Id is used for detection. The contents of the message are NOT looked at, so if you have two messages sent that have the same actual content, but have different message IDs they will not be detected as duplicate.
References:
MSDN Documentation: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-queues-topics-subscriptions
If the scenario cannot tolerate duplicate processing, then additional
logic is required in the application to detect duplicates which can be
achieved based upon the MessageId property of the message which will
remain constant across delivery attempts. This is known as Exactly
Once processing.
There is also a Brokered Message Duplication Detection code sample on WindowsAzure.com that should be exactly what you are looking for as far as proving it out.
I also quickly tested this out and sent in 5 messages to a queue with RequiresDuplicateDetection set to true, all with the exact same content but different MessageIds. I then retrieved all five messages. I then did the reverse where I had matching MessageIds but different payloads, and only one message was retrieved.
In my case I have to apply ScheduledEnqueueTimeUtc on top of MessageId.
Because most of the time the first message already got pickup by worker, before the sub-sequence duplicate message were arrive in the Queue.
By adding ScheduledEnqueueTimeUtc. We tell the Service bus to hold on the the message for some time before letting worker them up.
var message = new BrokeredMessage(json)
{
MessageId = GetMessageId(input, extra)
};
// Delay 30 seconds for Message to process
// So that Duplication Detection Engine has enought time to reject duplicated message
message.ScheduledEnqueueTimeUtc = DateTime.UtcNow.AddSeconds(30);
Another important property to be considered while dealing with 'RequiresDuplicateDetection' property of a Azure Service Bus entity is 'DuplicateDetectionHistoryTimeWindow', the time frame within which message with duplicate message id will be rejected.
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered.
We had a terrible problem/experience yesterday when trying to swap our staging <--> production role.
Here is our setup:
We have a workerrole picking up messages from the queue. These messages are processed on the role. (Table Storage inserts, db selects etc ). This can take maybe 1-3 seconds per queue message depending on how many table storage posts he needs to make. He will delete the message when everything is finished.
Problem when swapping:
When our staging project went online our production workerrole started erroring.
When the role wanted to process queue messsage it gave a constant stream of 'EntityAlreadyExists' errors. Because of these errors queue messages weren't getting deleted. This caused the queue messages to be put back in the queue and back to processing and so on....
When looking inside these queue messages and analysing what would happend with them we saw they were actually processed but not deleted.
The problem wasn't over when deleting these faulty messages. Newly queue messages weren't processed as well while these weren't processed yet and no table storage records were added, which sounds very strange.
When deleting both staging and producting and publishing to production again everything started to work just fine.
Possible problem(s)?
We have litle 2 no idea what happened actually.
Maybe both the roles picked up the same messages and one did the post and one errored?
...???
Possible solution(s)?
We have some idea's on how to solve this 'problem'.
Make a poison message fail over system? When the dequeue count gets over X we should just delete that queue message or place it into a separate 'poisonqueue'.
Catch the EntityAlreadyExists error and just delete that queue message or put it in a separate queue.
...????
Multiple roles
I suppose we will have the same problem when putting up multiple roles?
Many thanks.
EDIT 24/02/2012 - Extra information
We actually use the GetMessage()
Every item in the queue is unique and will generate unique messages in table Storage. Little more information about the process: A user posts something and will have to be distributed to certain other users. The message generate from that user will have a unique Id (guid). This message will be posted into the queue and picked up by the worker role. The message is distributed over several other tables (partitionkey -> UserId, rowkey -> Some timestamp in ticks & the unique message id. So there is almost no chance the same messages will be posted in a normal situation.
The invisibility time out COULD be a logical explanation because some messages could be distributed to like 10-20 tables. This means 10-20 insert without the batch option. Can you set or expand this invisibility time out?
Not deleting the queue message because of an exception COULD be a explanation as well because we didn't implement any poison message fail over YET ;).
Regardless of the Staging vs. Production issue, having a mechanism that handles poison messages is critical. We've implemented an abstraction layer over Azure queues that automatically moves messages over to a poison queue once they've been attempted to be processed some configurable amount of times.
You clearly have a fault on handling double messages. The fact that your ID is unique doesn't mean that the message will not be processed twice in some occasions like:
The role dying and with partially finished work, so the message will re-appear for processing in the queue
The role crashing unexpected, so the message ends up back in the queue
The FC migrating moving your role and you don't have code to handle this situation, so the message ends up back in the queue
In all cases, you need code that handles the fact that the message will re-appear. One way is to use the DequeueCount property and check how many times the message was removed from a Queue and received for processing. Make sure you have code that handles partial processing of a message.
Now what probably happened during swapping was, when the production environment became the staging and staging became production, both of them were trying to receive the same messages so they were basically competing each other fro those messages, which is probably not bad because this is a known pattern to work anyway but when you killed your old production (staging) every message that was received for processing and wasn't finished, ended up back in the Queue and your new production environment picked the message for processing again. Having no code logic to handle this scenario and a message was that partially processed, some records in the tables existed and it started causing the behavior you noticed.
There are a few possible causes:
How are you reading the queue messages? If you are doing a Peek Message then the message will still be visible to be picked up by another role instance (or your staging environment) before the message is deleted. You want to make sure you are using Get Message so the message is invisible until it can be deleted.
Is it possible that your first role crashed after doing the work for the message but prior to deleting the message? This would cause the message to become visible again and get picked up by another role instance. At that point the message will be a poison message which will cause your instances to constantly crash.
This problem almost certainly has nothing to do with Staging vs Production, but is most likely caused by having multiple instances reading from the same queue. You can probably reproduce the same problem by specifying 2 instances, or by deploying the same code to 2 different production services, or by running the code locally on your dev machine (still pointing to Azure storage) using 2 instances.
In general you do need to handle poison messages so you need to implement that logic anyways, but I would suggest getting to the root cause of this problem first, otherwise you are just going to run into a lot more problems later on.
With queues you need to code with idempotency in mind and expect and handle the ‘EntityAlreadyExists’ as a viable response.
As others have suggested, causes could be
Multiple message in the queue with the same identifier.
Are peeking for the message and not reading it form the queue and so not making them invisible.
Not deleting the message because an exception was thrown before you can delete them.
Taking too long to process the message so it cannot be deleted (because invisibility was timed out) and appears again
Without looking at the code I am guessing that it is either the 3 or 4 option that is occurring.
If you cannot detect the issue with a code review, you may consider adding time based logging and try/catch wrappers to get a better understanding.
Using queues effectively, in a multi-role environment, requires a slightly different mindset and running into such issues early is actually a blessing in disguise.
Appended 2/24
Just to clarify, modifying the invisibility time out is not a generic solution to this type of problem. Also, note that this feature although available on the REST API, may not be available on the queue client.
Other options involve writing to table storage in an asynchronous manner to speed up your processing time, but again this is a stop gap measures which does not really address the underlying paradigm of working with queues.
So, the bottom line is to be idempotent. You can try using the table storage upsert (update or insert) feature to avoid getting the ‘EntitiyAlreadyExists’ error, if that works for your code. If all you are doing is inserting new entities to azure table storage then the upsert should solve your problem with minimal code change.
If you are doing updates then it is a different ball game all together. One pattern is to pair updates with dummy inserts in the same table with the same partition key so as to error out if the update occurred previously and so skip the update. Later after the message is deleted, you can delete the dummy inserts. However, all this adds to the complexity, so it is much better to revisit the architecture of the product; for example, do you really need to insert/update into so many tables?
Without knowing what your worker role is actually doing I'm taking a guess here, but it sounds like when you have two instances of your worker role running you are getting conflicts while trying to write to an Azure table. It is likely to be because you have code that looks something like this:
var queueMessage = GetNextMessageFromQueue();
Foo myFoo = GetFooFromTableStorage(queueMessage.FooId);
if (myFoo == null)
{
myFoo = new Foo {
PartitionKey = queueMessage.FooId
};
AddFooToTableStorage(myFoo);
}
DeleteMessageFromQueue(queueMessage);
If you have two adjacent messages in the queue with the same FooId it is quite likely that you'll end up with both of the instances checking to see if the Foo exists, not finding it then trying to create it. Whichever instance is the last to try and save the item will get the "Entity already exists" error. Because it errored it never gets to the delete message part of the code and therefore it becomes visible back on the queue after a period of time.
As others have said, dealing with poison messages is a really good idea.
Update 27/02
If it's not subsequent messages (which based on your partition/row key scheme I would say it's unlikely), then my next bet would be it's the same message appearing back in the queue after the visibility timeout. By default if you're using .GetMessage() the timeout is 30 seconds. It has an overload which allows you to specify how long that time frame is. There is also the .UpdateMessage() function that allows you to update that timeout as you're processing the message. For example you could set the initial visibility to 1 minute, then if you're still processing the message 50 seconds later, extent it for another minute.