Google PubSub: drop nacked message after n retries

Google PubSub: drop nacked message after n retries - node.js

Is there way to configure pull subscription in the way that messages which caused error and were nacked, were re-queued (and so that redelivered) no more than n times?
Ideally on the last processing if it also failed I would like to handle this case (for example, log that this message is given up to process and will be dropped).
Or probably it's possible to find out, how much times received message was tried to be processed before?
I use node.js. I can see a lot of different options in the source code by am not sure how should I achieve desired behaviour.

Cloud Pub/Sub supports Dead Letter Queues that can be used to drop nacked messages after a configurable number of retries.

Currently, there is no way in Google Cloud Pub/Sub to automatically drop messages that were redelivered some designated number of times. The message will stop being delivered once the retention deadline has passed for that message (by default, seven days). Likewise, Pub/Sub does not keep track of or report the number of times a message was delivered.
If you want to handle these kinds of messages, you'd need to maintain a persistent storage keyed by message ID that you could use to keep track of the delivery count. If the delivery count exceeds your desired threshold, you could write the message to a separate topic that you use as a dead letter queue and then acknowledge original message.

Related

Azure servicebus ReceiveBatch only returns 2 messages

I am trying to periodically receive all messages in a servicebus queue. But when I call ReceiveBatch(1000) I max get 2 messages back.
This question is kind of related to this question, except he would get a lot more by calling ReceiveBatch multiple times, I do not.
How do I get all messages on a servicebus queue?

The name ReceiveBatch(maximumNumber) is somewhat misleading. You don't get a batch, you get a collection of up-to maximum number of messages. This means you can receive less than maximuNumber as well. If you wish to receive a specific amount, you'd need to loop through the receiving operation until you get that number of messages (and potentially slightly more).

Message Collapsing

I'm trying to determine if there's a way for Azure Service Bus to provide message collapsing. Specifically I'm after something like:
First event into a queue gets picked up straight away
All other events that are queued within the next N seconds, and match some criteria (e.g. matching message ids), have the schedule enqueue set to a value so they fire at the end of the N seconds. If a "waiting" message already exists it should be deleted.
After the N seconds has expired the newest scheduled message appears and is picked up.
Basically I need a way to get a good time-to-first-event, but provide protection from over processing events from chatty sources.
Does anyone have a pattern they've used to get something close to these semantics?
Update 1
The messages involved aren't true duplicates, rather they're the current state of an entity that is used for some processing (e.g. a message that's generated each time a file is updated). The result of the processing of an early message is fully replaced by that of later messages (e.g. the result is the size of the file). So we still need to guarantee we process the most recent message, but it's a waste to process all M within N seconds.

It sounds like you're talking about Duplicate Detection, especially in regards to matching MessageIds. If you want to evaluate some other attribute in the message for duplicate detection, maybe it's worth taking a step back and asking Why are my publishers sending so many duplicate messages? If it's unavoidable, maybe you can segregate your chatty consumers into a separate consumer group and manually handle the the duplicate check, then re-enqueue (just thinking out loud).

Azure Storage Queue - long time to process

I need to generate quite a number of reports and a report can take about 5 minutes to be generated, large amount of data, many different sources.
The client will post messages to an Azure Storage Queue. There is a worker roles that processes the messages and generates the reports.
If I want to scale this up let's say I end up with 10 worker roles that will process the messages from the queue and generate the reports. Then I will add messages into the queue like this:
message 1: process reports from 1 - 5
message 2: process reports from 6 - 11
........
message 10: process reports from 50 - 55 (might not be accurate the range)
If my worker role 1 will take the first message and put a lock on it but the process will take 5 minutes, the lock will expire and the message will be visible again in the queue so the worker role 2 will take it and start processing it ... and so forth
How can I avoid that consuming the queue message is done only once keeping in mind that the task is a long one?

First of all: Using Azure Storage queues, you should be prepared for all of your operations to be idempotent: In case your queue item is processed multiple times, the same result should happen each time. The reason I bring this up: There's simply no way to guarantee you'll process the message one time (unless you check the DequeueCount property of the message and halt processing accordingly), due to unexpected events such as your role instance crashing/rebooting or your queue item processing code doing something unexpected like throwing an exception.
Next: Queue message invisibility timeout can be programmatically extended. This can be done via the queue api or via one of the language sdk's. In c# (something like this - I didn't test this), extending an additional minute:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(60),
MessageUpdateFields.Visibility);
You can also modify the message along the way (maybe as a hint to your code, to let you know which of the 5 reports has been complete. This should help your specific issue: In the event the message gets reprocessed, you don't have to process all five reports if the message has been modified to say something like "process reports from 3-5"). Note: You can combine the MessageUpdateFields flags via |:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(0),
MessageUpdateFields.Content);
Lastly: If you're concerned with the length of time taken to process a batch of reports, perhaps rethink why you're processing five reports in each message, vs. one report per message. You can always read queue messages in batches. This is getting a bit subjective, as there's really no right or wrong way to do it, but it's just something for you to think about.

How does Azure Service Bus identify a duplicate message?

I understand that Azure Service Bus has a duplicate message detection feature which will remove messages it believes are duplicates of other messages. I'd like to use this feature to help protect against some duplicate delivery.
What I'm curious about is how the service determines two messages are actually duplicates:
What properties of the message are considered?
Is the content of the message considered?
If I send two messages with the same content, but different message properties, are they considered duplicates?

The duplicate detection is looking at the MessageId property of the brokered message. So, if you set the message Id to something that should be unique per message coming in the duplicate detection can catch it. As far as I know only the message Id is used for detection. The contents of the message are NOT looked at, so if you have two messages sent that have the same actual content, but have different message IDs they will not be detected as duplicate.
References:
MSDN Documentation: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-queues-topics-subscriptions
If the scenario cannot tolerate duplicate processing, then additional
logic is required in the application to detect duplicates which can be
achieved based upon the MessageId property of the message which will
remain constant across delivery attempts. This is known as Exactly
Once processing.
There is also a Brokered Message Duplication Detection code sample on WindowsAzure.com that should be exactly what you are looking for as far as proving it out.
I also quickly tested this out and sent in 5 messages to a queue with RequiresDuplicateDetection set to true, all with the exact same content but different MessageIds. I then retrieved all five messages. I then did the reverse where I had matching MessageIds but different payloads, and only one message was retrieved.

In my case I have to apply ScheduledEnqueueTimeUtc on top of MessageId.
Because most of the time the first message already got pickup by worker, before the sub-sequence duplicate message were arrive in the Queue.
By adding ScheduledEnqueueTimeUtc. We tell the Service bus to hold on the the message for some time before letting worker them up.
var message = new BrokeredMessage(json)
{
MessageId = GetMessageId(input, extra)
};
// Delay 30 seconds for Message to process
// So that Duplication Detection Engine has enought time to reject duplicated message
message.ScheduledEnqueueTimeUtc = DateTime.UtcNow.AddSeconds(30);

Another important property to be considered while dealing with 'RequiresDuplicateDetection' property of a Azure Service Bus entity is 'DuplicateDetectionHistoryTimeWindow', the time frame within which message with duplicate message id will be rejected.
Default value of duplicate detection time history now is 30 seconds, the value can range between 20 seconds and 7 days.
Enabling duplicate detection helps keep track of the application-controlled MessageId of all messages sent into a queue or topic during a specified time window. If any new message is sent carrying a MessageId that has already been logged during the time window, the message is reported as accepted (the send operation succeeds), but the newly sent message is instantly ignored and dropped. No other parts of the message other than the MessageId are considered.

does multiple Azure worker role polling same Queue causes Dead Lock or Poison message

Scenario:
if I've spin off multiple Worker roles or ONE Worker role with multiple threads, which polls the new messages in Azure Queue.
Could someone please confirm if the this the correct design approach? The reason I would like to have many worker roles is to speed up the PROCESSJOB. Our application should be near real time, i.e. as soon as there are messages we should get, apply complex business rules and commit to AZURE DB. We are expecting 11,000 message per 3min.
Thank you.

You may have as many queue-readers as you like. It's very common to scale out worker role instances, as they can all read from the same queue, giving you much greater work throughput.
When you read a queue message, it's marked "invisible" for a period of time, to prevent others from reading and doing the same work. The owner of the message must delete it before the time period expires, otherwise the message becomes visible again, and an exception will be thrown when the original reader attempts to delete it. This means your operations must be idempotent.
There's no direct poison-message handling, but it's easy to implement, as each message has a dequeue count. Just check it and remove poison messages after being read 3-4 times. You can also dynamically adjust the timeout period based on dequeue count, as maybe the processing fails due to too-short a time window.
Here's the MSDN documentation for DequeueCount.
EDIT: As far as processing 11,000 messages in 3 minutes: the scalability target for queues is 500 2,000 TPS, or up to 360,000 transactions in 3 minutes (far beyond the 11,000 message requirement you have). You can speed things up further by combining messages into a single queue message, as well as reading multiple messages at a time, which will also reduce your transaction count. You can also look at the ApproximateMessageCount property of a queue to see if your queue is backing up (and then scaling out to additional intstances to help consume queue items).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string