I have a process(Process A) that keeps sending events to an ASB topic. There are multiple consumers of the topic and therefore multiple subscriptions. So lets say that one of the consumer's process is down. And due to this the topic gets full as the messages are not consumed. Does this mean then Process A also fails as it is not able to send messages to ASB topic as its full?
Two more things to check:
Make sure that your dead letter queue is not full that counts towards the size of the entity.
Make sure that you have at least one subscription that works for each message. For example, if you send a message with ID=1, but you only have a subscription with ID=2, the messages will get backed up.
I think you are correct, once the limit is reached the queue stops.
However, with partitioning (using all 16 partitions * 5 GB), you can store up to 80 GB:
https://azure.microsoft.com/en-us/blog/partitioned-service-bus-queues-and-topics/
Another solution is to use auto forwarding, so the topic forwards all messages to another queue/topic
https://azure.microsoft.com/en-us/documentation/articles/service-bus-auto-forwarding/
This way each subscriber can have it's own queue of 5GB (or 80GB if you use partition)
Some more info:
https://azure.microsoft.com/nl-nl/documentation/articles/service-bus-azure-and-service-bus-queues-compared-contrasted/
https://azure.microsoft.com/en-us/documentation/articles/service-bus-quotas/
Related
Azure Service Bus entities (queues/topics) support a Time to Live (TTL). When the TTL passes the message expires. On expiry, the system deletes the message OR moves it to the Dead-Letter Queue (DLQ). Does Service Bus have another setting to delete messages from the DLQ after a specified period? For instance, to avoid passing size quotas, we might like to delete messages from the DLQ after six months.
See also:
Do messages in dead letter queues in Azure Service Bus expire?
https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-expiration?WT.mc_id=Portal-Microsoft_Azure_ServiceBus
Azure Service Bus doesn't have an expiration option on the dead-letter queues. This is likely intentional, as the system shouldn't just lose those messages but rather do something about them.
Sometimes, monitoring all dead-letter queues for total size and whatnot is inconvenient. One option is to create a centralized DLQ. That will allow the following:
Monitoring a single "dead-letter" queue.
Receive messages from a single entity for processing.
Keep the size under control by specifying a TTL on the queue.
For example, let's say you've got two queues, test-dlq and test-dlq2. You'd configure those to auto-forward dead-lettered messages to a 3rd queue, test-dlq-all. With that, when you have messages that are received by test-dlq or test-dlq2 and dead-lettering,
Those messages will end up in the centralized "DLQ" queue (test-dlq-all).
The nice part is whenever you have messages auto-forwarded, you'll always know where they originally dead-lettered.
For example, let's say you've got two messages, each from a different queue, ending up in test-dlq-all, the centralized "DLQ".
Inspecting its messages will reveal a system property, DeadLetterSource, stamped with the name of the queue it was dead-lettered initially in.
This solution lets you set TTL on the test-dlq-all queue and have messages auto-perged.
Also, worth mentioning that it's possible to either set up dead-lettering with the centralized "DLQ" or get messaged dead-lettered as a result of failing processing that exceeds MaxDeliveryCount. For that reason, it is worth wither monitoring test-dlq-alls DLQ.
Azure Event Hub uses the partitioned consumer pattern described in the docs.
I have some problems understanding the consumer side of this model when it comes to a real world scenario.
So lets say I have 1000 messages send to the event hub with 4 partitions, not defining any partition Id. This means the messages will go to all partitions using the round-robin method.
Now I want to have two applications distributing the messages to two different databases. My questions there are:
Lets say for the first application, I want to store all messages in Database 1. This means, for maximum speed, In my consumer application I need to have 4 threads (consumers), each listening to one partition of the event hub, right? Each of them also has to store their own offset for the partition they're reading (checkpoint).
Lets say my second application wants to filter the messages and only store a subset of them in Database 2. There I also need 4 consumers since I don't know which message goes to which partition, right?
Also for the two applications I need to have two consumer groups, but why? Is the filtering of the messages defined in the consumer group? I don't get it really why I need this one, since the applications consumers store the partition checkpoints by themselves and I can do the filtering within the applications itself.
I know there is the EventProcessorHost class but I want to understand the concept of the EventHub on a lower level.
Lets say for the first application, I want to store all messages in Database 1. This means, for maximum speed, In my consumer application I need to have 4 threads (consumers), each listening to one partition of the event hub, right? Each of them also has to store their own offset for the partition they're reading (checkpoint).
Correct, you should have a process per provisioned partition. So, if you have 4 processors you should have 4 processes, each processing the messages of a specific partition. If you process the messages using an EventProcessorHost it will take care of the spinning up of the processes for you.
Lets say my second application wants to filter the messages and only store a subset of them in Database 2. There I also need 4 consumers since I don't know which message goes to which partition, right?
What do you mean with a consumer? You need another 4 processes to process the messages but they should be configured to read using a different consumer group. Otherwise they will compete with the processes of 1
Also for the two applications I need to have two consumer groups, but why? Is the filtering of the messages defined in the consumer group? I don't get it really why I need this one, since the applications consumers store the partition checkpoints by themselves and I can do the filtering within the applications itself.
Let us define a consumer group:
Consumer groups enable multiple consuming applications to each have a separate view of the incoming message stream, and to read the stream independently at its own pace with its own offset
So yes, you need 2 different consumer groups.
Each consumer group will get all messages send to the event hub partitions. Each consumer group tracks its own progress in the stream of messages. That is why you need two for your scenario.
Say you define an additional consumer group called "App2-Consumer-Group", the reader processes will receive all messages but should take no action for messages they are not interested in.
If you would not create an additional consumer group, the reader processes for the default consumer group will process the messages for the first application and mark them as processed using the check-pointing mechanism. The reader processes for the second application won't get any messages since they are already marked as processed. (In real life, when using one consumer group with some messages might be picked up by the reader processes for the first application and some messages might be picked up by reader processes for the second application as the processes will try to get a lock on a specific partition)
I think this image shows clearly how consumer groups track their own progress in the stream of message and hence why you need tow of them if you have 2 different processing logic for the 2 different applications:
IN Azure Service bus topic how the messages are moved from dead letter queue to topic?
whether it will automatically moved to topic or we need to configure the properties of topic in portal or whether there is any other way to do it?(i prefer not to use any code here i wish to do only configuration changes)
We had a batch of around 60k messages, which need to be reprocessed from the dead letter queue. Peeking and send the messages back via Service Bus Explorer took around 6 minutes per 1k messages from my machine. I solved the issue by setting a forward rule for DLQ messages to another queue and from there auto forward it to the original queue. This solution took around 30 seconds for all 60k messages. This works well for queues and topics.
Quite new to RabbitMQ and I'm trying to see if I can achieve what I need with it.
I am looking for the Worker Queues pattern but with one caveat. I want to have only a single worker running concurrently per routing key.
An example for clarification:
If i send the following messages with routing keys by order: a, a, b, c, I want to have only 3 workers running concurrently. When the first a message is received a worker picks it up and handles it.
When the next a message is received and the previous a message is still handled (not acknowledged) the new a message should wait in queue. When the b and c messages are received they each get a worker handling them. When the first a message is acknowledged any worker can pick up the next a message.
Would that pattern be possible using RabbitMQ in a natural way (without writing any application code on my side to handle the locking and stuff...)
Edit:
Another clarification. All workers can and should handle all messages, and I don't want to have a queue per Worker as I want to share the load between them, and the Publisher doesn't know which Worker should process the message. But I do want to make sure that no 2 Workers are working on messages sharing the same key at the same time.
For example, if I have a Publisher publishing messages with a userId field, I want to make sure no 2 Workers are handling messages with the same userId at the same time.
Edit 2
Expanding on the userId example. Let's say I have a single Publisher and 3 Workers. The publisher publishes messages like these: { userId: 1, text: 'Hello' }, with varying userIds. My 3 Workers all do the same thing to this messages, so I can have any of them handle the messages coming in. But what I'm trying to achieve is to have only a single worker processing a message from a certain user at the same time. If a Worker has received a message with userId 1 and is still processing it, and another message with userId 1 is received I want to make sure no other Worker picks up that message. But other messages coming in with different userIds should be processed by other available Workers.
userIds are not known beforehand, and the publisher doesn't know how many workers are or anything specific about them, he just wants to schedule the messages for processing.
what your asking is not possible with routing keys, but is built into queues with a few settings.
if you define "queue_a" for a messages, "queue_b" for b messages, etc, you can then have as many consumers connect to it as you want.
RabbitMQ will only deliver a given message to a single consumer of a given queue.
The way it works with multiple consumers on a single queue is basic round-robin style dispatch of the messages. that is, the first message will be delivered to one of the consumers, and the next message (assuming the first consumer is still busy) will be delivered to the next consumer.
So, that should satisfy the need to deliver the message to any given consumer of the queue.
To ensure your messages have an equal chance of getting to any of the consumer (and are not all delivered to the same consumer all the time), there are a few other settings you should put in place.
First, make sure to set the message consumer no ack setting to false (sometimes called "auto ack"). This will force you to ack the message from your code.
Lastly, set the "consumer prefetch" limit of the consumer to 1.
With this combination of settings, a single consumer will retrieve a single message and begin working on it. While that consumer is working, any message waiting in the queue will be delivered to other consumers if any are available. If there are none available, the message will wait in the queue until a consumer is available.
With this, you should be able to achieve the behavior you are wanting, on a given queue.
...
Keep in mind this only applies to queues, though. routing keys cannot be managed this way. all matched routing keys from an exchange will cause a copy of the message to be sent to the destination queue.
I have a queue in Azure storage named for example 'messages'. And every 1 hour some service push to this queue some amount of messages that should update data. But, in some cases I also push to this queue message from another place and I want this message be proceeded immediately and I can not set priority for this message.
What is the best solution for this problem?
Can I use two different queues ('messages' and 'messages-priority') or it is a bad approach?
The correct approach is to use multiple queues - a 'normal priority' and a 'high priority' queue. What we have implemented is multiple queue reader threads in a single worker role - each thread first checks the high priority queue and, if its empty, looks in the normal queue. This way the high priority messages will be processed by the first available thread (pretty much immediately), and the same code runs regardless of where messages come from. It also saves having to have a reader continuously looking in a single queue and having to be backed off because there are seldom messages.