Acknowledging remaining messages for consumer in RabbitMQ - python-3.x

I have a queue and 3 consumers bind to the queue. Each consumer has a prefetch_count of 250(or say X) and manual acknowledgement is done prefetch_count(X)/2(i.e 125) messages - meaning consumer manually acknowledges 125 messages in a single go (which helps to reduce round-trip time and hence increases performance). Everything is working fine as expected but the only issue arises when there are no new messages in the queue and the consumers have some unacknowledged messages whose count is less 125.
As the acknowledgement is only sent when the count is 125, these unacknowledged messages keeps requeuing. How can I solve this ?
How can I know that my consumer has no new messages to process and can acknowledge all the remaining messages waiting to be acknowledged.

If I understand your scenario correctly, it sounds as though you have a series of messages that get published all at once, and then you process them in batches of 250 at a time until you have none left. The problem is, that if you don't have a number of messages that is divisible by 125, then your final batch never gets acknowledged. Clearly this is a logical problem, but it sounds like you are wondering if there is an easy way to deal with it.
Your question "How can I know that my consumer has no new messages to process?" is based upon a premise which RabbitMQ does not support -- namely, the "end" of a sequence of messages. RabbitMQ consumers expect to continue to receive messages indefinitely, so from their perspective, there is no such thing as "done."
Thus, any such concept must be implemented elsewhere, higher up in your application logic. Here are some options for you to consider:
If you know in advance how many messages will be processed, then send that count first and store. Send the final ack once you have processed that number (assuming no duplicates were processed).
Monitor the in-memory collection at the consumer (all pre-fetched messages reside here until they are actually processed). When it drops below 125, you know that you have a batch size less than that.
Similar to #1, send a special "last message" that the consumer can receive and know to acknowledge upon receipt.
Caveat: I would argue that you have some deeper design problem going on that is leading down the path where it would ever be desirable to do this in the first place. Each message should be 100% independent of any other message. If that assumption is violated, you will have a very fragile system.

Related

Best Practice for Batch Processing with RabbitMQ

I'm looking for the best way to preform ETL using Python.
I'm having a channel in RabbitMQ which send events (can be even every second).
I want to process every 1000 of them.
The main problem is that RabbitMQ interface (I'm using pika) raise callback upon every message.
I looked at Celery framework, however the batch feature was depreciated in version 3.
What is the best way to do it? I thinking about saving my events in a list, and when it reaches 1000 to copy it to other list and preform my processing. However, how do I make it thread-safe? I don't want to lose events, and I'm afraid of losing events while synchronising the list.
It sounds like a very simple use-case, however I didn't find any good best practice for it.
How do I make it thread-safe?
How about set consumer prefetch-count=1000. If a consumer's unack messages reach its prefetch limit, rabbitmq will not deliver any message to it.
Don't ACK received message, until you have 1000 messages, then copy it to other list and preform your processing. When your job done, ACK the last message, and all message before this message will be ACK by rabbitmq server.
But I am not sure whether large prefetch is the best practice.
First of all, you should not "batch" messages from RabbitMQ unless you really have to. The most efficient way to work with messaging is to process each message independently.
If you need to combine messages in a batch, I would use a separate data store to temporarily store the messages, and then process them when they reach a certain condition. Each time you add an item to the batch, you check that condition (for example, you reached 1000 messages) and trigger the processing of the batch.
This is better than keeping a list in memory, because if your service dies, the messages will still be persisted in the database.
Note : If you have a single processor per queue, this can work without any synchronization mechanism. If you have multiple processors, you will need to implement some sort of locking mechanism.

Message Collapsing

I'm trying to determine if there's a way for Azure Service Bus to provide message collapsing. Specifically I'm after something like:
First event into a queue gets picked up straight away
All other events that are queued within the next N seconds, and match some criteria (e.g. matching message ids), have the schedule enqueue set to a value so they fire at the end of the N seconds. If a "waiting" message already exists it should be deleted.
After the N seconds has expired the newest scheduled message appears and is picked up.
Basically I need a way to get a good time-to-first-event, but provide protection from over processing events from chatty sources.
Does anyone have a pattern they've used to get something close to these semantics?
Update 1
The messages involved aren't true duplicates, rather they're the current state of an entity that is used for some processing (e.g. a message that's generated each time a file is updated). The result of the processing of an early message is fully replaced by that of later messages (e.g. the result is the size of the file). So we still need to guarantee we process the most recent message, but it's a waste to process all M within N seconds.
It sounds like you're talking about Duplicate Detection, especially in regards to matching MessageIds. If you want to evaluate some other attribute in the message for duplicate detection, maybe it's worth taking a step back and asking Why are my publishers sending so many duplicate messages? If it's unavoidable, maybe you can segregate your chatty consumers into a separate consumer group and manually handle the the duplicate check, then re-enqueue (just thinking out loud).

how to process hundreds of JMS message from 2 queues, response time of 1 second and 1 minute respectively

I have business requirement where I have to process messages in a certain priority say priority1 and priority2
We have decided to use 2 JMS queues where priority1 messages will be sent to priority1Queue and priority2 messages will be sent to priority2Queue.
Response time for priority1Queue messages is that the moment message is in Queue, I need to read, process and send the response back to say another queue in 1 second. This means I should immediately process these messages the moment they are in priority1Queue, and I will have hundreds of such messages coming in per second on priority1Queue so I will definitely need to have multiple concurrent consumers consuming messages on this queue so that they can be processed immediately when they are in the queue(consumed and processed within 1 second).
Response time for priority2Queue messages is that I need to read, process and send the response back to say another queue in 1 minute. So the response time of priority2 is lower to priority1 messages however I still need to respond back in a minute.
Can you suggest best possible approach for this so that I can concurrently read messages from both the queue and give higher priority to priority1 messages so that each priority1 message can be read and processed in 1 second.
Mainly how it can be read and fed to a processor so that the next message can be read and so on.
I need to write a java based component that does the reading and processing.
I also need to ensure this component is highly available and doesn't result in OutOfMemory, I will be having this component running across multiple JVMS and multiple application servers thus I can have multiple clusters running this Java component
First off, the requirement to process within 1 second is not going to be dependent on your messaging approach, but more about the actual processing of the message and the raw CPUs available. Picking up 100s of messages per second from a queue is child's play, the JMS provider is most likely not the issue. Depending on your deployment platform (Tomcat, Mule, JEE, whatever), there should be a way to have n listeners to scale up appropriately. Because the messages exist on the queue until you pick it up, doubtful you'll run out of memory. I've done these apps, processed many more messages without problems.
Second, number of strategies for prioritizing messages, not necessarily requiring different queues, using priorities. I'm leaning towards using message priorities and message filters, where one group of listeners take care of the highest priority messages and another listener filters off lower priority but makes sure it does enough to get them out within a minute.
You could also do something where a lower priority message gets rewritten back to the same queue with a higher priority, based on how close to 1 minute you are. I know that sounds wrong, but reading/writing from JMS has very little overhead (at least compared to do the equivalent, column-driven database transactions), but the listener for lower priority messages could just continually increase the priority until it has to be processed.
Or simpler, just have more listeners on the high priority queue/messages than the lower priority ones, and imbalance in number of processes for messages might be all it needs.
Lots of possibilities, time for a PoC.

How to check if message is dropped due to HWM at send in ZeroMQ PUB-SUB pattern

I have implemented a message bus in Linux for IPC using ZeroMQ (more specifically CZMQ). Here is what I have implemented.
My question is, how do I know that send dropped the message when the publisher buffer is full?
In my simple test setup, I am using a publisher-subscriber with a proxy. I have a fast sender and a very slow receiver causing messages to hit HWM and drop on send. My exception is that send would fail with 'message dropped' error, but it is not the case. the zmq_msg_send() is not giving me any error even though the messages get dropped (I can verify this by seeing gaps in messages in subscriber end).
How can I know when the messages get dropped? If this is the intended behaviour and ZeroMQ does not let us know that, what is a workaround to find if my send dropped the message?
What you appear to be asking for is fault tolerance for which PUB/SUB isn't ideal. Not only may the HWM be reached, but consider what happens if a subscribing client dies and gets restarted - it will miss messages sent by the publisher for the duration. FWIW. In ZMQ v2, the default HWM was infinite for PUB/SUB, but got changed to 1000 in v3 because systems were choking for memory due to messages being queued faster than they could be sent. The 1000 seemed like a reasonable value for bursts of messages when the average message rate was within the network bandwidth. YMMV.
If you just want to know when messages get dropped, it's as simple as adding an incrementing message number to the message and having the subscribers monitor that. You could choose to place this number in it's own frame or not; overall simplicity will be the decider. I don't believe it's possible to determine when messages get dropped specifically because the HWM has been reached.
By default zeromq pub/sub from recent versions defaults to a high-water mark ZMQ_SNDHWM/ZMQ_RCVHWM of 1000 messages.
What this means is if you burst in a tight loop more than 1000 messages it will prob drop some. It is simple to write a test and give each message a payload with a sequence number.
One option is to set both the HWMs to 0. This will mean it's infinite.
You can play about with this using some examples I wrote recently:
https://gist.github.com/easytiger/992b3a29eb5c8545d289
https://gist.github.com/easytiger/e382502badab49856357
The will pub and sub on a tport in a burst of messages. If you play with the HWM you can see in big bursts that if it isn't 0 it will drop a great many

Is there any issue with resending a message back to the Azure queue

I've got a scheduler and some workers in Azure. The scheduler puts messages into a queue and the workers pull those messages and work on them. I've now just come into a scenario where I will need to move some data from table storage to our database once a certain threshold has been reached. These items need to be processed in order, oldest first. Once that threshold is met all the other items are processed in order. The current message that triggered the transfer needs to be stuffed at the end of the line and be reprocessed.
So, to the meat of my question...
Is it fine to simply resend the message to the queue as is or is there a potential for that to cause problems?
queueProvider.SendMessage(message);
A co-worker mentioned that he "though he might have read something about needing to do something special." I haven't seen anything to confirm his suspicions yet however so I thought I would pose the question here just to be safe.
The short answer is that it is fine. If you have a CloudQueueMessage, you can just send it to any queue (it is just a REST request at the end of the day). Every time you AddMessage(), it creates a new ID (might be same pop receipt but that doesn't matter). That being said, there are some things you might want to take care of and or investigate:
If you push a message onto one queue, pop it, and push to another queue or same queue, you should probably delete the first message off the queue. Merely popping it means that you have set the invisibility time out, but that it will reappear soon (and you now have identical message content on each queue). So, if I pop a message and immediately push it again, I now have 2 messages in the queue with identical content.
You can now update messages. This might be appropriate for you if you need ordering. You can indicate on the message itself in metadata or content what stage of processing it is in and you get some ordering here with a thoughtful implementation.
It is recommended that all logic inside the consumer of the queue be idempotent since a message can actually be picked up more than once. We have to keep in mind that the queue service guarantees that a message will be delivered, AT LEAST ONCE - so you could end up duplicating messages with this approach.

Resources