Spring integration queue, any way to get current queue size? - spring-integration

We have scenario that lots of message from external system need to be processed async, current design is to have a job wake up every 5 mins to pull msg from external system, and then persist raw msg, and then send msg id to ExecutorChannel, so consumer(potentially many) can consume from channel.
The problem we are facing is how to deal with system crash while msgs in queue, somehow every time job wake up, we will need to look into our DB to find out if there is any raw msgs not in queue already.
The easiest way is to query current queue size and find out if there are more raw msg than msg in queue. So question I have is: is any API for ExecutorChannel to find out size of queue? or any other suggestion?
Thx
Jason

Spring Integration itself doesn't maintain a queue within an ExecutorChannel; the messages are executed by the underlying Executor.
If you are using a Spring ThreadPoolTaskExecutor which is dedicated to the channel, you could drill down to the channel's underlying ThreadPoolTaskExecutor's ThreadPoolExecutor, and get a handle to its BlockingQueue (getQueue()) and get it's count.
However, you'd have to add the active task count as well.
The total count would be approximate, though because the ThreadPoolExecutor has no atomic method to get a count of queued and active tasks.

Related

Control Azure Service Bus Queue Message Reception

We have a distributed architecture and there is a native system which needs to be called. The challenge is the capacity of the system which is not scalable and cannot take on more load of requests at same time. We have implemented Service Bus queues, where there is a Message handler listening to this queue and makes a call to the native system. The current challenge is whenever a message posted in the queue, the message handler is immediately processing the request. However, We wanted to have a scenario to only process two requests at a time. Pick the two, process it and then move on to the next two. Does Service Bus Queue provide inbuilt option to control this or should we only be able to do with custom logic?
var options = new MessageHandlerOptions()
{
MaxConcurrentCalls = 1,
AutoComplete = false
};
client.RegisterMessageHandler(
async (message, cancellationToken) =>
{
try
{
//Handler to process
await client.CompleteAsync(message.SystemProperties.LockToken);
}
catch
{
await client.AbandonAsync(message.SystemProperties.LockToken);
}
}, options);
Message Handler API is designed for concurrency. If you'd like to process two messages at any given point in time then the Handler API with maximum concurrency of two will be your answer. In case you need to process a batch of two messages at any given point in time, this API is not what you need. Rather, fall back to building your own message pump using a lower level API outlined in the answer provided by Mikolaj.
Careful with re-locking messages though. It's not a guaranteed operation as it's a client-side operation and if there's a communication network, currently, the broker will reset the lock and the message will be processed again by another competing consumer if you scale out. That is why scaling-out in your scenario is probably going to be a challenge.
Additional point is about lower level API of the MessageReceiver when it comes to receiving more than a single message - ReceiveAsync(n) does not guarantee n messages will be retrieved. If you absolutely have to have n messages, you'll need to loop to ensure there are n and no less.
And the last point about the management client and getting a queue message count - strongly suggest not to do that. The management client is not intended for frequent use at run-time. Rather, it's uses for occasional calls as these calls are very slow. Given you might end up with a single processing endpoint constrained to only two messages at a time (not even per second), these calls will add to the overall time to process.
From the top of my head I don't think anything like that is supported out of the box, so your best bet is to do it yourself.
I would suggest you look at the ReceiveAsync() method, which allows you to receive specific amount of messages (NOTE: I don't think it guarantees that if you specify that you want to retrieve 2 message it will always get you two. For instance, if there's just one message in the queue then it will probably return that one, even though you asked for two)
You could potentially use the ReceiveAsync() method in combination with PeekAsync() method where you can also provide a number of messages you want to peek. If the peeked number of messages is 2 than you can call ReceiveAsync() with better chances of getting desired two messages.
Another way would be to have a look at the ManagementClient and the GetQueueRuntimeInfoAsync() method of the queue, which will give you the information about the number of messages in the queue. With that info you could then call the ReceiveAsync() mentioned earlier.
However, be aware that if you have multiple receivers listening to the same queue then there's no guarantees that anything from above will work, as there's no way to determine if these messages were received by another process or not.
It might be that you will need to go with a more sophisticated way of handling this and receive one message, then keep it alive (renew lock etc.) until you get another message and then process them together.
I don't think I helped too much but maybe at least it will give you some ideas.

Best Practice for Batch Processing with RabbitMQ

I'm looking for the best way to preform ETL using Python.
I'm having a channel in RabbitMQ which send events (can be even every second).
I want to process every 1000 of them.
The main problem is that RabbitMQ interface (I'm using pika) raise callback upon every message.
I looked at Celery framework, however the batch feature was depreciated in version 3.
What is the best way to do it? I thinking about saving my events in a list, and when it reaches 1000 to copy it to other list and preform my processing. However, how do I make it thread-safe? I don't want to lose events, and I'm afraid of losing events while synchronising the list.
It sounds like a very simple use-case, however I didn't find any good best practice for it.
How do I make it thread-safe?
How about set consumer prefetch-count=1000. If a consumer's unack messages reach its prefetch limit, rabbitmq will not deliver any message to it.
Don't ACK received message, until you have 1000 messages, then copy it to other list and preform your processing. When your job done, ACK the last message, and all message before this message will be ACK by rabbitmq server.
But I am not sure whether large prefetch is the best practice.
First of all, you should not "batch" messages from RabbitMQ unless you really have to. The most efficient way to work with messaging is to process each message independently.
If you need to combine messages in a batch, I would use a separate data store to temporarily store the messages, and then process them when they reach a certain condition. Each time you add an item to the batch, you check that condition (for example, you reached 1000 messages) and trigger the processing of the batch.
This is better than keeping a list in memory, because if your service dies, the messages will still be persisted in the database.
Note : If you have a single processor per queue, this can work without any synchronization mechanism. If you have multiple processors, you will need to implement some sort of locking mechanism.

how to process hundreds of JMS message from 2 queues, response time of 1 second and 1 minute respectively

I have business requirement where I have to process messages in a certain priority say priority1 and priority2
We have decided to use 2 JMS queues where priority1 messages will be sent to priority1Queue and priority2 messages will be sent to priority2Queue.
Response time for priority1Queue messages is that the moment message is in Queue, I need to read, process and send the response back to say another queue in 1 second. This means I should immediately process these messages the moment they are in priority1Queue, and I will have hundreds of such messages coming in per second on priority1Queue so I will definitely need to have multiple concurrent consumers consuming messages on this queue so that they can be processed immediately when they are in the queue(consumed and processed within 1 second).
Response time for priority2Queue messages is that I need to read, process and send the response back to say another queue in 1 minute. So the response time of priority2 is lower to priority1 messages however I still need to respond back in a minute.
Can you suggest best possible approach for this so that I can concurrently read messages from both the queue and give higher priority to priority1 messages so that each priority1 message can be read and processed in 1 second.
Mainly how it can be read and fed to a processor so that the next message can be read and so on.
I need to write a java based component that does the reading and processing.
I also need to ensure this component is highly available and doesn't result in OutOfMemory, I will be having this component running across multiple JVMS and multiple application servers thus I can have multiple clusters running this Java component
First off, the requirement to process within 1 second is not going to be dependent on your messaging approach, but more about the actual processing of the message and the raw CPUs available. Picking up 100s of messages per second from a queue is child's play, the JMS provider is most likely not the issue. Depending on your deployment platform (Tomcat, Mule, JEE, whatever), there should be a way to have n listeners to scale up appropriately. Because the messages exist on the queue until you pick it up, doubtful you'll run out of memory. I've done these apps, processed many more messages without problems.
Second, number of strategies for prioritizing messages, not necessarily requiring different queues, using priorities. I'm leaning towards using message priorities and message filters, where one group of listeners take care of the highest priority messages and another listener filters off lower priority but makes sure it does enough to get them out within a minute.
You could also do something where a lower priority message gets rewritten back to the same queue with a higher priority, based on how close to 1 minute you are. I know that sounds wrong, but reading/writing from JMS has very little overhead (at least compared to do the equivalent, column-driven database transactions), but the listener for lower priority messages could just continually increase the priority until it has to be processed.
Or simpler, just have more listeners on the high priority queue/messages than the lower priority ones, and imbalance in number of processes for messages might be all it needs.
Lots of possibilities, time for a PoC.

How does one determine if all messages in an Azure Queue have been processed?

I've just begun tinkering with Windows Azure and would appreciate help with a question.
How does one determine if a Windows Azure Queue is empty and that all work-items in it have been processed? If I have multiple worker processes querying a work-item queue, GetMessage(s) returns no messages if the queue is empty. But there is no guarantee that a currently invisible message will not be pushed back into the queue.
I need this functionality since follow-up behavior of my workflow depends on completion of all work-items in that particular queue. A possible way of tackling this problem would be to count the number of puts and deletes. But this will again require synchronization at a shared storage level and I would like to avoid it if possible.
Any ideas?
Take a look at the ApproximateMessageCount method. This should return the number of messages on the queue, including invisible messages (e.g. the ones being processed).
Mike Wood blogged about this subtlety, along with a tidbit about the queue's Clear method, here.
That said: you might want to choose a different mechanism for workflow management. Maybe a table row, where you have your rowkey equal to some multi-queue-item transation id, and individual properties being status flags. This allows you to track failed parts of the transaction (say, 9 out of 10 queue items process ok, the 10th fails; you can still delete the 10th queue item, but set its status flag to failed, then letting you deal with this scenario accordingly). Also: let's say you use the same queue to process another 'transaction' (meaning the queue is again non-zero in length). By using a separate object like a Table Row, you can still determine that your 'transaction' is complete even though there are additional queue messages.
The best way is to have another queue, call it termination indicator queue, and put a message in that queue for every message your process from your main queue. That is how it is done in research projects too. Check this out http://www.cs.gsu.edu/dimos/content/gis-vector-data-overlay-processing-azure-platform.html

does multiple Azure worker role polling same Queue causes Dead Lock or Poison message

Scenario:
if I've spin off multiple Worker roles or ONE Worker role with multiple threads, which polls the new messages in Azure Queue.
Could someone please confirm if the this the correct design approach? The reason I would like to have many worker roles is to speed up the PROCESSJOB. Our application should be near real time, i.e. as soon as there are messages we should get, apply complex business rules and commit to AZURE DB. We are expecting 11,000 message per 3min.
Thank you.
You may have as many queue-readers as you like. It's very common to scale out worker role instances, as they can all read from the same queue, giving you much greater work throughput.
When you read a queue message, it's marked "invisible" for a period of time, to prevent others from reading and doing the same work. The owner of the message must delete it before the time period expires, otherwise the message becomes visible again, and an exception will be thrown when the original reader attempts to delete it. This means your operations must be idempotent.
There's no direct poison-message handling, but it's easy to implement, as each message has a dequeue count. Just check it and remove poison messages after being read 3-4 times. You can also dynamically adjust the timeout period based on dequeue count, as maybe the processing fails due to too-short a time window.
Here's the MSDN documentation for DequeueCount.
EDIT: As far as processing 11,000 messages in 3 minutes: the scalability target for queues is 500 2,000 TPS, or up to 360,000 transactions in 3 minutes (far beyond the 11,000 message requirement you have). You can speed things up further by combining messages into a single queue message, as well as reading multiple messages at a time, which will also reduce your transaction count. You can also look at the ApproximateMessageCount property of a queue to see if your queue is backing up (and then scaling out to additional intstances to help consume queue items).

Resources