Camel: How to process IMAP emails in parallel? - multithreading

I am trying to create a Camel route that will process incoming IMAP messages in parallel. The mail component should distribute incoming mails to different threads (but every message should pass the two process steps in order).
Something like this:
from("imap://...")
.threads(4)
.process(new FirstProcessor())
.process(new SecondProcessor());
This seems to send new message to different threads, but not in parallel (thread n+1 starts after thread n finishes). How can I achieve parallel processing here?

This is not supported by the camel-mail consumer. It processes the mails in sequence using the same thread on the consumer side.
You need to use wireTap or store the message to a seda queue in no wait mode etc.

Related

Azure Service Bus - MaxConcurrentCallsPerSession > 1 doesnt garantee FIFO processing queue with sessions

I'm trying to proccess multiple messages at the same time from the same session and wanna FIFO guaranteed, It does work only with processor MaxConcurrentCallsPerSession = 1 on ServiceBusSessionProcessorOptions.
When I'm trying MaxConcurrentCallsPerSession > 1 my message handler receive any message from session with no order.
So, if I want to garantee FIFO ordering processing a session, does It work only with serial processing?
You cannot process messages in a specifc order and process many messages at the same time.
Even if you read the messages from the queue in order, there is no control over how long each message takes to process. The end processing time for each message would then appear to be random if you read messages concurrently.

Best Practice for Batch Processing with RabbitMQ

I'm looking for the best way to preform ETL using Python.
I'm having a channel in RabbitMQ which send events (can be even every second).
I want to process every 1000 of them.
The main problem is that RabbitMQ interface (I'm using pika) raise callback upon every message.
I looked at Celery framework, however the batch feature was depreciated in version 3.
What is the best way to do it? I thinking about saving my events in a list, and when it reaches 1000 to copy it to other list and preform my processing. However, how do I make it thread-safe? I don't want to lose events, and I'm afraid of losing events while synchronising the list.
It sounds like a very simple use-case, however I didn't find any good best practice for it.
How do I make it thread-safe?
How about set consumer prefetch-count=1000. If a consumer's unack messages reach its prefetch limit, rabbitmq will not deliver any message to it.
Don't ACK received message, until you have 1000 messages, then copy it to other list and preform your processing. When your job done, ACK the last message, and all message before this message will be ACK by rabbitmq server.
But I am not sure whether large prefetch is the best practice.
First of all, you should not "batch" messages from RabbitMQ unless you really have to. The most efficient way to work with messaging is to process each message independently.
If you need to combine messages in a batch, I would use a separate data store to temporarily store the messages, and then process them when they reach a certain condition. Each time you add an item to the batch, you check that condition (for example, you reached 1000 messages) and trigger the processing of the batch.
This is better than keeping a list in memory, because if your service dies, the messages will still be persisted in the database.
Note : If you have a single processor per queue, this can work without any synchronization mechanism. If you have multiple processors, you will need to implement some sort of locking mechanism.

Spring integration queue, any way to get current queue size?

We have scenario that lots of message from external system need to be processed async, current design is to have a job wake up every 5 mins to pull msg from external system, and then persist raw msg, and then send msg id to ExecutorChannel, so consumer(potentially many) can consume from channel.
The problem we are facing is how to deal with system crash while msgs in queue, somehow every time job wake up, we will need to look into our DB to find out if there is any raw msgs not in queue already.
The easiest way is to query current queue size and find out if there are more raw msg than msg in queue. So question I have is: is any API for ExecutorChannel to find out size of queue? or any other suggestion?
Thx
Jason
Spring Integration itself doesn't maintain a queue within an ExecutorChannel; the messages are executed by the underlying Executor.
If you are using a Spring ThreadPoolTaskExecutor which is dedicated to the channel, you could drill down to the channel's underlying ThreadPoolTaskExecutor's ThreadPoolExecutor, and get a handle to its BlockingQueue (getQueue()) and get it's count.
However, you'd have to add the active task count as well.
The total count would be approximate, though because the ThreadPoolExecutor has no atomic method to get a count of queued and active tasks.

How to improve perfomance using multithreading?

I've got a program which receives string messages from other applications and parses them using VCL.
Messages are sent as follows:
AtomId := GlobalAddAtom(PChar(s));
SendMessage(MyProgramHandle, WM_MSG, 0, AtomID);
GlobalDeleteAtom(AtomID);
My program receives this message, parses it for some time, and then returns control to an application.
It takes time to parse one message so perfomance of other applications worsens.
One possible solution is to create form with the same caption and the same class in other thread, and rename class of main form.
But as far as I know it isn't recommended to create forms in threads.
So, what are possible ways to improve perfomance?
The typical approach would be to create a worker thread (or a pool of worker threads). The main thread will continue to receive the messages, but instead of parsing them it will just add them to a queue (a linked list, for example).
The worker thread takes the first element in the queue and processes it. When done it goes back to the queue to get the next element.
Since the queue is a shared resource between multiple threads you have to control access to it. A mutex will ensure that only one thread gets access to the queue at any given time.
Good luck.
So the problem is that both the receiving of the messages and the VCL operations are done in the same thread (the main VCL thread)? And so the receiving and processing are serialized and as result the senders are blocked while your app is busy filling the grid? Then I can understand that you ask for a way to move the receiving to a different window message loop.
So I would create a window (not a VCL form) only for the purpose to receive messages and use its message loop to add message to a queue. So you only need to find this (non-VCL) window and SendMessage to its handle. In the VCL thread, a Timer could fetch the next "n" messages and add them to the grid.

Signalling a producer task from a consumer task when working with a BlockingCollection

I have a pretty basic application that uses a Producer task and a Consumer task to work with files. It is based off the example here http://msdn.microsoft.com/en-us/library/dd267312.aspx
The basics of the program is that the Producer task enumerates the files on my hard drive and calculates their hash values and does a few other things. Once the Producer has finished working with a file, it Enques the file and the Consumer then grabs it.
The Consumer task has to connect to a remote server and attempt to upload the file. However, if the Consumer encounters an error, such as, not being able to connect to the remote server I need it to signal the Producer task that it should stop what it is doing and terminate. If the server is down, or goes down, there is no need for the Producer to continue cycling through thousands of files.
I have seen plenty of samples of signalling the Consumer task from the Producer task by using .CompleteAdding() on the BlockingCollection object but I am lost as to how to send a signal to the Producer from the Consumer that it should stop producing.
You could use a return queue. If one of the items generates an error/exception, you could load it up with error data and queue it back to the producer. The producer should TryTake() from the return queue just before generating a new item and handle any returned item appropriately. This beats using some atomic boolean by enabling the item to signal back with extended error information that could be used to decide what action to take - the producer may not always want/need to stop. Also, you could then queue up errored items to a GUI list and/or logger.
It's tempting to say that the consumer should return items anyway, whether they are errored or not, so that they can be re-used insted of creating new ones all the time. This, however, intruduces latency in detecting/acting on errors unless you use two return queues to prioritize error returns.
Oh - another thing - using the above design, if it has to stop, the producer could retain errored items in a local queue an re-issue one occasionally. If the server comes back up, (as indicated by the return of a successful item), the producer could re-issue the errored jobs from the local queue again before generating any more new ones. With care, this could make your upload system resilient to server reboots.

Resources