Achieving Concurrency using Pika and python2.7 -- - multithreading

I have an application that lets users upload upto 200 documents/resumes.
The objective of the application is to return a parsed and scored result for each of these documents.
The front end splits these 200 documents into batches of 10. i.e 20 messages are put into a queue(RabbitMQ).
I have a 6 worker processes listening to the queue( scripts that are triggered by an entry_point).
The workers take the message and splits the resumes if it is a batch message. If it is not a batch message, the worker starts processing the message. ( the average time for the processing is around 8 secs).
The queue gets piled up with 200 resumes and the 6 workers get 5 messages. Processing each message sequentially.
Which means, if another user uploads even 1 resume,one of the workers needs to reach the end of the queue to pick that message and the user is left waiting till the processing of the 200 resumes.
I'm doing this using Rabbitmq,and python2.7.
I'm using a blockingconnection to connect to the queue and process the messages.
The only way to get to the last user's message is to complete the processing of all the message as fast as i can, which could mean more processes or more containers. When i fire up more proceatses using multiprocessing (pool of 6 workers), the cpu utilization is at the highest and cannot handle any more messages.
How can i prevent my users from waiting for the response. Is adding more workers to listen and consume from the queue the only way?
The consumer is just a plain consumer with no API. The tasks are directly picked from the queue and processed.
More workers i add, the faster the queue is consumed. But still the user that had uploaded probably last still has to wait for a long time.

Related

socket.io | losing messages due to frequency and volume

I have around 5700 messages (each message is a 100x100 image as a Base64 string) which I emit from the server to the client from within a for-loop, pretty fast:
[a pretty big array].forEach((imgAsBase64) => {
io.emit('newImgFromServer', imgAsBase64)
})
The client only receives from 1700 to 3000 of them in total, before I get a:
disconnected due to = transport error
socket connected
Once the socket re-connects (and the for-loop has not ended) the emission of new messages from within the loop resumes but I have lost those previous ones forever.
How can I make sure that the client receives all of the messages every time ?
This question is an interesting example of "starving the event loop". If you're in a tight for loop for some period of time with no await in the loop, then you don't let the event loop process any other events during the duration of the for loop. If some events need to be processed during that time for things to work properly, you get problems. Read on for how that applies to this case.
Both client and server need some occasional cycles to process housekeeping pings and pongs in the socket.io protocol. If you firehose messages from one end to the other in a non-stop for loop, you can starve the ability to process those housekeeping messages and it will think that it has timed out (not received the housekeeping messages when it should have which is usually a sign of a lost or inoperative connection). In reality, the housekeeping messages are sitting in the event loop waiting to be processed, but if you never give the event loop a chance to process them, some other code running in the for loop will think that they never arrived.
So, you have to make sure you give both ends enough occasional cycles to process those housekeeping messages. The typical way to do that is to just make sure that you aren't fire hosing messages. Send N messages, then pause for a short period of time (enough time for the event loop to be able to service any incoming network events). Then send N more, pause, etc...
In addition, you could make this whole process a lot more efficient by combining a number of the Base64 strings into a single message. You can probably just put them into an array of 100 of them and send that array of 100 and repeat until they are all sent. Then, obviously change the client to expect an array of Base64 strings instead of just a single one. This will obviously result in a lot fewer messages to send (which is more efficient), but you will still need to pause every so often to let the server process things in the event loop.
Exactly how many messages to send before pausing is something that could be figured out via trial and error, but if you put 100 images into a single message and send 10 of these larger messages (which sends 1,000 images) and then pause for even just 50ms, that should be enough time for the event loop to service any inbound ack messages from socket.io to avoid the timeout. Any sort of pause using setTimeout() makes the setTimeout() get in line behind most other messages that are waiting in the event loop so even a short pause with setTimeout() tends to accomplish the goal of letting the event loop process the things that were waiting to be run.
If end-to-end time was super important, you could experiment with sending more messages at once and/or changing the pause time, but you don't want to end with a setting that is close to where you get a timeout (you want some safety factor).

Azure Service Bus - MaxConcurrentCallsPerSession > 1 doesnt garantee FIFO processing queue with sessions

I'm trying to proccess multiple messages at the same time from the same session and wanna FIFO guaranteed, It does work only with processor MaxConcurrentCallsPerSession = 1 on ServiceBusSessionProcessorOptions.
When I'm trying MaxConcurrentCallsPerSession > 1 my message handler receive any message from session with no order.
So, if I want to garantee FIFO ordering processing a session, does It work only with serial processing?
You cannot process messages in a specifc order and process many messages at the same time.
Even if you read the messages from the queue in order, there is no control over how long each message takes to process. The end processing time for each message would then appear to be random if you read messages concurrently.

segmentation fault, and queue not clearing sometimes in beaglebone black debian OS

I use three different threads to read can messages from socket CAN raw and to write can messages to another socket CAN raw, The message read every 2 seconds are put onto queue and retrieved from queue on another thread, mutex thread functions are used, The idea is to write the CAN message every 2 seconds as read. But CAN messages are written every 0.3 millisecond by constantly retrieving from the queue. when the message is only read and put onto queue on every 2 second, so queue the message that is retrieved is not getting removed and remains stagnant in the queue. Every time I try to increase the msgsize, the program shows segmentation fault or automatically killed by the OS. How do i go about debugging this issue. please help, thank you

how to process hundreds of JMS message from 2 queues, response time of 1 second and 1 minute respectively

I have business requirement where I have to process messages in a certain priority say priority1 and priority2
We have decided to use 2 JMS queues where priority1 messages will be sent to priority1Queue and priority2 messages will be sent to priority2Queue.
Response time for priority1Queue messages is that the moment message is in Queue, I need to read, process and send the response back to say another queue in 1 second. This means I should immediately process these messages the moment they are in priority1Queue, and I will have hundreds of such messages coming in per second on priority1Queue so I will definitely need to have multiple concurrent consumers consuming messages on this queue so that they can be processed immediately when they are in the queue(consumed and processed within 1 second).
Response time for priority2Queue messages is that I need to read, process and send the response back to say another queue in 1 minute. So the response time of priority2 is lower to priority1 messages however I still need to respond back in a minute.
Can you suggest best possible approach for this so that I can concurrently read messages from both the queue and give higher priority to priority1 messages so that each priority1 message can be read and processed in 1 second.
Mainly how it can be read and fed to a processor so that the next message can be read and so on.
I need to write a java based component that does the reading and processing.
I also need to ensure this component is highly available and doesn't result in OutOfMemory, I will be having this component running across multiple JVMS and multiple application servers thus I can have multiple clusters running this Java component
First off, the requirement to process within 1 second is not going to be dependent on your messaging approach, but more about the actual processing of the message and the raw CPUs available. Picking up 100s of messages per second from a queue is child's play, the JMS provider is most likely not the issue. Depending on your deployment platform (Tomcat, Mule, JEE, whatever), there should be a way to have n listeners to scale up appropriately. Because the messages exist on the queue until you pick it up, doubtful you'll run out of memory. I've done these apps, processed many more messages without problems.
Second, number of strategies for prioritizing messages, not necessarily requiring different queues, using priorities. I'm leaning towards using message priorities and message filters, where one group of listeners take care of the highest priority messages and another listener filters off lower priority but makes sure it does enough to get them out within a minute.
You could also do something where a lower priority message gets rewritten back to the same queue with a higher priority, based on how close to 1 minute you are. I know that sounds wrong, but reading/writing from JMS has very little overhead (at least compared to do the equivalent, column-driven database transactions), but the listener for lower priority messages could just continually increase the priority until it has to be processed.
Or simpler, just have more listeners on the high priority queue/messages than the lower priority ones, and imbalance in number of processes for messages might be all it needs.
Lots of possibilities, time for a PoC.

Azure Storage Queue - long time to process

I need to generate quite a number of reports and a report can take about 5 minutes to be generated, large amount of data, many different sources.
The client will post messages to an Azure Storage Queue. There is a worker roles that processes the messages and generates the reports.
If I want to scale this up let's say I end up with 10 worker roles that will process the messages from the queue and generate the reports. Then I will add messages into the queue like this:
message 1: process reports from 1 - 5
message 2: process reports from 6 - 11
........
message 10: process reports from 50 - 55 (might not be accurate the range)
If my worker role 1 will take the first message and put a lock on it but the process will take 5 minutes, the lock will expire and the message will be visible again in the queue so the worker role 2 will take it and start processing it ... and so forth
How can I avoid that consuming the queue message is done only once keeping in mind that the task is a long one?
First of all: Using Azure Storage queues, you should be prepared for all of your operations to be idempotent: In case your queue item is processed multiple times, the same result should happen each time. The reason I bring this up: There's simply no way to guarantee you'll process the message one time (unless you check the DequeueCount property of the message and halt processing accordingly), due to unexpected events such as your role instance crashing/rebooting or your queue item processing code doing something unexpected like throwing an exception.
Next: Queue message invisibility timeout can be programmatically extended. This can be done via the queue api or via one of the language sdk's. In c# (something like this - I didn't test this), extending an additional minute:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(60),
MessageUpdateFields.Visibility);
You can also modify the message along the way (maybe as a hint to your code, to let you know which of the 5 reports has been complete. This should help your specific issue: In the event the message gets reprocessed, you don't have to process all five reports if the message has been modified to say something like "process reports from 3-5"). Note: You can combine the MessageUpdateFields flags via |:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(0),
MessageUpdateFields.Content);
Lastly: If you're concerned with the length of time taken to process a batch of reports, perhaps rethink why you're processing five reports in each message, vs. one report per message. You can always read queue messages in batches. This is getting a bit subjective, as there's really no right or wrong way to do it, but it's just something for you to think about.

Resources