segmentation fault, and queue not clearing sometimes in beaglebone black debian OS - multithreading

I use three different threads to read can messages from socket CAN raw and to write can messages to another socket CAN raw, The message read every 2 seconds are put onto queue and retrieved from queue on another thread, mutex thread functions are used, The idea is to write the CAN message every 2 seconds as read. But CAN messages are written every 0.3 millisecond by constantly retrieving from the queue. when the message is only read and put onto queue on every 2 second, so queue the message that is retrieved is not getting removed and remains stagnant in the queue. Every time I try to increase the msgsize, the program shows segmentation fault or automatically killed by the OS. How do i go about debugging this issue. please help, thank you

Related

socket.io | losing messages due to frequency and volume

I have around 5700 messages (each message is a 100x100 image as a Base64 string) which I emit from the server to the client from within a for-loop, pretty fast:
[a pretty big array].forEach((imgAsBase64) => {
io.emit('newImgFromServer', imgAsBase64)
})
The client only receives from 1700 to 3000 of them in total, before I get a:
disconnected due to = transport error
socket connected
Once the socket re-connects (and the for-loop has not ended) the emission of new messages from within the loop resumes but I have lost those previous ones forever.
How can I make sure that the client receives all of the messages every time ?
This question is an interesting example of "starving the event loop". If you're in a tight for loop for some period of time with no await in the loop, then you don't let the event loop process any other events during the duration of the for loop. If some events need to be processed during that time for things to work properly, you get problems. Read on for how that applies to this case.
Both client and server need some occasional cycles to process housekeeping pings and pongs in the socket.io protocol. If you firehose messages from one end to the other in a non-stop for loop, you can starve the ability to process those housekeeping messages and it will think that it has timed out (not received the housekeeping messages when it should have which is usually a sign of a lost or inoperative connection). In reality, the housekeeping messages are sitting in the event loop waiting to be processed, but if you never give the event loop a chance to process them, some other code running in the for loop will think that they never arrived.
So, you have to make sure you give both ends enough occasional cycles to process those housekeeping messages. The typical way to do that is to just make sure that you aren't fire hosing messages. Send N messages, then pause for a short period of time (enough time for the event loop to be able to service any incoming network events). Then send N more, pause, etc...
In addition, you could make this whole process a lot more efficient by combining a number of the Base64 strings into a single message. You can probably just put them into an array of 100 of them and send that array of 100 and repeat until they are all sent. Then, obviously change the client to expect an array of Base64 strings instead of just a single one. This will obviously result in a lot fewer messages to send (which is more efficient), but you will still need to pause every so often to let the server process things in the event loop.
Exactly how many messages to send before pausing is something that could be figured out via trial and error, but if you put 100 images into a single message and send 10 of these larger messages (which sends 1,000 images) and then pause for even just 50ms, that should be enough time for the event loop to service any inbound ack messages from socket.io to avoid the timeout. Any sort of pause using setTimeout() makes the setTimeout() get in line behind most other messages that are waiting in the event loop so even a short pause with setTimeout() tends to accomplish the goal of letting the event loop process the things that were waiting to be run.
If end-to-end time was super important, you could experiment with sending more messages at once and/or changing the pause time, but you don't want to end with a setting that is close to where you get a timeout (you want some safety factor).

Achieving Concurrency using Pika and python2.7 --

I have an application that lets users upload upto 200 documents/resumes.
The objective of the application is to return a parsed and scored result for each of these documents.
The front end splits these 200 documents into batches of 10. i.e 20 messages are put into a queue(RabbitMQ).
I have a 6 worker processes listening to the queue( scripts that are triggered by an entry_point).
The workers take the message and splits the resumes if it is a batch message. If it is not a batch message, the worker starts processing the message. ( the average time for the processing is around 8 secs).
The queue gets piled up with 200 resumes and the 6 workers get 5 messages. Processing each message sequentially.
Which means, if another user uploads even 1 resume,one of the workers needs to reach the end of the queue to pick that message and the user is left waiting till the processing of the 200 resumes.
I'm doing this using Rabbitmq,and python2.7.
I'm using a blockingconnection to connect to the queue and process the messages.
The only way to get to the last user's message is to complete the processing of all the message as fast as i can, which could mean more processes or more containers. When i fire up more proceatses using multiprocessing (pool of 6 workers), the cpu utilization is at the highest and cannot handle any more messages.
How can i prevent my users from waiting for the response. Is adding more workers to listen and consume from the queue the only way?
The consumer is just a plain consumer with no API. The tasks are directly picked from the queue and processed.
More workers i add, the faster the queue is consumed. But still the user that had uploaded probably last still has to wait for a long time.

Linux Message Queues - Multiple receivers

I've recently been investigating and playing around with linux message queues and have come across something that I don't quite understand why it happens!
If we have two programs running that are both using msgrcv() in an infinite for loop to check for messages and then send two messages, the first program running will receive the 1st message, and the second program the 2nd message? If you keep sending messages it then alternates between each receiver.
Obviously, I understand that as soon as one program has read the message it is removed from the queue but who/how is it decided who will receive the message if they are all infinitely checking?
Any help would be appreciated!
The short answer is that the kernel decides.
The long answer is that this is handled by the do_msgrcv() call within the Linux kernel. If there is no message available, the caller gets put on a queue until a message is available. It's not guaranteed to go back and forth like you describe, since it all depends on the timing of each msgrcv() call, but in your case, it will probably behave that way virtually all of the time.

ZMQ socket queue

I'm pretty new with ZMQ and I'm working with the NodeJS binding. I have an application that uses PUSH/PULL sockets. On one side I PUSH data to some nodes that through the PULL socket receive and process it. Sometimes I have to kill one or more nodes of my application, and it can happen that these nodes still have some data in the PULL socket to be processed. I don't want to lose this data, so I was wondering if there is a way to access ZMQ's PULL socket queue to check if there are still messages to be processed.
I actually couldn't find anything in the specs of ZMQ and the NodeJS binding, so maybe I'm getting the whole concept wrong.
If you kill a process then any data in that processes buffers will be lost.
Instead of killing the process forcefully, you should always find a way to allow processes to shut-down gracefully. Here, you can send a "KILL" message to the PULL socket; the process can then read that and exit when it receives it. If you can flush the socket buffer (depends if there are other processes still sending to it), you can do that and then exit when there are no more messages to read.
I'm posting the solution I found. It's not really a solution as I'm not using the ZMQ socket to check that there are no more messages in the queue, it's just a workaround/hack that came to my mind to make the thing work. I don't have time to write the queue handling by myself, so here's how I solved the problem:
Whenever the processes receive messages to process, they store a timestamp through new Date().getTime(). Whenever a process needs to be killed a kill message is sent to it. As the process receives the message, it starts a timeout with setInterval. Every x seconds (I put 10, can be more or less) the timeout fires a function that checks if the last received message is old enough (takes a timestamp, subtract this ts with the last one saved and if the result is greater that y, which in my case is 100 seconds, it is old enough). If it is, it means no more messages have been received (no more messages in the queue) so it kills the process, otherwise does nothing.

Is it possible to get a message from an azure storage queue twice?

I know that if a worker fails to process a message off of the queue that it will become visible again and you have to code against this (idempotent). But is it possible that a worker can dequeue a message twice? Based on my logging, I seem to be seeing this behavior and I'm not sure why. I'm even deleting the message in between going go get the next message and it seems like I got it again.
Yes, you can dequeue same message twice. This can happen for two reasons:
Worker A dequeues Message B and invisibility timeout expires. Message B becomes visible again and Worker C dequeues Message B, invalidating Worker A's pop receipt. Worker A finishes work, goes to delete Message B and error is thrown. This is most common.
In certain conditions (very frequent queue polling) you can get the same message twice on a GetMessage. This is a type of race condition that while rare does occur. Worker A and B are polling very quickly and hit the queue simultaneously and both get same message. This used to be much more common (SDK 1.0 time frame) under high polling scenarios, but it has become much more rare now in later storage updates (can't recall seeing this recently).
That being said - if you only have 1 worker popping messages, then you are queueing message twice. 1 and 2 only happen when you have more than 1 worker.
You shouldn't be able to dequeue it twice. And if I recall things properly, even deleting it twice shouldn't be possible because the pop receipt should change after the second dequeue and lock.
As SilverNinja suggests, I'd look to see if perhaps the message was inadvertantly queued twice.
Do you have more than one worker role?
It is possible (especially with processes that take a while) that the timeout on the queue item visibility could end before your role has finished processing whatever it is doing. In this case another identical role could pick up the same message (which is effectively what you need to allow for - you do not want it to be a problem if the same message is processed multiple times).
At this point the first role will finish and dequeue the message and then the other role that picked it up after the timeout will end and attempt to dequeue the message. Off the top of my head I don't recall what exactly happens when a role attempts to dequeue an already dequeued message.

Resources