Are message queues obsolete in linux? - linux

I've been playing with message queues (System V, but POSIX should be ok too) in Linux recently and they seem perfect for my application, but after reading The Art of Unix Programming I'm not sure if they are really a good choice.
http://www.faqs.org/docs/artu/ch07s02.html#id2922148
The upper, message-passing layer of System V IPC has largely fallen out of use. The lower layer, which consists of shared memory and semaphores, still has significant applications under circumstances in which one needs to do mutual-exclusion locking and some global data sharing among processes running on the same machine. These System V shared memory facilities evolved into the POSIX shared-memory API, supported under Linux, the BSDs, MacOS X and Windows, but not classic MacOS.
http://www.faqs.org/docs/artu/ch07s03.html#id2923376
The System V IPC facilities are present in Linux and other modern Unixes. However, as they are a legacy feature, they are not exercised very often. The Linux version is still known to have bugs as of mid-2003. Nobody seems to care enough to fix them.
Are the System V message queues still buggy in more recent Linux versions? I'm not sure if the author means that POSIX message queues should be ok?
It seems that sockets are the preferred IPC for almost anything(?), but I cannot see how it would be very simple to implement message queues with sockets or something else. Or am I thinking too complexly?
I don't know if it's relevant that I'm working with embedded Linux?

Personally I am quite fond of message queues and think they are arguably the most under-utilized IPC in the unix world. They are fast and easy to use.
A couple of thoughts:
Some of this is just fashion. Old things become new again. Add a shiny do-dad on message queues and they may be next year's newest and hottest thing. Look at Google's Chrome using separate processes instead of threads for its tabs. Suddenly people are thrilled that when one tab locks up it doesn't bring down the entire browser.
Shared memory has something of a He-man halo about it. You're not a "real" programmer if you aren't squeezing that last cycle out of the machine and MQs are marginally less efficient. For many, if not most apps, it is utter nonsense but sometimes it is hard to break a mindset once it takes hold.
MQs really aren't appropriate for applications with unbounded data. Stream oriented mechanisms like pipes or sockets are just easier to use for that.
The System V variants really have fallen out of favor. As a general rule go with POSIX versions of IPC when you can.

Yes, I think that message queues are appropriate for some applications. POSIX message queues provide a nicer interface, in particular, you get to give your queues names rather than IDs, which is very useful for fault diagnosis (makes it easier to see which is which).
Linux allows you to mount the posix message queues as a filesystem and see them with "ls", delete them with "rm" which is quite handy too (System V depends on the clunky "ipcs" and "ipcrm" commands)

I haven't actually used POSIX message queues because I always want to leave open the option to distribute my messages across a network. With that in mind, you might look at a more robust message-passing interface like zeromq or something that implements AMQP.
One of the nice things about 0mq is that when used from the same process space in a multithreaded app, it uses a lockless zero-copy mechanism that is quite fast. Still, you can use the same interface to pass messages over a network as well.

Biggest disadvantages of POSIX message queue:
POSIX message queue does not make it a requirement to be compatible with select().(It works with select() in Linux but not in Qnx system)
It has surprises.
Unix Datagram socket does the same task of POSIX message queue. And Unix Datagram socket works in socket layer. It is possible to use it with select()/poll() or other IO-wait methods. Using select()/poll() has the advantage when designing event-based system. It is possible to avoid busy loop in that way.
There is surprise in message queue. Think about mq_notify(). It is used to get receive-event. It sounds like we can notify something about the message queue. But it is actually registering for notification instead of notifying anything.
More surprise about mq_notify() is that it has to be called after every mq_receive(), which may cause a race-condition(when some other process/thread call mq_send() between the call of mq_receive() and mq_notify()).
And it has a whole set of mq_open, mq_send(), mq_receive() and mq_close() with their own definition which is redundant and in some case inconsistent with socket open(),send(),recv() and close() method specification.
I do not think message queue should be used for synchronization. eventfd and signalfd are suitable for that.
But it(POSIX message queue) has some realtime support. It has priority features.
Messages are placed on the queue in decreasing order of priority, with newer messages of the same priority being placed after older messages with the same priority.
But this priority is also available for socket as out-of-band data !
Finally, to me , POSIX message queue is a legacy API. I always prefer Unix Datagram socket instead of POSIX message queue as long as the real-time features are not needed.

Message queues are very useful to build local decoupled applications. They are super fast, they are block organized (no need for buffering, cutting, etc which is the case for streaming sockets), basically few memcpy() operations (user code copy block to kernel, and kernel copy block to other process reading from q), and that's the story for message delivery. Some industry known middlewares such as Oracle Tuxedo or Mavimax Enduro/X uses these queues to help to build load balanced, high performance, fault tolerant decomposed, distributed applications. These queues allows to do load balancing, when several executables reads from the same queue, and kernel scheduler just distributes the message to processes which ever is idling. The nice thing for Linux is that poll can be done on Posix queues, which helps a to solve certain scenarios. For IBM AIX it is possible to do poll on System V queues.
For example, two processes can communicate easily locally over the queues with quite impressive throughput (~70k req+rply/sec):
If networking is needed, then for example Enduro/X provides tpbridge process which basically reads from messages from local queue, sends blocks to some other machine, where the other end injects the messages back in the local queue.
Also when comparing to sockets, you do not get any issues with queues, such as busy/lingering sockets when for example some binary have crashed, i.e. program at startup can immediately start to read the queues and do the processing.

Related

Alternative to POSIX message queues

I am using POSIX message queues in a non-root system. I am running into significant issues with unlinking and cleaning. I can't see opened message queues and then write a routine to clean them.
I was wondering if one of the two are possible:
Create POSIX mqueue locally, in $PWD or something
Get an alternative message queue library instead of the standard one from Linux.
One thing you can try is to see whether you can go by using a unix domain datagram sockets instead of posix message queues, in particular SOC_SEQPACKET variety of those:
http://man7.org/linux/man-pages/man7/unix.7.html
If this is not enough, there are plenty of message queue abstraction libraries out there, such as a popular ZeroMQ: http://zeromq.org/

Linux, communication between applications

In my embedded system running Linux (Ubuntu armhf) I have to communicate between processes.
I'm doing it with TCP sockets. It works great but due the high frequency of my requests I have a very high processor usage (94% average measured whit nmon).
There is a way to lower it using that kind of communication in a more efficient manner?
shared memory and message queues can be used to exchange information between processes. The difference is in how they are used. both have some advantage and disadvantage.
Shared memory
it's an area of storage that can be read and written by more than one process. It provides no inherent synchronization; in other words, it's up to the programmer to ensure that one process doesn't clobber another's data. But it's efficient in terms of throughput: reading and writing are relatively fast operations.
A message queue is a one-way pipe:
one process writes to the queue, and another reads the data in the order it was written until an end-of-data condition occurs. When the queue is created, the message size (bytes per message, usually fairly small) and queue length (maximum number of pending messages) are set. Access is slower than shared memory because each read/write operation is typically a single message. But the queue guarantees that each operation will either processes an entire message successfully or fail without altering the queue. So the writer can never fail after writing only a partial message, and the reader will either retrieve a complete message or nothing at all.
If you wish to stick with your basic architecture, you can switch from TCP sockets to Unix domain sockets (AF_UNIX/AF_LOCAL). Since it's a strictly local protocol, it doesn't have the overhead of TCP.

Remote process control in Linux

I'm currently working on a project requiring a number of processes running under control of a "master" process, which receives remote commands via TCP and tells the child processes what to do (e.g.: what files they should act on, what processing operations they should perform).
I've come up with the following ideas to pass commands/configuration down to the child processes:
Signals (not powerful enough)
A binary protocol over sockets or pipes connecting each process to the master (reinvent the wheel).
RPC (maybe overkill)
CORBA (perhaps overkill)
DDS (totally overkill)
Any ideas/suggestions?
D-Bus
How about a text-protocol via pipes?
text-protocols are always better than binary protocols because they are easier to test, and easier testing generally means fewer bugs.
You could also use message queues, or shared memory with semaphores.
You could also look into an Apache project called ActiveMQ which allows messages to be dispatched to subscription queues, etc. Its very powerful and flexible and there are C interfaces. Its ideal if you have many machines/networks to which you need to dispatch messages.
http://activemq.apache.org/
A lightweight message queue like beanstalkd or resque seems like the right level of complexity. Files with inotify could also work; inotify is designed as an event queue. You can try it with incrontab before baking it in. {xml,json}-rpc are (slightly) more complex, but also more standard, as they use http. However, the message queue metaphor is more appropriate than rpc for non-blocking interactions.
The supervisord tool may be useful. This is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.

MSMQ for Managing Threads?

I am building an application where I have inputs from printers over the network (on specific ports) and other files which are created into a folder locally or through the network. The user can create different threads to monitor different folders at the same time, as well as threads to handle the input from threes printers over the network. The application is supposed to process the input data according to its type and output it. On the other end of the application, there would be 4 threads waiting for input data from the input threads (could be 10 or 20 threads) to process and apply 4 different tasks.
As we will have many threads running at the same time, I thought I would use MSMQ to manage these threads. Does using MSMQ fit in this scenario or should I use another technique? Managing these threads in terms of scheduling, prioritizing, etc.
(P.S: I was thinking to build my own ThreadEngine class that will take care of all of these things until I heard about MSMQ, which am still not sure if it’s the right thing to use)
MSMQ would be useful for managing your input/output data not for your threads. .Net already has the ThreadPool, the CCR and the TPL to assist you with concurrency and multithreading so I would suggest reading up on those technologies and choosing the most appropriate one.
MSMQ is a system message queue, not a thread pool manager.
This could be interesting in a case where you don't really mind poor performance and are really going for a system where tasks are persistent and transactional to guarantee execution.
If you are looking for performance then I agree with other folks and highly discourage you from doing this - even with non-durable (ram queues).

How to most efficently handle large numbers of file descriptors?

There appear to be several options available to programs that handle large numbers of socket connections (such as web services, p2p systems, etc).
Spawn a separate thread to handle I/O for each socket.
Use the select system call to multiplex the I/O into a single thread.
Use the poll system call to multiplex the I/O (replacing the select).
Use the epoll system calls to avoid having to repeatedly send sockets fd's through the user/system boundaries.
Spawn a number of I/O threads that each multiplex a relatively small set of the total number of connections using the poll API.
As per #5 except using the epoll API to create a separate epoll object for each independent I/O thread.
On a multicore CPU I would expect that #5 or #6 would have the best performance, but I don't have any hard data backing this up. Searching the web turned up this page describing the experiences of the author testing approaches #2, #3 and #4 above. Unfortunately this web page appears to be around 7 years old with no obvious recent updates to be found.
So my question is which of these approaches have people found to be most efficient and/or is there another approach that works better than any of those listed above? References to real life graphs, whitepapers and/or web available writeups will be appreciated.
Speaking with my experience with running large IRC servers, we used to use select() and poll() (because epoll()/kqueue() weren't available). At around about 700 simultaneous clients, the server would be using 100% of a CPU (the irc server wasn't multithreaded). However, interestingly the server would still perform well. At around 4,000 clients, the server would start to lag.
The reason for this was that at around 700ish clients, when we'd get back to select() there would be one client available for processing. The for() loops scanning to find out which client it was would be eating up most of the CPU. As we got more clients, we'd start getting more and more clients needing processing in each call to select(), so we'd become more efficient.
Moving to epoll()/kqueue(), similar spec'd machines would trivially deal with 10,000 clients, with some (admitidly more powerful machines, but still machines that would be considered tiny by todays standards), have held 30,000 clients without breaking a sweat.
Experiments I've seen with SIGIO seem to suggest it works well for applications where latency is extremely important, where there are only a few active clients doing very little individual work.
I'd recommend using epoll()/kqueue() over select()/poll() in almost any situation. I've not experimented with splitting clients between threads. To be honest, I've never found a service that needed more optimsation work done on the front end client processing to justify the experimentation with threads.
I have spent the 2 last years working on that specific issue (for the G-WAN web server, which comes with MANY benchmarks and charts exposing all this).
The model that works best under Linux is epoll with one event queue (and, for heavy processing, several worker threads).
If you have little processing (low processing latency) then using one thread will be faster using several threads.
The reason for this is that epoll does not scale on multi-Core CPUs (using several concurrent epoll queues for connection I/O in the same user-mode application will just slow-down your server).
I did not look seriously at epoll's code in the kernel (I only focussed on user-mode so far) but my guess is that the epoll implementation in the kernel is crippled by locks.
This is why using several threads quickly hit the wall.
It goes without saying that such a poor state of things should not last if Linux wants to keep its position as one of the best performing kernels.
From my experience, you'll have the best perf with #6.
I also recommend you look into libevent to deal with abstracting some of these details away. At the very least, you'll be able to see some of their benchmark .
Also, about how many sockets are you talking about? Your approach probably doesn't matter too much until you start getting at least a few hundred sockets.
I use epoll() extensively, and it performs well. I routinely have thousands of sockets active, and test with up to 131,072 sockets. And epoll() can always handle it.
I use multiple threads, each of which poll on a subset of sockets. This complicates the code, but takes full advantage of multi-core CPUs.

Resources