Linux IPC with single writer, multible readers - linux

In my application, I have one process (the "writer") reading data from external hardware. This process should supply consistent packets to several "readers".
Question:
How can one process send consistent data-packets (not blocking) to several clients while they see something like the end of a personal fifo? I'm on Debian Linux.
1) In my first approach I tried "datagram - unix domain sockets", which worked well.
But with the "writer" as server, all the clients have to poll the server permanently. :-(
They get one packet several times; or miss a packet if not polling fast enough.
2) My second approach was FIFOs (Named Pipes) which works too, but with several readers "strange things happen", what I got confirmed here: http://beej.us/guide/bgipc/output/html/multipage/fifos.html
I tried this, searched the net and stackoverflow the whole day, but I could not find a reasonable answer.
Edit: Sorry, I did not mention: I can't use socketpair() and fork. My programs are independently developed. I'd like to have the writer ready, while developing new readers.

If the writer process is a server, it might fork the client processes and just plain pipe(2) for communication. If there is no parent/child relationship, consider named pipes made with mkfifo(3), or AF_UNIX sockets (see unix(7) and scoket(2) ....) which are bidirectional (AF_UNIX sockets are much faster than TCP/IP or UDP/IP on the same machine).
Notice that your writer process is reading data from your hardware device and is writing or sending data to several reader clients. So your writer process deals with many file descriptors simultaneously (the hardware device to be read, and the sockets or pipes to be written to the clients, at least one file descriptor per client).
However, the important thing is to have some event loop (notably on the server side, and probably also inside clients). This means that you call some multiplexing syscall like poll(2) in the loop, and you "decide" if you are reading or writing or connecting (and which file descriptor should be read, or should be written, or should connect) in each iteration. See also read(2), write(2), connect(2), send(2), recv(2) etc... Notice that you should buffer data with an event loop (since read and write can be on "partial" or "incomplete" messages).
Notice that poll is not eating CPU resources when waiting for I/O. You could, but you should not anymore, use some older multiplexing syscall (like the obsolete select(2) ...). Use poll(2).
You may want to use a library to provide the event loop, e.g. libevent or libev... See also this answer. That event loop should also (on the server side) poll then read the hardware device.
If some of the programs are using a GUI toolkit (e.g. on the client side) like Qt or Gtk they should profit of the existing event loop provided by that toolkit...
You should read Advanced Linux Programming and know about the C10K problem.
If signals or timers are important (read carefully signal(7) and time(7)) the Linux specific signalfd(2) and timerfd_create(2) could be very helpful since they play nicely with event loops. These linux specific syscalls (signalfd & timerfd_create ...) are too recent to be mentioned in Advanced Linux Programming.
BTW, you could study the source code of existing free software similar to yours, and/or use strace(1) to understand the exact syscalls that they are doing.
If you have no loop around a multiplexing syscall (à la poll(2)) then you have no event loop and your design is buggy and cannot possibly and reliably work (since you need to react to several file descriptors at once).
You could also use a multi-threaded approach, but it is much more complex and not worth the effort in your particular case.

ZeroMQ has pattern for this problem. Is fast and supports a lot of programming languages. Pub-Sub. See: https://zeromq.org/ (free and open source).

Related

How to stop a program in production safely

We have multiple machines running as "servers". On them, they have a program that listen to requests from different clients.
In part of a continuous deployment process, updating the servers program can cause unfinished business to be killed. This is undesirable. I'm looking for an approach to drain the work from one node so we can update it while the other gets the load.
As for a more specific question representing my current mindset :
How do you send a "signal" so :
while True and no_signal:
do_server_work()
stops if we need to upgrade it.
For our infrastructure, we have many clients sending request to multiple RabbitMQ nodes where servers consume their queues.
Edit : On linux, using python3
(I guess you are on Linux, or at least on POSIX machines)
For well-written server programs, you should send them a SIGTERM signal (see signal(7) for details) to terminate them gently, and they should explicitly (and cleverly) handle that signal. A common way to do so is to use the kill(1) program (or the underlying kill(2) system call).
Badly written server programs might not handle SIGTERM as they should. Then (a few seconds later) you might need to kill them with SIGKILL, but that could leave them (or their files) in some inconsistent state, since SIGKILL cannot be caught.
Some server programs are documented to behave differently. For example, they might use some other inter-process communication facility to be asked to to terminate gently.
Handling SIGTERM properly is a widely used convention (but read also signal-safety(7) if you are coding a server handling it). Some servers might have a different one.
BTW, there are several tricks to write safe signal handlers (read carefully signal-safety(7)) at the C level. One is to have a global volatile sig_atomic_t variable that is set by your signal handler, and tested regularly in your code (perhaps in your event loops). Another is to set up (at initialization, using pipe(2)) a pipe(7) to self, to have your signal handler write(2) one or a few bytes into it (this is legal, since write(2) is an async-signal-safe function) and to poll(2) and read(2) that pipe in your event loop. The later trick is common enough to be documented in Qt.
Probably, Python handles signals using the first trick or something similar (perhaps related to its infamous GIL). Since it is free software, you could study its source code (right now, I am too lazy for that).

How is Wait() functionality Implemented

How are wait (Eg: WaitForSingleObject) functions implemented internally in Windows or any OS?
How is it any different from a spin lock?
Does the CPU/hardware provide special functionality to do this?
Hazy view of What's Going On follows... Focusing on IO mainly.
Unix / Linux / Posix
The Unix equivalent, select(), epoll(), and similar have been implemented in various ways. In the early days the implementations were rubbish, and amounted to little more than busy polling loops that used up all your CPU time. Nowadays it's much better, and takes no CPU time whilst blocked.
They can do this I think because the device driver model for devices like Ethernet, serial and so forth has been designed to support the select() family of functions. Specifically the model must allow the kernel to tell the devices to raise an interrupt when something has happened. The kernel can then decide whether or not that will result in a select() unblocking, etc etc. The result is efficient process blocking.
Windows
In Windows the WaitfFor when applied to asynchronous IO is completely different. You actually have to start a thread reading from an IO device, and when that read completes (note, not starts) you have that thread return something that wakes up the WaitFor. That gets dressed up in object.beginread(), etc, but they all boil down to that underneath.
This means you can't replicate the select() functionality in Windows for serial, pipes, etc. But there is a select function call for sockets. Weird.
To me this suggests that the whole IO architecture and device driver model of the Windows kernel can drive devices only by asking them to perform an operation and blocking until the device has completed it. There would seem to be no truly asynchronous way for the device to notify the kernel of events, and the best that can be achieved is to have a separate thread doing the synchronous operation for you. I've no idea how they've done select for sockets, but I have my suspicions.
CYGWIN, Unix on Windows
When the cygwin guys came to implement their select() routine on Windows they were horrified to discover that it was impossible to implement for anything other than sockets. What they did was for each file descriptor passed to select they would spawn a thread. This would poll the device, pipe, whatever waiting for the available data count to be non zero, etc. That thread would then notify the thread that is actually calling select() that something had happened. This is very reminiscent of select() implementations from the dark days of Unix, and is massively inefficient. But it does work.
I would bet a whole 5 new pence that that's how MS did select for sockets too.
My Experience Thus Far
Windows' WaitFors... are fine for operations that are guaranteed to complete or proceed and nice fixed stages, but are very unpleasant for operations that aren't (like IO). Cancelling an asynchronous IO operation is deeply unpleasant. The only way I've found for doing it is to close the device, socket, pipe, etc which is not always what you want to do.
Attempt to Answer the Question
The hardware's interrupt system supports the implementation of select() because it's a way for the devices to notify the CPU that something has happened without the CPU having to poll / spin on a register in the device.
Unix / Linux uses that interrupt system to provide select() / epoll() functionality, and also incorporates purely internal 'devices' (pipes, files, etc) into that functionality.
Windows' equivalent facility, WaitForMultipleObjects() fundamentally does not incorporate IO devices of any sort, which is why you have to have a separate thread doing the IO for you whilst you're waiting for that thread to complete. The interrupt system on the hardware is (I'm guessing) used solely to tell the device drivers when a read or write operation is complete. The exception is the select() function call in Windows which operates only on sockets, not anything else.
A big clue to the architectural differences between Unix/Linux and Windows is that a PC can run either, but you get a proper IO-centric select() only on Unix/Linux.
Guesswork
I'm guessing that the reason Windows has never done a select() properly is that the early device drivers for Windows could in no way support it, a bit like the early days of Linux.
However, Windows became very popular quite early on, and loads of device drivers got written against that (flawed?) device driver standard.
If at any point MS had thought "perhaps we'd better improve on that" they would have faced the problem of getting everyone to rewrite their device drivers, a massive undertaking. So they decided not to, and instead implemented the separate IO thread / WaitFor... model instead. This got promoted by MS as being somehow superior to the Unix way of doing things. And now that Windows has been that way for so long I'm guessing that there's no one in MS who perceives that Things Could Be Improved.
==EDIT==
I have since stumbled across Named Pipes - Asynchronous Peeking. This is fascinating because it would seem (I'm very glad to say) to debunk pretty much everything I'd thought about Windows and IO. The article applies to pipes, though presumably it would also apply to any IO stream.
It seems to hinge on starting an asynchronous read operation to read zero bytes. The read will not return until there are some bytes available, but none of them are read from the stream. You can therefore use something like WaitForMultipleObjects() to wait for more than one such asynchronous operation to complete.
As the comment below the accepted answer recognises this is very non-obvious in all the of the Microsoft documentation that I've ever read. I wonder about it being an unintended but useful behaviour in the OS. I've been ploughing through Windows Internals by Mark Russinovich, but I've not found anything yet.
I've not yet had a chance to experiment with this in anyway, but if it does work then that means that one can implement something equivalent to Unix's select() on Windows, so it must therefore be supported all the way down to the device driver level and interrupts. Hence the extensive strikeouts above...

How do programs communicate with each other?

How do procceses communicate with each other? Using everything I've learnt to fo with programming so far, I'm unable to explain how sockets, file systems and other things to do with sending messages between programs work.
Btw I use a Linux based OS if your going to add anything OS specific. Thanks in advance. The question's been bugging me for ages. I'm also guessing the kernel has something to do with it.
In case of most IPC (InterProcess Communication) mechanisms, the general answer to your question is this: process A calls the kernel passing a pointer to a buffer with data to be transferred to process B, process B calls the kernel (or is already blocked on a call to the kernel) passing a pointer to a buffer to be filled with data from process A.
This general description is true for sockets, pipes, System V message queues, ordinary files etc. As you can see the cost of communication is high since it involves at least one context switch.
Signals constitute an asynchronous IPC mechanism in which one process can send a simple notification to another process triggering a handler registered by the second process (alternatively doing nothing, stopping or killing that process if no handler is registered, depending on the signal).
For transferring large amount of data one can use System V shared memory in which case two processes can access the same portion of main memory. Note that even in this case, one needs to employ a synchronization mechanism, like System V semaphores, which result in context switches as well.
This is why when processes need to communicate often, it is better to make them threads in a single process.

What mechanism do PIPES use to "wake up" the recipient?

I have two questions is one here.
On Windows, I am familiar with pipes and how they work. However, I am curious as to what mechanism the OS uses to notify the recipient thread of a message arrival.
Does the thread "poll & sleep" continuously for data? Does the OS check to see if the thread is sleeping and wake it up? Or is there some other mechanism used?
Specifically, I want to build an IPC system where many threads need to pass messages. I don't need to use pipes, but I do need to know the most efficient notification method possible.
The developer can decide how they want to work with the pipe, whether they will sleep/poll or else they want to call blocking functions and wait until the data is available.
About the mechanism that the pipe has for waking up the process --assuming that the process is in a blocking read call-- it is not the pipe, but the OS the one that takes charge, like in any other OS call: it registers the operation and blocks the process/thread until the data is available. When the data is available, it completes the system call.
This is an answer for Unix. I'd lay good money on Windows being pretty similar as the solution has been around a long time and is well known to be robust. The details will vary a bit (different API calls, specifics of semantics, etc.)
It depends on whether the other end is using the pipe's file descriptor in blocking or non-blocking mode.
In blocking mode, the process is waiting in the OS kernel for the data to become available. The way in which notification happens there depends on the OS. Chances are it involves a queue of processes that are considered to be runnable, and everything's made simpler by the fact that the kernel can (largely) control what interrupts it. In a simple (single processor) implementation you could go for something as trivial as noting on write to the pipe that the other process is waiting to read from it (via some kind of “interest set”), and so marking the reader as runnable at that point (at which time it becomes up to the scheduler to decide).
In non-blocking mode, either the process is polling from time to time (yuck!) or they're using a system call like select() or poll() (there are some higher-performance variants too). That's very much like the Windows call WaitForMultipleObjects() and works just great with pipes. That in turn ends up back at that runnable process queue, the interest set, and the scheduler.
It also doesn't really matter too much whether it's blocking because the pipe is full or the pipe is empty, as the control flow is pretty much symmetric between readers and writers. (Unlike the data flow, of course.)

Does the Thundering Herd Problem exist on Linux anymore?

Many linux/unix programming books and tutorials speak about the "Thundering Herd Problem" which happens when multiple threads or forks are blocked on a select() call waiting for readability of a listening socket. When the connection comes in, all threads and forks are woken up but only one "wins" with a successful call to "accept()". In the meantime, a lot of cpu time is wasted waking up all the threads/forks for no reason.
I noticed a project which provides a "fix" for this problem in the linux kernel, but this is a very old patch.
I think there are two variants; One where each fork does select() and then accept(), and one that just does accept().
Do modern unix/linux kernels still have the Thundering Herd Problem in both these cases or only the "select() then accept()" version?
For years, most unix/linux kernels serialize response to accept(2)s, in other words, only one thread is waken up if more than one are blocking on accept(2) against a single open file descriptor.
OTOH, many (if not all) kernels still have the thundering herd problem in the select-accept pattern as you describe.
I have written a simple script ( https://gist.github.com/kazuho/10436253 ) to verify the existence of the problem, and found out that the problem exists on linux 2.6.32 and Darwin 12.5.0 (OS X 10.8.5).
This is a very old problem, and for the most part does not exist any more. The Linux kernel (for the past few years) has had a number of changes with the way it handles and routes packets up the network stack, and includes many optimizations to ensure both low latency, and fairness (i.e., minimize starvation).
That said, the select system has a number of scalability issues simply by way of its API. When you have a large number of file descriptors, the cost of a select call is very high. This is primarily due to having to build, check, and maintain the FD sets that are passed to and from the system call.
Now days, the preferred way to do asynchronous IO is with epoll. The API is far simpler and scales very nicely across various types of load (many connections, lots of throughput, etc.)
I recently saw tested a scenario where multiple threads polled on a listening unix-domain socket and then accepted the connection. All threads woke up using the poll() system call.
This was a custom build of the linux kernel rather than a distro build so perhaps there is a kernel configure option that changes it but I don't know what that would be.
We did not try epoll.
Refer the link below which talks about separate flags to epoll to avoid this problem.
http://lwn.net/Articles/632590/

Resources