Sharing stdout among multiple threads/processes - linux

I have a linux program(the language doesn't matter) which prints it's log onto stdout.
The log IS needed for monitoring of the process.
Now I'm going to parallelize it by fork'ing or using threads.
The problem: resulting stdout will contain unreadable mix of unrelated lines...
And finally The question: How would you re-construct the output logic for parallel processes ?

Sorry for answering myself...
The definite solution was to use the GNU parallel utility.
It came to replace the well known xargs utility, but runs the commands in parallel, separating the output into groups.
So I just left my simple one-process, one-thread utility as-is and piped its call through the parallel like that:
generate-argument-list | parallel < options > my-utility
This, depending on parallel's options can produce nicely grouped outputs for multiple calls of my-utility

If you're in C++ I'd consider using Pantheios or derivative version Boost::Log or using look at Logging In C++ : Part 2 or
If you're in another language then file locking around the IO operations is probably the way to go see File Locks, you can achieve the same results using semaphonres or any other process control system but for me file locks are the easiest.
You could also consider using syslog if this monitoring is considered as system wide.

Another approach, which we use, is to delegate a thread, logger thread, for logging. All other threads wishing to log will send it to logger thread. This method gives you flexibility as formatting of logs can be done is single place, which can also be configurable. If you don't want to worry about locks can use sockets for message passing.

If its multithreaded, then you'll need to mutex protect printing/writing to the stdout log. The most common way to do this in Linux and c/c++ is with pthread_mutex. Additionally, if its c++, boost has synchronization that could be used.
To implement it, you should probably encapsulate all the logging in one function or object, and internally lock and unlock the mutex.
If logging blocking performance becomes prohibitive, you could consider buffering the log messages (in the afore mentioned object or function) and only writing to stdout when the buffer is full. You'll still need mutex protection to buffer, but buffering will be faster than writing to stdout.
If each thread will have its own logging messages, then they will still need to all share the same mutex for writing to stdout. In this case it would probably be best for each thread to buffer its respective log messages and only write to stdout when the the buffer is full, thus only obtaining the mutex for writing to stdout.

Related

Is it thread-safe to write to the same pipe from multiple threads sharing the same file descriptor in Linux?

I have a Linux process with two threads, both sharing the same file descriptor to write data of 400 bytes to the same pipe every 100ms. I'm wondering if POSIX guarantees that this is thread-safe or if I need to add additional synchronization mechanisms to serialize the writing to the pipe from multiple threads (not processes).
I'm also aware that POSIX guarantees that multiple writes to the same pipe from different processes that are less than PIPE_BUF bytes are atomically written. But I'm not sure if the same guarantee applies to writes from multiple threads within the same process.
Can anyone provide some insight on this? Are there any additional synchronization mechanisms that I should use to ensure thread safety when writing to the same pipe from multiple threads using the same file descriptor in Linux?
Thank you in advance for any help or advice!
In the posix standard, on general information we read:
2.9.1 Thread-Safety
All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions need not be thread-safe.
And neither read nor write are listed afterwards. And so indeed, it is safe to call them from multiple threads. This however only means that the syscall won't crash, it doesn't say anything about the exact behaviour of calling them in parallel. In particular, it doesn't say about atomicity.
However in docs regarding write syscall we read:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of POSIX.1-2008 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
And in the same doc we also read:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
and the guarantee about atomicity (when size below PIPE_BUF) is repeated.
man 2 write (Linux man-pages 6.02) says:
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions
with Regular File Operations"):
All of the following functions shall be atomic with respect to each
other in the effects specified in POSIX.1-2008 when they operate on
regular files or symbolic links: ...
Among the APIs subsequently listed are write() and writev(2). And
among the effects that should be atomic across threads (and processes)
are updates of the file offset. However, before Linux 3.14, this was
not the case: if two processes that share an open file description (see
open(2)) perform a write() (or writev(2)) at the same time, then the
I/O operations were not atomic with respect to updating the file off-
set, with the result that the blocks of data output by the two pro-
cesses might (incorrectly) overlap. This problem was fixed in Linux
3.14.
So, it should be safe as long as you're running at least Linux 3.14 (which is almost 9 years old).

It is possible to grab process memory using ftrace?

I have two applications one writing requests to and reading responses from stdin/stdout of another. I should not modify the applications, but I have root permission. I need to intercept requests, and responses and measure time when some messages passed as precise as possible.
Currently I'm using ptrace, trace read and write syscalls on fd=0 and fd=1 and grab memory from /proc/<pid>/mem, but the overhead is too big, we cannot use such imprecise timestamps. I'm trying to use ftrace, but, I cannot read from /proc/<pid>/mem, because ftrace doesn't stop the tracee application.
It seems, ftrace only give me arguments of functions and registers, but I cannot google how to grab the buffer at the pointer given as argument. Is it even possible?
Could you suggest another approach for my problem?

Passing messages between processes

I need to write a simple function which does the following in linux:
Create two processes.
Have thread1 in Process1 do some small operation, and send a message to Process2 via thread2 once operation is completed.
*Process2 shall acknowledge the received message.
I have no idea where to begin
I have written two simple functions which simply count from 0 to 1000 in a loop (the loop is run in a function called by a thread) and I have compiled them to get the binaries.
I am executing these one after the other (both running in the background) from a shell script
Once process1 reaches 1000 in its loop, I want the first process to send a "Complete" message to the other.
I am not sure if my approach is correct on the process front and I have absolutely no idea how to communicate between these two.
Any help will be appreciated.
LostinSpace
You'd probably want to use pipes for this. Depending on how the processes are started, you either want named or anonymous pipes:
Use named pipes (aka fifo, man mkfifo) if the processes are started independently of each other.
Use anonymous pipes (man 2 pipe) if the processes are started by a parent process through forking. The parent process would create the pipes, the child processes would inherit them. This is probably the "most beautiful" solution.
In both cases, the end points of the pipes are used just like any other file descriptor (but more like sockets than files).
If you aren't familiar with pipes yet, I recommend getting a copy of Marc Rochkind's book "Advanced UNIX programming" where these techniques are explained in great detail and easy to understand example code. That book also presents other inter-process communication methods (the only really other useful inter-process communication method on POSIX systems is shared memory, but just for fun/completeness he presents some hacks).
Since you create the processes (I assume you are using fork()), you may want to look at eventfd().
eventfd()'s provide a lightweight mechanism to send events from one process or thread to another.
More information on eventfd()s and a small example can be found here http://man7.org/linux/man-pages/man2/eventfd.2.html.
Signals or named pipes (since you're starting the two processes separately) are probably the way to go here if you're just looking for a simple solution. For signals, your client process (the one sending "Done") will need to know the process id of the server, and for named pipes they will both need to know the location of a pipe file to communicate through.
However, I want to point out a neat IPC/networking tool that can make your job a lot easier if you're designing a larger, more robust system: 0MQ can make this kind of client/server interaction dead simple, and allows you to start up the programs in whatever order you like (if you structure your code correctly). I highly recommend it.

What mechanism do PIPES use to "wake up" the recipient?

I have two questions is one here.
On Windows, I am familiar with pipes and how they work. However, I am curious as to what mechanism the OS uses to notify the recipient thread of a message arrival.
Does the thread "poll & sleep" continuously for data? Does the OS check to see if the thread is sleeping and wake it up? Or is there some other mechanism used?
Specifically, I want to build an IPC system where many threads need to pass messages. I don't need to use pipes, but I do need to know the most efficient notification method possible.
The developer can decide how they want to work with the pipe, whether they will sleep/poll or else they want to call blocking functions and wait until the data is available.
About the mechanism that the pipe has for waking up the process --assuming that the process is in a blocking read call-- it is not the pipe, but the OS the one that takes charge, like in any other OS call: it registers the operation and blocks the process/thread until the data is available. When the data is available, it completes the system call.
This is an answer for Unix. I'd lay good money on Windows being pretty similar as the solution has been around a long time and is well known to be robust. The details will vary a bit (different API calls, specifics of semantics, etc.)
It depends on whether the other end is using the pipe's file descriptor in blocking or non-blocking mode.
In blocking mode, the process is waiting in the OS kernel for the data to become available. The way in which notification happens there depends on the OS. Chances are it involves a queue of processes that are considered to be runnable, and everything's made simpler by the fact that the kernel can (largely) control what interrupts it. In a simple (single processor) implementation you could go for something as trivial as noting on write to the pipe that the other process is waiting to read from it (via some kind of “interest set”), and so marking the reader as runnable at that point (at which time it becomes up to the scheduler to decide).
In non-blocking mode, either the process is polling from time to time (yuck!) or they're using a system call like select() or poll() (there are some higher-performance variants too). That's very much like the Windows call WaitForMultipleObjects() and works just great with pipes. That in turn ends up back at that runnable process queue, the interest set, and the scheduler.
It also doesn't really matter too much whether it's blocking because the pipe is full or the pipe is empty, as the control flow is pretty much symmetric between readers and writers. (Unlike the data flow, of course.)

Multithreading: Read from / write to a pipe

I write some data to a pipe - possibly lots of data and at random intervals. How to read the data from the pipe?
Is this ok:
in the main thread (current process) create two more threads (2, 3)
the second thread writes sometimes to the pipe (and flush-es the pipe?)
the 3rd thread has infinite loop which reads the pipe (and then sleeps for some time)
Is this so far correct?
Now, there are a few thing I don't understand:
do I have to lock (mutex?) the pipe on write?
IIRC, when writing to pipe and its buffer gets full, the write end will block until I read the already written data, right? How to check for read data in the pipe, not too often, not too rarely? So that the second thread wont block? Is there something like select for pipes?
It is possible to set the pipe to unbuffered more or I have to flush it regularly - which one is better?
Should I create one more thread, just for flushing the pipe after write? Because flush blocks as well, when the buffer is full, right? I just don't want the 1st and 2nd thread to block....
[Edit]
Sorry, I thought the question is platform agnostic but just in case: I'm looking at this from Win32 perspective, possibly MinGW C...
I'm not answering all of your questions here because there's a lot of them, but in answer to:
do I have to lock (mutex?) the pipe on write?
The answer to this question is platform specific, but in most cases I would guess yes.
It comes down to whether the write/read operations on the pipe are atomic. If either the read or write operation is non-atomic (most likely the write) then you will need to lock the pipe on writing and reading to prevent race conditions.
For example, lets say a write to the pipe compiles down to 2 instructions in machine code:
INSTRUCTION 1
INSTRUCTION 2
Let's say you get a thread context switch between these 2 instructions and your reading thread attempts to read the pipe which is in an intermediate state. This could result in a crash, or (worse) data corruption which can often manifest itself in a crash somewhere else in the code. This will often occur as a result of a race condition which are often non-deterministic and difficult to diagnose or reproduce.
In general, unless you can guarantee that all threads will be accessing the shared resource using an atomic instruction set, you must use mutexes or critical sections.

Resources