Is it thread-safe to write to the same pipe from multiple threads sharing the same file descriptor in Linux? - multithreading

I have a Linux process with two threads, both sharing the same file descriptor to write data of 400 bytes to the same pipe every 100ms. I'm wondering if POSIX guarantees that this is thread-safe or if I need to add additional synchronization mechanisms to serialize the writing to the pipe from multiple threads (not processes).
I'm also aware that POSIX guarantees that multiple writes to the same pipe from different processes that are less than PIPE_BUF bytes are atomically written. But I'm not sure if the same guarantee applies to writes from multiple threads within the same process.
Can anyone provide some insight on this? Are there any additional synchronization mechanisms that I should use to ensure thread safety when writing to the same pipe from multiple threads using the same file descriptor in Linux?
Thank you in advance for any help or advice!

In the posix standard, on general information we read:
2.9.1 Thread-Safety
All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions need not be thread-safe.
And neither read nor write are listed afterwards. And so indeed, it is safe to call them from multiple threads. This however only means that the syscall won't crash, it doesn't say anything about the exact behaviour of calling them in parallel. In particular, it doesn't say about atomicity.
However in docs regarding write syscall we read:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of POSIX.1-2008 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
And in the same doc we also read:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
and the guarantee about atomicity (when size below PIPE_BUF) is repeated.

man 2 write (Linux man-pages 6.02) says:
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions
with Regular File Operations"):
All of the following functions shall be atomic with respect to each
other in the effects specified in POSIX.1-2008 when they operate on
regular files or symbolic links: ...
Among the APIs subsequently listed are write() and writev(2). And
among the effects that should be atomic across threads (and processes)
are updates of the file offset. However, before Linux 3.14, this was
not the case: if two processes that share an open file description (see
open(2)) perform a write() (or writev(2)) at the same time, then the
I/O operations were not atomic with respect to updating the file off-
set, with the result that the blocks of data output by the two pro-
cesses might (incorrectly) overlap. This problem was fixed in Linux
3.14.
So, it should be safe as long as you're running at least Linux 3.14 (which is almost 9 years old).

Related

How do I check if a given operation (or system call) is atomic on Linux?

I want to find a reliable way (other than reading the kernel source code) to check if a given operation (or system call) is atomic (in the sense that other process can only see the state before or after that operation, but not something in between) on Linux. The goal of this is to avoid using unnecessary locks for some operations if the kernel already does that for me.
So far I can only find resources like this about this topic, which is by no means authoritative or exhaustive. Also, the Linux man pages contains little information about this. For example, for most functions mentioned in the above link, I don't find anything about their atomicity in the man pages.
Could anyone tell me if there is a standard or official documentation which provides this information? Any help would be much appreciated.
I think POSIX thread-safe functions are a good starting point. Thread-safe functions are functions that will give the same results when called from different threads. This is not at all the same as being atomic, but at least it gives a hint about which functions certainly are not atomic.
POSIX.1-2001 and POSIX.1-2008 require that all functions specified in the standard shall be thread-safe, except for a specific set of functions (most of which are implemented in the standard library and not in the kernel).
As an example of a function that is thread-safe but not atomic, consider fwrite(). fwrite() will write to a per-process buffer under pthread locks, so it is thread-safe. However, the buffer may be flushed in separate write() chunks, so other processes don't see it as an atomic write.

File Access (read/write) synchronization between 'n' processes in Linux

I am studying Operating Systems this semester and was just wondering how Linux handles file access (read/write) synchronization, what is the default implementation does it use semaphores, mutexes or monitors? And can you please tell me where I would find this in the source codes or my own copy of Ubuntu and how to disable it?
I need to disable it so i can check if my own implementation of this works, also how do i add my own implementation to the system.
Here's my current plan please tell me if its okay:
Disable the default implementation, add my own. (recompile kernel if need be)
My own version would keep track of every incoming process and maintain a list of what files they were using adn whenever a file would repeat i would check if its a reader process or a writer process
I will be going with a reader preferred solution to the readers writers problem.
Kernel doesn't impose process synchronization (it should be performed by processes while kernel only provides tools for that), but it can guarantee atomicity on some operations: atomic operation can not be interrupted and its result cannot be altered by other operation running in parallel.
Speaking of writing to a file, it has some atomicity guarantees. From man -s3 write:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
Some discussion on SO: Atomicity of write(2) to a local filesystem.
To maintain atomicity, various kernel routines hold i_mutex mutex of an inode. I.e. in generic_file_write_iter():
mutex_lock(&inode->i_mutex);
ret = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
So other write() calls won't mess with your call. Readers, however doesn't lock i_mutex, so they may get invalid data. Actual locking for readers is performed in page cache, so a page (4096 bytes on x86) is a minimum amount data that guarantees atomicity in kernel.
Speaking of recompiling kernel to test your own implementation, there are two ways of doing that: download vanilla kernel from http://kernel.org/ (or from Git), patch and build it - it is easy. Recompiling Ubuntu kernels is harder -- it will require working with Debian build tools: https://help.ubuntu.com/community/Kernel/Compile
I'm not clear about what you trying to achieve with your own implementation. If you want to apply strictier synchronization rules, maybe it is time to look at TxOS?

Reducing seek times when reading many small files

I need to write some code (in any language) to process 10,000 files that reside on a local Linux filesystem. Each file is ~500KB in size, and consists of fixed-size records of 4KB each.
The processing time per record is negligible, and the records can be processed in any order, both within and across different files.
A naïve implementation would read the files one by one, in some arbitrary order. However, since my disks are very fast to read but slow to seek, this will almost certainly produce code that's bound by disk seeks.
Is there any way to code the reading up so that it's bound by disk throughput rather than seek time?
One line of inquiry is to try and get an approximate idea of where the files reside on disk, and use that to sequence the reads. However, I am not sure what API could be used to do that.
I am of course open to any other ideas.
The filesystem is ext4, but that's negotiable.
Perhaps you could do the reads by scheduling all of them in quick succession with aio_read. That would put all reads in the filesystem read queue at once, and then the filesystem implementation is free to complete the reads in a way that minimizes seeks.
A very simple approach, although no results guaranteed. Open as many of the files at once as you can and read all of them at once - either using threads or asynchronous I/O. This way the disk scheduler knows what you read and can reduce the seeks by itself. Edit: as wildplasser observes, parallel open() is probably only doable using threads, not async I/O.
The alternative is to try to do the heavy lifting yourself. Unfortunately this involves a difficult step - getting the mapping of the files to physical blocks. There is no standard interface to do that, you could probably extract the logic from something like ext2fsprogs or the kernel FS driver. And this involves reading the physical device underlying a mounted filesystem, which can be writing to it at the same time you're trying to get a consistent snapshot.
Once you get the physical blocks, just order them, reverse the mapping back to the file offsets and execute the reads in the physical block order.
could you recommend using a SSD for the file storage? that should reduce seek times greatly as there's no head to move.
Since operations are similar and data are independent you can try using a thread pool to submit jobs that work on a number of files (can be a single file). Then you can have an idle thread complete a single job. This might help overlapping IO operations with execution.
A simple way would be to keep the original program, but fork an extra process which has no other task than to prefetch the files, and prime the disk buffer cache. ( a unix/linux system uses all "free" memory as disk buffer).
The main task will stay a few files behind (say ten). The hard part would be to keep things synchronised. A pipe seems the obvious way to accomplish this.
UPDATE:
Pseudo code for the main process:
fetch filename from worklist
if empty goto 2.
(maybe) fork a worker process or thread
add to prefetch queue
add to internal queue
if fewer than XXX items on internal queue goto 1
fetch filename from internal queue
process it
goto 1
For the slave processes:
fetch from queue
if empty: quit
prefetch file
loop or quit
For the queue, a message queue seems most appropiate, since it maintains message boundaries. Another way would be to have one pipe per child (in the fork() case) or use mutexes (when using threads).
You'll need approximate seektime_per_file / processing_time_per_file worker threads / processes.
As a simplification: if seeking the files is not required (only sequential access), the slave processes could consist of the equivalent of
dd if=name bs=500K
, which could be wrapped into a popen() or a pipe+fork().

simultaneous read on file descriptor from two threads

my question: in Linux (and in FreeBsd, and generally in UNIX) is it possible/legal to read single file descriptor simultaneously from two threads?
I did some search but found nothing, although a lot of people ask like question about reading/writing from/to socket fd at the same time (meaning reading when other thread is writing, not reading when other is reading). I also have read some man pages and got no clear answer on my question.
Why I ask it. I tried to implement simple program that counts lines in stdin, like wc -l. I actually was testing my home-made C++ io engine for overhead, and discovered that wc is 1.7 times faster. I trimmed down some C++ and came closer to wc speed but didn't reach it. Then I experimented with input buffer size, optimized it, but still wc is clearly a bit faster. Finally I created 2 threads which read same STDIN_FILENO in parallel, and this at last was faster than wc! But lines count became incorrect... so I suppose some junk comes from reads which is unexpected. Doesn't kernel care what process read?
Edit: I did some research and discovered just that calling read directly via syscall does not change anything. Kernel code seem to do some sync handling, but i didnt understand much (read_write.c)
That's undefined behavior, POSIX
says:
The read() function shall attempt to read nbyte bytes from the file
associated with the open file descriptor, fildes, into the buffer
pointed to by buf. The behavior of multiple concurrent reads on the
same pipe, FIFO, or terminal device is unspecified.
About accessing a single file descriptor concurrently (i.e. from multiple threads or even processes), I'm going to cite POSIX.1-2008 (IEEE Std 1003.1-2008), Subsection 2.9.7 Thread Interactions with Regular File Operations:
2.9.7 Thread Interactions with Regular File Operations
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links:
[…] read() […]
If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. […]
At first glance, this looks quite good. However, I hope you did not miss the restriction when they operate on regular files or symbolic links.
#jarero cites:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
So, implicitly, we're agreeing, I assume: It depends on the type of the file you are reading. You said, you read from STDIN. Well, if your STDIN is a plain file, you can use concurrent access. Otherwise you shouldn't.
When used with a descriptor (fd), read() and write() rely on the internal state of the fd to know the "current offset" at which the read and write will occur. As a result, they aren't thread-safe.
To allow a single descriptor to be used by multiple threads simultaneously, pread() and pwrite() are provided. With those interfaces, the descriptor and the desired offset are specified, so the "current offset" in the descriptor isn't used.

What about race condition in multithreaded reading?

According to an article on IBM.com, "a race condition is a situation in which two or more threads or processes are reading or writing some shared data, and the final result depends on the timing of how the threads are scheduled. Race conditions can lead to unpredictable results and subtle program bugs." . Although the article concerns Java, I have in general been taught the same definition.
As far as I know, simple operation of reading from RAM is composed of setting the states of specific input lines (address, read etc.) and reading the states of output lines. This is an operation that obviously cannot be executed simultaneously by two devices and has to be serialized.
Now let's suppose we have a situation when a couple of threads access an object in memory. In theory, this access should be serialized in order to prevent race conditions. But e.g. the readers/writers algorithm assumes that an arbitrary number of readers can use the shared memory at the same time.
So, the question is: does one have to implement an exclusive lock for read when using multithreading (in WinAPI e.g.)? If not, why? Where is this control implemented - OS, hardware?
Best regards,
Kuba
Reading memory at hardware level is done sequentially - you don't need to worry about concurrency at this level. Two threads issue read instructions and all the necessary stuff - setting addresses on the address bus and actual reads are implemented by the memory access hardware in such way that reads will always work right.
In fact the same is true for read/write scenarios except that when read and write requests are interleaved you will get different results depending on timing and this is why you need synchronization.
As long as there's nothing changing the data, it's perfectly safe to be reading it from several threads. Even if two CPUs (or cores) race to access the memory for reading at the exact same clock cycle, their accesses will be serialized by the memory controller and they won't interfere with each other. This feature is essential for HW working correctly.
There is not a simple answer to this question. Different API's (and different environments) will have different levels of multhreaded-awareness and multithreaded safety.

Resources