Do I need to flush named pipes? - linux

I cannot find whether named pipes are buffered, hence the question.
The manpage says https://linux.die.net/man/3/mkfifo:
A FIFO special file is similar to a pipe ... any process can open it for reading or writing, in the same way as an ordinary file.
Pipes are not buffered, no need to flush. But in a ordinary file, I would fflush (or fsync) the file descriptor.
How about named pipe?

Pipes are not buffered, no need to flush.
I'd actually put that the other way around: for most intents and purposes, pipes are nothing but buffer. It is not meaningful to flush them because there is no underlying device to receive the data.
Moreover, although POSIX does not explicitly forbid additional buffering of pipe I/O, it does place sufficient behavioral requirements that I don't think there's any way to determine from observation whether such buffering occurs, except possibly by whether fsync() succeeds. In other words, even if there were extra buffering, it should not be necessary to fsync() a pipe end.
But in a ordinary file, I
would fflush (or fsync) the file descriptor.
Well no, you would not fflush() a file descriptor. fflush() operates on streams, represented by FILE objects, not on file descriptors. This is a crucial distinction, because most streams are buffered at the C library level, independent of the nature of the file underneath. It is this library-level buffer that fflush() interacts with. You can control the library-level buffering mode of a stream via the setvbuf() function.
On those systems that provide it, fsync() operates at a different, lower level. It instructs the OS to ensure that all data previously written to the specified file descriptor has been delivered to the underlying storage device. In other words, it flushes OS-level buffers.
Note well that you can wrap a stream around a pipe-end file descriptor via the fdopen() function. That doesn't make the pipe require flushing any more than it did before, but the stream will be buffered by default, so flushing will be relevant to it.
Note, too, that some storage devices perform their own buffering, so that even after the data have been handed off to a storage device, it is not certain that they are immediately persistent.
How about named pipe?
The discussion above about stream I/O vs. POSIX descriptor-based I/O applies here, too. If you access a named pipe via a stream, then its interaction with fflush() will depend on the buffering of that stream.
But I suppose your question is more about os-level buffering and flushing. POSIX does not appear to say much concrete, but since you tag [linux] and refer to a Linux manual page in your question, I offer this in response:
The only difference between pipes and FIFOs is the manner in which
they are created and opened. Once these tasks have been accomplished,
I/O on pipes and FIFOs has exactly the same semantics.
(Linux pipe(7) manual page.)

I don't understand quite well what you try to ask, but as you have been already told, pipes are not more than buffer.
Historically, fifos (or pipes) consumed the direct blocks of the inode used to maintain them, and they tied to a file (having it a name or not) in some filesystem.
Today, I don't know the exact implementation details for a fifo, but basically, the kernel buffers all data the writers have already written, but the readers haven't read yet. The fifo has an upper limit (system defined) for the amount of buffer they can support, but normally that fails around 10-20kb of data.
The kernel buffers, but there's no delay between writers and readers, because as soon a writer writes on a pipe, the kernel awakens all the readers waiting for it to have data. The reverse is also true, in the case the pipe gets full of data, as soon as a reader consumes it, all the writers are awaken to allow for filling it again.
Anyway, your question about flushing has nothing to do with pipes (well, not like, let me explain myself) but with <stdio.h> package. <stdio.h> does buffer, and it handles buffering on each FILE * individually, so you have calls for flushing buffers when you want them to be write(2)n to disk.
has a dynamic behaviour that allows to optimize buffering and not force programmers to have to flush at each time. That depends on the type of file descriptor associated with a FILE * pointer.
When the FILE * pointer is associated to a serial tty (it does check that calling to isatty(3) call, which internally makes an ioctl(2) call, that allow <stdio.h> to see if you are against a serial device, a char device. If this happens, then <stdio.h> does line buffering that means that always when a '\n' char is output to a device, the buffer is automatically buffered.
This supposes an optimization problem, because when, for example you are using cat(1) to copy a file, the largest the buffer normally supposes the most efficient approach. Well, <stdio.h> comes to solve the problem, because when output is not a tty device, it makes full buffering, and only flushes the internal buffers of the FILE * pointer when it is full of data.
So the question is: How does <stdio.h> behave with a fifo (or pipe) node? The answer is simple.... is is not a char device (or a tty) so <stdio.h> does full buffering on it. If you are communicating data between two processes and you want the reader to receive the data as soon as you have printf(3)ed it, then you have better to fflush(3), because if you don't, you can be waiting for a response that never comes, because what you have written, has not yet been written (not by the kernel, but by the <stdio.h> library)
As I said, I don't know if this is exactly the answer to your question, but for sure it can give you a hint on where the problem could be.

Related

Is ftruncate considered a "write operation" for the purposes of O_SYNC?

When writing to a file opened with O_SYNC, the data (and metadata) is guaranteed to be written to persistent storage when the write call returns, and no explicit fsync call is needed.
Is the same true for ftruncate? Or do I still need to call fsync after ftruncate even with O_SYNC?
Not all filesystems are capable of dealing with holes. There will be some filesystems that actually have to physically write 0's when you call ftruncate().
So logically, ftruncate() be treated like a write() and be subject to O_SYNC.
The POSIX definition of O_SYNC says:
Write I/O operations on the file descriptor shall complete as defined by synchronized I/O file integrity completion.
And the POSIX definition for "synchronized I/O file integrity completion":
Identical to a synchronized I/O data integrity completion with the addition that all file attributes relative to the I/O operation [...] are successfully transferred prior to returning to the calling process.
And the definition for "Synchronized I/O Data Integrity Completion":
[...] The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred.
That includes the file size.
But notably it only applies to "writes" (and "reads").
However, neither POSIX nor the Linux man pages define what a "write" or "write I/O" is, and in particular, whether ftruncate() counts as one.
So if you want to get lawyery about it, it is not strictly guaranteed anywhere, although I think it's a bug in the specification.
In practice, though, I doubt any file system that implements O_SYNC and ftruncate() would require you to call fsync() after ftruncate() of a file opened with O_SYNC.

What's the difference between `flush` and `sync_all`?

I'm curious whether I should use Write::flush or File::sync_all when I finish writing a file.
TL;DR: If you want to "ensure" that the data has been written to the device, then use File::sync_all if you use a File. Note that this isn't necessary though.
The Write::flush implementation for File uses the operating system dependent flush operation, for example std::sys::unix::File::flush, or std::sys::windows::File::flush. Those flush operations do... nothing. Both implementations just return Ok(()).
Why? Because the write() already uses the underlying system call for write() in both cases; the handle-based write on Windows, and the file descriptor-based write on Unix-like systems. At that point, it's out of reach of the Rust environment, save for a system call that's specific to files.
So what is Write::flush useful for? It's useful if you have any kind of buffer before the actual file, for example a BufWriter. If you have a File wrapped by a BufWriter, then you need to use flush to ensure that the bytes get written to the file. While it's useful to keep in mind that BufWriter's Drop implementation also tries(!) to write those bytes, it may or may not work, so you're supposed to call Write::flush there (see BufWriter's documentation).
That being said, sync_all isn't necessary and instead will block your program. The operating system will handle the file system synchronisation. While you can certainly wait for that synchronisation to happen via sync_data or sync_all, you're usually better of with not doing either.
Write::flush for on-disk file is actually a no-op [source]. It's useless for File, just impl for consistency. This interface is meant for stream that utilizes app-level in-memory buffer before writing into destination, as stated in the doc:
Flush this output stream, ensuring that all intermediately buffered contents reach their destination.
File::sync_data is the kinda like the useful version of flush for File. Under the hood, intermediate buffer is used on kernel-level, and sync_data delegates to fdatasync POSIX call, which does what flush does on app-level, .
File::sync_all does what File::sync_data does, and on top of that, it also ensure metadata about a file is written to disk. It delegates to fsync on POSIX system.
Sidenote: depending on system (e.g. macOS, android, etc.), implementation for File::sync_data and File::sync_all could be exactly the same.

How to read a [nonblocking] filedescriptor of a file that is appended to (aka, like tail -f)?

Actually, I am using libev; but under the hood this is using epoll (I'm only on linux). When I add a watcher to read a file and all data has been read then I do get a call back that there is data to read, but read(2) returns 0 (EOF). At that point I have to stop the watcher or else it will continue to tell me that there is something to read. However, if I stop the watcher and then some other process appends data to that file then I'll never see it.
What is the correct way to get notified that there is additional/appended data in a file that can be read when before I already read till the end?
I'd prefer the answer in terms of libev, but lower level will do too (I can then probably translate that to how to do that with libev).
It is very common, for some reason, for people to think that making an fd nonblocking, or calling poll/select/.. has different behaviour for files compared to other types of file descriptions, but nonblocking behaviour and I/O readyness behaviour is essentially the same for all of types of file descriptions: the kernel will immediately return from read/write etc. if the outcome is known, and will signal I/O readyness when this is the case. When a socket has an EOF condition, select will signal that the socket is ready to read, and you will get 0 (for EOF). The same happens for files - if you are at the end of a file, the kernel will return immediately from read and return 0 to signal EOF.
The important difference is that files can change contents at random places, and can be extended. Pipes and sockets are not random access and cannot be appended to once closed. Thus, while the behaviour is consistent, it is often not what is wanted, namely waiting for a file to change in some way.
The conflict in many people's minds is simply that they want to be told "when there is new data", but if you think about it a bit, you will realise that simply waking you up would not be an adequate interface for this, as you have no way of knowing why you woke up, and what changed.
POSIX doesn't have an interface to do that, other than regularly polling the fd or file (and in case of random changes, regularly reading the whole file!). Some operating systems have an interface to do something similar to that (kqueue on BSDs, inotify on GNU/Linux) , but they are usually not a perfect match, either (for example, inotify cannot watch an fd for changes, it will watch a path for changes).
The closest you can get with libev is to use an ev_stat watcher. It behaves as if you would stat() a path regularly, and invoke the watcher callback whenever the stat data changes. Portably, it does just that: it regularly calls stat, but on some operating systems (currently only inotify on GNU/Linux, as kqueue doesn't have correct semantics for this) it can use other mechanisms to speed this up in some cases, although it will fall back to regular stat polling everywhere, for example for when the file is on a network file system, where inotify can't see remote changes.
To answer your question: If you have a path, you can use an ev_stat watcher to watch for stat data changes, such as size/mtime etc. changes. Doing this right can be a bit tricky (see the libev documentation, especially the part about stat time resolution: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_stat_code_did_the_file_attri), and you have to keep in mind that this watches a path, not a file descriptor, so you might want to compare the device/inode of your file descriptor and the watched path regularly to see if you still have the correct file open.
This still doesn't tell you what part of the file has changed.
Alternatively, since you apparently only want to read appended data, you could opt to just read() the file regularly (in an ev_timer callback) and do away with all the complexity and hassles of an ev_stat watcher setup (while not forgetting to also compare the path stat data with your fd stat data to see if you still hasve the right file open, depending on whether the file your are reading might get renamed or replaced. Sometimes programs also truncate files, something you can also detect by seeing the size decrease between stat calls).
This is essentially what older tail -f implementations do, while newer ones might, for example, take hints (only) from inotify, just like ev_stat watchers do.
None of that is easy, and details depend on your knowledge of how exactly the file changes, but it's the best you can do.

How exactly stdin, stdout, stderr are implemented in linux?

How exactly stdin, stderr, stdout are implemented in LINUX?
They are certainly not physical files. They must be some sort of temporary storage arrangement made by OS in the RAM for every process.
Are these array data structures attached to every process separately?
stdin, stderr, and stdout are file descriptors (or FILE* wrappers around them, if you mean the C stdio objects bearing those names). File descriptors are numbers that index a per-process data structure in the kernel. That data structure records which I/O channels a process has open, I/O channel being my ad-hoc term for a file, device, socket, or pipe.
By convention, the first entry in the table has index 0 and is called the standard input, 1 is the standard output and 2 is the standard error channel. This is just a convention in Unix programs; as far as the kernel is concerned, nothing's special about these numbers.
Each I/O system call (read, write, etc.) takes a file descriptor that indicates which channel the call should operate on.

Does each Unix file description have its own read/write buffers?

In reference to this question about read() and write(), I'm wondering if each open file description has its own read and write buffers or if perhaps there's a single read and write buffer for a file when it has been opened multiple times at once. I'm curious because this would have an effect on what exactly happens with overlapping writes to the same file. Perhaps this is something that varies among Unixes?
(To my understanding, "file description" refers to the info/options about an open file, such as the current marker position. "File descriptor", in contrast, refers to just the number used in a process to refer to a description.)
This depends a bit on whether you are talking about sockets or actual files.
Strictly speaking, a descriptor never has its own buffers; it's just a handle to a deeper abstraction.
File system objects have their "own" buffers, at least when they are required. That is, if a program writes less than the file system block size, the kernel has no choice but to read a FS block and merge the write with the existing data.
This buffer is attached to the vnode and at a lower level, possibly an inode. It's owned by the file and not the descriptor. It may be kept around for a long time if memory is available.
In the case of a socket, then a stream, but not specifically a single descriptor, does actually have buffers that it owns.
If the files were open in blocking mode then yes there should only be one buffer. I would bet the default is non blocking for performance reasons.

Resources