What's the difference between `flush` and `sync_all`? - rust

I'm curious whether I should use Write::flush or File::sync_all when I finish writing a file.

TL;DR: If you want to "ensure" that the data has been written to the device, then use File::sync_all if you use a File. Note that this isn't necessary though.
The Write::flush implementation for File uses the operating system dependent flush operation, for example std::sys::unix::File::flush, or std::sys::windows::File::flush. Those flush operations do... nothing. Both implementations just return Ok(()).
Why? Because the write() already uses the underlying system call for write() in both cases; the handle-based write on Windows, and the file descriptor-based write on Unix-like systems. At that point, it's out of reach of the Rust environment, save for a system call that's specific to files.
So what is Write::flush useful for? It's useful if you have any kind of buffer before the actual file, for example a BufWriter. If you have a File wrapped by a BufWriter, then you need to use flush to ensure that the bytes get written to the file. While it's useful to keep in mind that BufWriter's Drop implementation also tries(!) to write those bytes, it may or may not work, so you're supposed to call Write::flush there (see BufWriter's documentation).
That being said, sync_all isn't necessary and instead will block your program. The operating system will handle the file system synchronisation. While you can certainly wait for that synchronisation to happen via sync_data or sync_all, you're usually better of with not doing either.

Write::flush for on-disk file is actually a no-op [source]. It's useless for File, just impl for consistency. This interface is meant for stream that utilizes app-level in-memory buffer before writing into destination, as stated in the doc:
Flush this output stream, ensuring that all intermediately buffered contents reach their destination.
File::sync_data is the kinda like the useful version of flush for File. Under the hood, intermediate buffer is used on kernel-level, and sync_data delegates to fdatasync POSIX call, which does what flush does on app-level, .
File::sync_all does what File::sync_data does, and on top of that, it also ensure metadata about a file is written to disk. It delegates to fsync on POSIX system.
Sidenote: depending on system (e.g. macOS, android, etc.), implementation for File::sync_data and File::sync_all could be exactly the same.

Related

Is ftruncate considered a "write operation" for the purposes of O_SYNC?

When writing to a file opened with O_SYNC, the data (and metadata) is guaranteed to be written to persistent storage when the write call returns, and no explicit fsync call is needed.
Is the same true for ftruncate? Or do I still need to call fsync after ftruncate even with O_SYNC?
Not all filesystems are capable of dealing with holes. There will be some filesystems that actually have to physically write 0's when you call ftruncate().
So logically, ftruncate() be treated like a write() and be subject to O_SYNC.
The POSIX definition of O_SYNC says:
Write I/O operations on the file descriptor shall complete as defined by synchronized I/O file integrity completion.
And the POSIX definition for "synchronized I/O file integrity completion":
Identical to a synchronized I/O data integrity completion with the addition that all file attributes relative to the I/O operation [...] are successfully transferred prior to returning to the calling process.
And the definition for "Synchronized I/O Data Integrity Completion":
[...] The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred.
That includes the file size.
But notably it only applies to "writes" (and "reads").
However, neither POSIX nor the Linux man pages define what a "write" or "write I/O" is, and in particular, whether ftruncate() counts as one.
So if you want to get lawyery about it, it is not strictly guaranteed anywhere, although I think it's a bug in the specification.
In practice, though, I doubt any file system that implements O_SYNC and ftruncate() would require you to call fsync() after ftruncate() of a file opened with O_SYNC.

Do I need to flush named pipes?

I cannot find whether named pipes are buffered, hence the question.
The manpage says https://linux.die.net/man/3/mkfifo:
A FIFO special file is similar to a pipe ... any process can open it for reading or writing, in the same way as an ordinary file.
Pipes are not buffered, no need to flush. But in a ordinary file, I would fflush (or fsync) the file descriptor.
How about named pipe?
Pipes are not buffered, no need to flush.
I'd actually put that the other way around: for most intents and purposes, pipes are nothing but buffer. It is not meaningful to flush them because there is no underlying device to receive the data.
Moreover, although POSIX does not explicitly forbid additional buffering of pipe I/O, it does place sufficient behavioral requirements that I don't think there's any way to determine from observation whether such buffering occurs, except possibly by whether fsync() succeeds. In other words, even if there were extra buffering, it should not be necessary to fsync() a pipe end.
But in a ordinary file, I
would fflush (or fsync) the file descriptor.
Well no, you would not fflush() a file descriptor. fflush() operates on streams, represented by FILE objects, not on file descriptors. This is a crucial distinction, because most streams are buffered at the C library level, independent of the nature of the file underneath. It is this library-level buffer that fflush() interacts with. You can control the library-level buffering mode of a stream via the setvbuf() function.
On those systems that provide it, fsync() operates at a different, lower level. It instructs the OS to ensure that all data previously written to the specified file descriptor has been delivered to the underlying storage device. In other words, it flushes OS-level buffers.
Note well that you can wrap a stream around a pipe-end file descriptor via the fdopen() function. That doesn't make the pipe require flushing any more than it did before, but the stream will be buffered by default, so flushing will be relevant to it.
Note, too, that some storage devices perform their own buffering, so that even after the data have been handed off to a storage device, it is not certain that they are immediately persistent.
How about named pipe?
The discussion above about stream I/O vs. POSIX descriptor-based I/O applies here, too. If you access a named pipe via a stream, then its interaction with fflush() will depend on the buffering of that stream.
But I suppose your question is more about os-level buffering and flushing. POSIX does not appear to say much concrete, but since you tag [linux] and refer to a Linux manual page in your question, I offer this in response:
The only difference between pipes and FIFOs is the manner in which
they are created and opened. Once these tasks have been accomplished,
I/O on pipes and FIFOs has exactly the same semantics.
(Linux pipe(7) manual page.)
I don't understand quite well what you try to ask, but as you have been already told, pipes are not more than buffer.
Historically, fifos (or pipes) consumed the direct blocks of the inode used to maintain them, and they tied to a file (having it a name or not) in some filesystem.
Today, I don't know the exact implementation details for a fifo, but basically, the kernel buffers all data the writers have already written, but the readers haven't read yet. The fifo has an upper limit (system defined) for the amount of buffer they can support, but normally that fails around 10-20kb of data.
The kernel buffers, but there's no delay between writers and readers, because as soon a writer writes on a pipe, the kernel awakens all the readers waiting for it to have data. The reverse is also true, in the case the pipe gets full of data, as soon as a reader consumes it, all the writers are awaken to allow for filling it again.
Anyway, your question about flushing has nothing to do with pipes (well, not like, let me explain myself) but with <stdio.h> package. <stdio.h> does buffer, and it handles buffering on each FILE * individually, so you have calls for flushing buffers when you want them to be write(2)n to disk.
has a dynamic behaviour that allows to optimize buffering and not force programmers to have to flush at each time. That depends on the type of file descriptor associated with a FILE * pointer.
When the FILE * pointer is associated to a serial tty (it does check that calling to isatty(3) call, which internally makes an ioctl(2) call, that allow <stdio.h> to see if you are against a serial device, a char device. If this happens, then <stdio.h> does line buffering that means that always when a '\n' char is output to a device, the buffer is automatically buffered.
This supposes an optimization problem, because when, for example you are using cat(1) to copy a file, the largest the buffer normally supposes the most efficient approach. Well, <stdio.h> comes to solve the problem, because when output is not a tty device, it makes full buffering, and only flushes the internal buffers of the FILE * pointer when it is full of data.
So the question is: How does <stdio.h> behave with a fifo (or pipe) node? The answer is simple.... is is not a char device (or a tty) so <stdio.h> does full buffering on it. If you are communicating data between two processes and you want the reader to receive the data as soon as you have printf(3)ed it, then you have better to fflush(3), because if you don't, you can be waiting for a response that never comes, because what you have written, has not yet been written (not by the kernel, but by the <stdio.h> library)
As I said, I don't know if this is exactly the answer to your question, but for sure it can give you a hint on where the problem could be.

Transactionally writing files in Node.js

I have a Node.js application that stores some configuration data in a file. If you change some settings, the configuration file is written to disk.
At the moment, I am using a simple fs.writeFile.
Now my question is: What happens when Node.js crashes while the file is being written? Is there the chance to have a corrupt file on disk? Or does Node.js guarantee that the file is written in an atomic way, so that either the old or the new version is valid?
If not, how could I implement such a guarantee? Are there any modules for this?
What happens when Node.js crashes while the file is being written? Is
there the chance to have a corrupt file on disk? Or does Node.js
guarantee that the file is written in an atomic way, so that either
the old or the new version is valid?
Node implements only a (thin) async wrapper over system calls, thus it does not provide any guarantees about atomicity of writes. In fact, fs.writeAll repeatedly calls fs.write until all data is written. You are right that when Node.js crashes, you may end up with a corrupted file.
If not, how could I implement such a guarantee? Are there any modules for this?
The simplest solution I can come up with is the one used e.g. for FTP uploads:
Save the content to a temporary file with a different name.
When the content is written on disk, rename temporary file to destination file.
The man page says that rename guarantees to leave an instance of newpath in place (on Unix systems like Linux or OSX).
fs.writeFile, just like all the other methods in the fs module are implemented as simple wrappers around standard POSIX functions (as stated in the docs).
Digging a bit in nodejs' code, one can see that the fs.js, where all the wrappers are defined, uses fs.c for all its file system calls. More specifically, the write method is used to write the contents of the buffer. It turns out that the POSIX specification for write explicitly says that:
Atomic/non-atomic: A write is atomic if the whole amount written in
one operation is not interleaved with data from any other process.
This is useful when there are multiple writers sending data to a
single reader. Applications need to know how large a write request can
be expected to be performed atomically. This maximum is called
{PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether
write requests for more than {PIPE_BUF} bytes are atomic, but requires
that writes of {PIPE_BUF} or fewer bytes shall be atomic.
So it seems it is pretty safe to write, as long as the size of the buffer is smaller than PIPE_BUF. This is a constant that is system-dependent though, so you might need to check it somewhere else.
write-file-atomic will do what you need. It writes to temporary file, then rename. That's safe.

using files as IPC on linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?
A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.
I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

Writing to a remote file: When does write() really return?

I have a client node writing a file to a hard disk that is on another node (I am writing to a parallel fs actually).
What I want to understand is:
When I write() (or pwrite()), when exactly does the write call return?
I see three possibilities:
write returns immediately after queueing the I/O operation on the client side:
In this case, write can return before data has actually left the client node (If you are writing to a local hard drive, then the write call employs delayed writes, where data is simply queued up for writing. But does this also happen when you are writing to a remote hard disk?). I wrote a testcase in which I write a large matrix (1GByte) to file. Without fsync, it showed very high bandwidth values, whereas with fsync, results looked more realistic. So looks like it could be using delayed writes.
write returns after the data has been transferred to the server buffer:
Now data is on the server, but resides in a buffer in its main memory, but not yet permanently stored away on the hard drive. In this case, I/O time should be dominated by the time to transfer the data over the network.
write returns after data has been actually stored on the hard drive:
Which I am sure does not happen by default (unless you write really large files which causes your RAM to get filled and ultimately get flushed out and so on...).
Additionally, what I would like to be sure about is:
Can a situation occur where the program terminates without any data actually having left the client node, such that network parameters like latency, bandwidth, and the hard drive bandwidth do not feature in the program's execution time at all? Consider we do not do an fsync or something similar.
EDIT: I am using the pvfs2 parallel file system
Option 3. is of course simple, and safe. However, a production quality POSIX compatible parallel file system with performance good enough that anyone actually cares to use it, will typically use option 1 combined with some more or less involved mechanism to avoid conflicts when e.g. several clients cache the same file.
As the saying goes, "There are only two hard things in Computer Science: cache invalidation and naming things and off-by-one errors".
If the filesystem is supposed to be POSIX compatible, you need to go and learn POSIX fs semantics, and look up how the fs supports these while getting good performance (alternatively, which parts of POSIX semantics it skips, a la NFS). What makes this, err, interesting is that the POSIX fs semantics harks back to the 1970's with little to no though of how to support network filesystems.
I don't know about pvfs2 specifically, but typically in order to conform to POSIX and provide decent performance, option 1 can be used together with some kind of cache coherency protocol (which e.g. Lustre does). For fsync(), the data must then actually be transferred to the server and committed to stable storage on the server (disks or battery-backed write cache) before fsync() returns. And of course, the client has some limit on the amount of dirty pages, after which it will block further write()'s to the file until some have been transferred to the server.
You can get any of your three options. It depends on the flags you provide to the open call. It depends on how the filesystem was mounted locally. It also depends on how the remote server is configured.
The following are all taken from Linux. Solaris and others may differ.
Some important open flags are O_SYNC, O_DIRECT, O_DSYNC, O_RSYNC.
Some important mount flags for NFS are ac, noac, cto, nocto, lookupcache, sync, async.
Some important flags for exporting NFS are sync, async, no_wdelay. And of course the mount flags of the filesystem that NFS is exporting are important as well. For example, if you were exporting XFS or EXT4 from Linux and for some reason you used the nobarrier flag, a power loss on the server side would almost certainly result in lost data.

Resources