Transactionally writing files in Node.js - node.js

I have a Node.js application that stores some configuration data in a file. If you change some settings, the configuration file is written to disk.
At the moment, I am using a simple fs.writeFile.
Now my question is: What happens when Node.js crashes while the file is being written? Is there the chance to have a corrupt file on disk? Or does Node.js guarantee that the file is written in an atomic way, so that either the old or the new version is valid?
If not, how could I implement such a guarantee? Are there any modules for this?

What happens when Node.js crashes while the file is being written? Is
there the chance to have a corrupt file on disk? Or does Node.js
guarantee that the file is written in an atomic way, so that either
the old or the new version is valid?
Node implements only a (thin) async wrapper over system calls, thus it does not provide any guarantees about atomicity of writes. In fact, fs.writeAll repeatedly calls fs.write until all data is written. You are right that when Node.js crashes, you may end up with a corrupted file.
If not, how could I implement such a guarantee? Are there any modules for this?
The simplest solution I can come up with is the one used e.g. for FTP uploads:
Save the content to a temporary file with a different name.
When the content is written on disk, rename temporary file to destination file.
The man page says that rename guarantees to leave an instance of newpath in place (on Unix systems like Linux or OSX).

fs.writeFile, just like all the other methods in the fs module are implemented as simple wrappers around standard POSIX functions (as stated in the docs).
Digging a bit in nodejs' code, one can see that the fs.js, where all the wrappers are defined, uses fs.c for all its file system calls. More specifically, the write method is used to write the contents of the buffer. It turns out that the POSIX specification for write explicitly says that:
Atomic/non-atomic: A write is atomic if the whole amount written in
one operation is not interleaved with data from any other process.
This is useful when there are multiple writers sending data to a
single reader. Applications need to know how large a write request can
be expected to be performed atomically. This maximum is called
{PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether
write requests for more than {PIPE_BUF} bytes are atomic, but requires
that writes of {PIPE_BUF} or fewer bytes shall be atomic.
So it seems it is pretty safe to write, as long as the size of the buffer is smaller than PIPE_BUF. This is a constant that is system-dependent though, so you might need to check it somewhere else.

write-file-atomic will do what you need. It writes to temporary file, then rename. That's safe.

Related

What's the difference between `flush` and `sync_all`?

I'm curious whether I should use Write::flush or File::sync_all when I finish writing a file.
TL;DR: If you want to "ensure" that the data has been written to the device, then use File::sync_all if you use a File. Note that this isn't necessary though.
The Write::flush implementation for File uses the operating system dependent flush operation, for example std::sys::unix::File::flush, or std::sys::windows::File::flush. Those flush operations do... nothing. Both implementations just return Ok(()).
Why? Because the write() already uses the underlying system call for write() in both cases; the handle-based write on Windows, and the file descriptor-based write on Unix-like systems. At that point, it's out of reach of the Rust environment, save for a system call that's specific to files.
So what is Write::flush useful for? It's useful if you have any kind of buffer before the actual file, for example a BufWriter. If you have a File wrapped by a BufWriter, then you need to use flush to ensure that the bytes get written to the file. While it's useful to keep in mind that BufWriter's Drop implementation also tries(!) to write those bytes, it may or may not work, so you're supposed to call Write::flush there (see BufWriter's documentation).
That being said, sync_all isn't necessary and instead will block your program. The operating system will handle the file system synchronisation. While you can certainly wait for that synchronisation to happen via sync_data or sync_all, you're usually better of with not doing either.
Write::flush for on-disk file is actually a no-op [source]. It's useless for File, just impl for consistency. This interface is meant for stream that utilizes app-level in-memory buffer before writing into destination, as stated in the doc:
Flush this output stream, ensuring that all intermediately buffered contents reach their destination.
File::sync_data is the kinda like the useful version of flush for File. Under the hood, intermediate buffer is used on kernel-level, and sync_data delegates to fdatasync POSIX call, which does what flush does on app-level, .
File::sync_all does what File::sync_data does, and on top of that, it also ensure metadata about a file is written to disk. It delegates to fsync on POSIX system.
Sidenote: depending on system (e.g. macOS, android, etc.), implementation for File::sync_data and File::sync_all could be exactly the same.

How to update part of a file atomically?

I have a big file (several gigabytes), and I want to update a small section in it (overwrite some bytes with a new value). This must be done atomically (either the operation succeeds, or the file is left unchanged). How can I do that?
The purpose is to store progress information in a file that takes a lot of time to generate/upload (it can be on a remote file system). There will probably be times where I need to write at different locations in the file (and commit all changes at once), but if needed I can rewrite the whole index, which is a contiguous block and relatively small compared to the rest of the file. There is only one process and thread writing to the file at any given time.
Normal disks are not transactional, and don't provide atomicity guarantees.
If the underlying file system doesn't provide atomic writes (and most of them don't), then you'll need to create atomicity in your own application/data structure. This could be done via journaling (as many file systems and databases do), copy-on-write techniques, etc.
In Windows, the Transactional File System (TxF) feature does exactly what you need - but your application will need to explicitly use the Win32 transactional file I/O APIs to do that.
I think simple lockfile should be enough...
For example proper-lockfile:
const lockfile = require('proper-lockfile');
lockfile.lock('some/file')
.then(() => doStuff())
.finally(() => lockfile.unlock('some/file'));
Note that any logic working with some/file has to respect the lockfile.

using files as IPC on linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?
A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.
I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

store some data in the struct inode

Hello I am a newbie to kernel programming. I am writing a small kernel module
that is based on wrapfs template to implement a backup mechanism. This is
purely for learning basis.
I am extending wrapfs so that when a write call is made wrapfs transparently
makes a copy of that file in a separate directory and then write is performed
on the file. But I don't want that I create a copy for every write call.
A naive approach could be I check for existence of file in that directory. But
I think for each call checking this could be a severe penalty.
I could also check for first write call and then store a value for that
specific file using private_data attribute. But that would not be stored on
disk. So I would need to check that again.
I was also thinking of making use of modification time. I could save a
modification time. If the older modification time is before that time then only
a copy is created otherwise I won't do anything. I tried to use inode.i_mtime
for this but it was the modified time even before write was called, also
applications can modify that time.
So I was thinking of storing some value in inode on disk that indicates its
backup has been created or not. Is that possible? Any other suggestions or
approaches are welcome.
You are essentially saying you want to do a Copy-On-Write virtual filesystem layer.
IMO, some of these have been done, and it would be easier to implement these in userland (using libfuse and the fuse module, e.g.). That way, you can be king of your castle and add your metadata in any which way you feel is appriate:
just add (hidden) metadata files to each directory
use extended POSIX attributes (setfattr and friends)
heck, you could even use a sqlite database
If you really insist on doing these things in-kernel, you'll have a lot more work since accessing the metadata from kernel mode is goind to take a lot more effort (you'd most likely want to emulate your own database using memory mapped files so as to minimize the amount of 'userland (style)' work required and to make it relatively easy to get atomicity and reliability right1.
1
On How Everybody Gets File IO Wrong: see also here
You can use atime instead of mtime. In that case setting S_NOATIME flag on the inode prevents it from updating (see touch_atime() function at the inode.c). The only thing you'll need is to mount your filesystem with noatime option.

Writing to a remote file: When does write() really return?

I have a client node writing a file to a hard disk that is on another node (I am writing to a parallel fs actually).
What I want to understand is:
When I write() (or pwrite()), when exactly does the write call return?
I see three possibilities:
write returns immediately after queueing the I/O operation on the client side:
In this case, write can return before data has actually left the client node (If you are writing to a local hard drive, then the write call employs delayed writes, where data is simply queued up for writing. But does this also happen when you are writing to a remote hard disk?). I wrote a testcase in which I write a large matrix (1GByte) to file. Without fsync, it showed very high bandwidth values, whereas with fsync, results looked more realistic. So looks like it could be using delayed writes.
write returns after the data has been transferred to the server buffer:
Now data is on the server, but resides in a buffer in its main memory, but not yet permanently stored away on the hard drive. In this case, I/O time should be dominated by the time to transfer the data over the network.
write returns after data has been actually stored on the hard drive:
Which I am sure does not happen by default (unless you write really large files which causes your RAM to get filled and ultimately get flushed out and so on...).
Additionally, what I would like to be sure about is:
Can a situation occur where the program terminates without any data actually having left the client node, such that network parameters like latency, bandwidth, and the hard drive bandwidth do not feature in the program's execution time at all? Consider we do not do an fsync or something similar.
EDIT: I am using the pvfs2 parallel file system
Option 3. is of course simple, and safe. However, a production quality POSIX compatible parallel file system with performance good enough that anyone actually cares to use it, will typically use option 1 combined with some more or less involved mechanism to avoid conflicts when e.g. several clients cache the same file.
As the saying goes, "There are only two hard things in Computer Science: cache invalidation and naming things and off-by-one errors".
If the filesystem is supposed to be POSIX compatible, you need to go and learn POSIX fs semantics, and look up how the fs supports these while getting good performance (alternatively, which parts of POSIX semantics it skips, a la NFS). What makes this, err, interesting is that the POSIX fs semantics harks back to the 1970's with little to no though of how to support network filesystems.
I don't know about pvfs2 specifically, but typically in order to conform to POSIX and provide decent performance, option 1 can be used together with some kind of cache coherency protocol (which e.g. Lustre does). For fsync(), the data must then actually be transferred to the server and committed to stable storage on the server (disks or battery-backed write cache) before fsync() returns. And of course, the client has some limit on the amount of dirty pages, after which it will block further write()'s to the file until some have been transferred to the server.
You can get any of your three options. It depends on the flags you provide to the open call. It depends on how the filesystem was mounted locally. It also depends on how the remote server is configured.
The following are all taken from Linux. Solaris and others may differ.
Some important open flags are O_SYNC, O_DIRECT, O_DSYNC, O_RSYNC.
Some important mount flags for NFS are ac, noac, cto, nocto, lookupcache, sync, async.
Some important flags for exporting NFS are sync, async, no_wdelay. And of course the mount flags of the filesystem that NFS is exporting are important as well. For example, if you were exporting XFS or EXT4 from Linux and for some reason you used the nobarrier flag, a power loss on the server side would almost certainly result in lost data.

Resources