using files as IPC on linux - linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?

A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.

I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

Related

How to safely use mmap() for reading?

I have a need to do a lot of random-access reads in a large file so I use mmap(). This solution seems to be perfect as long as the mapped file is untouched. But this is not always the case. If the mapped file is tampered with, several problems arise:
If a change to a file reduces its length or the file becomes inaccessible then a process in order to live must handle SIGBUS signal (at least, in Linux implementation). This adds additional complications since I'm writing a library.
To make things even worse, mmap() manpage says it is unspecified if changes to the original
file are propagated to the memory. So they can very well be propagated.
This essentially means the contents of the file I work with can become white noise at any moment.
Does all of this mean that any program that maps a freely accessible file and does not handle these problems can be brought down by a DoS attack? Even while I do not expect evil hackers to go after my program, I can easily see a user modifying my mapped file, replacing it with another one or making the file inaccessible by, for example, removing a USB drive. And while I can write a signal handler (and this is a bit messy, so I am looking for a better solution) to solve the first problem,
I have no idea how to solve the second one.
The file can not be copied and can be freely moved around if it's not used by a program (just like any other media file). Linux file locks do not always work.
So, how to safely use mmap() for reading?

Linux: How to prevent a file backed memory mapping from causing access errors (SIGBUS etc.)?

I want to write a wrapper for memory mapped file io, that either fails to map a file or returns a mapping that is valid until it is unmapped. With plain mmap, problems arise, if the underlying file is truncated or deleted while being mapped, for example. According to the linux man page of mmap SIGBUS is received if memory beyond the new end of the file is accessed after a truncate. It is no option to catch this signal and handle the error this way.
My idea was to create a copy of the file and map the copy. On a cow capable file system, this would impose little overhead.
But the problem is: how do I protect the copy from being manipulated by another process? A tempfile is no real option, because in theory a malicious process could still mutate it. I know that there are file locks on Linux, but as far as I understood they're either optional or don't prevent others from deleting the file.
I'm asking for two kinds of answers: Either a way to mmap a file in a rock solid way or a mechanism to protect a tempfile fully from other processes. But maybe my whole way of approaching the problem is wrong, so feel free to suggest radical solutions ;)
You can't prevent a skilled and determined user from intentionally shooting themselves in the foot. Just take reasonable precautions so it doesn't happen accidentally.
Most programs assume the input file won't change and that's usually fine
Programs that want to process files shared with cooperative programs use file locking
Programs that want a private file will create a temp file, snapshot or otherwise -- and if they unlink it for auto-cleanup, it's also inaccessible via the fs
Programs that want to protect their data from all regular user actions will run as a dedicated system account, in which case chmod is protection enough.
Anyone with access to the same account (or root) can interfere with the program with a simple kill -BUS, chmod/truncate, or any of the fancier foot-guns like copying and patching the binary, cloning its FDs, or attaching a debugger. If that's what they want to do, it's not your place to stop them.

How to read a [nonblocking] filedescriptor of a file that is appended to (aka, like tail -f)?

Actually, I am using libev; but under the hood this is using epoll (I'm only on linux). When I add a watcher to read a file and all data has been read then I do get a call back that there is data to read, but read(2) returns 0 (EOF). At that point I have to stop the watcher or else it will continue to tell me that there is something to read. However, if I stop the watcher and then some other process appends data to that file then I'll never see it.
What is the correct way to get notified that there is additional/appended data in a file that can be read when before I already read till the end?
I'd prefer the answer in terms of libev, but lower level will do too (I can then probably translate that to how to do that with libev).
It is very common, for some reason, for people to think that making an fd nonblocking, or calling poll/select/.. has different behaviour for files compared to other types of file descriptions, but nonblocking behaviour and I/O readyness behaviour is essentially the same for all of types of file descriptions: the kernel will immediately return from read/write etc. if the outcome is known, and will signal I/O readyness when this is the case. When a socket has an EOF condition, select will signal that the socket is ready to read, and you will get 0 (for EOF). The same happens for files - if you are at the end of a file, the kernel will return immediately from read and return 0 to signal EOF.
The important difference is that files can change contents at random places, and can be extended. Pipes and sockets are not random access and cannot be appended to once closed. Thus, while the behaviour is consistent, it is often not what is wanted, namely waiting for a file to change in some way.
The conflict in many people's minds is simply that they want to be told "when there is new data", but if you think about it a bit, you will realise that simply waking you up would not be an adequate interface for this, as you have no way of knowing why you woke up, and what changed.
POSIX doesn't have an interface to do that, other than regularly polling the fd or file (and in case of random changes, regularly reading the whole file!). Some operating systems have an interface to do something similar to that (kqueue on BSDs, inotify on GNU/Linux) , but they are usually not a perfect match, either (for example, inotify cannot watch an fd for changes, it will watch a path for changes).
The closest you can get with libev is to use an ev_stat watcher. It behaves as if you would stat() a path regularly, and invoke the watcher callback whenever the stat data changes. Portably, it does just that: it regularly calls stat, but on some operating systems (currently only inotify on GNU/Linux, as kqueue doesn't have correct semantics for this) it can use other mechanisms to speed this up in some cases, although it will fall back to regular stat polling everywhere, for example for when the file is on a network file system, where inotify can't see remote changes.
To answer your question: If you have a path, you can use an ev_stat watcher to watch for stat data changes, such as size/mtime etc. changes. Doing this right can be a bit tricky (see the libev documentation, especially the part about stat time resolution: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_stat_code_did_the_file_attri), and you have to keep in mind that this watches a path, not a file descriptor, so you might want to compare the device/inode of your file descriptor and the watched path regularly to see if you still have the correct file open.
This still doesn't tell you what part of the file has changed.
Alternatively, since you apparently only want to read appended data, you could opt to just read() the file regularly (in an ev_timer callback) and do away with all the complexity and hassles of an ev_stat watcher setup (while not forgetting to also compare the path stat data with your fd stat data to see if you still hasve the right file open, depending on whether the file your are reading might get renamed or replaced. Sometimes programs also truncate files, something you can also detect by seeing the size decrease between stat calls).
This is essentially what older tail -f implementations do, while newer ones might, for example, take hints (only) from inotify, just like ev_stat watchers do.
None of that is easy, and details depend on your knowledge of how exactly the file changes, but it's the best you can do.

Transactionally writing files in Node.js

I have a Node.js application that stores some configuration data in a file. If you change some settings, the configuration file is written to disk.
At the moment, I am using a simple fs.writeFile.
Now my question is: What happens when Node.js crashes while the file is being written? Is there the chance to have a corrupt file on disk? Or does Node.js guarantee that the file is written in an atomic way, so that either the old or the new version is valid?
If not, how could I implement such a guarantee? Are there any modules for this?
What happens when Node.js crashes while the file is being written? Is
there the chance to have a corrupt file on disk? Or does Node.js
guarantee that the file is written in an atomic way, so that either
the old or the new version is valid?
Node implements only a (thin) async wrapper over system calls, thus it does not provide any guarantees about atomicity of writes. In fact, fs.writeAll repeatedly calls fs.write until all data is written. You are right that when Node.js crashes, you may end up with a corrupted file.
If not, how could I implement such a guarantee? Are there any modules for this?
The simplest solution I can come up with is the one used e.g. for FTP uploads:
Save the content to a temporary file with a different name.
When the content is written on disk, rename temporary file to destination file.
The man page says that rename guarantees to leave an instance of newpath in place (on Unix systems like Linux or OSX).
fs.writeFile, just like all the other methods in the fs module are implemented as simple wrappers around standard POSIX functions (as stated in the docs).
Digging a bit in nodejs' code, one can see that the fs.js, where all the wrappers are defined, uses fs.c for all its file system calls. More specifically, the write method is used to write the contents of the buffer. It turns out that the POSIX specification for write explicitly says that:
Atomic/non-atomic: A write is atomic if the whole amount written in
one operation is not interleaved with data from any other process.
This is useful when there are multiple writers sending data to a
single reader. Applications need to know how large a write request can
be expected to be performed atomically. This maximum is called
{PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether
write requests for more than {PIPE_BUF} bytes are atomic, but requires
that writes of {PIPE_BUF} or fewer bytes shall be atomic.
So it seems it is pretty safe to write, as long as the size of the buffer is smaller than PIPE_BUF. This is a constant that is system-dependent though, so you might need to check it somewhere else.
write-file-atomic will do what you need. It writes to temporary file, then rename. That's safe.

Can file size be used to detect a partial append?

I'm thinking about ways for my application to detect a partially-written record after a program or OS crash. Since records are only ever appended to a file (never overwritten), is a crash while writing guaranteed to yield a file size that is shorter than it should be? Is this guaranteed even if the file was opened in read-write mode instead of append mode, so long as writes are always at the end of the file? This would greatly simplify crash recovery, since comparing the last record's expected size and position with the actual file size would be enough to detect a partial write.
I understand that random-access writes can be reordered by the filesystem, but I'm having trouble finding information on whether this can happen when appending. I imagine an out-of-order append would require the filesystem to create a "hole" at the tail of the (sparse) file, write blocks beyond the hole, and then fill in the blocks in between, but I'm hoping that such an approach would be so inefficient that nobody would ever implement their filesystem that way.
I suppose another problem might be a filesystem updating the directory entry's file size field before appending the new blocks to to the file, and the OS crashing in between. Does this ever happen in practice? (ext4, perhaps?) Is there a quick way to detect it? (And what happens when trying to read the unwritten blocks that should exist according to the file's size?)
Is there anything else, such as write reordering performed by a disk/flash drive, that would get in the way of using file size as a way to detect a partial append? I don't expect to be able to compensate for this sort of drive trickery in my application, but it would be good to know about.
If you want to be SURE that you're never going to lose records, you need a consistent journaling or transactional system for your files.
There is absolutely no guarantee that a write will have been fulfilled unless you either set O_DIRECT [which you probably do not want to do], or you use markers to indicate aht "this has been fully committed", that are only written when the file is closed. You can either do that in the mainfile, or, for example, have a file that records, externally, "last written record". If you open & close that file, it should be safe as long as the APP is what is crashing - if the OS crashes [or is otherwise abruptly stopped - e.g. power cut, disk unplugged, etc], all bets are off.
Write reordering and write caching is/can be done at all levels - the C library, the OS, the filesystem module and the hard disk/controller itself are all ABLE to reorder writes.

Resources