POSIX replacement for inotify? - linux

I need to tail an input file for commands, ignoring EOF. I've been using inotify(2) to block until changes have been made to the file after reaching EOF, which works fine. However, inotify(2) is a Linux-specific syscall. Are there any alternatives defined in POSIX?

Are there any alternatives defined in POSIX?
No.
Well, it's easy to prove that something exists - it's there. It's harder to prove something is not there.
It's not there. There is no POSIX interface with similar functionality as inotify or kqueue.
If you want to be portable, handle each system separately. Don't reinvent the wheel - libuv and libevent exist.

Related

Is there a simple way to fork a file descriptor?

I've just read a handful of man pages: dup, dup2, fcntl, pread/pwrite, mmap, etc.
Currently I am using mmap, but it's not the nicest thing in the world because I have to manage file offset and buffer length myself and basically reimplement read/write in userspace.
From what I gathered:
dup, dup2, fcntl just create aliases for the fds, so their offsets and flags are shared - reading from one advances the offset of the others.
pread/pwrite can be buggy and give inconsistent results.
mmap is buggy on linux when given some uncommon flags, but I don't need them.
Am I missing something or is mmap really the way to go?
(Note that re-open()ing a file is dangerous on POSIX - unlike Windows, POSIX provides no guarantees on the path not being moved/deleted while the file is open. On POSIX, you can open a path, move the file, and still read from it. You can even delete the file sometimes. I also couldn't find anything that can open an inode.)
I'd like answers for at least the most common POSIX variants, if there's no one answer for them all.
On Linux, opening /proc/self/fd/$NUM will work regardless of whether the file still has the same name it had the first time you opened it, and will generate a new open file description (i.e. a new fd with independent offset and flags).
I don't know of any POSIXly portable way of doing this.
(I also don't know what you mean about pread/pwrite being buggy...)

How can I create a userspace filesystem with FUSE without using libfuse?

I've found that the FUSE userspace library and kernel interface has been ported, since its inception on Linux, to many other systems, and presents a relatively stable API with a supposedly small surface area. If I wanted to author a filesystem in userspace, and I were not on Plan 9 or Hurd, I would think that FUSE is my best choice.
However, I am not going to use libfuse. This is partially because of pragmatism; using C is hard in my language of choice (Monte). It's also because I am totally uninterested in writing C support code, and libfuse's recommended usage is incompatible with Monte philosophy. This shouldn't be a problem, since C is not magical and /dev/fuse can be opened with standard system calls.
Going to look for documentation, however, I've found none. There is no documentation that I can find for the /dev/fuse ABI/API, and no stories of others taking this same non-C-bound route. Frustrating.
Does any kind of documentation exist on how to interact in a language-agnostic way with /dev/fuse and the FUSE subsystem of the kernel? If so, could you point me to it? Thanks!
Update: There exists go-fuse, which is in Go, a slightly more readable language than C. However, it does not contain any ABI/API documentation either.
Update: I notice that people have voted to close this. Don't worry, there is no need for that. I have satisfied myself that the documentation that I desire does not yet exist. I will write the documentation myself, publish it, and then link to it in an accepted answer. Hopefully the next person to search for this documentation will not be disappointed.
(I'm not accepting this until it's complete. In the interim, edits are welcome!)
The basic outline of a FUSE session:
open() is called on /dev/fuse. I'll call the resulting FD the control FD.
mount() is called with the target mount point, filesystem type "fuse" for normal mode or "fuseblk" for block-device mode, and options including "fd=X" where X is the control FD.
FUSE-specific structs are transferred on the control FD repeatedly. The general pattern of communication follows a request-response pattern, where the program read()s filesystem commands from the control FD and then write()s responses back.
umount() is called with the target mount point.
close() is called on the control FD.
With that all said, there's a handful of complications that one should be aware of. First, mount() is almost always a privileged syscall, so you'll have to be root to mount a FUSE filesystem. However, as one may have noticed, FUSE programs can generally be started as non-root! How?
There's a helper, /bin/fusermount, installed setuid. Usage is totally undocumented, but that's what I'm here for. Instead of open()ing /dev/fuse yourself, run fusermount as a subprocess, passing the target mount point as an argument, any extra mount options you like with -o, and (crucially) with the environment variable _FUSE_COMMFD exported and set to the ASCII string of an open FD, which I'll call the comm FD. You must create the comm FD yourself using e.g. pipe(). fusermount will call open() and mount() for you, and share the control FD back to you along the comm FD, using the sendmsg() trick for sharing FDs. Use recvmsg() to read it back.
Editorial: I really don't understand why this is structured to be so difficult. FDs are inherited by subprocesses; it would have been so much easier to open() the control FD in the top process and pass it down into fusermount. True, there's some confused deputy dangers, but fusermount is already installed and setuid and dangerous.
Anyway! fusermount will crudely daemonize and take care of calling umount() and close() to clean up once your main process exits.
Things not yet covered:
How is non-blocking access to FUSE handled? Can the control FD just be kicked into non-blocking mode? Does it actually not block, or does it behave like an ordinary file and secretly block on access?
The struct layouts. These can be more or less rediscovered from the C or Go source, but that's no excuse. I'll document them more seriously when I've worked up sufficient masochism.

Creating temporary named fifo in *nix system

I have some tasks requiring massive temporary named pipes to deal with.
Originally, I just simply think that generate random numbers, then append it as <number>.fifo be the name of named pipe.
However, I found this post: Create a temporary FIFO (named pipe) in Python?
It seems there is something I don't know that may cause some security issue there.
So my question here is that, what's the best way to generate a named pipe?
Notice that even though I am referencing a Python related post, I don't really mean to ask only in Python.
UPDATE:
Since I want to use a named pipe to connect unrelated processes, my plan is having process A call process B first via shell, and capture stdout to acquire the name of pipe, then both know what to open.
Here I am just worrying about whether leaking the name of pipe will become an issue. Before I never thought of it, until I read that Python post.
If you have to use named FIFOs and need to ensure that overlap/overwriting cannot occur, your best bet is probably to use some combination of mktemp and mkfifo.
Although mktemp itself cannot create FIFOs, it can be used to create unique temporary directories, which you can then put your FIFOs into.
The GNU mktemp documentation has an example of this.
Alternatively, you could create some name containing well random letters. You could read from /dev/random (or /dev/urandom, read random(4)) some random bytes to e.g. seed a PRNG (e.g. random(3) seeded by srandom), and/or mix the PID and time, etc.
And since named fifo(7) are files, you should use the permission system (and/or ACL) on them. In particular, you might create a command Linux user to run all your processes and restrict the FIFOs to be only owner-readable, etc.
Of course, and in all cases, you need to "store" or "transmit" securely these FIFO names.
If you start your programs in some bash script, you might consider making your fifo names using mktemp(1) as:
fifoname=$(mktemp -u -t yourprog_XXXXXX).fifo-$RANDOM-$$
mkfifo -m 0600 $fifoname
(perhaps in some loop). I guess it would be secure enough if the script is running in a dedicated user (and then pass the $fifoname in some pipe or file, not as a program argument)
The recent renameat2(2) syscall might be helpful (atomicity of RENAME_EXCHANGE).
BTW, you might want some SElinux. Remember that opened file descriptors -and that includes your fifos- are available as symlinks in proc(5) !
PS. it all depends upon how paranoid are you. A well sysadmined Linux system can be quite secure...

Can regular file reading benefited from nonblocking-IO?

It seems not to me and I found a link that supports my opinion. What do you think?
The content of the link you posted is correct. A regular file socket, opened in non-blocking mode, will always be "ready" for reading; when you actually try to read it, blocking (or more accurately as your source points out, sleeping) will occur until the operation can succeed.
In any case, I think your source needs some sedatives. One angry person, that is.
I've been digging into this quite heavily for the past few hours and can attest that the author of the link you cited is correct. However, the appears to be "better" (using that term very loosely) support for non-blocking IO against regular files in native Linux Kernel for v2.6+. The "libaio" package contains a library that exposes the functionality offered by the kernel, but it has some caveats about the different types of file systems which are supported and it's not portable to anything outside of Linux 2.6+.
And here's another good article on the subject.
You're correct that nonblocking mode has no benefit for regular files, and is not allowed to. It would be nice if there were a secondary flag that could be set, along with O_NONBLOCK, to change this, but due to the way cache and virtual memory work, it's actually not an easy task to define what correct "non-blocking" behavior for ordinary files would mean. Certainly there would be race conditions unless you allowed programs to lock memory associated with the file. (In fact, one way to implement a sort of non-sleeping IO for ordinary files would be to mmap the file and mlock the map. After that, on any reasonable implementation, read and write would never sleep as long as the file offset and buffer size remained within the bounds of the mapped region.)

POSIX: Pipe syscall in FreeBSD vs Linux

In Linux (2.6.35-22-generic), man pipe states that
pipe() creates a pipe, a unidirectional data channel that can be used for interprocess communication."
In FreeBSD (6.3-RELEASE-p5), man pipe states that
The pipe() system call creates a pipe, which is an object allowing bidirectional data flow, and allocates a pair of file descriptors."
One is unidirectional, the other is bidirectional. I hope this isn't a silly question, but which method is the standard way of doing this? Are they both POSIX compliant?
To make my intentions clear, I lost some points on an exam for believing pipe() was one way and am looking for some ammo to get any points back ;p
I started this as a comment on Greg's answer at first, but it occurs to me that it more closely answers your specific question:
pipe()s documentation in the POSIX standard explicitly states that the behavior in question is "unspecified" -- that is, pipe() is not required to be bidirectional, though it's not forbidden. Linux's is unidirectional, FreeBSD's is bidirectional. Both are compliant, one just implements additional behavior that is not required (but doesn't break apps built to work on compliant systems).
Data can be written to the file
descriptor fildes[1] and read from the
file descriptor fildes[0]. A read on
the file descriptor fildes[0] shall
access data written to the file
descriptor fildes[1] on a
first-in-first-out basis. It is
unspecified whether fildes[0] is also
open for writing and whether fildes[1]
is also open for reading.
I wouldn't count on getting the points back (though you should). Professors have a tendency to ignore the real world in favor of whatever they've decided is correct.
The FreeBSD man page for pipe is pretty clear on this point:
The bidirectional nature of this implementation of pipes is not portable to older systems, so it is recommended to use the convention for using the endpoints in the traditional manner when using a pipe in one direction.

Resources