FUSE: Multiple opens on the same file - fuse

Does the OS/VFS/FUSE-layer manage the semantics of multiple handles to the same file, or is that something that the driver has to arbitrate?

Short: If you want to disallow that, you have to handle it in the driver.
Long: I did not find any indication in the POSIX error codes of open() that would prevent having multiple handles for the same file in the same process. Wikipedia states that this is fine:
The same file may be opened simultaneously by several processes, and even by the same process (resulting in several file descriptors for the same file) depending on the file organization and filesystem.
FUSE in it's documentation does not condemn it either; it often just proxies the semantics.
To try it, I opened the same file in Python twice, and got two different file descriptors.
In [1]: fd1 = open("./resting.org")
In [2]: fd2 = open("./resting.org")
In [3]: fd1.fileno()
Out[3]: 5
In [4]: fd2.fileno()
Out[4]: 6
So, you have to prevent it yourself, might stay POSIX compliant, since it's unspecified, but may violate an assumption some unknowing programmer made.

Related

Not closing os.devnull Python

I was wondering if there are problems with not closing the file os.devnull in python. I am aware that normally we need to close files that we open. Nevertheless, I am wondering if it is possible to tread os.devnull as we treat sys.stdout or sys.stderr where we don't close them.
With normal files, you run the risk of losing data when you don't close them due to buffering. This is obviously not a concern for /dev/null.
However, while /dev/null is technically not a regular file but a device file, it uses file descriptors the same way. You can even inspect the file descriptor in Python using the fileno() method:
import os
with open(os.devnull) as devnull:
print(devnull.fileno())
Operating systems limit the number of open file descriptors any single process can have. This is very unlikely to be a problem, but it's still good practice to treat /dev/null like any other file and close it for this reason alone.

What is the OS-level handle of tempfile.mkstemp good for?

I use tempfile.mkstemp when I need to create files in a directory which might stay, but I don't care about the filename. It should only be something that doesn't exist so far and have a prefix- and a suffix.
One part about the documentation that I ignored so far is
mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.
What is the OS-level handle and how should one use it?
Background
I always used it like this:
from tempfile import mstemp
_, path = mkstemp(prefix=prefix, suffix=suffix, dir=dir)
with open(path, "w") as f:
f.write(data)
# do something
os.remove(path)
It worked fine so far. However, today I wrote a small script which generates huge files and deletes them. The script aborted the execution with the message
OSError: [Errno 28] No space left on device
When I checked, there were 80 GB free.
My suspicion is that os.remove only "marked" the files for deletion, but the files were not properly removed. And the next suspicion was that I might need to close the OS-level handle before the OS can actually free that disk space.
Your suspicion is correct. The os.remove only removes the directory entry that contains the name of the file. However, the file data remains intact and continues to consume space on the disk until the last open descriptor on the file is closed. During that time normal operations on the file through existing descriptors continue to work, which means you could still use the _ descriptor to seek in, read from, or write to the file after os.remove has returned.
In fact it's common practice to immediately os.remove the file before moving on to using the descriptor to operate on the file contents. This prevents the file from being opened by any other process, and also means that the file won't be left hanging around if this program dies unexpectedly before reaching a later os.remove.
Of course that only works if you're willing and able to use the low-level descriptor for all of your operations on the file, or if you use the os.fdopen method to construct a file object on top of the descriptor and use that new object for all operations. Obviously you only want to do one of those things; mixing descriptor access and file-object access to the same underlying file can produce unexpected results.
os.fdopen(_) should execute faster than open(path) but it doesn't have the context manager integration that open has, so it's not directly usable in a with construct. I think you can use contextlib.closing to get around that.

Is there a simple way to fork a file descriptor?

I've just read a handful of man pages: dup, dup2, fcntl, pread/pwrite, mmap, etc.
Currently I am using mmap, but it's not the nicest thing in the world because I have to manage file offset and buffer length myself and basically reimplement read/write in userspace.
From what I gathered:
dup, dup2, fcntl just create aliases for the fds, so their offsets and flags are shared - reading from one advances the offset of the others.
pread/pwrite can be buggy and give inconsistent results.
mmap is buggy on linux when given some uncommon flags, but I don't need them.
Am I missing something or is mmap really the way to go?
(Note that re-open()ing a file is dangerous on POSIX - unlike Windows, POSIX provides no guarantees on the path not being moved/deleted while the file is open. On POSIX, you can open a path, move the file, and still read from it. You can even delete the file sometimes. I also couldn't find anything that can open an inode.)
I'd like answers for at least the most common POSIX variants, if there's no one answer for them all.
On Linux, opening /proc/self/fd/$NUM will work regardless of whether the file still has the same name it had the first time you opened it, and will generate a new open file description (i.e. a new fd with independent offset and flags).
I don't know of any POSIXly portable way of doing this.
(I also don't know what you mean about pread/pwrite being buggy...)

Usage of smbclient lock

From here, I got below information....
lock [filenum] [r|w] [hex-start] [hex-len]
This command depends on the server supporting the CIFS UNIX extensions and will fail if the server does not. Tries to set a POSIX fcntl lock of the given type on the given range. Used for internal Samba testing purposes.
However, I don't find the example for this command....
Form my understanding, [filenum] is the file name, [r|w] is read and/or write lock.
But I have no idea what [hex-start] and [hex-len] is.....
Someone could help?
lock is a simple implementation of advisory file locking using fcntl(). (In fact, years ago, I wrote a practically identical command-line utility, which executed a single command or script while holding a lock on the specified file.)
fcntl() locks work for remote filesystems if the servers have the support enabled. In particular, Samba and NFS servers on Linux definitely do have the capability. On NFS it is usually either misconfigured or outright disabled, so hardlink- or lock directory -based locking schemes are more common. Sadly.
Technically, fcntl() locks are not file locks, but record locks: any byte range in the file can be separately locked, even by different processes. The most common use is to lock the entire file (by specifying zero start and length, so the lock will apply even if the file is appended to). The lock command does exactly that if you omit both the hex-start and hex-length parameters.
If you do specify the hex-start to lock, it refers to the offset where the lock region starts. If you omit or use zero hex-length, then the lock applies to the rest of the file, even if the file is appended to, or truncated. If you also specify hex-length, then the lock applies to offsets [hex-start, hex-start+hex-length). The hex- prefix obviously refers to the values being specified in hexadecimal.
The locks are advisory, because they do not prevent any kind of access to the file. Every application needs to call fcntl(), to obtain an advisory lock on the file; if the desired lock would conflict with other locks on the same file, the call will block (F_SETLKW) or fail (F_SETLK).
Questions?

using files as IPC on linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?
A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.
I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

Resources