Open File Description Locks confusion

Open File Description Locks confusion - multithreading

As in - https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html#Open-File-Description-Locks
fcntl(F_OFD_SETLK) locks on an open file table entry, (usually obtained by open()). Easy to understand.
But in the following example :
https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks-Example.html#Open-File-Description-Locks-Example.
In its example process, each thread calls open(), so each file descriptor should point to a different open file table entry.
Then doing fcntl (fd, F_OFD_SETLKW, &lck) in each thread is just getting a lock on a different open file table entry, which means this locking is completely wrong.
But I tested on Ubuntu, and it works for some reason. What am I missing?

Related

what happens when calling ```touch .``` in linux?

this is a very specific question
I'm mainly interested in the open() system calls the happen when running touch ..
So I ran strace touch . and saw that opennat() is called three times.
but I'm not really understanding whats going on; as touch . does not print anything in the console and does not create a new file named "." since "." is a pointer to the current folder and can be seen by running ls -a so nothing is created since that name is already in use.
this is my assumption:
open() is called to check if the specified file name already exits, if a file descriptor is returned this means that the name is already in use and the operation is canceled.
please correct me if I'm wrong.

GNU touch prefers to use a file descriptor when touching files, since it's possible to write touch - > foo and expect the file foo to be touched. As a result, it always tries to open the specified path as a writable file, and if that's possible, it then uses that file descriptor to update the file timestamp.
In this case, it's not possible to open . for writing, so openat returns EISDIR. touch notices that it's a directory, so its call to its internal fdutimensat function gets an invalid file descriptor and falls back to using utimensat instead of futimens.
It isn't the case that the openat call is used to check that the file exists, but instead that using a file descriptor for many operations means that you don't have to deal with path resolution multiple times or handle symlinks, since all of those are resolved when the file descriptor is opened. This is why many long-lived programs choose to open a file descriptor to their current working directory, then change directories, and then use the file descriptor with fchdir to change back. Any pchanges to permissions after the program starts are not a problem.

Which system error should my libfuse filesystem return when attempting to read from a file that's not open?

I'm implementing a libfuse filesystem. When a file is opened, I read the file attributes and store them in a hash table keyed with the file handle I generate. This serves two purposes: to maintain a collection of open file handles and to cache the information I retrieve during opening.
Of course, nothing is stopping user code from trying to pass an invalid file handle, i.e. read from a file that's not open.
There are a number of error codes that I can return from the read function, but it's not clear to me which is the one that is expected in such situation.

As you can see in the POSIX standard, the correct value to return would be EBADF:
[EBADF] The fildes argument is not a valid file descriptor open for
reading.
That said, if user code passes an invalid file handle, the Linux kernel will return EBADF to the user before you, or fuse, get any say on the matter.

How does logrotate work when there are two process use the same file?

For example:
Program A is writing log to file "test.log".
If logrotate runs, it will rename "test.log" to "test.log.1" first, and then create a new file "test.log".
After step 2, program A does not report any error, but the A's log does not appear in the new file "test.log".
The questions are:
Where is the data that A write to file after step 2 ?
How can logrotate rename and create new file when another process is writing to the file? (Is any point that I miss about logrotate?)
Thanks!

This is very tightly related to how POSIX filesystems work. When you rename a file, it's only the name of the file that is changed, the physical file on the disk will not change. Also, once a file is opened, the process using the file only have a link (through many layers) to the physical file on the disk, the name is only used when opening the file.
That means the program A will still write to the same file, which now has the new name (i.e. test.log.1 in your example).
A common solution to this problem is to have the log rotation program send a signal (e.g. SIGHUP or SIGUSR1 or similar) to the process. The process will detect this signal and then reopen the logging to use the new file.

Clearing OS cache from mem-mapped files without file handle

I need to force OS to purge the pages used for a mapped file. I don't have the file descriptor, so posix_fadvise cannot be used.
Our application caches a lot of files by mapping them into memory. After the file has been mapped (i.e. we've got the pointer from mmap()), we close the file. When at some later point we have to clean the cache, we want to purge the pages in OS cache as well. That is, we want to unmap the file, and do something like posix_fadvise(POSIX_FADV_DONTNEED), but the file descriptor is not available at this point.
The flow looks like this:
//caching stage
fd = open("file");
data = mmap(fd, <mmap flags>);
close(fd);
//clean-up stage
munmap(data);
// posix_fadvise(???, POSIX_FADV_DONTNEED);
Is there a way to clear the cached pages without file descriptor?
I have thought about following two workarounds:
Keeping the files open, so that I have valid descriptors at the time of cleanup. However, there may be tens of thousands files, and keeping them all open may affect OS performance.
Keep the file path, and reopen it just to get a descriptor and call posix_fadvise(). But the question is: will the old mapped area be associated with the same file? And will fadvise() purge the cached pages in this scenario?

The second option worked. When the file is reopened later, the mapped area is associated with it, and calling posix_fadvise with new file descriptor unloads the mapped pages:
//caching stage
fd = open("file");
data = mmap(fd, <mmap flags>);
close(fd);
//clean-up stage
fd = open("file");
munmap(data);
posix_fadvise(fd, POSIX_FADV_DONTNEED);
close(fd);

Create a hard link from a file handle on Unix?

If I've got a handle to an open file, is it possible to create a hard link to that file after all references to it have been removed from the filesystem?
For example, something like this:
fd = fopen("/tmp/foo", "w");
unlink("/tmp/foo");
fwrite(fd, "Hello, world!\n");
create_link_from_fd(fd, "/tmp/hello");
fclose(fd);
Specifically, I'd like to do this so that I can safely write to large data files, then move them into place atomically without having to worry about cleaning up after myself if my program is killed in the middle of writing the file.

The newly released linux 3.11 offers a solution to this problem with the new O_TMPFILE open(2) flag. With this flag you can create an "invisible" file (i.e. an inode with no hardlinks) in some file system (specified by a directory in that file system). Then, after the file is fully set up, you can create a hardlink using linkat. It works like this:
fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
// write something to the file here
// fchown()/fchmod() it
linkat(fd, "", AT_FDCWD, "/tmp/test", AT_EMPTY_PATH);
Note that aside from the >=3.11 kernel requirement, this also requires support from the underlying file system (I tried the above snippet on ext3 and it worked, but it did not seem to work on btrfs).

Not generally, no. [Edit: since Linux 3.11 there is now linkat; see safsaf32's answer. This does not work on POSIX systems in general since POSIX linkat is restricted to directories only.] There are security considerations here: someone can pass to you an open file descriptor that you could not normally open on your own, e.g.:
mkdir lock; chmod 700 lock
echo secret contents > lock/in
sudoish cmd < lock/in
Here cmd runs as a user who has no permission to open the input file (lock/in) by name, but can still read from it. If cmd could create a new name on the same file system, it could pass the file contents on to a later process. (Obviously it can copy those contents, so this issue is more of a "pass the contents on by mistake" thing than "pass the contents on, on purpose".)
That said, people have come up with ways of "relinking" files by inode/vnode internally (it's pretty easy to do inside most file systems), so you could make your own private system call for it. The descriptor must refer to a real file on the appropriate mount point, of course—there's no way of "relinking" a pipe or socket or device into becoming a regular file.
Otherwise you're stuck with "catch signals and clean up and hope for the best", or a similar trick, "fork off a subprocess, run it, and if it succeeds/fails, take appropriate move/clean-up action".
Edit to add historical note: the above lock example is not particularly good, but back in the days of V6 Unix, MDQS used a fancier version of this trick. Bits and pieces of MDQS survive in various forms today.

On Linux, you might try the unportable trick of using /proc/self/fd by trying to call
char pbuf[64];
snprintf (pbuf, sizeof(pbuf), "/proc/self/fd/%d", fd);
link(pbuf, "/tmp/hello");
I would be surprised if that trick worked after an unlink("/tmp/foo") ... I did not try that.
A more portable (but less robust) way would be to generate a "unique temporary path" perhaps like
int p = (int) getpid();
int t = (int) time(0);
int r = (int) random();
sprintf(pbuf, sizeof(pbuf), "/tmp/out-p%d-r%d-t%d.tmp", p, r, t);
int fd = open (pbuf, O_CREAT|O_WRONLY);
Once the file has been written and closed, you rename(2) it to some more sensible path. You could use atexit in your program to do the renaming (or the removing).
And have some cron job to clean the [old] /tmp/out*.tmp every hour...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string