Is the file mutex in Linux? How to implement it? - linux

In windows, if I open a file with MS Word, then try to delete it.
The system will stop me. It prevents the file being deleted.
There is a similar mechanism in Linux?
How can I implement it when writing my own program?

There is not a similar mechanism in Linux. I, in fact, find that feature of windows to be an incredible misfeature and a big problem.
It is not typical for a program to hold a file open that it is working on anyway unless the program is a database and updating the file as it works. Programs usually just open the file, write contents and close it when you save your document.
vim's .swp file is updated as vim works, and vim holds it open the whole time, so even if you delete it, the file doesn't really go away. vim will just lose its recovery ability if you delete the .swp file while it's running.
In Linux, if you delete a file while a process has it open, the system keeps it in existence until all references to it are gone. The name in the filesystem that refers to the file will be gone. But the file itself is still there on disk.
If the system crashes while the file is still open it will be cleaned up and removed from the disk when the system comes back up.
The reason this is such a problem in Windows is that mandatory locking frequently prevents operations that should succeed from succeeding. For example, a backup process should be able to read a file that is being written to. It shouldn't have to stop the process that is doing the writing before the backup proceeds. In many other cases, operations that should be able to move forward are blocked for silly reasons.

The semantics of most Unix filesystems (such as Linux's ext2 fs family) is that a file can be unlink(2)'d at any time, even if it is open. However, after such a call, if the file has been opened by some other process, they can continue to read and write to the file through the open file descriptor. The filesystem does not actually free the storage until all open file descriptors have been closed. These are very long-standing semantics.
You may wish to read more about file locking in Unix and Linux (e.g., the Wikipedia article on File Locking.) Basically, mandatory and advisory locks on Linux exist but they're not guaranteed to prevent what you want to prevent.

Related

When writing to a newly created file, can I create the directory entry only after writing is completed?

I'm writing a file that takes minutes to write. External software monitors for this file to appear, but unfortunately doesn't monitor for inotify IN_CLOSE_WRITE events, but rather checks periodically "the file is there" and then starts to process it, which will fail if the file is incomplete. I cannot fix the external software. A workaround I've been using so far is to write a temporary file and then rename it when it's finished, but this workaround complicates my workflow for reasons beyond the scope of this question¹.
Files are not directory entries. Using hardlinks, there can be multiple pointers to the same file. When I open a file for writing, both the inode and the directory entry are created immediately. Can I prevent this? Can I postpone the creation of the directory entry until the file is closed, rather than when the file is opened for writing?
Example Python-code, but the question is not specific to Python:
fp = open(dest, 'w') # currently both inode and directory entry are created here
fp.write(...)
fp.write(...)
fp.write(...)
fp.close() # I would like to create the directory entry only here
Reading everything into memory and then writing it all in one go is not a good solution, because writing will still take time and the file might not fit into memory.
I found the related question Is it possible to create an unlinked file on a selected filesystem?, but I would want to first create an anonymous/unnamed file, then naming it when I'm done writing (I agree with the answer there that creating an inode is unavoidable, but that's fine; I just want to postpone naming it).
Tagging this as linux, because I suspect the answer might be different between Linux and Windows and I only need a solution on Linux.
¹Many files are produced in parallel within dask graphs, and injecting a "move as soon as finished" task in our system would be complicated, so we're really renaming 50 files when 50 files have been written, which causes delays.

will io direction operation lock the file?

i have a growing nginx log file about 20G already, and i wish to rotate it.
1, i mv the old log file to a new log file
2, i do > old_log_file.log to truncate the old log file in about 2~3 seconds
if there's a lock(write lock?) on the old log file when i doing the truncating(about 2~3 seconds)?
at that 2~3s period, nginx returns 502 for waiting to append logs to old log file until lock released?
thank you for explaining.
On Linux, there is (almost) no mandatory file locks (more precisely, there used to be some mandatory locking feature in the kernel, but it is deprecated and you really should avoid using it). File locking happens with flock(2) or lockf(3) and is advisory and should be explicit (e.g. with flock(1) command, or some program calling flock or lockf).
So every locking related to files is practically a convention between all the software using that file (and mv(1) or the redirection by your shell don't use file locking).
Remember that a file on Linux is mostly an i-node (see inode(7)) which could have zero, one or several file paths (see path_resolution(7) and be aware of link(2), rename(2), unlink(2)) and used thru some file descriptor. Read ALP (and perhaps Operating Systems: Three Easy Pieces) for more.
No file locking happens in the scenario of your question (and the i-nodes and file descriptors involved are independent).
Consider using logrotate(8).
Some software provide a way to reload their configuration and re-open log files. You should read the documentation of your nginx.
It depends on application if it locks the file. Application that generates this log file must have option to clear log file. One example is in editor like vim file can be externally modified while it is still open in editor.

Syncing a file system that has no file on it

Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.

I/O Performance in Linux

File A in a directory which have 10000 files, and file B in a directory which have 10 files, Would read/write file A slower than file B?
Would it be affected by different journaling file system?
No.
Browsing the directory and opening a file will be slower (whether or not that's noticeable in practice depends on the filesystem). Input/output on the file is exactly the same.
EDIT:
To clarify, the "file" in the directory is not really the file, but a link ("hard link", as opposed to symbolic link), which is merely a kind of name with some metadata, but otherwise unrelated to what you'd consider "the file". That's also the historical reason why deleting a file is done via the unlink syscall, not via a hypothetical deletefile call. unlink removes the link, and if that was the last link (but only then!), the file.
It is perfectly legal for one file to have a hundred links in different directories, and it is perfectly legal to open a file and then move it to a different place or even unlink it (while it remains open!). It does not affect your ability to read/write on the file descriptor in any way, even when a file (to your knowledge) does not even exist any more.
In general, once a file has been opened and you have a handle to it, the performance of accessing that file will be the same no matter how many other files are in the same directory. You may be able to detect a small difference in the time it takes to open the file, as the OS will have to search for the file name in the directory.
Journaling aims to reduce the recover time from file system crashes, IMHO, it will not affect the read/write speed of files. Journaling ext2

Mandatory file lock on linux

On Linux I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on.
Can Linux enforce a mandatory file lock to protect R/W?
[EDIT] The original question wasn't about linux file locking capabilities but about a supposed bug in linux, reproducing it here as it is responded below and others may have the same question.
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??
To do mandatory locking on Linux, the filesystem must be mounted with the -o mand option, and you must set g-x,g+s permissions on the file. That is, you must disable group execute, and enable setgid. Once this is performed, all access will either block or error with EAGAIN based on the value of O_NONBLOCK on the file descriptor. But beware: "The implementation of mandatory locking in all known versions of Linux is subject to race conditions which render it unreliable... It is therefore inadvisable to rely on mandatory locking." See fcntl(2).
You don't need locking. This is not a bug but a choice, your assumptions are wrong.
The file system uses reference counting and it will mark a file as free only when all hard links to the file are removed and all file descriptors are closed.
This approach allows safe file system operations that Windows, for example, doesn't. Operations like delete, move and rename over files in use without needing locking or breaking anything.
Your dd operation is going to succeed despite the file removal, which will actually be deferred till the dd finishes.
http://en.wikipedia.org/wiki/Reference_counting#Disk_operating_systems
[EDIT] My response doesn't make much sense now as the question was edited by someone else. The original question was about a supposed bug in linux and not about if linux can lock a file:
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??
Linux and Unix OS's can enforce file locks, but it does not do so by default becuase of its multiuser design. Try reading the manual pages for flock and fcntl. That might get you started.

Resources