How to Safely and Transactionally Replace a File on Linux? - linux

The most naive, worst way I can think of to replace the contents of a file is:
f = open('file.txt', 'w')
f.write('stuff')
f.close()
Obviously, if that operation fails at some point before closing, you lose the contents of the original file while not necessarily finishing the new content.
So, what is the completely proper way to do this (if there is one). I imagine it's something like:
f = open('file.txt.tmp', 'w')
f.write('stuff')
f.close()
move('file.txt.tmp', 'file.txt') # dangerous line?
But is that completely atomic and safe? What are the proper commands to actually perform the move. If I have another process with an open connection to file.txt I assume that it will hold on to its pointer to the original file until is closes. What if another process tries to open up file.txt in the middle of the move?
I don't really care what version of the file my processes get as long as they get a complete, uncorrupted version.

Your implementation of move should use the rename function, which is atomic. Processes opening the file will see either the old or the new contents, there is no middle state. If a process already has opened the file it will keep having access to the old version after move.

Related

Is it possible to read a short-lived file reliably in /tmp due to periodic cleanup?

I'm considering making my application create a file in /tmp. All I'm doing is making a temporary file that I'll copy to a new location. On a good day this looks like:
Write the temp file
Move the temp file to a new location
However on my system (RHEL 7) there is a file at /usr/lib/tmpfiles.d/tmp.conf which indicates that /tmp gets cleaned up every 10 days. From what I can tell this is the default installation. So, what I'm concerned about is that I have the following situation:
Write the temp file
/tmp gets cleaned up
Move the temp file to a new location (explodes)
Is my concern founded? If so, how is this problem solved in sophisticated programs? If there are no sophisticated tricks, then it's a bit puzzling to me as I don't have a concrete picture of what the utility of /tmp is if it can be blown away completely at any moment.
this should not be a problem if you keep a file descriptor open during your operation. As long as a file descriptor is open, the FS keeps the file on disk but it just don't appear when using ls. So If you create another name for this file, it will "resurect" in some way. Keeping an open fd on a file that is deleted is a common way to create temporary files on linux
see the man 3 unlink:
The unlink() function shall remove a link to a file. [..] unlink() shall remove the link named by the pathname pointed to by
path and shall decrement the link count of the file
referenced by the link.
When the file's link count becomes 0 and no process has the file open, the space occupied by the file shall be freed and the file
shall no longer be accessible. If one or more
processes have the file open when the last link is removed, the link shall be removed before unlink() returns, but the removal of
the file contents shall be postponed until all
references to the file are closed.

File updates between open() and first read() are not read

External Updates that happened to that file content during the time between open() and first read() are not returned in the read() content.
How can I get the latest file content from the read()?
I've tried flush() and seek(0) but didn't help.
https://repl.it/repls/RealGreedyTransfer#main.py
import time
def myfoo(handle):
print("myfoo started", flush=True)
time.sleep(50)
# External updates that happen during that time don't show up in read()
# foo.flush()
# foo.seek(0)
# can't close and re-open file handle
print(handle.read()) # <-- Not reading updates done after file open
# Upstream code base passing a file handle under an exclusive fcntl.lockf() lock
handle = open('temp.txt', 'r+')
myfoo(handle)
The issue is with the way files are written. Many text editors don’t just write to the file, they use a different method: they write to a temporary file, and then rename it to the original filename. Since renames are atomic in POSIX, in the event of a system crash during saving, the old version of the file will be available, and the new version might or might not be available in the temporary file.
For most purposes, this works as desired. The only exception is in this case, where you’re holding onto a file handle. Renames/moves/deletions do not affect the file handles, they are still open with the file they were opened with, even if that file is no longer accessible from the filesystem. You can experiment with this by opening a file, then removing it with rm, and then reading from the file — it will still show you the file contents from before you deleted it. You can also access the file in Linux inside /proc/XX/fd.
Your file handle won’t see changes, unless they are actually written (and flushed) to the same file (without the rename dance). If you’re working with something that writes by renaming, you would need to reopen the file to see the new contents.

Why do I get no error when running the same Python script on multiple terminals at the same time?

I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error. (Maybe because of temporary files?)
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error.
That's not actually true. Vim actually will let you open the same file in multiple terminals at the same time; it's just that it gives you a warning first to let you know that this is happening, so you can abort before you make changes. (It's not safe to modify the file concurrently in two different instances of Vim, because the two instances won't coordinate at all.)
Furthermore, Vim will only give you this warning if you try to open the same file for editing in multiple terminals at the same time. It won't complain if you're just opening the file for reading (using the -R flag).
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
That's not exactly true, either. If you make multiple separate calls to open, you'll have multiple separate file objects, and each separately maintains its position in the file. So something like
with open('filename.txt', 'r') as first:
with open('filename.txt', 'r') as second:
print(first.read())
print(second.read())
will print the complete contents of filename.txt twice.
The only reason you'd need to reset the position when you're done reading a file is if you want to use the same file object to read the file again, or if you've opened the file in read/write mode (r+ rather than r) and you now want to switch from reading to writing.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
As I think should now be clear — there's no problem here. There's no reason that two instances of Python can't both read the same script file at the same time. Linux allows that. (And in fact, if you delete the file, Linux will keep the file on disk until all programs that had it open have either closed it or exited.)
In fact, there's also no reason that two processes can't write to the same file at the same time, though here you have to be very careful to avoid the processes causing problems for each other or corrupting the file.
terminal is just running the command you said it to execute, there is no pointer or anything
you jus

Why file is accessible after deleting in unix?

I thought about a concurrency issue (in Solaris), what happen if while reading someone tries to delete the same file. I have a query regarding file existence in the Solaris/Linux. suppose I have a file test.txt, I have open it in vi editor, and then I have open a duplicate session and remove that file, but even after deleting that file I am able to read that file. so here are my questions:
Do I need to thinks about any locking mechanism while reading, so no one able to delete same file while reading.
What is the reason of showing different behavior from windows(like in windows if file is open in in some editor than we can not delete that file)
After removing that file, how I am still able to read that file, if I haven't closed file from vi editor.
I am asking files in general,but yes platform specific i.e. unix. what will happen if I am using a java program (buffer reader) for read file and file is deleted while reading, does buffer reader still able to read the file for next chunk or not?
You have basically 2 or 3 unrelated questions there. Text editors like to read the whole file into memory at the start of the editing session. Imagine every character you type being saved to disk immediately, with all characters after it in the file being rewritten one place further along to make room. That would be awful. Much better that the thing you're actually editing is a memory representation of the file (array of pointers to lines, probably with some metadata attached) which only gets converted back into a linear stream when you explicitly save.
Any relatively recent version of vim will notify you if the file you are editing is deleted from its original location with the message
E211: File "filename" no longer available
This warning is not just for unix. gvim on Windows will give it to you if you delete the file being edited. It serves as a reminder that you need to save the version you're working on before you exit, if you don't want the file to be gone.
(Note: the warning doesn't appear instantly - vim only checks for the original file's existence when you bring it back into the foreground after having switched away from it.)
So that's question 1, the behavior of text editors - there's no reason for them to keep the file open for the whole session because they aren't actually using it except at startup and during a save operation.
Question 2, why do some Windows editors keep the file open and locked - I don't know, Windows people are nuts.
Question 3, the one that's actually about unix, why do open files stay accessible after they're deleted - this is the most interesting one. The answer, guaranteed to shock you when presented directly:
There is no command, function, syscall, or any other method which actually requests deletion of a file.
Underlying rm and any other command that may appear to delete a file there is the system call unlink. And it's called unlink, not remove or deletefile or anything similar, because it doesn't remove a file. It removes a link (a.k.a. directory entry) which is an association between a file and a name in a directory. (Note: ANSI C added remove as a more generic function to appease non-unix people who had no intention of implementing unix filesystem semantics, but on unix, remove is just a rmdir if the target is a directory, and unlink for everything else.)
A file can have multiple links (see the ln command for how they are created), which means that the same file is known by multiple names. If you rm one of them, the others stick around and the file is not deleted. What happens when you remove the last link? Well, now you have a file with no name. But names are only one kind of reference to a file. There are at least 2 others: file descriptors and mmap regions. When the last reference to a file goes away, that's when the file is deleted.
Since references come in several forms, there are many kinds of events that can cause a file to be deleted. Here are some examples:
unlink (rm, etc.)
close file descriptor
dup2 (can implicitly closes a file descriptor before replacing it with a copy of a different file descriptor)
exec (can cause file descriptors to be closed via close-on-exec flag)
munmap (unmap memory region)
mmap (if you create a new memory map at an address that's already mapped, the old mapping is unmapped)
process death (which closes all file descriptors and unmaps all memory mappings of the process)
normal exit
fatal signal generated by the kernel (^C, segfault)
fatal signal sent from another process (kill)
I won't call that a complete list. And I don't encourage anyone to try to build a complete list. Just know that rm is "remove name", not "remove file", and files go away as soon as they're not in use.
If you want to destroy the contents of a file immediately, truncate it. All processes already using it will find that its size has suddenly become 0. (This is destruction as far as the normal file access methods are concerned. To destroy it more thoroughly so that even someone with raw disk access can't read what used to be there, you need to overwrite it. There's a tool called shred for that.)
I think your question has nothing to do with the difference between Windows/Linux. It's about how VI works.
when using VI to edit a file. VI will create a .swp file. And the .swp file is what you are actually editing. At the same time, if other users delete the original file will not effect your editing.
And when you type :w in VI, VI will use .swp file to overwrite the original file.

Deleting an opened file in Infinite looped process

I have doubt in the following scenario
Scenario:
A process or program starts with opening a file in a write mode and entering a infinite loop say example: while(1) whose body has logic to write to the opened file.
Problem: What if i delete the opened or created file soon after the process enters the infinite loop
In Unix, users really cannot delete files, they only can drop references to files. The kernel deletes the file when there are no references (hard links and open file descriptors) left.
From what you're saying, it sounds like in reality you don't want an infinite loop, but rather a while loop with some flag, something to the effect of
while (file exists)
perform operation
Add a line that checks to see if the file exists during the while loop. If it doesn't exist, kill the loop.
It appears that what happens is that your file disappears (basically).
Try this, create a file test.py and put the following in it:
import os
f = open('out.txt', 'w') # Open file for writing
f.write("Hi Mom!") # Write something
os.remove('out.txt') # Delete the file
try:
while True: # Do forever
f.write("Silly English Kanighit!")
except:
f.close()
then $ python test.py and hit enter. Ctrl-C should stop the execution. This will open, then delete the file, then continue writing to the file that no longer exists, for the reasons that have previously been mentioned.
However, if you really have a different question such as "How can I prevent my file from being accidentally deleted while I'm writing to it?" or something else, it's probably better to ask that question.

Resources