I've been studying some Linux implementation and a question came up to my mind.
As far as I know, there is a bit which marks a file as a temporary file. When the process which generated that file dies, how does the kernel delete it? I've thinking that it might be related to the file descriptor table, but I'm not sure whatsoever.
If someone could give an explanation step-by-step, that would come in handy!
There's no bit that marks a file as a temporary file.
Every inode has a link count field, which is the number of directory entries that refer to the file. Every time you make a hard link to a file this count is increased, and when you remove a name it's decreased; when the count goes to zero, the file is deleted (the inode is marked as available, and all the data blocks are put on the free list).
When a file is opened in a process, a copy of the inode is kept in the kernel's file table, and the number of file handles that refer to it are added into the link count in this copy. When a process closes its file descriptor, the link count is decremented. The file isn't actually removed until this in-memory link count drops to zero. This is what keeps a file on disk while it's open, even if all the names are removed.
So when you create a temporary file, it performs the following steps:
Creates the file. The on-disk inode link count = 1.
Opens the file. The kernel inode link count = 2.
Removes the filename. The kernel inode link count = 1.
At this point, the process can keep using the temporary file, but it can't be opened by another process because it has no name.
When the process closes the file handle, the link count goes to 0, and the file is deleted.
Recent versions of Linux have an O_TMPFILE flag to open(2) that automates this. Instead of specifying a filename, you just specify the directory, which is just used to find a filesystem to hold the file data. When this is used, it effectively does all 3 steps above in one call, but it never actually creates the filename anywhere (so race conditions and name conflicts are avoided).
I've been doing some research on the topic and I found out some extra info to complement the answer that Barmar provided.
I read about tmpfile() system call. This system call creates a temporary file and returns a stream descriptor.
The thing is that tmpfile makes a call to unlink internally. Thus decrementing the link count. Although this link was the last one, if the file has been opened by any process it remains in existence until it is closed. I'm not 100% sure about how this procedure works internally, but I think it is due to the order in which iPut algorithm verifies both reference count and link count. I've seen some implementations of iPut and, first of all, it checks if the reference count equals zero, and if so, then it goes to the link count, deallocating all blocks asigned to the file if it equals zero.
So in this situation, we would have the reference count==1, because we'd still have a process having the file open, but the link count would be zero. So iput would not release the i-node until the process closes the file.
Related
I use tempfile.mkstemp when I need to create files in a directory which might stay, but I don't care about the filename. It should only be something that doesn't exist so far and have a prefix- and a suffix.
One part about the documentation that I ignored so far is
mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.
What is the OS-level handle and how should one use it?
Background
I always used it like this:
from tempfile import mstemp
_, path = mkstemp(prefix=prefix, suffix=suffix, dir=dir)
with open(path, "w") as f:
f.write(data)
# do something
os.remove(path)
It worked fine so far. However, today I wrote a small script which generates huge files and deletes them. The script aborted the execution with the message
OSError: [Errno 28] No space left on device
When I checked, there were 80 GB free.
My suspicion is that os.remove only "marked" the files for deletion, but the files were not properly removed. And the next suspicion was that I might need to close the OS-level handle before the OS can actually free that disk space.
Your suspicion is correct. The os.remove only removes the directory entry that contains the name of the file. However, the file data remains intact and continues to consume space on the disk until the last open descriptor on the file is closed. During that time normal operations on the file through existing descriptors continue to work, which means you could still use the _ descriptor to seek in, read from, or write to the file after os.remove has returned.
In fact it's common practice to immediately os.remove the file before moving on to using the descriptor to operate on the file contents. This prevents the file from being opened by any other process, and also means that the file won't be left hanging around if this program dies unexpectedly before reaching a later os.remove.
Of course that only works if you're willing and able to use the low-level descriptor for all of your operations on the file, or if you use the os.fdopen method to construct a file object on top of the descriptor and use that new object for all operations. Obviously you only want to do one of those things; mixing descriptor access and file-object access to the same underlying file can produce unexpected results.
os.fdopen(_) should execute faster than open(path) but it doesn't have the context manager integration that open has, so it's not directly usable in a with construct. I think you can use contextlib.closing to get around that.
Now I know how does file deletion work in Linux
In ext2 it mark "unused", and in ext3 not only mark "unused" but also change the size, block pointers to zero.
But I wonder when I create a hard link to a file, and then delete the original file will the inode be marked "unuse"?
Or it will happen until all hard link be deleted?
thanks.
i-nodes contain a link count (visible in ls -l output). Each hard link increments that count. Unlinking (removing a link, be it the original filename->inode link, or some hard link added later, which is the only thing users can request) decrements the count. The file won't be deleted until the count reaches 0 and there are no open file descriptors left pointing at that file (which is similarly tracked by an in-kernel reference count).
I came across this and this questions on deleting opened files in linux
However, I'm still confused what happened in the RAM when a process(call it A) deletes an opened file by another process B.
What baffles me is this(my analysis could be wrong, please correct me if so):
When a process opens a file, a new entry for that file in the UFDT is created.
When a process deletes a file, all the links to the file are gone
especially, we have no reference to its inode, thus, it gets removed from the GFDT
However, when modifying the file(say writing to it) it must be updated in the disk(since its pages gets modified/dirty), but it got no reference in the GFDT because of the earlier delete, so we don't know the inode to it.
The Question is why the "deleted" file still accessible by the process which opened it? And how is that been done by the operating system?
EDIT By UFDT i mean the file descriptor table of the process which holds the file descriptors of the files which opened by the process(each process has its own UFDT) and the GFDT is the global file descriptor table, there is only one GFDT in the system(RAM in our case).
I never really heard about those UFDT and GFDT acronyms, but your view of the system sounds mostly right. I think you lack some detail on your description of how open files are managed by the kernel, and perhaps this is where your confusion comes from. I'll try to give a more detailed description.
First, there are three data structures used to keep track of and manage open files:
Each process has a table of file descriptors. Each entry in this table stores a file descriptor, and the file descriptor status flags (as of now, the only flag is O_CLOEXEC). The file descriptor is just a pointer to an entry in the file table entry, which I cover next. The integer returned by open(2) and family is usually an index into this file descriptor table - each process has its table, that's why open(2) and family may return the same value for different processes opening different files.
There is one opened files table in the entire system. Each file descriptor table entry of each process references one of these entries in the opened files table. There is one entry in this table for each opened file: if two processes open the same file, two entries in this global table are created, even though it's the same file. Each entry in the files table stores the file status flags (opened for reading, writing, appending, etc), and the current file offset. This is why different processes can read from and write to different offsets in the same file concurrently as long as each of them opens the file.
Each entry in the file table entry also references an entry in the vnode table. The vnode table is a global table that has one entry for each unique file. If processes A, B, and C open file D, there will be only one vnode table entry, referenced by all 3 of the file table entries (in Linux, there is really no vnode, rather there is an inode, but let's keep this description generic and conceptual). The vnode entry contains pretty much the same information as the traditional inode (file size, other attributes, etc.), but it also contains other information useful for opened files, such as file locks that are active, who owns them, which portions of the file they lock, etc. This vnode entry also stores pointers to the file's data blocks on disk.
Deleting a file consists of calling unlink(2). This function unlinks a file from a directory. Each file inode in disk has a count of the number of links pointing to it; the file is only really removed if the link count reaches 0 and it is not opened (or 2 in the case of directories, since a directory references itself and is also referenced by its parent). In fact, the manpage for unlink(2) is very specific about this behavior:
unlink - delete a name and possibly the file it refers to
So, instead of looking at unlinking as deleting a file, look at it as deleting a file name, and maybe the file it refers to.
When unlink(2) detects that there is an active vnode table entry referring this file, it doesn't delete the file from the filesystem. Nothing happens. Yes, you can't find the file on your filesystem anymore. find(1) won't find it. You can't open it in new processes.
But the file is still there. It just doesn't appear in any directory entry.
For example, if it's a huge file, and if you run df or du, you will see that space usage is the same. The file is still there, on disk, you just can't reach it.
So, any reads or writes take place as usual - the file data blocks are accessible through the vnode table entry. You can still know the file size. And the owner. And the permissions. All of it. Everything's there.
When the process terminates or explicitly closes the file, the operating system checks the inode. If the number of links pointing to the inode is 0 and this was the last process that opened the file (which is also indicated by storing a link count in the vnode table entry), then the file is purged.
When a process opens a file, a new entry for that file in the UFDT is
created.
What is this weird acronym? I take it you mean the process in question has a file descriptor.
When a process deletes a file, all the links to the file are gone
especially, we have no reference to its inode, thus, it gets removed
from the GFDT
What on earth is GFDT?
However, when modifying the file(say writing to it) it must be updated
in the disk(since its pages gets modified/dirty), but it got no
reference in the GFDT because of the earlier delete, so we don't know
the inode to it.
I am guessing whatever this GFDT is has something to do with being "global" and "file descriptors".
So, all this shows serious misconceptions.
As was outlined by your own question, the file is a different thingy from the name. Next, when you open something from a filesystem it gets an in-memory representation of the inode and a struct file object is allocated, which later points to the in-memory inode. Finally, file descriptor table of relevant thread is updated to store the pointer to the struct file object at given offset. The offset is known as a file descriptor.
So there. Amount of names associated with an inode has zero relation to kernel's ability to issue reads/writes affecting the inode (or blocks the file it represents) as long as it had it opened before the last name got removed.
The may or may not be trashed when there are no names and the kernel does not use it anymore.
POSIX famously lets processes rename and unlink file entries with no regard as to the effects on others using them, whilst Windows by default raises an error if you even try to touch the timestamps of a directory which has a file handle open somewhere deep inside inside.
However Windows doesn't need to be so conservative. If you open all your file handles with FILE_FLAG_BACKUP_SEMANTICS and FILE_SHARE_DELETE and take care to rename files to random names just before flagging deletion, you get POSIX semantics including lack of restriction on manipulating file paths containing open file handles.
One very nifty thing Windows can do is to perform renames and deletes and hard links only using an open file descriptor, and therefore you can delete a file without having to worry about whether another process has renamed it or any of the directories in the path preceding the file's location. This facility lets you perform completely race free file deletions - once you have an open file handle to the right file, you can stop caring about what other processes are doing to the filing system, at least for deletion (which is the most important as it implicitly involves destroying data).
This raises the question of what about POSIX? On POSIX unlink() takes a path, and between retrieving the current path of a file descriptor using /proc/self/fd/x or F_GETPATH and calling unlink() someone may have changed that path, thus potentially leading to the wrong file being unlinked and data lost.
A considerably safer solution is this:
Get one of the current paths of the open file descriptor using /proc/self/fd/x or F_GETPATH etc.
Open its containing directory.
Do a statat() on the containing directory for the leafname of the open file descriptor, checking if the device ids and inodes match.
If they match, do an unlinkat() to remove the leafname.
This is race safe from the parent directory upwards, though the hard link you delete may not be the one expected. However, it is not race safe if within the containing directory a third party process were to rename your file to something else and rename another file to your leafname between you checking for inode equivalence and calling the unlinkat(). Here the wrong file could be deleted, and data lost.
I therefore ask the question: can POSIX, or any specific POSIX implementation such as Linux, allow programs to unlink file entries completely race free? One solution could be to unlink a file entry by open file descriptor, another could be to unlink a file entry by inode, however google has not turned up solutions for either of those. Interestingly, NTFS does let you delete by a choice of inode or GUID (yes NTFS does provide inodes, you can fetch them from the NT kernel) in addition to deletion via open file handle, but that isn't much help here.
In case this seems like too esoteric a question, this problem affects proposed Boost.AFIO where I need to determine what filing system races I can mitigate and what I cannot as part of its documented hard behaviour guarantees.
Edit: Clarified that there is no canonical current path of an open file descriptor, and that in this use case we don't care - we just want to unlink some one of the links for the file.
No replies to this question, and I have spent several days trawling through Linux source code. I believe the answer is "currently you can't unlink a file race free", so I have opened a feature request at https://bugzilla.kernel.org/show_bug.cgi?id=93441 to have Linux extend unlinkat() with the AT_EMPTY_PATH Linux extension flag. If they accept that idea, I'll mark this answer as the correct one.
When you copy files in linux(using contex menu copy command) does linux create hard links of files ?
Also, what happens if you delete original file, than hard link, that file still persist in memory, but it's pointer is removed ?
I have trouble understanding few things with a memory.
To free disk space, you need to delete both files, right ?
Does hard link points to memory location of a original file ? I used to see term inode, I'm now quiet sure what inode really is.
The inode is all the file data except the content.
A directory contains a set of names and numbers: "This directory contains file foo, which is file number 3 on this drive, bar, which is file number 4, quux, 17, viz, 123 and lastly ohmygod, 77321341". Inode number 3 contains "This file was created on Januar 1, 1970, last modified on January 1, 1990 and last read on January 2, 1990. It is 722 bytes large, and those bytes are in 4k block number 768123 on the drive" and a few more things.
The stat() system call shows how many blocks are needed, and almost everything else related to the inode.
Copying does not create hard links, that would be broken behavior. A hard link is just an additional first-class name to the same file; modify the file via one name (and not by saving under a temp name and then moving it, as some editors do), and you will see the change in the file when accessed under the other name, too. Not what I’d expect from a copy.
Note that there is nothing special about the first name a file had. All hard links are simply pointing at the same file.
Once the last directory entry pointing to a file is removed, there may still be file handles open pointing to it (from programs that opened the file). As long as one of those exists, the file is still there and can be used. It just cannot be opened by processes that haven’t done so before any longer, since it has no name any more.
When there is no more directory entry pointing to a file and no program has an open handle to the file any more, it can never be reached again. Therefore, the operating system frees the space on the disk.