The links to a file - linux

I was wondering why every file has 1 link to itself.
I'll try to be more clear.
By inserting from bash the command " ls -l " you'll end up with a list of files each preceded by different data divided in columns. The number of links to a file is in the third column. Can someone explain me why a file has that information setted to 1 instead of 0?
I get why the directories have two, if you explore one you'll find the " . " and the " . . " subdirectories, the former points at the directory itself, while the latter to the previous directory, but the file can't contain the " . " subdirectory since it's a file, so shouldn't it be 0?

Because there is nothing special about the first hard link (soft links are different, they're really just special files containing the textual target) to a file, it's just a directory entry that refers to the inode behind it.
This is a consequence of the decision to separate data (inode) from references to the data (directory entry).
So don't think of the file being one directory entry, with other files linked to it. Think instead of the data in the file being the thing, with as many directory entries as you need referring to it. When you hard-link directory entry a to an existing b, you're really linking it to the inode behind b, and they are equals in every sense of the word. The data is only ever destroyed when the final hard link to it is disconnected.

The filename you see is a hard link to the underlying object that represents the file, and ones with 0 links get automatically removed by the filesystem.

You have misinterpreted the number of links pointing to the file's inode, with the fact of having a link pointint to itself (see below).
Just consider the possibility of having a link... a link is an entry in a directory, that associates a name with an inode. The inode represents all the administrative data that is stored in a file (or a directory) and stores things like the permission bits, the dates of last read, last write or last inode change, pointers to the file's data blocks, etc.
There's one integer field in each inode that reflects the number of links that point to it (and this is what ls shows). This is needed for efficiency reasons, as it is redundant information. Just navigating the whole filesystem we can determine the number of directory entries that point to the same inode... but that is impractical to do, so the link count is maintained each time a node is linked or unlinked. So the next question is: Why is the number of directory entries pointing to the file's inode necessary? There's a bookkeeping reason. In order to detect when it reaches zero, as because of that, the kernel keeps the number of links pointing to an inode so it can free all the blocks belonging to the inode and the inode itself in order to recover the data after a file is removed (unlinked) the last time (when the counter gets down to zero)
Normally, a file has at least one such link, but it can have more. If you execute a command like:
ln foo bar
you are creating an alias for file foo, that now is called also bar. If you make your directory listing now, you'll see both file entries having 2 in the number of links field. More on, if you execute
ls -li foo bar
you'll see a first column, showing the inode number (this is a unique id for a file in a filesystem) of both files as the same inode... If you change the permissions to one of the links, you'll see the other file name his permissions changed also... this is because both filenames are aliases for the same file.
With respect with my first paragraph, a number of links is not the same thing as pointing to itself. Every directory has an entry that points to the directory's inode of the directory itself. This is the entry . that every directory has (this entry allows you to execute the ls command without parameters, for example, as when you don't specify parameters, the ls command uses . as the directory to list) and that makes that a directory has always 2 as a minimum (the 1 you observed for a file, plus the link of the . directory every directory has, for that number (check this and you'll see that this is an invariant of unix systems) Directories have 2 + number_of_subdirectories in its inode (one for the entry in the parent directory pointing to the directory itself, one for the directory entry . in the directory itself, and one for each subdirectories' entry .. parent directory, in the subdirectories of this directory) Check the numbers, it doesn't fail. And for files is the number of links (or aliases) a file can have. You cannot see files with 0 links as you are listing a directory, those files, case of existing, are being handled by the kernel, in the task of erasing them (freeing the blocks they used) Directories cannot have different aliases or names as files in order to conserve the hierarchical directory structure, only the two exceptions for . and .. are permitted and these are maintained (and enforced) by the kernel.

Related

Is it possible to read a short-lived file reliably in /tmp due to periodic cleanup?

I'm considering making my application create a file in /tmp. All I'm doing is making a temporary file that I'll copy to a new location. On a good day this looks like:
Write the temp file
Move the temp file to a new location
However on my system (RHEL 7) there is a file at /usr/lib/tmpfiles.d/tmp.conf which indicates that /tmp gets cleaned up every 10 days. From what I can tell this is the default installation. So, what I'm concerned about is that I have the following situation:
Write the temp file
/tmp gets cleaned up
Move the temp file to a new location (explodes)
Is my concern founded? If so, how is this problem solved in sophisticated programs? If there are no sophisticated tricks, then it's a bit puzzling to me as I don't have a concrete picture of what the utility of /tmp is if it can be blown away completely at any moment.
this should not be a problem if you keep a file descriptor open during your operation. As long as a file descriptor is open, the FS keeps the file on disk but it just don't appear when using ls. So If you create another name for this file, it will "resurect" in some way. Keeping an open fd on a file that is deleted is a common way to create temporary files on linux
see the man 3 unlink:
The unlink() function shall remove a link to a file. [..] unlink() shall remove the link named by the pathname pointed to by
path and shall decrement the link count of the file
referenced by the link.
When the file's link count becomes 0 and no process has the file open, the space occupied by the file shall be freed and the file
shall no longer be accessible. If one or more
processes have the file open when the last link is removed, the link shall be removed before unlink() returns, but the removal of
the file contents shall be postponed until all
references to the file are closed.

Copying a file, but appending index if file exists

I have several directories with filenames being the same, but their data inside is different.
My program identifies these files (among many others) and I would like to copy all the matches to the same directory.
I am using shutil.copy(src,dst) but I don't want to overwrite files that already exist in that directory (previous matches) if they have the same name. I'd like to be able to append an integer if it already exists. Similar to the behavior in Windows10 when you copy where you can "keep both versions".
So for example, if I have file.txt in several places, the first time it would copy into dst directory it would be file.txt, the next time it would be file-1.txt (or something similar), and the next time it would be file-2.txt.
Are there any flags for shutil.copy or some other copy mechanism in Python that I could use to accomplish this?

ability to delete a file from inspecting the ls-l output

Down below there are four outputs of the ls -l command for the file and it's parent directory.
In which of the four scenarios student1 can delete file1? (the answer is in red)
I don't understand why the answer is the red one, moreover what does it mean in the ls -l output that there are only - and not anything else in the permissions part? Is it just saying that no one has permission for this file, and if so why is it still the answer?
Deleting a file is not an operation on the file, but on the directory. This is because the "file" entry in the directory is not a file; it is just a reference to the file (semantics are odd because of the overloaded meaning of the word "file" and the imprecision in common usage.) In order to delete a file (eg, remove a reference to it), you just need execute and read permission on the directory the file is in. Hence scenario 1 in your case.
Note that removing a reference (a "link") to a file in one directory only results in the deletion of that file if that is the last reference in the file system. That reference count is given in column 2 of the output of ls -l, so in your case the file linked to by the name "file1" in the directory "directory1" will get garbage collected by the filesystem. (eg, the data will be deleted.)
Of course, the data can also be deleted if the file is overwritten or truncated, so my entire answer is based on the assumption that you use "deleted" to mean "unlinked" or "removed". Imprecise language is rampant!

What is double dot(..) and single dot(.) in Linux?

The ls -ai command shows that . and .. have their inodes the same as the current directory and parent directory, respectively.
What exactly are . and ..?
Are they real files or even hard links? But as I have known, it's not allowed to create a hard link to a directory.
. represents the directory you are in and .. represents the parent directory.
From the dot definition:
This is a short string (i.e., sequence of characters) that is added to
the end of the base name (i.e., the main part of the name) of a file
or directory in order to indicate the type of file or directory.
On Unix-like operating systems every directory contains, as a minimum,
an object represented by a single dot and another represented by two
successive dots. The former refers to the directory itself and the
latter refers to its parent directory (i.e., the directory that
contains it). These items are automatically created in every
directory, as can be seen by using the ls command with its -a option
(which instructs it to show all of its contents, including hidden
items).
They are special name-inode maps which do count as hard-links (they do increase the link-count) though they aren't really hard-links, since, as you said, directories can't have hard-links. Read more here: Hard links and Unix file system nodes (inodes)
. represents the current directory that you are using and
.. represents the parent directory.
Example:
Suppose you are in the directory /etc/mysql and you wanted to move to the parent directory, i.e. /etc/. Then use cd..:
/etc/mysql> cd ..
And if you wanted to set the path of one file in the current directory bash file, use . with file name like this: ./filename
They are not hard links. You can more think of it like a shorthand for this directory (.) and parent of this directory (..).
Try to remove or rename . or ... Then you understand why it is not a hard link.

How to realize the "cp -u" function in a clearcase vob?

We have two vobs which are "voba" and "vobb". And there is a directory "abc" in both vob and contains the same .h / .cpp files.
Usually, the files in "abc" dir in "voba" are updated quite frequently. And from time to time, I would like to update all files in "abc of vobb" from "abc of voba", which means:
Checkout the updated files in vobb.abc, overwrite them and then check in.
Copy the newly created files to vobb.abc, create element.
Delete the deleted files in vobb.abc by corrspoding to voba.abc.
If it is a common linux directory, I think cp -u and achieve that. But when it comes to the clearcase, I can only do the above 1-3 by hand.
Is there any easy way to finish that update automatically?
This is called in ClearCase a clearfsimport (potentially used with the -mirror option)
Since the elements in the directories abc of the two vobs are completely different (different oid, with different history), what you can do is import the content of abc from one vob into another: clearfsimport will automatically checkout, update and checkin only the files that have evolved in the source, and need to be updated in the destination.
Note, this recent thread (March 2013) also points out to the perl script ClearCase::SyncTree
It is superior to clearfsimport in many respects, especially in its evil twin avoidance (it will try with the proper options to link suitable entries from non visible versions).
Description:
This module provides an infrastructure for programs which want to synchronize a set of files, typically a subtree, with a similar destination subtree in VOB space. The enclosed synctree script is an example of such a program.
The source area may be in a VOB or may be a regular filesystem; the destination area must be in a VOB.
Methods are supplied for adding, subtracting, and modifying destination files so as to make that area look identical to the source.
Symbolic links are supported, even on Windows (of course in this case the source filesystem must support them, which is only likely in the event of an MVFS->MVFS transfer). Note that the text of the link is transported verbatim from source area to dest area; thus relative symlinks may no longer resolve in the destination.

Resources