Can inode and crtime be used as a unique file identifier? - linux

I have a file indexing database on Linux. Currently I use file path as an identifier.
But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.
I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.
So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier.
I hope to use i_crtime on ext4 and creation_time on NTFS.
In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.
So, the question is whether there are cases when inode or crtime of a file may change.
For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?
Interesting that
http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:
"In the NTFS file system, a file keeps the same file ID until it is deleted."
but also:
"In some cases, the file ID for a file can change over time."
So, what are those cases they mentioned?
Note that I studied similar questions:
How to determine the uniqueness of a file in linux?
Executing 'mv A B': Will the 'inode' be changed?
Best approach to detecting a move or rename to a file in Linux?
but they do not answer my question.

{device_nr,inode_nr} are a unique identifier for an inode within a system
moving a file to a different directory does not change its inode_nr
the linux inotify interface enables you to monitor changes to inodes (either files or directories)
Extra notes:
moving files across filesystems is handled differently. (it is infact copy+delete)
networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)
Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it.
The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.

The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.
For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.
In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.
This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.
From the source code for ialloc.c, where i-nodes are allocated:
There are two policies for allocating an inode. If the new inode is a
directory, then a forward search is made for a block group with both
free space and a low directory-to-inode ratio; if that fails, then of
he groups with above-average free space, that group with the fewest
directories already is chosen. For other inodes, search forward from
the parent directory's block group to find a free inode.
The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c

I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.

Related

How are ext4 directory entries stored in the i-nodes?

I am doing some experimentation with the internals of the ext4 file system, when I stumbled upon this issue while trying to implement reading a file by path.
The root directory i-node, number 2 as per the Kernel documentation's special i-node table, is easily found in the i-node table per the pointers in the block group descriptors and superblock.
As far as I understand it, the process of looking up a file by path is
Find the root directory i-node
Traverse it's directory entries until we find the name of the sub-directory we're looking for
Take the i-node number the directory entry we have found points to
go to (2.), repeat until we have found the file.
Read the file by parsing the extent tree
Is this correct?
If so, how are the struct ext4_dir_entrys stored/referenced from the i-node? I assume i_node.i_block[] has something to do with that, but I am not entirely clear on how to read the directory entries from there. Are they stored in the i-node? Or does the array contain pointers?

How to prove that directory is a file in Linux

"Everything is a file in Linux". How can i prove that directories are represented as files in linux. Also the physical hardware devices everything creates and is represented as files in Linux. But how can i prove this concept with supporting examples to someone.
Viewing the Directory and other physical hardwares as files in Liniux.( POC)
The "Everything is a file in Linux" statement is a bit of an oversimplification. There are many things in Linux that appear as files, but don't quite 'act' as you think they would in a conventional sense.
Block files (e.g. /dev/loop0) are a great example of this as they are used as a way of communicating with device drivers.
That said, directories are their own 'special' kind of file that contain inode ids pointing to a file's inode. I suppose a simple 'proof' of sorts would be to ls -l any directory and you will notice that most (if not all) of them will have a listed file size of 4096 bytes rather than listing the collective size of its contents.
4096 bytes is the smallest blocksize for most filesystems and is usually more than enough to fit all the information (inode ids) of a directory. So rather than direct information/access to its files, a directory rather holds meta-data about them.
Alternatively, using stat on any directory will display it's own inode number (as well as the number of links it has).
EDIT: Directory files contain the inode id (a pointer to a file's inode) not the inode itself. I have edited the answer.

In a kernel module, how to know whether given inode belongs to a specific directory?

One possible way is that, compare given inode with list of inodes in that directory. The list of inodes could be predetermined or it can be calculated run time, both ways have their own problems:
Predetermined list: List can be changed during this operation, i.e. files could be added or removed from that directory.
Run time list: If that directory has too many files, it's too much overhead for each access of any file in the system.
Is there any efficient solution/way for this? I have tried by comparing file by it's path, which was really a bad idea.
Either if you do it in kernel mode or in user mode has no advantages. To see if an inode is indeed in some directory you have to read that directory as files are located in directories normally as a linear list. This can lead your process blocking for directory blocks to be present if not cached and, in that time, the directory contents can be modified. Only if you maintain the directory inode blocked while doing that operation can help, but this can add severe performance restrictions to your operating system. Another issue is that each filesystem is free to implement directory contents in it's own format. In userland you get an uniform directory format, but in kernel mode you have to deal with the different approaches for different filesystem types. Why do you need to know that? I can't imagine a scenario where this can be needed. Perhaps you can redesign your algorithm for the directory contents to be unnecessary.
By the way, dealing with complete paths or searching directories have obscure race conditions that can deal your system blocked someway. What can happen if, in the middle of your seach, somebody tries to unlink the inode you are searching for; or the directory contents must be modified; or some other process is using namei() to traverse through your directory upwards; or downwards. Have you think in all these possibilities?

Disadvantages to creating/removing many hard links?

I need to create hundreds to thousands of temporary hard or symbolic links that will be deleted shortly after creation. For my purposes both types of links will work (i.e. the target is not a directory and it always exists on the same file system)
As I understand it, symbolic links create a small file that contains the path to the original file. Whereas a hardlink creates a reference to the data in the same inode. So maybe if I am going to be creating/deleting thousands of these links is it better to be creating and deleting thousands of tiny files (symlinks) or thousands of these references (hardlinks)? It seems like one taxes the hard drive (maybe fragmentation) while the other might tax the file system itself? Where are inode references stored. Do I risk corrupting the file system by making so many hard links? What about speed?
Thanks for your expertise!
This a work around to be able to use ffmpeg to encode a movie out of an arbitrary subset of images from a directory. Since ffmpeg requires that the files be named properly (e.g. frame%04d.jpg) I realized I can just create hard/sym links to the subset of files and just name the links appropriately. This avoids renaming the original files and having to actually copy the data. It works great but it requires creating and deleting many thousands of links, repeatedly.
Sort of addresses this problem too I believe:
convert image sequence using ffmpeg
If this activity breaks your file system, then your file system is at fault, not you. File systems are generally pretty reliable, so don't worry about that.
Both options require adding an entry in the directory. The symbolic link requires creating a file as well. When you access the file the hard link jumps directly to the content, while accessing a symlink requires finding the symlink file, reading it, finding the directory with the content, finding where the content is, and then accessing that. Therefore symlinks are more work for the filesystem all around.
But the difference is minute when compared to the work of actually reading the data in the files. Therefore I would not worry about it, and just go with whichever one best gives you the semantics you want.
Since you are not trying to create hundreds of thousands to the same file, hard links are marginally better performing.
However, symbolic links in /tmp if /tmp is tmpfs is even better performing yet.
Oh, and symlinks are too small to cause fragmentation issues.
Both options require the addition of a file entry in the directory inode, the directory structure may grow by allocating new blocks.
But a symbolic link requires the allocation of an inode and the filesystem has a limit for inodes. Your hundreds of thousands symlinks may hit that limit and you may get the "Not enough space for file" error message even with gigabytes free.
By default, the file system creation tool choose the maximum number of inodes according to the physical partition size. For instance for Linux ext2/3/4, mkfs.ext3 uses a bytes-per-inode ratio you can find in your /etc/mke2fs.conf.
For an existing filesystem, here is a command to get information about inodes:
# dumpe2fs /dev/sda1 | grep -i inode | less
Inode count: 979200
Free inodes: 742304
Inodes per group: 16320
Inode blocks per group: 510
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 441066
Journal backup: inode blocks
As a conclusion, you should prefer hard links mainly for resource consumption on disk and in memory (VFS structures in caches).
Another advice: do not create too many files in the same directory, 2'000 files is a reasonable limit to avoid performance issues.

Shred: Doesn't work on Journaled FS?

Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).
All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.
The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).
The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.

Resources