Is there a way to create a file in Linux that link to a specific iNode?
Take this scenario: There is a file that is in course of writing (a log maybe) and the specific file is deleted but a link in the dir /proc is still pointing at it. In this case we need not a bare copy of it but an hard link to it so we can have the future modifications and the most last modification before the process close and the system delete it.
If we have the iNode number is there a way to achieve this goal?
Since there is no Syscall that involves iNode, because is a concept of extX fs and is not a good practice make a stove pipe but it is to make a chain of responsability (as M.E.L. suggests), there is only a NO answer for this question because at VFS level we handle files path and names and not other internal representations.
BUT to achieve the goal to track the most last modification we can use a continous monitoring and duplication with tail:
tail -c+1 -f --pid=PID /proc/PID/fd/FD > /path/to/the/copy
where PID is the pid of the process that have the deleted file still opened and FD is its file descriptor number. With -f tail open and hold the file to display further modification, with -c+1 start to "tail" from the first byte and with --pid=PID tail is informed to exit when the pid exit.
You can use lsof to recover deleted files (sometimes)...
> lsof | grep testing.txt
less 4607 juliet 4r REG 254,4 21
8880214 /home/juliet/testing.txt (deleted)
Be sure to read the original article for full details before attempting this, unless you're a Maveric like me.
> ls -l /proc/4607/fd/4
lr-x------ 1 juliet juliet 64 Apr 7 03:19
/proc/4607/fd/4 -> /home/juliet/testing.txt (deleted)
> cp /proc/4607/fd/4 testing.txt.bk
http://www.linuxplanet.com/linuxplanet/tips/6767/1
Enjoy
It's always difficult to answer a question like "can I do" confidently in the negative. But as far as I see, neither /sys/ nor /proc provide a mapping of open files descriptors that are not symlinks. I assume by "BUT a link in the dir /proc is still pointing at it" you mean that the /proc//fd/ entries look like symlinks? I'm almost sure you cannot recover the original file.
I take that back: As user user2676075 pointed out, copying does work. Just hardlinking doesn't ...
UPDATE: If you think about it, it's quite logical.
/proc and /sys are file systems different from your hard disk. So they can't provide file like directory entries which one could hardlink to a destination on the hard disk.
The /proc/*/fd/ entries pretend to be symlinks, but actually they are different, else the copying would not work. I think they pretend to be symlinks to provide meaningful information with 'ln -l'.
Regarding the (missing) capability to hardlink to some inode (let's say with some system call): This cannot be part of the kernel or the VFS-Interface, for the following reasons:
It would violate the integrity of the file system. The filesystem is not supposed to keep the disk blocks of files that are completely deleted around in the same manner as files that persist.
The inodes might be a completely virtual concept to identify a "slot where a datastream is stored'. I assume there can be implementations that would have a problem converting a slot that has no reference back to a slot which is refered to by a name in the file system.
I admit the case against the possibility of such a system call is not water tight. But given the current state of the VFS interface (which AFAIR doesn't provide for such a call), it would be a heavy burden for any file system implementation (including e.g. distributed file systems) to provide a call to link a file into a directory by inode.
ATM I wonder if calling fstat before and after deleting the last reference is actually requires to return the same inode information ...
t
Related
Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.
I have a file indexing database on Linux. Currently I use file path as an identifier.
But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.
I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.
So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier.
I hope to use i_crtime on ext4 and creation_time on NTFS.
In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.
So, the question is whether there are cases when inode or crtime of a file may change.
For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?
Interesting that
http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:
"In the NTFS file system, a file keeps the same file ID until it is deleted."
but also:
"In some cases, the file ID for a file can change over time."
So, what are those cases they mentioned?
Note that I studied similar questions:
How to determine the uniqueness of a file in linux?
Executing 'mv A B': Will the 'inode' be changed?
Best approach to detecting a move or rename to a file in Linux?
but they do not answer my question.
{device_nr,inode_nr} are a unique identifier for an inode within a system
moving a file to a different directory does not change its inode_nr
the linux inotify interface enables you to monitor changes to inodes (either files or directories)
Extra notes:
moving files across filesystems is handled differently. (it is infact copy+delete)
networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)
Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it.
The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.
The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.
For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.
In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.
This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.
From the source code for ialloc.c, where i-nodes are allocated:
There are two policies for allocating an inode. If the new inode is a
directory, then a forward search is made for a block group with both
free space and a low directory-to-inode ratio; if that fails, then of
he groups with above-average free space, that group with the fewest
directories already is chosen. For other inodes, search forward from
the parent directory's block group to find a free inode.
The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c
I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.
If I have a POSIX system like Linux or Mac OS X, what's the best and most portable way to determine if a path is on a read-only filesystem? I can think of 4 ways off the top of my head:
open(2) a file with O_WRONLY - You would need to come up with a unique filename and also pass in O_CREAT and O_EXCL. If it fails and you have an errno of EROFS then you know it's a read-only filesystem. This would have the annoying side effect of actually creating a file you didn't care about, but you could unlink(2) it immediately after creating it.
statvfs(3) - One of the fields of the returned struct statvfs is f_flag, and one of the flags is ST_RDONLY for a read-only filesystem. However, the spec for statvfs(3) makes it clear that applications cannot depend on any of the fields containing valid information. It would seem there's a decent possibility ST_RDONLY might not be set for a read-only filesystem.
access(2) - If you know the mount point, you can use access(2) with the W_OK flag as long as you are running as a user who would have write access to the mountpoint. Ie, either you are root or it was mounted with your UID as a mount parameter. You would get a return value of -1 and an errno of EROFS.
Parsing /etc/mtab or /proc/mounts - Doesn't seem portable. Mac OS X seems to have neither of these, for example. Even if the system did have /etc/mtab I'm not sure the fields are consistent between OSes or if the mount options for read-only (ro on Linux) are portable.
Are there other ways I'm missing? If you needed to know if a filesystem was mounted read-only, how would you do it?
You could also popen the command mount and examine the output looking for your file system and seeing if it held the text " (ro,".
But again, that's not necessarily portable.
My option would be to not worry about whether the file system was mounted read only at all. Just try and create your file and, if it fails, tell the user what the error was. And, of course, give them the option of saving it somewhere else.
You really have to do that sort of thing anyway since, in any scenario where there's even a small gap between testing and doing, you may find the situation changes (probably not to the extent of making an entire file system read only but, who knows, maybe there is (or will be in the future) a file system that allows this).
utime(path, NULL);
If you have write perms, then that will give you ROFS or -- if permitted -- simply update the mtime on the directory, which is basically harmless.
I have fileA, which has a size of 200 MB. Now I want to create a hardlink to fileA, named fileB, but I only want this file point to the first 100 mb of fileA. So basically I need fileB to point to the same data blocks, but with a different length. It doesn't necessarily have to be a real hardlink it could be a virtual file proxying contents.
I was thinking about duplicating the Inode somehow and changing the length, but I presume this could cause filesystem coherency issues (when datablocks move around etc.). Is there any linux tool or user-level system call that could let me do this?
You cannot do this directly in the manner you are talking about. There are filesystems that sort of do this. For example LessFS can do this. I also believe that the underlying structure of btrfs supports this, so if someone put the hooks in, it would be accessible at user level.
But, there is no way to do this with any of the ext filesystems, and I believe that it happens implicitly in LessFS.
There is a really ugly way of doing it in btrfs. If you make a snapshot, then truncate the snapshot file to 100M, you have effectively achieved your goal.
I believe this would also work with btrfs and a sufficiently recent version of cp you could just copy the file with cp then truncate one copy. The version of cp would have to have the --reflink option, and if you want to be extra sure about this, give the --reflink=always option.
Adding to #Omnifarious's answer:
What you're describing is not a hard link. A hard links is essentially a reference to an inode, by a path name. (A soft link is a reference to a path name, by a path name.) There is no mechanism to say, "I want this inode, but slightly different, only the first k blocks". A copy-on-write filesystem could do this for you under the covers. If you were using such a filesystem then you would simply say
cp fileA fileB && truncate -s 200M fileB
Of course, this also works on a non-copy-on-write filesystem, but it takes up an extra 200 MB instead of just the filesystem overhead.
Now, that said, you could still implement something like this easily on Linux with FUSE. You could implement a filesystem that mirrors some target directory but simply artificially sets a maximum length to files (at say 200 MB).
FUSE
FUSE Hello World
Maybe you can check ChunkFS. I think this is what you need (I didn't try it).
I can know inode of device/socket with stat, so seems like I can somehow "copy" this file for backup. Of course the solution is "dd", but I have no idea what can I do if the device is infinity (like the random one). And can I just copy the inode somehow?
These are referred to as "special files" or "special nodes". Copying their contents doesn't make sense, as the contents are generated in one way or another programatically by the kernel as needed.
Programs like "tar" know how to copy the contents of the inode, which will refer to the portion of the kernel that support each of these different nodes. See the documentation of the "mknod" command for some more details.
And if you need one-liner to copy device nodes with tar, here it is:
cd /dev && tar -cpf- sda* | tar -xf- -C /some/destination/path/
Found out the major and minor number of the device file you need to copy then use mknod to create the device file with the same major and minor number. Major number is used for a program to access to kernel device switch table and calling the proper kernel function (usually device drive). Minor number is used as a parameter for calling those functions (like different density, disk, .... etc).
24 July 2022
There is one legitimate use case for copying (archiving) a socket.
I have a program that gathers and summarizes attribute data in a file system tree. In order to regression test, I created a directory that contains one example of every type of file the program might encounter. I run my program on this directory to test it whenever I alter the code.
It is necessary to backup this directory along with other more valuable data, and it is necessary to restore it, should the storage device fail.
tar is the program of choice, and of course tar can not archive a socket. Doing so in most situations is senseless - any program that uses the socket will have to delete it and recreate it before use.
In the case of the test directory, there is one named socket, for it is possible that my program will encounter such things and it needs to correctly gather attributes for a complete summary.
As noted by others, that socket is not useful for anything directly. It does, however, occupy a little storage space, much as an empty file occupies storage space. That is why you can see it in the directory listing.
You can copy it successfully with the command:
cp -ar --parents <path> <backup_device_directory>
and restore it with:
cp -ar --parents <backup_device_directory>/<path> <directory>
The socket is not useful for anything except probing its attributes with a program during a regression test.
Archiving it saves the trouble of having to remember to recreate it after a restoration. The extra nuisance of archiving the sockets is easily codified in a script and forgotten. That is what we all want - easy to use solutions whose implementation you can ignore after you have solved the problem.
You can copy from a working system as below to some shared location between the machines and copy from the shared location to the other system.
Machine A
cp -rf /dev/SRC shared_directory
Machine B
cp -rf shared_directory /dev/