I need to find a way to determine whether a file has been archived (mainly using logrotate).
On BTRFS, the inode is changing when creating a new file with the same name.
But on the ext4 filesystem, it seems to not be the case.
The scenario is the following: a process is creating and feeding a Linux logfile on a dedicated path on a ext4 filesystem. At some point in time, it's rotated using the logrotate process. But re-created with the same path later on.
It seems the (inode,dev) combination is not sufficient to uniquely determine with no doubt whether the file has been rotated.
Thanks for any hint.
Related
I'm reading Operating System Concepts by Avi Silberschatz(9thE), in section 11.4 File-System Mounting, the author explains the steps of filesystem mounting as follows:
The operating system is given the
name of the device and the mount point—the location within the file structure
where the file system is to be attached.
Next, the operating system verifies that the device contains a valid file
system.
Finally, the operating
system notes in its directory structure that a file system is mounted at the
specified mount point.
I'm confused with the final step, since to the best of my knowledge, the directory structure is stored somewhere on the disk, which records the files' information -- such as name, location, size, and type. Then what does the author mean by directory structure in operating system? Is it the same directory on disk?
Additionally, which part finishes the conversion from file name to physical address on disk? Is it the disk driver or the disk controller or done by processor with memory?
What you are reading is largely nonsense. To begin with, it is eunuchs specific. Eunuchs variants tend to have a single directory structure containing all disks and even things that are not really files.
Let us assume that you are on Windoze. If you mount a disk the drive gets a name, typically a single letter but larger names are possible in some cases. Let's say you mount a disk drive, and the system assigns it to "Q:".
So now Q: is available and you can access files, by specifying something like
"Q:\dir1\dir2\file.type"
You are just accessing the directory structure that exists on Q:.
Each drive has a separate, independent directory structure.
Many operating system operate this way and your sequence above is irrelevant to them.
Eunchs variants do not work this way. The system maintains a single directory starting at "/" which is the root directory for the system. This is a directory maintained by the operating system and does not exist at all on a disk drive.
On a Mac, for instance, there is a "/Volumes" directory that contains all the drives mounted. These too are directories maintained by the operating system and do not exist at all on a disk drive.
"/Volumes/Macintosh HD"
"/Volumes/Backup Drive"
These system directories then link to the directories that are stored on those disks. Thus, in Eunuchs, there are directories maintained by the operating system and directories maintained on the disk that are merged together.
So if you want to find "/Volumes/Backup Drive/dir/something.txt" the system goes to the root "/" finds "Volumes" and determines this is a system directory. Finds "Backup Drives" and determines this is a disk drive that has been mounted. Goes to the root directory of the drive and find that "dir" is a directory on the drive, and finds the file something.txt.
To add to the confusion, there are disk formats that have no directory structure at all. But this illustrates that your book is taking you on a confusing path.
Each disk drive has a format of some kind. E.g., NTFS, ODS-11, FAT, ....
What I am telling you from here on is generalization of what typically happens but there are large variations in how it works among systems.
Typically, each drive will have a header that includes a description of block clusters in use (often a bitmaps) and files on the disk. The file description will usually have a file name, date created, owner, etc. The file description will also have information about where the data is stored on the disk.
The drive often will have a directory structure in which there is some file it defines as the root directory. The directory structure exists by creating directory files within other directory files. A directory is normally just a file that has a list of file names and the address of their description in the the disk header. Other file attributes, such as the file size and date of creation, are not stored in the directory.You get that from the file description in the disk header.
The file structure in the disk header is separate from the directory structure. In fact, it is often possible to create a file that is not even in a directory at all. Or you can put a single file in multiple directories.
If your disk gets trashed and has to be recovered, this is usually done by looking at the disk header. You get back your files but lose your directory structure.
Additionally, which part finishes the conversion from file name to physical address on disk? Is it the disk driver or the disk controller or done by processor with memory?
The logical location on the disk is specified in the file description in the disk header. The format of that information is specific to the underlying disk format. Generally you have two paths to reach the file description:
You can go through the list of file headers maintained by the disk; or
You can navigate a directory structure until you find the file name you want with a link to the file description.
Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.
I have a file indexing database on Linux. Currently I use file path as an identifier.
But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.
I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.
So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier.
I hope to use i_crtime on ext4 and creation_time on NTFS.
In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.
So, the question is whether there are cases when inode or crtime of a file may change.
For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?
Interesting that
http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:
"In the NTFS file system, a file keeps the same file ID until it is deleted."
but also:
"In some cases, the file ID for a file can change over time."
So, what are those cases they mentioned?
Note that I studied similar questions:
How to determine the uniqueness of a file in linux?
Executing 'mv A B': Will the 'inode' be changed?
Best approach to detecting a move or rename to a file in Linux?
but they do not answer my question.
{device_nr,inode_nr} are a unique identifier for an inode within a system
moving a file to a different directory does not change its inode_nr
the linux inotify interface enables you to monitor changes to inodes (either files or directories)
Extra notes:
moving files across filesystems is handled differently. (it is infact copy+delete)
networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)
Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it.
The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.
The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.
For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.
In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.
This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.
From the source code for ialloc.c, where i-nodes are allocated:
There are two policies for allocating an inode. If the new inode is a
directory, then a forward search is made for a block group with both
free space and a low directory-to-inode ratio; if that fails, then of
he groups with above-average free space, that group with the fewest
directories already is chosen. For other inodes, search forward from
the parent directory's block group to find a free inode.
The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c
I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.
how can I mount and unmount a file as loop device and have exactly the same MD5 checksum afterwards? (Linux)
Here's the workflow:
I take a fresh copy of a fixed template file which contains a prepared
ext2 root file system.
The file is mounted with mount -t ext2 <file> <mountpoint> -o loop,sync,noatime,nodiratime
( Here, some files will be added in future--but ignore this for a moment and focus on mount / umount )
umount
Take the MD5 sum of the file.
I expect the same, reproducible checksum every time I perform exactly the same steps.
However, when I repeat the process (remember: taking a fresh copy of the template file), I always get a different checksum.
I assume on the one hand that still some timestamps are set internally (I tried to avoid this with the noatime option) or, on the other hand, Linux manages the file system on its own way where I have no influence. That means: the files and timestamps inside might might be the same, but the way the file system is arranged inside the file might be differnt and therefore kind of random.
In comparison, when I create a zip file of a file tree, and I touched all files with a defined timestamp, the checksum of the zip file is reproducible.
Is there a way to keep the mount or file access that controlled as I need at all?
it depends on the file system on disk format. I believe ext2 keep sat the least the mount count counter - how many time the file system was mounted. I don't remember any mount option to tell it not to write that counter (and perhaps other data items) but you can:
a. mount the file system read only. Then the checksum will not change of course.
b. Change the ext2 file system kernel driver to add an option to not change the counter and possible other data bits.
The more interesting question is why you are interested is such an option. I think there is probably a better way to achieve what you are trying to do - whatever it is.
I'm developing a linux-based appliance using an alix 2d13.
I've developed a script that takes care of creating an image file, creating the partitions, installing the boot loader (syslinux), the kernel and the initrd and, that takes care to put root filesystem files into the right partition.
Configuration files are on tmpfs filesystem and gets created on system startup by a software that reads an XML file that resides on an own partition.
I'm looking a way to update the filesystem and i've considered two solutions:
the firmware update is a compressed file that could contain kernel, initrd and/or the rootfs partition, in this way, on reboot, initrd will takes care to dd the rootfs image to the right partition;
the firmware update is a compressed file that could contain two tar archives, one for the boot and one for the root filesystem.
Every solution has its own advantages:
- a filesystem image will let me to delete any unused files but needs a lot of time and it will kill the compact flash memory fastly;
- an archive is smaller, needs less time for update, but i'll have the caos on the root filesystem in short time.
An alternative solution could be to put a file list and to put a pre/post update script into the tar archive, so any file that doesn't reside into the file list will be deleted.
What do you think?
I used the following approach. It was somewhat based on the paper "Building Murphy-compatible embedded Linux systems," available here. I used the versions.conf stuff described in that paper, not the cfgsh stuff.
Use a boot kernel whose job is to loop-back mount the "main" root file system. If you need a newer kernel, then kexec into that newer kernel right after you loop-back mount it. I chose to put the boot kernel's complete init in initramfs, along with busybox and kexec (both statically linked), and my init was a simple shell script that I wrote.
One or more "main OS" root file systems exist on an "OS image" file system as disk image files. The boot kernel chooses one of these based on a versions.conf file. I only maintain two main OS image files, the current and fall-back file. If the current one fails (more on failure detection later), then the boot kernel boots the fall-back. If both fail or there is no fall-back, the boot kernel provides a shell.
System config is on a separate partition. This normally isn't upgraded, but there's no reason it couldn't be.
There are four total partitions: boot, OS image, config, and data. The data partition is for user application stuff that is intended for frequent writing. boot is never mounted read/write. OS image is only (re-)mounted read/write during upgrades. config is only mounted read/write when config stuff needs to change (hopefully never). data is always mounted read/write.
The disk image files each contain a full Linux system, including a kernel, init scripts, user programs (e.g. busybox, product applications), and a default config that is copied to the config partition on the first boot. The files are whatever size is necessary to fit everything in them. As long I allowed enough room for growth so that the OS image partition is always big enough to fit three main OS image files (during an upgrade, I don't delete the old fall-back until the new one is extracted), I can allow for the main OS image to grow as needed. These image files are always (loop-back) mounted read-only. Using these files also takes out the pain of dealing with failures of upgrading individual files within a rootfs.
Upgrades are done by transferring a self-extracting tarball to a tmpfs. The beginning of this script remounts the OS image read/write, then extracts the new main OS image to the OS image file system, and then updates the versions.conf file (using the rename method described in the "murphy" paper). After this is done, I touch a stamp file indicating an upgrade has happened, then reboot.
The boot kernel looks for this stamp file. If it finds it, it moves it to another stamp file, then boots the new main OS image file. The main OS image file is expected to remove the stamp file when it starts successfully. If it doesn't, the watchdog will trigger a reboot, and then the boot kernel will see this and detect a failure.
You will note there are a few possible points of failure during an upgrade: syncing the versions.conf during the upgrade, and touching/removing the stamp files (three instances). I couldn't find a way to reduce these further and achieve everything I wanted. If anyone has a better suggestion, I'd love to hear it. File system errors or power failures while writing the OS image could also occur, but I'm hoping the ext3 file system will provide some chance of surviving in that case.
You can have a seperate partition for update(Say Side1/Side2).
The existing kernel,rootfs is in Side1, then put the update in Side2 and switch.
By this you can reduce wear leveling and increase the life but the device gets costlier.
You can quick format the partitions before extracting the tar files. Or go with the image solution but use the smallest possible image and after dd do a filesystem resize (although that is not necessary for readonly storage)