how to get linux file modification history including who:group

how to get linux file modification history including who:group - linux

I want to trace the modification history of a specific file. Does linux have commands allow me to get some details like who, when, did what(access, write, execute).
eg.
bob:adm read $timestamp
alice:user write $timestamp
I know stat can do some of it, but can it also show me who/group modified the file?

Most File Systems simply don't track those data.
Journaling file systems track them (at least the write operations) but usually just the last ones.
Without a very specific file system, you won't have those data.

Related

How to prevent changes to the underlying file after mmap()-ing a file from being visible to my program?

According to mmap() manpage:
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the mapping are not visible to other processes mapping the same file, and are not carried through to the underlying file. It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.
Question: How to prevent changes to the underlying file after mmap()-ing a file from being visible to my program?
Background: I am designing a data structure for a text editor designed to allow editing huge text files efficiently. The data structure is akin to an on-disk rope but with the actual strings being pointer to mmap()-ed ranges from the original file.
Since the file could be very large, there are a few restrictions around the design:
Must not load the entire file into RAM as the file may be larger than available physical RAM
Must not copy files on opening as this will make opening new files really slow
Must work on filesystems like ext4 that does not support copy-on-write (cp --reflink/ioctl_ficlone)
Must not rely on mandatory file locking, as this is deprecated, and requires specific mount option -o mand in the filesystem
As long as the changes aren't visible in my mmap(), it's ok for the underlying file to change on the filesystem
Only need to support recent Linux and using Linux-specific system APIs are ok
The data structure I'm designing would keep track of a list of unedited and edited ranges in the file by storing start and end index of the ranges into the mmap()-ed buffer. While the user is browsing through the file, ranges of text that have never been modified by the user would be read directly from a mmap() of the original file, while a swap file will store the ranges of texts that have been edited by the user but had not been saved.
When the user saves a file, the data structure would use copy_file_range to splice the swap file and the original file to assemble the new file. For this splicing to work, the original file as seen by my program must remain unchanged throughout the entire editing session.
Problem: The user may concurrently have other programs modifying the same file, possibly other text editors or some other programs that modified the text file in-place, after making unsaved changes in my text editor.
In such situation, the editor can detect such external change using inotify, and then I want to give the user two options on how to continue from this:
discard all unsaved changes and re-read the file from disk, implementing this option is fairly straightforward
allow the user to continue editing the file and later on the user should be able to save the unsaved changes in a new location or to overwrite the changes that had been made by the other program, but implementing this seems tricky
Since my editor did not make a copy of the file when it opened the file, when the other program overwrite the file, the text ranges that my data structure are tracking may become invalid because the data on-disk have changed and these changes are now visible through my mmap(). This means if my editor tried to write unsaved changes after the file has been modified from another process, it could be splicing text ranges in the old file using data from the data from the new file, which could mean that my editor could be producing a corrupt file when saving the unsaved changes.
I don't think advisory locks would have saved the situation here in all cases, as other programs may not honor advisory lock.
My ideal solution would be to make it so that when other programs overwrites the file, the system should transparently copy the file to allow my program to continue seeing the old version while the other program finishes their write to disk and make their version visible in the filesystem. I think ioctl_ficlone could have made this possible, but to my understanding, this only works with a copy-on-write filesystem like btrfs.
Is such a thing possible?
Any other suggestions to solve this problem would also be welcome.

What you want to do isn't possible with mmap, and I'm not sure if it's possible at all with your constraints.
When you map a region, the kernel may or may not actually load all of it into memory. The region of memory that lacks data will actually contain an invalid page, so when you access it, the kernel takes a page fault and maps that region into memory. That region will likely contain whatever is in that portion of the file at the time the page fault occurs. There is an option, MAP_LOCKED, which tries to prefault all of the pages in, but doesn't guarantee it, so you can't rely on it working.
In general, you cannot prevent other processes from changing a file out from under you. Some tools (including editors) will write a new file to the side, calling rename to overwrite the file, and some will rewrite the file in place. The former is what you want, but many editors choose to do the latter, since it preserves characteristics such as ACLs and permissions you can't restore.
Furthermore, you really don't want to use mmap on any file you can't totally control, because if another process truncates the file and you try to access that portion of the buffer, your process will die with SIGBUS. Catching this signal is undefined behavior, and the only sane thing to do is die. (Also, it can be sent in other situations, such as unaligned access, and you'll have a hard time distinguishing between them.)
Ultimately, if you're not interested in copying the file, you can't guarantee someone won't change underneath you, and you'll need to be prepared for that to occur.

How to check if a file is opened in Linux?

The thing is, I want to track if a user tries to open a file on a shared account. I'm looking for any record/technique that helps me know if the concerned file is opened, at run time.
I want to create a script which monitors if the file is open, and if it is, I want it to send an alert to a particular email address. The file I'm thinking of is a regular file.
I tried using lsof | grep filename for checking if a file is open in gedit, but the command doesn't return anything.
Actually, I'm trying this for a pet project, and thus the question.

The command lsof -t filename shows the IDs of all processes that have the particular file opened. lsof -t filename | wc -w gives you the number of processes currently accessing the file.

The fact that a file has been read into an editor like gedit does not mean that the file is still open. The editor most likely opens the file, reads its contents and then closes the file. After you have edited the file you have the choice to overwrite the existing file or save as another file.

You could (in addition of other answers) use the Linux-specific inotify(7) facilities.
I am understanding that you want to track one (or a few) particular given file, with a fixed file path (actually a given i-node). E.g. you would want to track when /var/run/foobar is accessed or modified, and do something when that happens
In particular, you might want to install and use incrond(8) and configure it thru incrontab(5)
If you want to run a script when some given file (on a native local, e.g. Ext4, BTRS, ... but not NFS file system) is accessed or modified, use inotify incrond is exactly done for that purpose.
PS. AFAIK, inotify don't work well for remote network files, e.g. NFS filesystems (in particular when another NFS client machine is modifying a file).
If the files you are fond of are somehow source files, you might be interested by revision control systems (like git) or builder systems (like GNU make); in a certain way these tools are related to file modification.
You could also have the particular file system sits in some FUSE filesystem, and write your own FUSE daemon.
If you can restrict and modify the programs accessing the file, you might want to use advisory locking, e.g. flock(2), lockf(3).
Perhaps the data sitting in the file should be in some database (e.g. sqlite or a real DBMS like PostGreSQL ou MongoDB). ACID properties are important ....
Notice that the filesystem and the mount options may matter a lot.
You might want to use the stat(1) command.
It is difficult to help more without understanding the real use case and the motivation. You should avoid some XY problem
Probably, the workflow is wrong (having a shared file between several users able to write it), and you should approach the overall issue in some other way. For a pet project I would at least recommend using some advisory lock, and access & modify the information only thru your own programs (perhaps setuid) using flock (this excludes ordinary editors like gedit or commands like cat ...). However, your implicit use case seems to be well suited for a DBMS approach (a database does not have to contain a lot of data, it might be tiny), or some index locked file like GDBM library is handling.
Remember that on POSIX systems and Linux, several processes can access (and even modify) the same file simultaneously (unless you use some locking or synchronization).
Reading the Advanced Linux Programming book (freely available) would give you a broader picture (but it does not mention inotify which appeared aften the book was written).

You can use ls -lrt, it displays the last RW operations in the shell. Then you can conclude whether the file is opened or not. Make sure that you are in the exact directory.

I/O Performance in Linux

File A in a directory which have 10000 files, and file B in a directory which have 10 files, Would read/write file A slower than file B?
Would it be affected by different journaling file system?

No.
Browsing the directory and opening a file will be slower (whether or not that's noticeable in practice depends on the filesystem). Input/output on the file is exactly the same.
EDIT:
To clarify, the "file" in the directory is not really the file, but a link ("hard link", as opposed to symbolic link), which is merely a kind of name with some metadata, but otherwise unrelated to what you'd consider "the file". That's also the historical reason why deleting a file is done via the unlink syscall, not via a hypothetical deletefile call. unlink removes the link, and if that was the last link (but only then!), the file.
It is perfectly legal for one file to have a hundred links in different directories, and it is perfectly legal to open a file and then move it to a different place or even unlink it (while it remains open!). It does not affect your ability to read/write on the file descriptor in any way, even when a file (to your knowledge) does not even exist any more.

In general, once a file has been opened and you have a handle to it, the performance of accessing that file will be the same no matter how many other files are in the same directory. You may be able to detect a small difference in the time it takes to open the file, as the OS will have to search for the file name in the directory.

Journaling aims to reduce the recover time from file system crashes, IMHO, it will not affect the read/write speed of files. Journaling ext2

Files informations on unix-based file systems

When I create a new file (eg. touch file.txt) its size equals to 0B.
I'm wondering where are its informations (size, last modify date, owner name, file name) stored.
These informations are stored on hd and are managed by kernel, of course, but I'd love to know something more about them:
Where and how I may get them, using a programming language as C, for example, and how I may change them.
Are these informations changeable, simply using a programming language, or maybe kernel avoids this operations?
I'm working on Unix based file systems, and I'm asking informations especially about this fs.

On unix system, they're traditionally stored in the metadata part of a file representation called an inode
You can fetch this information with the stat() call, see these fields, you can change the owner and permissions with chown() and chmod()

This information is retrievable using the stat() function (and others in its family). Where it's stored is up to the specific file system and for what should be obvious reasons, you cannot change them unless you have raw access to the drive -- and that should be avoided unless you're ok with losing everything on that drive.

The metadata such as owner, size and dates are usually stored in a structure called index-node (inode), which resides in the filesystem's superblock.

Append only file

I'm trying to realize a file. Each event just appends one line to the file. So far this is a no brainer. The hard part is that several users are supposed to be able to add entries to that file but no one is supposed to be able to modify or delete existing ones. Can I somehow enforce this using file access rights? I'm using Linux.
Thx

On linux you have the option of using the system append-only flag. This is not available on all filesystems.
This attribute is set using the chattr utility and you should view the man page. Only root can set this attribute.
On Ubuntu you'll probably end up doing:
sudo chattr +a filename

The classic permissions, read, write, and execute won't get you there. If you have write permission you can delete the file, and all the lines in it.
You'll need some kind of program to arbitrate the file access. One way would be to open up a fifo and have the producers write to the fifo. If the writes are not too big (4k writes are atomic on my linux box) the different writes won't get intermixed. By making the consumer process have priviledges that the producers don't have, the producers won't be able to see the final results.
You could use something like syslog to do this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string