How to detect a file has been deleted

How to detect a file has been deleted - linux

I am writing a program to monitor the file system. But I'm not able to detect when a file is deleted. I tried monitoring with FAN_MARK_ONLYDIR flag hoping fanotify rise some event when deleting a file in a monitored dir, no results.
It is even possible do this using fanotify? There are something to help me to do this?

According to a linuxquestions.org thread fanotify doesn't detect file replacement or deletion or subdirectory creation, renaming, or deletion. Also see baach.de discussion, which compares (or mentions) inotify, dnotify, fam, Fanotify, tripwire, Python-fuse, and llfuse (python) among other file or directory change monitors.
inotify supports IN_DELETE and IN_DELETE_SELF events and if you are working with a limited number of directories, rather than an entire filesystem, is practical to use.
Edit: Among inotify limitations or caveats mentioned in its webpage are the following:
inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created. This can take a significant amount time for large directory trees. ... If monitoring an entire directory subtree, and a new subdirectory is created in that tree, be aware that by the time you create a watch for the new subdirectory, new files may already have been created in the subdirectory. Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch.

Related

When writing to a newly created file, can I create the directory entry only after writing is completed?

I'm writing a file that takes minutes to write. External software monitors for this file to appear, but unfortunately doesn't monitor for inotify IN_CLOSE_WRITE events, but rather checks periodically "the file is there" and then starts to process it, which will fail if the file is incomplete. I cannot fix the external software. A workaround I've been using so far is to write a temporary file and then rename it when it's finished, but this workaround complicates my workflow for reasons beyond the scope of this question¹.
Files are not directory entries. Using hardlinks, there can be multiple pointers to the same file. When I open a file for writing, both the inode and the directory entry are created immediately. Can I prevent this? Can I postpone the creation of the directory entry until the file is closed, rather than when the file is opened for writing?
Example Python-code, but the question is not specific to Python:
fp = open(dest, 'w') # currently both inode and directory entry are created here
fp.write(...)
fp.write(...)
fp.write(...)
fp.close() # I would like to create the directory entry only here
Reading everything into memory and then writing it all in one go is not a good solution, because writing will still take time and the file might not fit into memory.
I found the related question Is it possible to create an unlinked file on a selected filesystem?, but I would want to first create an anonymous/unnamed file, then naming it when I'm done writing (I agree with the answer there that creating an inode is unavoidable, but that's fine; I just want to postpone naming it).
Tagging this as linux, because I suspect the answer might be different between Linux and Windows and I only need a solution on Linux.
¹Many files are produced in parallel within dask graphs, and injecting a "move as soon as finished" task in our system would be complicated, so we're really renaming 50 files when 50 files have been written, which causes delays.

Counter file placement and naming convention

Ok this one might be stupid, but i'm losing too much time overthinking a solution.
I have a web app with 2 differents kind of payment modules.
These modules need (each) a counter file, incremented each time someone want to pay, and locked while incrementing to make sure the payment get a unique payment reference.
The files were placed inside the main directory (public_html) and have been overriden by a bad versionning move.
So I want to move them outside of public_html, where I already placed the main config file.
But having these critical file placed at the root of my ftp sounds stupind and dangerous. So I'll create a directory to place them.
This is a lot of text just to ask this :
How would you call this directory ?

IMO, your question has not related especially with PHP, it's a common issue. You can use of one of standard directories to share data between the applications.
/var
From the Filesystem Hierarchy Standard (FHS):
/var contains variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
(read more)
Some options:
You can store your file directly in the /var.
Also /var/tmp can hold temporary files for a longer time and doesn't clean it after reboot (depends on your system).
Or you can create a custom subdirectory in the /var/opt with name that relevant to your applications.

Daemon for file watching / reporting in the whole UNIX OS

I have to write a Unix/Linux daemon, which should watch for particular set of files (e.g. *.log) in any of the file directories, across various locations and report it to me. Then I have to read all the newly modified files and then I have to process them and push grepped data into Elasticsearch.
Any suggestion on how this can be achieved?
I tried various Perl modules (e.g. File::ChangeNotify, File::Monitor) but for these I need to specify the directories, which I don't want: I need the list of files to be dynamically generated and I also need the content.
Is there any method that I can call OS system calls for file creation and then read the newly generated/modified file?

Not as easy as it sounds unfortunately. You have hooks to inotify (on some platforms) that let you trigger an event on a particular inode changing.
But for wider scope changing, you're really talking about audit and accounting tracking - this isn't a small topic though - not a lot of people do auditing, and there's a reason for that. It's complicated and very platform specific (even different versions of Linux do it differently). Your favourite search engine should be able to help you find answers relevant to your platform.
It may be simpler to run a scheduled task in cron - but not too frequently, because spinning a filesystem like that is dirty - along with File::Find or similar to just run a search occasionally.

In a kernel module, how to know whether given inode belongs to a specific directory?

One possible way is that, compare given inode with list of inodes in that directory. The list of inodes could be predetermined or it can be calculated run time, both ways have their own problems:
Predetermined list: List can be changed during this operation, i.e. files could be added or removed from that directory.
Run time list: If that directory has too many files, it's too much overhead for each access of any file in the system.
Is there any efficient solution/way for this? I have tried by comparing file by it's path, which was really a bad idea.

Either if you do it in kernel mode or in user mode has no advantages. To see if an inode is indeed in some directory you have to read that directory as files are located in directories normally as a linear list. This can lead your process blocking for directory blocks to be present if not cached and, in that time, the directory contents can be modified. Only if you maintain the directory inode blocked while doing that operation can help, but this can add severe performance restrictions to your operating system. Another issue is that each filesystem is free to implement directory contents in it's own format. In userland you get an uniform directory format, but in kernel mode you have to deal with the different approaches for different filesystem types. Why do you need to know that? I can't imagine a scenario where this can be needed. Perhaps you can redesign your algorithm for the directory contents to be unnecessary.
By the way, dealing with complete paths or searching directories have obscure race conditions that can deal your system blocked someway. What can happen if, in the middle of your seach, somebody tries to unlink the inode you are searching for; or the directory contents must be modified; or some other process is using namei() to traverse through your directory upwards; or downwards. Have you think in all these possibilities?

I/O Performance in Linux

File A in a directory which have 10000 files, and file B in a directory which have 10 files, Would read/write file A slower than file B?
Would it be affected by different journaling file system?

No.
Browsing the directory and opening a file will be slower (whether or not that's noticeable in practice depends on the filesystem). Input/output on the file is exactly the same.
EDIT:
To clarify, the "file" in the directory is not really the file, but a link ("hard link", as opposed to symbolic link), which is merely a kind of name with some metadata, but otherwise unrelated to what you'd consider "the file". That's also the historical reason why deleting a file is done via the unlink syscall, not via a hypothetical deletefile call. unlink removes the link, and if that was the last link (but only then!), the file.
It is perfectly legal for one file to have a hundred links in different directories, and it is perfectly legal to open a file and then move it to a different place or even unlink it (while it remains open!). It does not affect your ability to read/write on the file descriptor in any way, even when a file (to your knowledge) does not even exist any more.

In general, once a file has been opened and you have a handle to it, the performance of accessing that file will be the same no matter how many other files are in the same directory. You may be able to detect a small difference in the time it takes to open the file, as the OS will have to search for the file name in the directory.

Journaling aims to reduce the recover time from file system crashes, IMHO, it will not affect the read/write speed of files. Journaling ext2

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string