How can one monitor what part of big file changed

How can one monitor what part of big file changed - linux

Is there solution for Linux kernel-3.0 (or later) that allows one to get notifications similar to inotify describing particular segment of file that was changed?
There was fschange patch for up to kernel-2.6.21. Is there any up to date solution available? Is recent fanotify able to provide the functionality?

Not that I know of, but there is a way to sort of hack the functionality by using the file change notification as an indicator to read the on disk format of the file system an examine the internal file system block allocation tables to learn whats changed.
It's tricky to do, suffers from race conditions and probably a bad idea, but if you must and coding an fschange on top of 3.0 is not an option for you, it might be the way to go.

IMO... forget using inotify unless "the pretty" is important. Other than that, you can setup a cronjob with a script doing a diff or using FIND with the MTIME option.

Related

Detecting if the code read the specified input file

I am writing some automated tests for testing code and providing feedback to the programmer.
One of the requirements is to detect if the code has successfully read the specified input file. If not - we need to provide feedback to the user accordingly. One way to detect this was atime timestamp, but since our server drive is mounted with relatime option - we are not getting atime updates for every file read. Changing this option to record every atime is not feasible as it slows down our I/O operations significantly.
Is there any other alternative that we can use to detect if the given code indeed reads the specified input file?

Here's a wild idea: intercept read call at some point. One of possible approaches goes more or less like this:
The program makes all its reading through an abstraction. For example, MyFileUtils.read(filename) (custom) instead of File.read(filename) (stdlib).
During normal operation, MyFileUtils simply delegates the work to File (or whatever system built-in libraries/calls you use).
But under test, MyFileUtils is replaced with a special test version which, along with the delegation, also reports usage to the framework.
Note that in some environments/languages it might be possible to inject code into File directly and the abstraction will not be needed.

I agree with Sergio: touching a file doesn't mean that it was read successfully. If you want to be really "sure"; those programs have to "send" some sort of indication back. And of course, there are many options to get that.
A pragmatic way could be: assuming that those programs under test create log files; your "test monitor" could check that the log files contain fixed entries such as "reading xyz PASSED" or something alike.
If your "code under test" doesn't create log files; maybe: consider changing that.

Daemon for file watching / reporting in the whole UNIX OS

I have to write a Unix/Linux daemon, which should watch for particular set of files (e.g. *.log) in any of the file directories, across various locations and report it to me. Then I have to read all the newly modified files and then I have to process them and push grepped data into Elasticsearch.
Any suggestion on how this can be achieved?
I tried various Perl modules (e.g. File::ChangeNotify, File::Monitor) but for these I need to specify the directories, which I don't want: I need the list of files to be dynamically generated and I also need the content.
Is there any method that I can call OS system calls for file creation and then read the newly generated/modified file?

Not as easy as it sounds unfortunately. You have hooks to inotify (on some platforms) that let you trigger an event on a particular inode changing.
But for wider scope changing, you're really talking about audit and accounting tracking - this isn't a small topic though - not a lot of people do auditing, and there's a reason for that. It's complicated and very platform specific (even different versions of Linux do it differently). Your favourite search engine should be able to help you find answers relevant to your platform.
It may be simpler to run a scheduled task in cron - but not too frequently, because spinning a filesystem like that is dirty - along with File::Find or similar to just run a search occasionally.

Persistant storage values handling in Linux

I have a QSPI flash on my embedded board.
I have a driver + process "Q" to handle reading and writing into.
I want to store variables like SW revisions, IP, operation time, etc.
I would like to ask for suggestions how to handle the different access rights to write and read values from user space and other processes.
I was thinking to have file for each variable. Than I can assign access rights for those files and process Q can change the value in file if value has been changed. So process Q will only write into and other processes or users can only read.
But I don't know about writing. I was thinking about using message queue or zeroMQ and build the software around it but i am not sure if it is not overkill. But I am not sure how to manage access rights anyway.
What would be the best approach? I would really appreciate if you could propose even totally different approach.
Thanks!

This question will probably be downvoted / flagged due to the "Please suggest an X" nature.
That said, if a file per variable is what you're after, you might want to look at implementing a FUSE file system that wraps your SPI driver/utility "Q" (or build it into "Q" if you get to compile/control source to "Q"). I'm doing this to store settings in an EEPROM on a current work project and its turned out nicely. So I have, for example, a file, that when read, retrieves 6 bytes from EEPROM (or a cached copy) provides a MAC address in std hex/colon-separated notation.
The biggest advantage here, is that it becomes trivial to access all your configuration / settings data from shell scripts (e.g. your init process) or other scripting languages.
Another neat feature of doing it this way is that you can use inotify (which comes "free", no extra code in the fusefs) to create applications that efficiently detect when settings are changed.
A disadvantage of this approach is that it's non-trivial to do atomic transactions on multiple settings and still maintain normal file semantics.

intercepting file system system calls

I am writing an application for which I need to intercept some filesystem system calls eg. unlink. I would like to save some file say abc. If user deletes the file then I need to copy it to some other place. So I need unlink to call my code before deleting abc so that I could save it. I have gone through threads related to intercepting system calls but methods like LD_PRELOAD it wont work in my case because I want this to be secure and implemented in kernel so this method wont be useful. inotify notifies after the event so I could not be able to save it. Could you suggest any such method. I would like to implement this in a kernel module instead of modifying kernel code itself.
Another method as suggested by Graham Lee, I had thought of this method but it has some problems ,I need hardlink mirror of all the files it consumes no space but still could be problematic as I have to repeatedly mirror drive to keep my mirror up to date, also it won't work cross partition and on partition not supporting link so I want a solution through which I could attach hooks to the files/directories and then watch for changes instead of repeated scanning.
I would also like to add support for write of modified file for which I cannot use hard links.
I would like to intercept system calls by replacing system calls but I have not been able to find any method of doing that in linux > 3.0. Please suggest some method of doing that.

As far as hooking into the kernel and intercepting system calls go, this is something I do in a security module I wrote:
https://github.com/cormander/tpe-lkm
Look at hijacks.c and symbols.c for the code; how they're used is in the hijack_syscalls function inside security.c. I haven't tried this on linux > 3.0 yet, but the same basic concept should still work.
It's a bit tricky, and you may have to write a good deal of kernel code to do the file copy before the unlink, but it's possible here.

One suggestion could be Filesystems in Userspace (FUSE.) That is, write a FUSE module (which is, granted, in userspace) which intercepts filesystem-related syscalls, performs whatever tasks you want, and possibly calls the "default" syscall afterwards.
You could then mount certain directories with your FUSE filesystem and, for most of your cases, it seems like the default syscall behavior would not need to be overridden.

You can watch unlink events with inotify, though this might happen too late for your purposes (I don't know because I don't know your purposes, and you should experiment to find out). The in-kernel alternatives based on LSM (by which I mean SMACK, TOMOYO and friends) are really for Mandatory Access Control so may not be suitable for your purposes.

If you want to handle deletions only, you could keep a "shadow" directory of hardlinks (created via link) to the files being watched (via inotify, as suggested by Graham Lee).
If the original is now unlinked, you still have the shadow file to handle as you want to, without using a kernel module.

How to tune inotify to use less memory?

I'm working on an embedded linux system.
I tried to use inotify to monitor changes. But when I tried to monitor a huge numbers of folders (let's say more than 6000 folders), inotify uses a lot of memory (about 25-30MB). As you all know, 25-30MB in embedded system is considered to be large...
My questions are;
is this normal?
is anyone know how to tune this?
any alternative to monitor a huge numbers of folders without adding watch in each folder?

As far as I know a recursive watch is not possible with an unpatched Linux kernel. See also https://superuser.com/questions/118642/recursive-filesystem-notifications-inotify-for-ubuntu-karmic-koala . Maybe fanotify would work for you, but it needs a kernel patch.

Look into using Auditd.
There is also a user space file system called loggedfs, but I couldn't get that to work.

It's inevitable to monitor directories recursively when using inotify.
reference:
http://en.wikipedia.org/wiki/Inotify#Limitations
In order to improve inotify(7) performance(reduce memory usage, maybe), my suggestion is:
Whenever you start watching a directory, just focus on types of inotify_event that interest you(as less as possible), you can adjust the mask argument of inotify_add_watch(2) function to achieve this. Setting the mask argument value to IN_ALL_EVENTS to monitor all kinds to inotify_event is sometimes unnecessary.
Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can one monitor what part of big file changed - linux

IMO... forget using inotify unless "the pretty" is important. Other than that, you can setup a cronjob with a script doing a diff or using FIND with the MTIME option.

Related

Detecting if the code read the specified input file

Daemon for file watching / reporting in the whole UNIX OS

Persistant storage values handling in Linux

intercepting file system system calls

How to tune inotify to use less memory?

Categories

Resources