Linux add listener to the log file - linux

Is there a better way to monitor for file log changes that using inotify? (http://linux.die.net/man/7/inotify). I have several software that writes to different log files and I want to go POST query every time the new line is added to the log.
Currently, my proposal is to set inotify to listen for file changes, get data that was changed since last visit and do post.
Things that are important:
Reaction to event (at least 1 second).
Low CPU and HDD consumption.
Keeping log file (i.e. I want it to be on the machine full, unmodified).
New lines are added once in 1 min.
Thanks for ideas.

Inotify is fine for getting notified about file events such as writing etc but how will you know how much data has been appended? If you know the log files in advance you might as well read until end of file and just sleep for a short while and try again to read (something like "tail -f" does). That way you still have the pointer to where you start to read the newly written data. You could even combine that with Inotify to know when to pick up reading. If you want to just use Inotify, you probably will have to store the pointers to the last read position somewhere.

Related

Correct way to read and append file from different process in node.js

I have 2 node.js processes both point to same file on disk one of them just appends to the file, the other process just reads from file...
Is this correct design if i am ok to read half commited data? And or if there are any other things/issues to look out for?
The reason to do this is because i am trying to do Write Ahead Log which needs persistent and will not grow beyond 5MB or any better way to do it on single host?

How to read a [nonblocking] filedescriptor of a file that is appended to (aka, like tail -f)?

Actually, I am using libev; but under the hood this is using epoll (I'm only on linux). When I add a watcher to read a file and all data has been read then I do get a call back that there is data to read, but read(2) returns 0 (EOF). At that point I have to stop the watcher or else it will continue to tell me that there is something to read. However, if I stop the watcher and then some other process appends data to that file then I'll never see it.
What is the correct way to get notified that there is additional/appended data in a file that can be read when before I already read till the end?
I'd prefer the answer in terms of libev, but lower level will do too (I can then probably translate that to how to do that with libev).
It is very common, for some reason, for people to think that making an fd nonblocking, or calling poll/select/.. has different behaviour for files compared to other types of file descriptions, but nonblocking behaviour and I/O readyness behaviour is essentially the same for all of types of file descriptions: the kernel will immediately return from read/write etc. if the outcome is known, and will signal I/O readyness when this is the case. When a socket has an EOF condition, select will signal that the socket is ready to read, and you will get 0 (for EOF). The same happens for files - if you are at the end of a file, the kernel will return immediately from read and return 0 to signal EOF.
The important difference is that files can change contents at random places, and can be extended. Pipes and sockets are not random access and cannot be appended to once closed. Thus, while the behaviour is consistent, it is often not what is wanted, namely waiting for a file to change in some way.
The conflict in many people's minds is simply that they want to be told "when there is new data", but if you think about it a bit, you will realise that simply waking you up would not be an adequate interface for this, as you have no way of knowing why you woke up, and what changed.
POSIX doesn't have an interface to do that, other than regularly polling the fd or file (and in case of random changes, regularly reading the whole file!). Some operating systems have an interface to do something similar to that (kqueue on BSDs, inotify on GNU/Linux) , but they are usually not a perfect match, either (for example, inotify cannot watch an fd for changes, it will watch a path for changes).
The closest you can get with libev is to use an ev_stat watcher. It behaves as if you would stat() a path regularly, and invoke the watcher callback whenever the stat data changes. Portably, it does just that: it regularly calls stat, but on some operating systems (currently only inotify on GNU/Linux, as kqueue doesn't have correct semantics for this) it can use other mechanisms to speed this up in some cases, although it will fall back to regular stat polling everywhere, for example for when the file is on a network file system, where inotify can't see remote changes.
To answer your question: If you have a path, you can use an ev_stat watcher to watch for stat data changes, such as size/mtime etc. changes. Doing this right can be a bit tricky (see the libev documentation, especially the part about stat time resolution: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_stat_code_did_the_file_attri), and you have to keep in mind that this watches a path, not a file descriptor, so you might want to compare the device/inode of your file descriptor and the watched path regularly to see if you still have the correct file open.
This still doesn't tell you what part of the file has changed.
Alternatively, since you apparently only want to read appended data, you could opt to just read() the file regularly (in an ev_timer callback) and do away with all the complexity and hassles of an ev_stat watcher setup (while not forgetting to also compare the path stat data with your fd stat data to see if you still hasve the right file open, depending on whether the file your are reading might get renamed or replaced. Sometimes programs also truncate files, something you can also detect by seeing the size decrease between stat calls).
This is essentially what older tail -f implementations do, while newer ones might, for example, take hints (only) from inotify, just like ev_stat watchers do.
None of that is easy, and details depend on your knowledge of how exactly the file changes, but it's the best you can do.

Determine the offset and the size of another process write

I'm working on a backup service. It tracks changes of the files in a the directory to backup. It does that by setting a watch (using inotify with Linux) and comparing the modification time and size after a file has been changed. When it is, the whole file is copied to backup. I'm thinking, could this be done much more efficient? If the backup service can determine the offset and the number of bytes written, it can just copy that, in stead of copy the whole file. I've been looking to fanotify, which offers some interesting features, like an fd to the file modified (by the other process). Now here it stops I think. There is no way as far as I can see it how the process using fanotify can determine from the fd how the file is changed.
Do I overlook something, or is it not possible to get this information?

How to open and read 1000s of files very quickly

My problem is that application takes too long to load thousands of files. Yes, I know it's going to take a long time, but I would like to make it faster by any amount of time. What I mean by "load" is open the file to get its descriptor and then read the first 100 bytes or so of it.
So, my main strategy has been to create a second thread that will open and close (without reading any contents) all the files. This seems to help because the thread runs ahead of the main thread and I'm guessing the OS is caching these file descriptors ahead of time so that when my main thread opens them it's a quick open. This has actually helped because the thread can start caching these file descriptors while my main thread is parsing the data read in from these files.
So my real question is...what else can I do to make this faster? What approaches are there? Has anyone had success doing this?
I've heard of OS prefetching calls but it was for virtual memory pages. Is there a way to tell the OS, hey I'm going to be needed all these files pretty soon - I suggest that you start gathering them for me ahead of time. My lookahead thread is pretty crude.
Are there low level disk techniques I could use? Is there possibly a pattern of file access that would help? Right now, the files that are loaded all come from the same folder. I suppose there is no way to determine where exactly on disk they lie and which ordering of file opens would be fastest for the disk. I'm also guessing that the disk has some hard ware to make this as efficient as possible too.
My application is mainly for windows, but unix suggestions would help as well.
I am programming in C++ if that makes a difference.
Thanks,
-julian
My first thought is that this is going to be hard to work around from a programmatic level.
You'll find Linux and OSX can access thousands of files like this in a fraction of the time it takes Windows. I don't know how much control you have over the machine. If you can keep the thousands of files on a FAT partition, you should see better results than with NTFS.
How often are you scanning these files and how often are they changing. If the ratio is heavily on the reading side, it would make sense to copy the start of each file into a cache. The cache could store the filename, modification time, and 100 bytes of each of the thousand files.

Reading file in Kernel Mode

I am building a driver and i want to read some files.
Is there any way to use "ZwReadFile()" or a similar function to read the
contents of the files line by line so that i can process them in a loop.
The documentation in MSDN states that :-
ZwReadFile begins reading from the given ByteOffset or the current file position into the given Buffer. It terminates the read operation under one of the following conditions:
The buffer is full because the number of bytes specified by the Length parameter has been read. Therefore, no more data can be placed into the buffer without an overflow.
The end of file is reached during the read operation, so there is no more data in the file to be transferred into the buffer.
Thanks.
No, there is not. You'll have to create a wrapper to achieve what you want.
However, given that kernel mode code has the potential to crash the system rather than the process it runs in, you have to make sure that problems such as those known from usermode with very long lines etc will not cause issues.
If the amount of data is (and will stay) below the threshold of what registry values can hold, you should use that instead. In particular REG_MULTI_SZ which has the properties you are looking for ("line-wise" storage of data).
In this situation unless performance is a critical (like 'realtime') then I would pass the filtering to a user mode service or application. Send the file name to the application to process. A user mode application is easier to test and easier to debug. It wont blue screen or hang your box either.

Resources