file system operation really "flushed" - linux

We are working on an iMX6Sx Freescale board, building the Linux kernel distro with Yocto.
I would like to know if there is a way to check if it is possible to check if file system operations (in particular, write) are really terminated, avoiding to close/kill a process while operations are still running.
To be more clear: we have to do some actions (copy of files, writes, ..) when our application has to switch-off and we have to know (since they are asynchronus I think) when they're are really completed.
Thanks in advance
Andrea

If you want to ensure all the writes are commited to storage and the filesystem is updated:
call fsync() on the file descriptor,
open the parent directory and call fsync() on that file descriptor
When both of these are done, the kernel has flushed everything from memory and ensured the filesystem is updated regarding the file you operate on.
Another approach is to call sync(), which ensures all kernel data are written to storage for all files and filesystem metadata.
Note:
if your application are working with FILE* instead of file descriptors, you need to first ensure written data are flushed from your application to the kernel, either by calling fflush() or fclose the FILE*
If you kill an application, any write operation it has performed will not be cancelled or interrupted, and you can make sure it's committed to storage by calling sync() or open the same file and call fsync() on it.
If you kill an application arbitrarily you can't expect everything to be consistent, perhaps the application was doing 2 important writes to a database, config file, etc. and you terminated it after the 1 write, the file might be damaged according to its format.

Related

node fs.fsync (when to use?)

I want to safely write a file and I wan't to understand the proper use/place for fsync.
https://linux.die.net/man/2/fsync
After reading ^ that, I am puzzled as to where to effectively use it.
Question, do I:
fs.write('temp/file.txt','utf-8',function(error){
if(error){fs.unlink('temp/file.txt',function(){cb(error,undefined);});}
else{
fs.rename('temp/file.txt','real/file.txt',function(){
fs.fsync('real/file.txt',function(){
cb(undefined,true);
});
});
}
});
I'm writing something that performs many file changes. I have looked at modules that write atomic, but I would like to understand the process.
fsync is one of those functions where it's extremely rare that you'll need to use it.
All operating systems mask the fact that storage devices are slow by caching reads and writes. When you write to a file, it doesn't immediately write to the actual storage medium; it'll capture it in a cache, tell your program that the write has completed, and go and write the contents to the storage device in the background instead. The operating system will keep everything consistent though; if another application reads from that file, it'll see the new contents, as the OS will serve the contents from cache.
Note for a moment that this isn't universal; I believe Windows disables caching for removable storage devices to prevent data loss when people pull the drive out. There's also some set of flags you can pass to open() to disable the cache.
For almost all use cases, you don't need to care that this happens. The only upshot for you is that your program can continue faster. There are some cases where this is problematic though:
If power is lost, the contents of the cache are lost, so the disk won't have all the new contents of the file.
If the drive is removed, writes will equally be lost. This is pretty typical for removable storage devices, and I'm pretty sure 90% of people ignore the "safely remove" prompt ;).
I think doing direct reads directly from a device (i.e. /dev/sdX in Linux) will bypass this cache, but I'm not 100% sure.
Examples of where it is needed are, say, databases. When you run an update query, the database will normally update its in-memory state, and write the mutation to a transaction log. Reliability is a good thing for a database though, so it will write to the transaction log and do an fsync on that file before responding to the user (or will have opened the transaction log as unbuffered) so there's some level of guarantee that the transaction has been persisted.
In your example, the fsync will ensure that the rename has actually taken place and has been flushed to disk.

How to read a [nonblocking] filedescriptor of a file that is appended to (aka, like tail -f)?

Actually, I am using libev; but under the hood this is using epoll (I'm only on linux). When I add a watcher to read a file and all data has been read then I do get a call back that there is data to read, but read(2) returns 0 (EOF). At that point I have to stop the watcher or else it will continue to tell me that there is something to read. However, if I stop the watcher and then some other process appends data to that file then I'll never see it.
What is the correct way to get notified that there is additional/appended data in a file that can be read when before I already read till the end?
I'd prefer the answer in terms of libev, but lower level will do too (I can then probably translate that to how to do that with libev).
It is very common, for some reason, for people to think that making an fd nonblocking, or calling poll/select/.. has different behaviour for files compared to other types of file descriptions, but nonblocking behaviour and I/O readyness behaviour is essentially the same for all of types of file descriptions: the kernel will immediately return from read/write etc. if the outcome is known, and will signal I/O readyness when this is the case. When a socket has an EOF condition, select will signal that the socket is ready to read, and you will get 0 (for EOF). The same happens for files - if you are at the end of a file, the kernel will return immediately from read and return 0 to signal EOF.
The important difference is that files can change contents at random places, and can be extended. Pipes and sockets are not random access and cannot be appended to once closed. Thus, while the behaviour is consistent, it is often not what is wanted, namely waiting for a file to change in some way.
The conflict in many people's minds is simply that they want to be told "when there is new data", but if you think about it a bit, you will realise that simply waking you up would not be an adequate interface for this, as you have no way of knowing why you woke up, and what changed.
POSIX doesn't have an interface to do that, other than regularly polling the fd or file (and in case of random changes, regularly reading the whole file!). Some operating systems have an interface to do something similar to that (kqueue on BSDs, inotify on GNU/Linux) , but they are usually not a perfect match, either (for example, inotify cannot watch an fd for changes, it will watch a path for changes).
The closest you can get with libev is to use an ev_stat watcher. It behaves as if you would stat() a path regularly, and invoke the watcher callback whenever the stat data changes. Portably, it does just that: it regularly calls stat, but on some operating systems (currently only inotify on GNU/Linux, as kqueue doesn't have correct semantics for this) it can use other mechanisms to speed this up in some cases, although it will fall back to regular stat polling everywhere, for example for when the file is on a network file system, where inotify can't see remote changes.
To answer your question: If you have a path, you can use an ev_stat watcher to watch for stat data changes, such as size/mtime etc. changes. Doing this right can be a bit tricky (see the libev documentation, especially the part about stat time resolution: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_stat_code_did_the_file_attri), and you have to keep in mind that this watches a path, not a file descriptor, so you might want to compare the device/inode of your file descriptor and the watched path regularly to see if you still have the correct file open.
This still doesn't tell you what part of the file has changed.
Alternatively, since you apparently only want to read appended data, you could opt to just read() the file regularly (in an ev_timer callback) and do away with all the complexity and hassles of an ev_stat watcher setup (while not forgetting to also compare the path stat data with your fd stat data to see if you still hasve the right file open, depending on whether the file your are reading might get renamed or replaced. Sometimes programs also truncate files, something you can also detect by seeing the size decrease between stat calls).
This is essentially what older tail -f implementations do, while newer ones might, for example, take hints (only) from inotify, just like ev_stat watchers do.
None of that is easy, and details depend on your knowledge of how exactly the file changes, but it's the best you can do.

Syncing a file system that has no file on it

Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.

Identifying that a file is being copied outside the computer in LKM

Assuming that i have Loadable-Kernel-Module inserted in linux-kernel and have hooked read, write, open and close functions. So now i can stop access to any file but i want to stop files from being copied outside the device like to a usb device, card, disk etc. The thing i want to know is that sitting in LKM and with function calls hooked how can i identify that a file is being written to external device?.
Also i want to know that which system calls are used during a copy operation ? I have idea that a program opens the file reads from it ( read system call) and then writes to second file( write system call) but i observed strange behavior when i was trying to stop write access to a file that a process which opens a file never calls write operation on that file for saving file (checked for pdf viewer).
If anybody have idea about this strange behavior or you have idea that how to stop writing to a file then please share it also.
They could mmap it to do read/write. Or they could read the entire original file into memory, close it, then open the destination.
Or they could encrypt the file, then write it out to a new file on the USB.
Or they could do minor edits to the contents, then save it out.
Or they could use gvfs to access the network/USB device.
Or the user could reboot and copy the file in a different OS.
All that really highlights is that the problem is really difficult - a determined user will always find a way to extract data from a system they have access to.
You're best bet is just to prevent accidental leakage - so scan files after close on the removable media, and check they don't have contents you don't want leaked. Overwrite and delete if they do.
Or else block the devices from being mounted in the first place, and disable gvfs as well.
As to why your hook isn't intercepting the write(), either:
Your hook isn't actually intercepting the operation.
The application isn't using write() to put the content in a file.

Is the file mutex in Linux? How to implement it?

In windows, if I open a file with MS Word, then try to delete it.
The system will stop me. It prevents the file being deleted.
There is a similar mechanism in Linux?
How can I implement it when writing my own program?
There is not a similar mechanism in Linux. I, in fact, find that feature of windows to be an incredible misfeature and a big problem.
It is not typical for a program to hold a file open that it is working on anyway unless the program is a database and updating the file as it works. Programs usually just open the file, write contents and close it when you save your document.
vim's .swp file is updated as vim works, and vim holds it open the whole time, so even if you delete it, the file doesn't really go away. vim will just lose its recovery ability if you delete the .swp file while it's running.
In Linux, if you delete a file while a process has it open, the system keeps it in existence until all references to it are gone. The name in the filesystem that refers to the file will be gone. But the file itself is still there on disk.
If the system crashes while the file is still open it will be cleaned up and removed from the disk when the system comes back up.
The reason this is such a problem in Windows is that mandatory locking frequently prevents operations that should succeed from succeeding. For example, a backup process should be able to read a file that is being written to. It shouldn't have to stop the process that is doing the writing before the backup proceeds. In many other cases, operations that should be able to move forward are blocked for silly reasons.
The semantics of most Unix filesystems (such as Linux's ext2 fs family) is that a file can be unlink(2)'d at any time, even if it is open. However, after such a call, if the file has been opened by some other process, they can continue to read and write to the file through the open file descriptor. The filesystem does not actually free the storage until all open file descriptors have been closed. These are very long-standing semantics.
You may wish to read more about file locking in Unix and Linux (e.g., the Wikipedia article on File Locking.) Basically, mandatory and advisory locks on Linux exist but they're not guaranteed to prevent what you want to prevent.

Resources