Does a 'copy' on Memory Mapped file trigger flush to disk? - linux

I have a memory mapped file opened in my Java program and I keep on writing new data into it.
Generally, the OS takes care of flushing the contents to the disk.
If for some reason, some other process wants to copy the file to another location (possibly do an rsync), then do I get the latest contents of that file (i.e. the contents that are there at that instant of time in the memory map?)
Typically, I feel that if a copy on a file that is memory mapped by a process is triggered, either of these 2 should happen
The contents of the file should be flushed to disk, so that other process sees that latest content of the file.
When a copy is triggered, it is generally copied to memory (when we are doing something like rsync) and then the contents will be written to the destination file from the memory. So, if they are to be copied from the memory, then the pages of the file are already in memory, since the other process is using it (memory mapped), so this page will be accessed and there is no need to flush to disk.
What happens exactly? Is it something other than the above mentioned?
Is it same behavior for both Windows and Linux?

Related

Executable Object Files and Virtual Memory

I'm a beginner in Linux and Virtual Memory, still struggling in understanding the relationship between Virtual Memory and Executable Object Files.
let's say we have a executable object file a.out stored on hard drive disk, and lets say originally the a.out has a .data section with a global variable with a value of 2018.
When the loader run, it allocates a contiguous chunk of virtual pages marks them as invalid (i.e., not cached) and points their page table entries to the appropriate locations in the a.out. The loader never actually copies any data from disk into memory. The data is paged in automatically and on demand by the virtual memory system the first time each page is referenced.
My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed? otherwise we get a different value each time we finish and run the program again?
In general (not specifically for Linux)...
When an executable file is started, the OS (kernel) creates a virtual address space and an (initially empty) process, and examines the executable file's header. The executable file's header describes "sections" (e.g. .text, .rodata, .data, .bss, etc) where each section has different attributes - if the contents of the section should be put in the virtual address space or not (e.g. is a symbol table or something that isn't used at run-time), if the contents are part of the file or not (e.g. .bss), and if the area should be executable, read-only or read/write.
Typically, (used parts of) the executable file are cached by the virtual file system; and pieces of the file that are already in the VFS cache can be mapped (as "copy on write") into the new process' virtual address space. For parts that aren't already in the VFS cache, those pieces of the file can be mapped as "need fetching" into the new process' virtual address space.
Then the process is started (given CPU time).
If the process reads data from a page that hasn't been loaded yet; the OS (kernel) pauses the process, fetches the page from the file on disk into the VFS cache, then also maps the page as "copy on write" into the process; then allows the process to continue (allows the process to retry the read from the page that wasn't loaded, which will work now that the page is loaded).
If the process writes to a page that is still "copy on write"; the OS (kernel) pauses the process, allocates a new page and copies the original page's data into it, then replaces the original page with the process' own copy; then allows the process to continue (allows the process to retry the write which will work now that the process has it's own copy).
If the process writes to data from a page that hasn't been loaded yet; the OS (kernel) combines both of the previous things (fetches original page from disk into VFS cache, creates a copy, maps the process' copy into the process' virtual address space).
If the OS starts to run out of free RAM; then:
pages of file data that are in the VFS cache but aren't shared as "copy on write" with any process can be freed in the VFS without doing anything else. Next time the file is used those pages will be fetched from the file on disk into the VFS cache.
pages of file data that are in the VFS cache and are also shared as "copy on write" with any process can be freed in the VFS and the copies in any/all processes marked as "not fetched yet". Next time the file is used (including when a process accesses the "not fetched yet" page/s) those pages will be fetched from the file on disk into the VFS cache and then mapped as "copy on write" in the process/es).
pages of data that have been modified (either because they were originally "copy on write" but got copied, or because they weren't part of the executable file at all - e.g. .bss section, the executable's heap space, etc) can be saved to swap space and then freed. When the process accesses the page/s again they will be fetched from swap space.
Note: If the executable file is stored on unreliable media (e.g. potentially scratched CD) a "smarter than average" OS may load the entire executable file into VFS cache and/or swap space initially; because there's no sane way to handle "read error from memory mapped file" while the process is using the file other than making the process crash (e.g. SIGSEGV) and making it look like the executable was buggy when it was not, and because this improves reliability (because you're depending on more reliable swap and not depending on a less reliable scratched CD). Also; if the OS guards against file corruption or malware (e.g. has a CRC or digital signature built into executable files) then the OS may (should) load everything into memory (VFS cache) to check the CRC or digital signature before allowing the executable to be executed, and (for secure systems, in case the file on disk is modified while the executable is running) when freeing RAM may stored unmodified pages in "more trusted" swap space (the same as it would if the page was modified) to avoid fetching the data from the original "less trusted" file (partly because you don't want to do the whole digital signature check every time a page is loaded from the file).
My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed?
The page containing 2018 will begin as "not fetched", then (when its accessed) loaded into VFS cache and mapped into the process as "copy on write". At either of these points the OS may free the memory and fetch the data (that hasn't been changed) from the executable file on disk if it's needed again.
When the process modifies the global variable (changes it to contain 2019) the OS creates a copy of it for the process. After this point, if the OS wants to free the memory the OS needs to save the page's data in swap space, and load the page's data back from swap space if it's accessed again. The executable file is not modified and (for that page, for that process) the executable file isn't used again.

How safe is it reading / copying a file which is being appended to?

If a log file has events constantly being appended to it, how safe is it to read that file (or copy it) with another process?
Unix allows concurrent reading and writing. It is totally safe to read a file while others are appending to it.
Of course it can happen that an appending act is unfinished while a reading act is reaching the end of the file, then this reader will get an incomplete version (e. g. only a part of a new log entry at the end of the file). But technically, this is correct because the file really was in this state while it was being read (e. g. copied).
EDIT
There's more to it.
If a writer process has an open file handle, the file will stay on disk as long as this process keeps the open file handle.
If you remove the file (rm(1), unlink(2)), it will be removed from its directory only. It will stay on disk, and that writer (and everybody else who happens to have an open file handle) will still be able to read the contents of the already removed file. Only after the last process closes its file handle, the file contents will be freed on the disk.
This is sometimes an issue if a process writes a large log file which is filling up the disk. If it keeps and open file handle to the log file, the system administrator cannot free this disk capacity using rm.
A typical approach then is to kill the process as well. Hence it is a good idea, as a process, to close the file handle for the log file again after writing to the log (or close and reopen it at least from time to time).
There's more:
If a process has a an open file handle on a log file, this file handle contains a position. If now the log file is emptied (truncate(1), truncate(2), open(2) for writing not using append flags, : > filepath), the file's contents is indeed removed from the disk. If the process having an open file handle is now writing to this file, it will write at the old position, e. g. at a position of several megabytes. Doing this to an empty file will fill the gap with zeros.
This is no real problem, if a sparse file can be created (typically possible on Unix file systems). Only otherwise will it fill the disk again quickly. But in any case it can be very confusing.

file system operation really "flushed"

We are working on an iMX6Sx Freescale board, building the Linux kernel distro with Yocto.
I would like to know if there is a way to check if it is possible to check if file system operations (in particular, write) are really terminated, avoiding to close/kill a process while operations are still running.
To be more clear: we have to do some actions (copy of files, writes, ..) when our application has to switch-off and we have to know (since they are asynchronus I think) when they're are really completed.
Thanks in advance
Andrea
If you want to ensure all the writes are commited to storage and the filesystem is updated:
call fsync() on the file descriptor,
open the parent directory and call fsync() on that file descriptor
When both of these are done, the kernel has flushed everything from memory and ensured the filesystem is updated regarding the file you operate on.
Another approach is to call sync(), which ensures all kernel data are written to storage for all files and filesystem metadata.
Note:
if your application are working with FILE* instead of file descriptors, you need to first ensure written data are flushed from your application to the kernel, either by calling fflush() or fclose the FILE*
If you kill an application, any write operation it has performed will not be cancelled or interrupted, and you can make sure it's committed to storage by calling sync() or open the same file and call fsync() on it.
If you kill an application arbitrarily you can't expect everything to be consistent, perhaps the application was doing 2 important writes to a database, config file, etc. and you terminated it after the 1 write, the file might be damaged according to its format.

Forcing a program to flush file contents to disk

I have to debug a program that writes a log file. It takes time for the actual log file to be generated because it takes a while to flush the contents to disk. On top of that, the log file is on a mounted Unix drive on my windows machine. I was wondering if there is a command to make the operating system flush the written contents to disk. Does it also take a while for the file to be updated on the mounted drive in windows?
PS. I actually; don't want to go in and edit the program.
Ted
Look at the following APIs:
http://www.cplusplus.com/reference/clibrary/cstdio/setvbuf/
fsync
Also see the ever-great eat my data: how everybody gets file IO wrong

MEM_SHARED, mmap, and hard links

Just wondering if the key to shared memory is the file name or the inode.
I have a file called .last, which is just a hard link to a file named YYYYMMDDHHMMSS.
A directory looks like this:
20110101143000
.last
.last is just a hard link to 20110101143000.
Some time later, a new file is created
20110101143000
20110622083000
.last
We then delete .last, and recreate it to refer to the new file.
Our software, which is continuously running during these updates, mmaps the .last file with MAP_SHARED. When done with a file, the software might cache it for several minutes rather than unmap it. On a physical server, there are 12-24 instances of the software running at the same time. Different instances often mmap the same file at about the same time. My question is:
Does linux use the file name to key to the shared memory, or does it use the inode?
Given this scenario:
proc A mmaps .last, and does not unmap
a new file is written, .last is deleted, a new .last is created to link the new
file
proc B mmaps the new .last, and does not unmap
If linux used the inode, then proc A and B would be seeing different blocks of memory mapped to different files, which is what we want. If linux uses the filename, then both A and B see the same block of memory mapped to the new file. B is fine, but A crashes when the memory in the shard block changes.
Anyone know how it actually works? I'm going to test, but if it turns out to be name based, i am screwed unless someone knows a trick.
Thanks!
It's the inode, at least effectively. That is to say that once you have mapped some pages from a file they will continue to refer to that file and won't change just because the mapping of names to files changes in the filesystem.

Resources