Executable Object Files and Virtual Memory - linux

I'm a beginner in Linux and Virtual Memory, still struggling in understanding the relationship between Virtual Memory and Executable Object Files.
let's say we have a executable object file a.out stored on hard drive disk, and lets say originally the a.out has a .data section with a global variable with a value of 2018.
When the loader run, it allocates a contiguous chunk of virtual pages marks them as invalid (i.e., not cached) and points their page table entries to the appropriate locations in the a.out. The loader never actually copies any data from disk into memory. The data is paged in automatically and on demand by the virtual memory system the first time each page is referenced.
My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed? otherwise we get a different value each time we finish and run the program again?

In general (not specifically for Linux)...
When an executable file is started, the OS (kernel) creates a virtual address space and an (initially empty) process, and examines the executable file's header. The executable file's header describes "sections" (e.g. .text, .rodata, .data, .bss, etc) where each section has different attributes - if the contents of the section should be put in the virtual address space or not (e.g. is a symbol table or something that isn't used at run-time), if the contents are part of the file or not (e.g. .bss), and if the area should be executable, read-only or read/write.
Typically, (used parts of) the executable file are cached by the virtual file system; and pieces of the file that are already in the VFS cache can be mapped (as "copy on write") into the new process' virtual address space. For parts that aren't already in the VFS cache, those pieces of the file can be mapped as "need fetching" into the new process' virtual address space.
Then the process is started (given CPU time).
If the process reads data from a page that hasn't been loaded yet; the OS (kernel) pauses the process, fetches the page from the file on disk into the VFS cache, then also maps the page as "copy on write" into the process; then allows the process to continue (allows the process to retry the read from the page that wasn't loaded, which will work now that the page is loaded).
If the process writes to a page that is still "copy on write"; the OS (kernel) pauses the process, allocates a new page and copies the original page's data into it, then replaces the original page with the process' own copy; then allows the process to continue (allows the process to retry the write which will work now that the process has it's own copy).
If the process writes to data from a page that hasn't been loaded yet; the OS (kernel) combines both of the previous things (fetches original page from disk into VFS cache, creates a copy, maps the process' copy into the process' virtual address space).
If the OS starts to run out of free RAM; then:
pages of file data that are in the VFS cache but aren't shared as "copy on write" with any process can be freed in the VFS without doing anything else. Next time the file is used those pages will be fetched from the file on disk into the VFS cache.
pages of file data that are in the VFS cache and are also shared as "copy on write" with any process can be freed in the VFS and the copies in any/all processes marked as "not fetched yet". Next time the file is used (including when a process accesses the "not fetched yet" page/s) those pages will be fetched from the file on disk into the VFS cache and then mapped as "copy on write" in the process/es).
pages of data that have been modified (either because they were originally "copy on write" but got copied, or because they weren't part of the executable file at all - e.g. .bss section, the executable's heap space, etc) can be saved to swap space and then freed. When the process accesses the page/s again they will be fetched from swap space.
Note: If the executable file is stored on unreliable media (e.g. potentially scratched CD) a "smarter than average" OS may load the entire executable file into VFS cache and/or swap space initially; because there's no sane way to handle "read error from memory mapped file" while the process is using the file other than making the process crash (e.g. SIGSEGV) and making it look like the executable was buggy when it was not, and because this improves reliability (because you're depending on more reliable swap and not depending on a less reliable scratched CD). Also; if the OS guards against file corruption or malware (e.g. has a CRC or digital signature built into executable files) then the OS may (should) load everything into memory (VFS cache) to check the CRC or digital signature before allowing the executable to be executed, and (for secure systems, in case the file on disk is modified while the executable is running) when freeing RAM may stored unmodified pages in "more trusted" swap space (the same as it would if the page was modified) to avoid fetching the data from the original "less trusted" file (partly because you don't want to do the whole digital signature check every time a page is loaded from the file).
My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed?
The page containing 2018 will begin as "not fetched", then (when its accessed) loaded into VFS cache and mapped into the process as "copy on write". At either of these points the OS may free the memory and fetch the data (that hasn't been changed) from the executable file on disk if it's needed again.
When the process modifies the global variable (changes it to contain 2019) the OS creates a copy of it for the process. After this point, if the OS wants to free the memory the OS needs to save the page's data in swap space, and load the page's data back from swap space if it's accessed again. The executable file is not modified and (for that page, for that process) the executable file isn't used again.

Related

Does a 'copy' on Memory Mapped file trigger flush to disk?

I have a memory mapped file opened in my Java program and I keep on writing new data into it.
Generally, the OS takes care of flushing the contents to the disk.
If for some reason, some other process wants to copy the file to another location (possibly do an rsync), then do I get the latest contents of that file (i.e. the contents that are there at that instant of time in the memory map?)
Typically, I feel that if a copy on a file that is memory mapped by a process is triggered, either of these 2 should happen
The contents of the file should be flushed to disk, so that other process sees that latest content of the file.
When a copy is triggered, it is generally copied to memory (when we are doing something like rsync) and then the contents will be written to the destination file from the memory. So, if they are to be copied from the memory, then the pages of the file are already in memory, since the other process is using it (memory mapped), so this page will be accessed and there is no need to flush to disk.
What happens exactly? Is it something other than the above mentioned?
Is it same behavior for both Windows and Linux?

Is data actually transferred between disk and memory when CPU first touches a anonymous file ( CSAPP)

In CSAPP 2nd, Chapter 9, section 8 (in page 807)
Anonymous file: An area can also be mapped to an anonymous file,
created by the kernel, that contains all binary zeros. The first time
the CPU touches a virtual page in such an area, the kernel finds an
appropriate victim page in physical memory, swaps out the victim page
if it is dirty, overwrites the victim page with binary zeros, and
updates the page table to mark the page as resident. Notice that no
data is actually transferred between disk and memory. For this reason,
pages in areas that are mapped to anonymous files are sometimes called
demand-zero pages.
When the victim page is dirty.I think it should be wrote back to disk.Why " Notice that no data is actually transferred between disk and memory."?
Unfortunately, this is bad terminology on the part of Unix. Part of the problem is the historical lack of a hard file system (corrected in some Unix variants). In an idealized model of paging, user-created files can serve as page files. The static data (including code) can be paged directly from the executable file. The read/write data is paged from the page file. In that sense, the mapping is anonymous as there really is not a file but rather portion of a page file.
In most Unix variants, there is no page FILE but rather a swap partition. This is due poor design of the original Unix file system that has lived on for decades. The traditional Unix file system does not have the concept of a contiguous file. This makes it impossible to do logical I/O to a page file. Therefore, traditional Unix uses a swap partition instead.
Even if you map to a named file, on many Unix variations that mapping is only for the first READ. In the case of an anonymous mapping, the first read creates a demand zero page. To write it back to disk is goes to the swap partition on both cases. From the Unix perspective, calling this an "anonymous" mapping kind of makes sense but from the conceptual point of view (where one expects a memory to file mapping to be two-way) it makes no sense at all.

MEM_SHARED, mmap, and hard links

Just wondering if the key to shared memory is the file name or the inode.
I have a file called .last, which is just a hard link to a file named YYYYMMDDHHMMSS.
A directory looks like this:
20110101143000
.last
.last is just a hard link to 20110101143000.
Some time later, a new file is created
20110101143000
20110622083000
.last
We then delete .last, and recreate it to refer to the new file.
Our software, which is continuously running during these updates, mmaps the .last file with MAP_SHARED. When done with a file, the software might cache it for several minutes rather than unmap it. On a physical server, there are 12-24 instances of the software running at the same time. Different instances often mmap the same file at about the same time. My question is:
Does linux use the file name to key to the shared memory, or does it use the inode?
Given this scenario:
proc A mmaps .last, and does not unmap
a new file is written, .last is deleted, a new .last is created to link the new
file
proc B mmaps the new .last, and does not unmap
If linux used the inode, then proc A and B would be seeing different blocks of memory mapped to different files, which is what we want. If linux uses the filename, then both A and B see the same block of memory mapped to the new file. B is fine, but A crashes when the memory in the shard block changes.
Anyone know how it actually works? I'm going to test, but if it turns out to be name based, i am screwed unless someone knows a trick.
Thanks!
It's the inode, at least effectively. That is to say that once you have mapped some pages from a file they will continue to refer to that file and won't change just because the mapping of names to files changes in the filesystem.

How to manipulate page cache in Linux?

I want to know what files are cached in Page Cache, and want to free the cache space of a specific file pragmatically. It is possible for me to write kernel module or even modify the kernel code if needed. Can anyone give me some clues?
Firstly, the kernel does not maintain a master list of all files in the page cache, because it has no need for such information. Instead, given an inode you can look up the associated page cache pages, and vice-versa.
For each page cache struct page, page_mapping() will return the struct address_space that it belongs to. The host member of struct address_space identifies the owning struct inode, and from there you can get the inode number and device.
mincore() returns a vector that indicates whether pages of the calling process's virtual memory are resident in core (RAM), and so will not cause a disk access (page fault) if referenced. The kernel returns residency information about the pages starting at the address addr, and continuing for length bytes.
To test whether a file currently mapped into your process is in cache, call mincore with its mapped address.
To test whether an arbitrary file is in cache, open and map it, then follow the above.
There is a proposed fincore() system call which would not require mapping the file first, but (at this point in time) it's not yet generally available.
(And then madvise(MADV_DONTNEED)/fadvise(FADV_DONTNEED) can drop parts of a mapping/file from cache.)
You can free the contents of a file from the page cache under Linux by using
posix_fadvise(fd, POSIX_FADV_DONTNEED
As of Linux 2.6 this will immediately get rid of the parts of the page cache which are caching the given file or part of file; the call blocks until the operation is complete, but that behaviour is not guaranteed by posix.
Note that it won't have any effect if the pages have been modified, in that case you want to do a fdatasync or such like first.
EDIT: Sorry, I didn't fully read your question. I don't know how to tell which files are currently in the page cache. Sorry.

Code segment sharing between two processes

Suppose we run two processes back to back say :-
$ grep abc abc.txt ==> pid 100
$ grep def def.txt ==> pid 101
I read in the book "Beginning Linux programming" chapter# 11 that the code section of the processes would be shared, as it is read only. Is it so? I think if grep is compiled as shared library only then the code section would be shared.
One more question, in case of shared libraries how does the OS knows that the library has already been loaded or not? Suppose if 2 processes are simultaneously calling a shared library function then how does the virtual address of two processes be converted to physical address pointing the same location in RAM?
The OS doesn't load files into memory anymore. Instead, files are memory mapped. This means an inode and an offset of a file on disk will be connected to a page in memory. This makes it pretty simple to find out if some part of a file has already been loaded. Also, you can keep only part of a file in RAM (after setup, you don't need the setup code anymore, so you can "forget" about it and reuse those pages for something more useful).
The libraries and executables are not loaded, but mapped into memory with mmap(2). Basically, when you mmap() something with MAP_SHARED flag, others who map the same file will get the same memory pages.

Resources