Page Cache for shared memory - linux

In the following link within the 4th image from the top:
http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files
The scenario depicted, is that of two processes, "render" and "3drender", sharing a file. The author is going about describing how the sharing mechanism, plays along with the page cache.
Originally render had it's virtual pages mapped onto the page cache.
In step 4 "render" is allocated a new anonymous page of it's own, which would contain certain changes, that it would want to make to "scene.dat #2".
Once "render" makes it's changes, how is this change reflected to "3drender" which has continued to point to the page cache page frame containing "scene.dat #2" ?
Also, shouldn't this change made by "render", make it's way back to the page cache, there by replacing the old page cache copy of "scene.dat #2" ?
The part that remains unclear to me, is what happens "after" one of the processes writes to a shared page and how this "update" makes it's way to the page cache and disk, such that other processes which share the same file, see this change.
It be great if someone could throw some light.
Thanks,
VIjay

In the scenario described in the linked article, render and render3d have private memory-mapped copies of a single file. As far as the processes can tell, the OS allocated a bunch of pages in each process's address space and just copied the file contents in there. If they modify those pages, nothing happens. No changes go back to the file. No changes go between render and render3d. That's what it means to have a private mapping.
Of course, giving each process a full copy of the file is really slow, so the OS uses a virtual memory trick. Until a process writes to the file, it can use a shared copy (shared with other processes and the page cache, also called a buffer cache). The private copy only happens when the process first tries to change the page.

Related

Is data actually transferred between disk and memory when CPU first touches a anonymous file ( CSAPP)

In CSAPP 2nd, Chapter 9, section 8 (in page 807)
Anonymous file: An area can also be mapped to an anonymous file,
created by the kernel, that contains all binary zeros. The first time
the CPU touches a virtual page in such an area, the kernel finds an
appropriate victim page in physical memory, swaps out the victim page
if it is dirty, overwrites the victim page with binary zeros, and
updates the page table to mark the page as resident. Notice that no
data is actually transferred between disk and memory. For this reason,
pages in areas that are mapped to anonymous files are sometimes called
demand-zero pages.
When the victim page is dirty.I think it should be wrote back to disk.Why " Notice that no data is actually transferred between disk and memory."?
Unfortunately, this is bad terminology on the part of Unix. Part of the problem is the historical lack of a hard file system (corrected in some Unix variants). In an idealized model of paging, user-created files can serve as page files. The static data (including code) can be paged directly from the executable file. The read/write data is paged from the page file. In that sense, the mapping is anonymous as there really is not a file but rather portion of a page file.
In most Unix variants, there is no page FILE but rather a swap partition. This is due poor design of the original Unix file system that has lived on for decades. The traditional Unix file system does not have the concept of a contiguous file. This makes it impossible to do logical I/O to a page file. Therefore, traditional Unix uses a swap partition instead.
Even if you map to a named file, on many Unix variations that mapping is only for the first READ. In the case of an anonymous mapping, the first read creates a demand zero page. To write it back to disk is goes to the swap partition on both cases. From the Unix perspective, calling this an "anonymous" mapping kind of makes sense but from the conceptual point of view (where one expects a memory to file mapping to be two-way) it makes no sense at all.

How to artificially cause a page fault in Linux kernel?

I am pretty new to the Linux kernel. I would like to make the kernel fault every time a specified page 'P' is being fetched. One simple conceptual idea is to clear the bit indicating the presence of page 'P' in Page Table Entry (PTE).
Can anyone provide more details on how to go about achieving this in x86? Also please point me to where in the source code one needs to make this modification, if possible.
Background
I have to invoke my custom page handler which is applicable only for handling a set of pages in an user's application. This custom page handler must to be enabled after some prologue is executed in a given application. For testing purposes, I need to induce faults after my prologue is executed.
Currently the kernel loads everything well before my prologue is executed, so I need to artificially cause faults to test my handler.
I have not played with the swapping code since I moved from Minix to Linux, but a swapping algorithm does two things. When there is a shortage of memory, it moves the page from memory to disk, and when a page is needed, it copies it back (probably after moving another page to disk).
I would use the full swap out function that you are writing to clear the page present flag. I would probably also use a character device to send the command to the test code to force the swap.

Where is the link between NUMA code for writing a page and the rest of Linux swap

So, for normal pages in Linux try_to_unmap creates a swap entry for a particular page and then pageout handles writing it to the swap space by calling mapping->a_ops->writepage on it. Now, shrink_page_list connects the pieces together.
For NUMA pages on the other hand try_to_unmap creates a NUMA migration entry for a particular page, but I do not see where in the code it is actually written out and where in the code things are glued together.
Anyone knows the link?
Thanks.

page swap in Linux kernel

I know that Linux kernel has page cache to save recently used pages and blocks.
I understood that it helps to save time, because Linux doesn't need to get those blocks from a lower memory. When some block is missing in the cache, then Linux asks for it from lower level memory (by using some functions like submit_bio) and gets the block corresponding page.
I want to find the place in Linux kernel (3.10) where it checks for existence of the block in the page cache, and if it can't find this page, it brings the block from the block i/o layer.
I search for something like this in the code:
if( block's page exists in the cache )
return this page
else
bring the page of the searched block and return it
Can anyone post a link to the place in the kernel where this decision made?
The best place to start looking is going to be in mm.h: http://lxr.linux.no/linux+v3.10.10/include/linux/mm.h
Then take a look at the mm directory, which has files like page_io.c: http://lxr.linux.no/linux+v3.10.10/mm/page_io.c
Keep in mind that any architecture specific stuff will likely be defined in the arch directory for the system you are looking at. For example, here is the x86 page table management code: http://lxr.linux.no/linux+v3.10.10/arch/x86/mm/pgtable.c
Good luck! Remember, you are likely not going to find a section of code as clean as the example code you gave.

How to manipulate page cache in Linux?

I want to know what files are cached in Page Cache, and want to free the cache space of a specific file pragmatically. It is possible for me to write kernel module or even modify the kernel code if needed. Can anyone give me some clues?
Firstly, the kernel does not maintain a master list of all files in the page cache, because it has no need for such information. Instead, given an inode you can look up the associated page cache pages, and vice-versa.
For each page cache struct page, page_mapping() will return the struct address_space that it belongs to. The host member of struct address_space identifies the owning struct inode, and from there you can get the inode number and device.
mincore() returns a vector that indicates whether pages of the calling process's virtual memory are resident in core (RAM), and so will not cause a disk access (page fault) if referenced. The kernel returns residency information about the pages starting at the address addr, and continuing for length bytes.
To test whether a file currently mapped into your process is in cache, call mincore with its mapped address.
To test whether an arbitrary file is in cache, open and map it, then follow the above.
There is a proposed fincore() system call which would not require mapping the file first, but (at this point in time) it's not yet generally available.
(And then madvise(MADV_DONTNEED)/fadvise(FADV_DONTNEED) can drop parts of a mapping/file from cache.)
You can free the contents of a file from the page cache under Linux by using
posix_fadvise(fd, POSIX_FADV_DONTNEED
As of Linux 2.6 this will immediately get rid of the parts of the page cache which are caching the given file or part of file; the call blocks until the operation is complete, but that behaviour is not guaranteed by posix.
Note that it won't have any effect if the pages have been modified, in that case you want to do a fdatasync or such like first.
EDIT: Sorry, I didn't fully read your question. I don't know how to tell which files are currently in the page cache. Sorry.

Resources