page swap in Linux kernel - linux

I know that Linux kernel has page cache to save recently used pages and blocks.
I understood that it helps to save time, because Linux doesn't need to get those blocks from a lower memory. When some block is missing in the cache, then Linux asks for it from lower level memory (by using some functions like submit_bio) and gets the block corresponding page.
I want to find the place in Linux kernel (3.10) where it checks for existence of the block in the page cache, and if it can't find this page, it brings the block from the block i/o layer.
I search for something like this in the code:
if( block's page exists in the cache )
return this page
else
bring the page of the searched block and return it
Can anyone post a link to the place in the kernel where this decision made?

The best place to start looking is going to be in mm.h: http://lxr.linux.no/linux+v3.10.10/include/linux/mm.h
Then take a look at the mm directory, which has files like page_io.c: http://lxr.linux.no/linux+v3.10.10/mm/page_io.c
Keep in mind that any architecture specific stuff will likely be defined in the arch directory for the system you are looking at. For example, here is the x86 page table management code: http://lxr.linux.no/linux+v3.10.10/arch/x86/mm/pgtable.c
Good luck! Remember, you are likely not going to find a section of code as clean as the example code you gave.

Related

How to artificially cause a page fault in Linux kernel?

I am pretty new to the Linux kernel. I would like to make the kernel fault every time a specified page 'P' is being fetched. One simple conceptual idea is to clear the bit indicating the presence of page 'P' in Page Table Entry (PTE).
Can anyone provide more details on how to go about achieving this in x86? Also please point me to where in the source code one needs to make this modification, if possible.
Background
I have to invoke my custom page handler which is applicable only for handling a set of pages in an user's application. This custom page handler must to be enabled after some prologue is executed in a given application. For testing purposes, I need to induce faults after my prologue is executed.
Currently the kernel loads everything well before my prologue is executed, so I need to artificially cause faults to test my handler.
I have not played with the swapping code since I moved from Minix to Linux, but a swapping algorithm does two things. When there is a shortage of memory, it moves the page from memory to disk, and when a page is needed, it copies it back (probably after moving another page to disk).
I would use the full swap out function that you are writing to clear the page present flag. I would probably also use a character device to send the command to the test code to force the swap.

Write module of kernel (Linux), which to save the page of process from removing to the swap

Need to save the page of process (the user part!) from removing to the swap.
I need to do it in the kernel, only. (language C I know)
(Maybe insert hook in shrink_page_list?)
I have IDs of processes, which need to save and threshold amount of physical memory in the system (We fill, while it isn't filled). IDs and threshold write in /proc, /dev or /sys.
How to approach this?
What files to look at?
What tutorials to read?
Maybe there are examples that are somehow are related with this task.
Info: I compilling kernel of Debian Lenny, use Qemu for start it on my Ubuntu.
See get_user_pages. http://www.makelinux.net/ldd3/chp-15-sect-3.
Use get_user_pages, you can get whatever page you want and keep it locked in memory.
Even better, look at the comments on the source at
http://lxr.free-electrons.com/source/mm/gup.c#L637

How to tell Linux that a mmap()'d page does not need to be written to swap if the backing physical page is needed?

Hopefully the title is clear. I have a chunk of memory obtained via mmap(). After some time, I have concluded that I no longer need the data within this range. I still wish to keep this range, however. That is, I do not want to call mummap(). I'm trying to be a good citizen and not make the system swap more than it needs.
Is there a way to tell the Linux kernel that if the given page is backed by a physical page and if the kernel decides it needs that physical page, do not bother writing that page to swap?
I imagine under the hood this magical function call would destroy any mapping between the given virtual page and physical page, if present, without writing to swap first.
Your question (as stated) makes no sense.
Let's assume that there was a way for you to tell the kernel to do what you want.
Let's further assume that it did need the extra RAM, so it took away your page, and didn't swap it out.
Now your program tries to read that page (since you didn't want to munmap the data, presumably you might try to access it). What is the kernel to do? The choices I see:
it can give you a new page filled with 0s.
it can give you SIGSEGV
If you wanted choice 2, you could achieve the same result with munmap.
If you wanted choice 1, you could mremap over the existing mapping with MAP_ANON (or munmap followed by new mmap).
In either case, you can't depend on the old data being there when you need it.
The only way your question would make sense is if there was some additional mechanism for the kernel to let you know that it is taking away your page (e.g. send you a special signal). But the situation you described is likely rare enough to warrant additional complexity.
EDIT:
You might be looking for madvise(..., MADV_DONTNEED)
You could munmap the region, then mmap it again with MAP_NORESERVE
If you know at initial mapping time that swapping is not needed, use MAP_NORESERVE

Page Cache for shared memory

In the following link within the 4th image from the top:
http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files
The scenario depicted, is that of two processes, "render" and "3drender", sharing a file. The author is going about describing how the sharing mechanism, plays along with the page cache.
Originally render had it's virtual pages mapped onto the page cache.
In step 4 "render" is allocated a new anonymous page of it's own, which would contain certain changes, that it would want to make to "scene.dat #2".
Once "render" makes it's changes, how is this change reflected to "3drender" which has continued to point to the page cache page frame containing "scene.dat #2" ?
Also, shouldn't this change made by "render", make it's way back to the page cache, there by replacing the old page cache copy of "scene.dat #2" ?
The part that remains unclear to me, is what happens "after" one of the processes writes to a shared page and how this "update" makes it's way to the page cache and disk, such that other processes which share the same file, see this change.
It be great if someone could throw some light.
Thanks,
VIjay
In the scenario described in the linked article, render and render3d have private memory-mapped copies of a single file. As far as the processes can tell, the OS allocated a bunch of pages in each process's address space and just copied the file contents in there. If they modify those pages, nothing happens. No changes go back to the file. No changes go between render and render3d. That's what it means to have a private mapping.
Of course, giving each process a full copy of the file is really slow, so the OS uses a virtual memory trick. Until a process writes to the file, it can use a shared copy (shared with other processes and the page cache, also called a buffer cache). The private copy only happens when the process first tries to change the page.

Linux - identify process owning a specific address in physical memory

Under Linux, how can I tell what specific process owns / is using a given address in physical memory?
I understand that this may require writing a kernel module to access some kernel data structure and return the results to a user - I need to know how it can be done, regardless of how complicated it is.
The pages in use by a process and their location in physical memory are not static pieces of information. However, the information you seek should be in the page tables. A change went into the kernel that might be almost exactly what you're looking for:
author Arjan van de Ven <arjan#linux.intel.com> 2008-04-17 15:40:45 (GMT)
committer Ingo Molnar <mingo#elte.hu> 2008-04-17 15:40:45 (GMT)
commit 926e5392ba8a388ae32ca0d2714cc2c73945c609 (patch)
tree 2718b50b8b66a3614f47d3246b080ee8511b299e
parent 2596e0fae094be9354b29ddb17e6326a18012e8c (diff)
x86: add code to dump the (kernel) page tables for visual inspection by kernel developers
This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing
fishy is going on and that the various mappings are set up correctly.
This was quite useful in finding various change_page_attr() bugs, and
is very likely to be useful in the future as well.
Signed-off-by:Arjan van de Ven <arjan#linux.intel.com>
Cc: mingo#elte.hu
Cc: tglx#tglx.de
Cc: hpa#zytor.com
Signed-off-by: Ingo Molnar <mingo#elte.hu>
Signed-off-by: Thomas Gleixner <tglx#linutronix.de>
The added functionality is enabled by a new config option (X86_PTDUMP).
Might want to start here for a discusson of how process virtual memory is mapped to physical memory. That would give you a good place to start as far as figuring out where you would need to hook into the kernel to access the page table, etc. where that information is stored.
Well due to the way things are done under Linux, a process may own memory at one instance, and then will not anymore, due to paging.
http://en.wikipedia.org/wiki/Paging
Essentially this means that the computer switches out data it doesn't need at one moment so that the memory can be used for something else.
I'm not sure if this helped or not, but I'd advise you to look at page tables and directories, since you can use these to translate to physical addresses.
You might be able to use pmap -d [pid] for this... unfortunately you'd have to run it on all processes to see which one returned a result for the given memory address. Certainly not as efficient as a kernel module (and you might not even get a result, if the memory is paged out while you're looking for it).

Resources