Page Table Entry, Present Bit? - linux

Quoting from: http://www.cburch.com/books/vm/index.html
The final bit (labeled P) indicates whether the page is present in
RAM. If this bit is 0, then any access to the page will trigger a page
fault.
My professor doesn't agree, he said the bit can be 0 while page is in RAM and he added that this can happen when the page is shared between multiple processes and someone does something or so.
Can someone kindly explain this, still I don't get it I'm looking for detailed examples when page is in RAM but it's present bit in PTE is 0 and not 1.

Yes, It's possible to have a page in RAM with p-bit disabled.
This method was useful while creating a software/kernel with multi-threading and multi-processor environment, where a process needs exclusive rights or if a piece code must not cross some other. We can temporarily disable it's access to other core/processor by demoting p-bit in page-table and the kernel/software must handle the page fault accordingly.

Related

How to handle linux page cache (tag lookup) returning less pages that what was asked?

This is from a file system perspective.
The file system page size is 8K (i.e. double the block size, 4k).
So, when I dirty pages and go for a flush, I make sure that the range passed to pvec_lookup_tag() is 8k aligned at all costs. The page cache should give me pages starting at 8k aligned address (i.e. even index)
So, down to the problem.
I have already dirtied the pages and then I ask the page cache for 14 dirty pages in some specified range and mapping.
But, surprisingly it gives me just one page which is odd aligned.
In short, I'm getting just the second 4k page of my originally intended 8k page.
Also, I checked the mapping by taking a crashdump. All the 14 pages I had asked were right there and also marked dirty.
Just retrying the same lookup gives me the correct pages.
But I feel there must be a better solution here.
Is there some weird window between marking the pages dirty and trying a tag lookup that is causing this?
(I'm on Linux Kernel v3.10.x)
Okay, Let me rephrase the question in simpler terms.
Is it possible that a tag lookup in linux gives me less pages than I asked for?
If yes, how to handle such cases?

How to artificially cause a page fault in Linux kernel?

I am pretty new to the Linux kernel. I would like to make the kernel fault every time a specified page 'P' is being fetched. One simple conceptual idea is to clear the bit indicating the presence of page 'P' in Page Table Entry (PTE).
Can anyone provide more details on how to go about achieving this in x86? Also please point me to where in the source code one needs to make this modification, if possible.
Background
I have to invoke my custom page handler which is applicable only for handling a set of pages in an user's application. This custom page handler must to be enabled after some prologue is executed in a given application. For testing purposes, I need to induce faults after my prologue is executed.
Currently the kernel loads everything well before my prologue is executed, so I need to artificially cause faults to test my handler.
I have not played with the swapping code since I moved from Minix to Linux, but a swapping algorithm does two things. When there is a shortage of memory, it moves the page from memory to disk, and when a page is needed, it copies it back (probably after moving another page to disk).
I would use the full swap out function that you are writing to clear the page present flag. I would probably also use a character device to send the command to the test code to force the swap.

Changing kernel page permission for allowing user access

In x86 or x64 Linux, I am trying to make a kernel module that changes specific kernel page permission to allow user application accessing that memory. For example, if there is a readable kernel page at 0xC0001000(say it's 3:1 split), I want to change user/supervisor bit of this page and allow user applications to do something like this.
int* m = 0xC0001000;
printf("reading kernel memory from user : %08x\n", *m);
In my kernel module, I changed the access bit of corresponding kernel memory page from 0x67 to 0x63 (lower bits 111 -> 011) clearing the supervisor bit.
After that, I flushed the TLB of virtual address 0xc0001000 using invdpg instruction.
I have confirmed that the page entry I manipulated was indeed the corresponding one.
However, accessing 0xC0001000 from user application still causes me segmentation fault.
Am I missing something important here? perhaps cs segment and GDT? or is that irrelevant?
Some advice would be nice, thank you in advance :)
From your kernel module you can just change the effective user id to 0 to let it read /dev/kmem,

page swap in Linux kernel

I know that Linux kernel has page cache to save recently used pages and blocks.
I understood that it helps to save time, because Linux doesn't need to get those blocks from a lower memory. When some block is missing in the cache, then Linux asks for it from lower level memory (by using some functions like submit_bio) and gets the block corresponding page.
I want to find the place in Linux kernel (3.10) where it checks for existence of the block in the page cache, and if it can't find this page, it brings the block from the block i/o layer.
I search for something like this in the code:
if( block's page exists in the cache )
return this page
else
bring the page of the searched block and return it
Can anyone post a link to the place in the kernel where this decision made?
The best place to start looking is going to be in mm.h: http://lxr.linux.no/linux+v3.10.10/include/linux/mm.h
Then take a look at the mm directory, which has files like page_io.c: http://lxr.linux.no/linux+v3.10.10/mm/page_io.c
Keep in mind that any architecture specific stuff will likely be defined in the arch directory for the system you are looking at. For example, here is the x86 page table management code: http://lxr.linux.no/linux+v3.10.10/arch/x86/mm/pgtable.c
Good luck! Remember, you are likely not going to find a section of code as clean as the example code you gave.

How to tell Linux that a mmap()'d page does not need to be written to swap if the backing physical page is needed?

Hopefully the title is clear. I have a chunk of memory obtained via mmap(). After some time, I have concluded that I no longer need the data within this range. I still wish to keep this range, however. That is, I do not want to call mummap(). I'm trying to be a good citizen and not make the system swap more than it needs.
Is there a way to tell the Linux kernel that if the given page is backed by a physical page and if the kernel decides it needs that physical page, do not bother writing that page to swap?
I imagine under the hood this magical function call would destroy any mapping between the given virtual page and physical page, if present, without writing to swap first.
Your question (as stated) makes no sense.
Let's assume that there was a way for you to tell the kernel to do what you want.
Let's further assume that it did need the extra RAM, so it took away your page, and didn't swap it out.
Now your program tries to read that page (since you didn't want to munmap the data, presumably you might try to access it). What is the kernel to do? The choices I see:
it can give you a new page filled with 0s.
it can give you SIGSEGV
If you wanted choice 2, you could achieve the same result with munmap.
If you wanted choice 1, you could mremap over the existing mapping with MAP_ANON (or munmap followed by new mmap).
In either case, you can't depend on the old data being there when you need it.
The only way your question would make sense is if there was some additional mechanism for the kernel to let you know that it is taking away your page (e.g. send you a special signal). But the situation you described is likely rare enough to warrant additional complexity.
EDIT:
You might be looking for madvise(..., MADV_DONTNEED)
You could munmap the region, then mmap it again with MAP_NORESERVE
If you know at initial mapping time that swapping is not needed, use MAP_NORESERVE

Resources