In Linux, pages in memory have a PG_referenced bit which is set when there is a reference to the page. My question is, if there is a memory read/write to an address in a page and there was a cache hit for that address, will it count as a reference to that page?
Is PG_referenced what Linux calls the bit in the hardware page tables that the CPU updates when a page is accessed? Or is it tracking whether a page is referenced by another kernel data structure?
The CPU hardware sets a bit in the page-table entry on access to virtual page, if it wasn't already set. (I assume most ISAs have functionality like this to help detect less recently used pages, e.g. the OS can clear the Accessed bit on some pages, and later check to see which pages still haven't been accessed. Without having to actually invalidate them and force a soft page fault on access.)
On x86 for example, the check (for whether to "trap" to an internal microcode assist to atomically set the Accessed and/or Dirty bit in the PTE and higher levels in the page directory) is based on the TLB entry caching the PTE.
This is unrelated to D-cache or I-cache hit or miss for the physical address.
Updating the Accessed and/or Dirty bits works even for pages set to be uncacheable, I think.
Related
When virtual memory systems decide to evict a memory page out to disk, it's sometimes mentioned that the last time of access of that memory page is used to help decide which page is evicted. But what mechanism is used to track the last time of access of a memory page?
I believe the TLB only stores information on its current set of pages not all of them. Or is it simply that anything not present in the TLB is game for eviction under that criteria?
The page frame reclaiming algorithm(PFRA) in Linux kernel is acutally quite complex and it uses several heuristics to decide which page to evict. The actual decision is based on many factors, but following two stand out.
1. The Least Recently Used (LRU) Lists
Linux divides memory into different memory zones, each zone has a zone descriptor and this descriptor along with other things stores references to two very important lists namely active and inactive.
The active list includes the pages that have been accessed recently, while the inactive list includes the pages that have not been accessed for some time. Different page frames move from one list to another and out of lists depending on the access patterns.
2. Page descriptor flags
following two flags in the page descriptor allows kernel to identify if a given page is included in one of the above lists.
PG_lru - The page is in the active or inactive page list
PG_active -The page is in the active page list
Understanding the Linux Kernel states the following when it describes how PFRA actually uses above information.
The main idea behind the LRU algorithm is to associate a counter
storing the age of the page with each page in RAM—that is, the
interval of time elapsed since the last access to the page. This
counter allows the PFRA to reclaim only the oldest page of any
process. Some computer platforms provide sophisticated support for LRU
algorithms;† unfortunately, 80 × 86 processors do not offer such a
hardware feature, thus the Linux kernel cannot rely on a page counter
that keeps track of the age of every page. To cope with this
restriction, Linux takes advantage of the Accessed bit included in
each Page Table entry, which is automatically set by the hardware when
the page is accessed; moreover, the age of a page is represented by
the position of the page descriptor in one of two different “The Least
Recently Used (LRU) Lists”.
I am exploring the memory management in Linux operating system.
As far as I know, MMU is a hardware integrated in the modern CPU to handle address translation. If a virtual address is not in the TLB, the
MMU will first get the address of a process's page table through page table base register (PTBR), then retrieve the physical address from the page table which is located in physical memory.
My question is: how does MMU notify the operating system that the physical page has been accessed or modified, as the operating system is responsible for page replacement? I see a function in Linux/mm/swap.c. But I am not sure if this function is called every time the page table is updated.
void mark_page_accessed(struct page *page)
{
if (!PageActive(page) && !PageUnevictable(page) && PageReferenced(page)) {
/*
* If the page is on the LRU, queue it for activation via
* activate_page_pvecs. Otherwise, assume the page is on a
* pagevec, mark it active and it'll be moved to the active
* LRU on the next drain.
*/
if (PageLRU(page))
activate_page(page);
else
__lru_cache_activate_page(page);
ClearPageReferenced(page);
if (page_is_file_cache(page))
workingset_activation(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
}
I think MMU will possibly modify the PTE flag of the page table. But the operating system will not know only if the OS does a page table walking, right? And the page replacement is performed on physical pages, is there also some flag on the physical pages? I must be missing something really important..
Thanks
I read some related books on Linux memory management. I think the basic steps include:
The process would like to access a virtual address;
Processor checks TLB if there is a cached map from virtual address to physical address; if a cached map is found, jump to step 4;
Get the physical address using the process's page table;
If the page is not presented in physical memory, trigger a page fault and perform a page swapping. Access the physical page;
Update page flags, like dirty bit or referenced bit in page->flags.
The OS keeps an active_list to contain physical pages those are recently referenced, and an inactive_list to contain reclaim candidates. The policy for page replacement is implemented in the swap.c file.
For a page in the active_list, if it reaches the bottom of the list, the referenced flag is checked. If it is set, the page is moved back to the top of the list and the next page is to be checked; Otherwise, the page is moved to the inactive_list.
In conclusion, I guess, the answer to my question
How does the memory management unit (MMU) notify the operating system that the page table is updated?
is that
MMU doesn't notify the operating system that the page table is updated directly. It updates the flag of a physical page (flag in struct page). When the OS is performing page replacement, if a physical page reaches the tail of active_list or inactive_list, the page's flag is checked to determine whether the page needs replaced or moved to the head of the two lists.
I'm asking because I remember that all physical pages belong to the kernel are pinned in memory and thus are unswappable, like what is said here: http://www.cse.psu.edu/~axs53/spring01/linux/memory.ppt
However, I'm reading a research paper and feel confused as it says,
"(physical) pages frequently move between the kernel data segment and user space."
It also mentions that, in contrast, physical pages do not move between the kernel code segment and user space.
I think if a physical page sometimes belongs to the kernel data segment and sometimes belongs to user space, it must mean that physical pages belong to the kernel data segment are swappable, which is against my current understanding.
So, physical pages belong to the kernel data segment are swappable? unswappable?
P.S. The research paper is available here:
https://www.cs.cmu.edu/~arvinds/pubs/secvisor.pdf
Please search "move between" and you will find it.
P.S. again, a virtual memory area ranging from [3G + 896M] to 4G belongs to the kernel and is used for mapping physical pages in ZONE_HIGHMEM (x86 32-bit Linux, 3G + 1G setting). In such a case, the kernel may first map some virtual pages in the area to the physical pages that host the current process's page table, modify some page table entries, and unmap the virtual pages. This way, the physical pages may sometimes belong to the kernel and sometimes belong to user space, because they do not belong to the kernel after the unmapping and thus become swappable. Is this the reason?
tl;dr - the memory pools and swapping are different concepts. You can not make any deductions from one about the other.
kmalloc() and other kernel data allocation come from slab/slub, etc. The same place that the kernel gets data for user-space. Ergo pages frequently move between the kernel data segment and user space. This is correct. It doesn't say anything about swapping. That is a separate issue and you can not deduce anything.
The kernel code is typically populated at boot and marked read-only and never changes after that. Ergo physical pages do not move between the kernel code segment and user space.
Why do you think because something comes from the same pool, it is the same? The network sockets also come from the same memory pool. It is a seperation of concern. The linux-mm (memory management system) handles swap. A page can be pinned (unswappable). The check for static kernel memory (this may include .bss and .data) is a simple range check. The memory is normally pinned and marked unswappable at the linux-mm layer. The user data (whos allocation come from the same pool) can be marked as swappable by the linux-mm. For instance, even without swap, user-space text is still swappable because it is backed by an inode. Caching is much simpler for read-only data. If data is swapped, it is marked as such in the MMU tables and a fault handler must distinguish between swap and a SIGBUS; which is part of the linux-mm.
There are also versions of Linux with no-mm (or no MMU) and these will never swap anything. In theory someone might be able to swap kernel data; but the why is it in the kernel? The Linux way would be to use a module and only load them as needed. Certainly, the linux-mm data is kernel data and hopefully, you can see a problem with swapping that.
The problem with conceptual questions like this,
It can differ with Linux versions.
It can differ with Linux configurations.
The advice can change as Linux evolves.
For certain, the linux-mm code can not be swappable, nor any interrupt handler. It is possible that at some point in time, kernel code and/or data could be swapped. I don't think that this is ever the current case outside of module loading/unloading (and it is rather pedantic/esoteric as to whether you call this swapping or not).
I think if a physical page sometimes belongs to the kernel data segment and sometimes belongs to user space, it must mean that physical pages belong to the kernel data segment are swappable, which is against my current understanding.
there is no connection between swappable memory and page movement between user space and kernel space. whether a page can be swapped or not depends totally on whether it is pinned or not. Pinned pages are not swapped so their mapping is considered permanent.
So, physical pages belong to the kernel data segment are swappable? unswappable?
usually pages used by kernel are pinned and so are meant not to be swappable.
However, I'm reading a research paper and feel confused as it says, "(physical) pages frequently move between the kernel data segment and user space."
Could you please give a link of this research papaer?
As far as I known, (just from UNIX lectures and labs at school) the pages for kernel space has been allocated for kernel, with a simple, fixed mapping algorithm, and they are all pinned. After kernel turn on the paging mode, (bits operation of CR0&CR3 for x86) there will be the first user mode process, and the pages which has been allocated for kernel will not be in the available set of pages for user space.
I have been learning about virtual memory recently, and some questions were raised - especially regarding the initialization of all the structs.
assume x86 architecture, linux 2.4 (=> 2 level paging).
at the beginning, what do the PGD's entries contain, if they dont point to any allocated Page Table?
same question for page tables - how are the entries initialized?
when process creates new memory area, say ,for virtual addresses 100 - 200, does it also create (if needed) and initialize the page tables that correspond to those addresses? or wait until there is an access for a specific address?
when page table entry needs to be initialized to physical address (say on write access) - how does OS select it?
thanks in advance.
Entries have a valid bit. So if there are no page tables allocated in a page directory, all entries theoretically have the valid bit off and it wouldn't matter what else is in the entry.
Same as above, except I think if a page table is created that means a page from this range has been accessed so at least one entry will be set as valid upon page table initialization. Otherwise, there'd be no reason to create an empty page table at all and take up memory.
I'm interpreting your "creating new memory area" as using a malloc() call. Malloc is a way to ask the OS for memory and to map that memory to your virtual address space. That memory comes from your heap virtual memory range and I don't think you can guarantee the specific addresses the OS uses, just the size. If you use mmap I think you do have the ability to ask for specific addresses to be used, but in general you only want to do this for specific cases like shared memory.
As for the page tables, I imagine that when the OS gets your memory during a malloc call, it will update your page table for you with the new mappings. If it doesn't during the malloc, then it will when you try to access the memory and it causes a page fault.
In Linux, the OS generally keeps track of a free list of pages so that it can easily grab memory without worrying about anyone else using it. My guess is the free list is initialized upon booting by communicating with the main memory controller/bitmap to know which spots of physical memory are in use, but maybe a hardware person can back this up.
According the the mlock() man page:
All pages that contain a part of the specified address range are
guaranteed to be resident in RAM when the call returns successfully;
the pages are guaranteed to stay in RAM until later unlocked.
Does this also guarantee that the physical address of these pages is constant throughout their lifetime, or until unlocked?
If not (that is, if it can be moved by the memory manager - but still remain in RAM), is there anything that can be said about the new location, or the event when such change occur?
UPDATE:
Can anything be said about the coherency of the locked pages in RAM? If the CPU has a cache, then does mlock-ing guarantee RAM coherency with the cache (assuming write-back cache)?
No. Pages that have been mlocked are managed using the kernel's unevictable LRU list. As the name suggests (and mlock() guarantees) these pages cannot be evicted from RAM. However, the pages can be migrated from one physical page frame to another. Here is an excerpt from Unevictable LRU Infrastructure (formatting added for clarity):
MIGRATING MLOCKED PAGES
A page that is being migrated has been isolated from the LRU lists and is held locked across unmapping of the page, updating the page's address space entry and copying the contents and state, until the page table entry has been replaced with an entry that refers to the new page. Linux supports migration of mlocked pages and other unevictable pages. This involves simply moving the PG_mlocked and PG_unevictable states from the old page to the new page.