How does virtual to pyhsical memory mapping work - linux

Im currently trying to understand systems programming for Linux and have a hard time understanding how virtual to physical memory mappings work.
What I understand so far is that two processes P1 and P2 can make references to the same virtual adress for example 0xf11001. Now this memory adress is split up into two parts. 0xf11 is the page number and 0x001 is the offset within that page (assuming 4096 page size is used). To find the physical adress the MMU has hardware registeres that maps the pagenumber to a physical adress lets say 0xfff. The last stage is to combine 0xfff with 0x001 to find the physical 0xfff001 adress.
However this understanding makes no sens, the same virtual adresses would still point to the same physical location??? What step is I missing inorder for my understanding to be correct???

You're missing one (crucial) step here. In general, MMU doesn't have hardware registers with mappings, but instead one register (page table base pointer) which points to the physical memory address of the page table (with mappings) for the currently running process (which are unique to every process). On context switch, kernel with change this register's value, so for each running process different mapping will be performed.
Here's nice presentation on this topic: http://www.eecs.harvard.edu/~mdw/course/cs161/notes/vm.pdf

Related

How to test if address is virtual or logical in linux kernel?

I am familiar that the Linux kernel memory is usually 1:1 mapped (up to a certain limit of the zone). From what I understood is that to make this 1:1 mapping more efficient the array of struct page is virtually mapped.
I wanted to check if that is the case. Is there a way to test if - given an address (lets say that of a struct page) check if it is 1:1 mapped or virtually mapped?
The notion of address space for a 64-bit machine emcompasses 2^64 addresses. This is far larger than any modern amount of physical memory in one machine. Therefore, it is possible to have the entire physical memory mapped into the address space with plenty of room to spare. As discussed in this post and shown here, Linux leaves 64 TB of the address space for the physical mapping. Therefore, if the kernel needed to iterate through all bytes in physical memory, it could just iterate through addresses 0+offset to total_bytes_of_RAM + offset, where offset is the address where the direct mapping starts (ffff888000000000 in the 64 bit memory layout linked above). Also, this direct mapping region is within the kernel address range that is "shared between all processes" so addresses in this range should always be logical.
Your post has two questions: one is how to test if an address is logical or virtual. As I mentioned, the answer is if the address falls within the direct mapping range, then it is logical. Otherwise it is virtual. If it is a virtual address, then obtaining the physical address through the page tables should allow you to access the address logically by following the physical_addr + offset math as mentioned above.
Additionally, kmalloc allocates/reserves memory directly using this logical mapping, so you immediately know that if the address you're using came from kmalloc, it is a logical address. However, vmalloc and any user-space memory allocations use virtual addresses that must be translated to get the logical equivalent.
Your second question is whether "logically mapped pages" can be swapped out. The question should be rephrased because technically all pages that are in RAM are logically mapped in that direct mapping region. And yes certain pages in main memory can be swapped out or kicked out to be used by another page in the same page frame. Now, if you're asking whether pages that are only mapped logically and not virtually (like with kmalloc, which gets memory from slab) can be swapped out, I think the answer is that they can be reclaimed if not being used, but aren't generally swapped out. Kernel pages are generally not swapped out, except for hibernation.

How to get process's virtual address in Linux kernel

Currently, I'm trying to figure out how to get the virtual address (VA) of a specific process in the Linux kernel, since there are several functions taking VA as an argument related to different page directories, including pgd_offset(), pgd_index(), p4d_offset(), p4d_index()...
Could anyone explain the functions of these functions, including xxx_offset(), xxx_index()?(xxx:pgd, p4d, pmd...) And how to use these functions?
What does the VA mean when it is taken as an argument of functions mentioned above, is that the virtual address of the process? And how can I get the VA of a specific process? I've already known that we can use process's task_struct->mm->mmap to get the range of the virtual address space, but no idea about how to get a specific virtual address.
Is the task_struct->mm->pgd_t indicating the base address of PGD_directory?
Your question doesn't really make sense. You don't "get a virtual address of a process". A process has a virtual address space that serves as a virtual memory map for data, code, stack, heap, etc.
Those functions are taking a single virtual address within the process virtual address space and helping with walking through the page tables to find its page table entry and then its physical address (or checking page table entry flags). In Linux, there are 4 page tables levels to go through to get to the page table entry. Normally the levels are pgd (page table directory), pud (page upper directory), pmd (page mid directory), and pte (page table entry). But I think recently p4d was added as an extra page table level. Typically, the address of the page directory (top-level page table) is stored in the CR3 register. So you use that address to access the directory, then use the pgd_index and pgd_offset to find the address of the next level (p4d) you need to look into, and repeat till you hit the pte. A useful file to see this in action is the mm/page_walk.c file.
A process accesses memory during its runtime and typically this memory is referred to by virtual addresses. When it accesses an address that isn't in the TLB, the address must be walked through as described above to find out its location and permissions flags. There is no "getting the VA of a process", but when your program uses mmap or malloc and you get addresses of variables, those addresses are typically virtual addresses. You can look in /proc/proc_number/maps to see the virtual address layout of a process with PID proc_number. Note that with address space layout randomization turned on, this map will be different every time you run the same program.
I'm not sure, but you probably can test it by comparing that variable with the pgd address used in the page_walk.c file I linked above.

How exactly do kernel virtual addresses get translated to physical RAM?

On the surface, this appears to be a silly question. Some patience please.. :-)
Am structuring this qs into 2 parts:
Part 1:
I fully understand that platform RAM is mapped into the kernel segment; esp on 64-bit systems this will work well. So each kernel virtual address is indeed just an offset from physical memory (DRAM).
Also, it's my understanding that as Linux is a modern virtual memory OS, (pretty much) all addresses are treated as virtual addresses and must "go" via hardware - the TLB/MMU - at runtime and then get translated by the TLB/MMU via kernel paging tables. Again, easy to understand for user-mode processes.
HOWEVER, what about kernel virtual addresses? For efficiency, would it not be simpler to direct-map these (and an identity mapping is indeed setup from PAGE_OFFSET onwards). But still, at runtime, the kernel virtual address must go via the TLB/MMU and get translated right??? Is this actually the case? Or is kernel virtual addr translation just an offset calculation?? (But how can that be, as we must go via hardware TLB/MMU?). As a simple example, lets consider:
char *kptr = kmalloc(1024, GFP_KERNEL);
Now kptr is a kernel virtual address.
I understand that virt_to_phys() can perform the offset calculation and return the physical DRAM address.
But, here's the Actual Question: it can't be done in this manner via software - that would be pathetically slow! So, back to my earlier point: it would have to be translated via hardware (TLB/MMU).
Is this actually the case??
Part 2:
Okay, lets say this is the case, and we do use paging in the kernel to do this, we must of course setup kernel paging tables; I understand it's rooted at swapper_pg_dir.
(I also understand that vmalloc() unlike kmalloc() is a special case- it's a pure virtual region that gets backed by physical frames only on page fault).
If (in Part 1) we do conclude that kernel virtual address translation is done via kernel paging tables, then how exactly does the kernel paging table (swapper_pg_dir) get "attached" or "mapped" to a user-mode process?? This should happen in the context-switch code? How? Where?
Eg.
On an x86_64, 2 processes A and B are alive, 1 cpu.
A is running, so it's higher-canonical addr
0xFFFF8000 00000000 through 0xFFFFFFFF FFFFFFFF "map" to the kernel segment, and it's lower-canonical addr
0x0 through 0x00007FFF FFFFFFFF map to it's private userspace.
Now, if we context-switch A->B, process B's lower-canonical region is unique But
it must "map" to the same kernel of course!
How exactly does this happen? How do we "auto" refer to the kernel paging table when
in kernel mode? Or is that a wrong statement?
Thanks for your patience, would really appreciate a well thought out answer!
First a bit of background.
This is an area where there is a lot of potential variation between
architectures, however the original poster has indicated he is mainly
interested in x86 and ARM, which share several characteristics:
no hardware segments or similar partitioning of the virtual address space (when used by Linux)
hardware page table walk
multiple page sizes
physically tagged caches (at least on modern ARMs)
So if we restrict ourselves to those systems it keeps things simpler.
Once the MMU is enabled, it is never normally turned off. So all CPU
addresses are virtual, and will be translated to physical addresses
using the MMU. The MMU will first look up the virtual address in the
TLB, and only if it doesn't find it in the TLB will it refer to the
page table - the TLB is a cache of the page table - and so we can
ignore the TLB for this discussion.
The page table
describes the entire virtual 32 or 64 bit address space, and includes
information like:
whether the virtual address is valid
which mode(s) the processor must be in for it to be valid
special attributes for things like memory mapped hardware registers
and the physical address to use
Linux divides the virtual address space into two: the lower portion is
used for user processes, and there is a different virtual to physical
mapping for each process. The upper portion is used for the kernel,
and the mapping is the same even when switching between different user
processes. This keep things simple, as an address is unambiguously in
user or kernel space, the page table doesn't need to be changed when
entering or leaving the kernel, and the kernel can simply dereference
pointers into user space for the
current user process. Typically on 32bit processors the split is 3G
user/1G kernel, although this can vary. Pages for the kernel portion
of the address space will be marked as accessible only when the processor
is in kernel mode to prevent them being accessible to user processes.
The portion of the kernel address space which is identity mapped to RAM
(kernel logical addresses) will be mapped using big pages when possible,
which may allow the page table to be smaller but more importantly
reduces the number of TLB misses.
When the kernel starts it creates a single page table for itself
(swapper_pg_dir) which just describes the kernel portion of the
virtual address space and with no mappings for the user portion of the
address space. Then every time a user process is created a new page
table will be generated for that process, the portion which describes
kernel memory will be the same in each of these page tables. This could be
done by copying all of the relevant portion of swapper_pg_dir, but
because page tables are normally a tree structures, the kernel is
frequently able to graft the portion of the tree which describes the
kernel address space from swapper_pg_dir into the page tables for each
user process by just copying a few entries in the upper layer of the
page table structure. As well as being more efficient in memory (and possibly
cache) usage, it makes it easier to keep the mappings consistent. This
is one of the reasons why the split between kernel and user virtual
address spaces can only occur at certain addresses.
To see how this is done for a particular architecture look at the
implementation of pgd_alloc(). For example ARM
(arch/arm/mm/pgd.c) uses:
pgd_t *pgd_alloc(struct mm_struct *mm)
{
...
init_pgd = pgd_offset_k(0);
memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
...
}
or
x86 (arch/x86/mm/pgtable.c) pgd_alloc() calls pgd_ctor():
static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
references from swapper_pg_dir. */
...
clone_pgd_range(pgd + KERNEL_PGD_BOUNDARY,
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
KERNEL_PGD_PTRS);
...
}
So, back to the original questions:
Part 1: Are kernel virtual addresses really translated by the TLB/MMU?
Yes.
Part 2: How is swapper_pg_dir "attached" to a user mode process.
All page tables (whether swapper_pg_dir or those for user processes)
have the same mappings for the portion used for kernel virtual
addresses. So as the kernel context switches between user processes,
changing the current page table, the mappings for the kernel portion
of the address space remain the same.
The kernel address space is mapped to a section of each process for example on 3:1 mapping after address 0xC0000000. If the user code try to access this address space it will generate a page fault and it is guarded by the kernel.
The kernel address space is divided into 2 parts, the logical address space and the virtual address space. It is defined by the constant VMALLOC_START. The CPU is using the MMU all the time, in user space and in kernel space (can't switch on/off).
The kernel virtual address space is mapped the same way as user space mapping. The logical address space is continuous and it is simple to translate it to physical so it can be done on demand using the MMU fault exception. That is the kernel is trying to access an address, the MMU generate fault , the fault handler map the page using macros __pa , __va and change the CPU pc register back to the previous instruction before the fault happened, now everything is ok. This process is actually platform dependent and in some hardware architectures it mapped the same way as user (because the kernel doesn't use a lot of memory).

Find out pages used by mem_map[] array

Recently, I am working on the ARM Linux kernel, and I need to split the HighMem zone into two parts. So I have added a new zone into the kernel, let's say "NewMem". Therefore, I have three zones in my system, they are Normal, NewMem, and HighMem. The size of the NewMem zone is 512MB (, totally 131072 pages). My propose is that I want to manage all the page frames in NewMem zone in my own way, currently I use a doubly linked list to allocate/de-allocate pages. Note that the buddy system for NewMem zone is still exist, but I do not use it. To achieve this. I modified the page allocation routine to make sure that the kernel cannot allocate any page frame from my zone.
My concern is that can I use all the page frames in that zone as it is suggested that each zone is concerned with a subset of the mem_map[] array. I found that only 131084 pages are free in NewMem zone.Therefore, some page frames in my zone may used to store mem_map[], writing data to these pages may lead to unpredictable errors. So is there exist any way to find out which page frame is used to store mem_map[], so that I can avoid rewriting it.
You have to check the break down of physical and virtual memory. usually mem_map is stored on first mappable address of the virtual memory. In Unix, kernel image of usually 8MB is stored at physical address of 1MiB accessed with virtual address PAGE_OFFSET + 0x00100000. 8MiB is reserved in virtual memory for kernel image. Then comes the 16 MiB of zone_dma. So first address which can be used by kernel for mapping is 0xC1000000. Which is supposed to contain mem_map array.
I am not familiar with ARM memory break down but from your post it is evident that there is no zone_dma at least in your case. So your best bet is that address 0xC0800000 stores mem_map. I am assuming that kernel image is 8MB.
As stated above in general first mappable virtual address stores mem_map. You can calculate that address with size and location of kernel image and zone_dma(present or not).
Please come with your feedback.

Two Identical Linear Addresses of two different Processes?

Hello everyone,
I am a newbie to Linux-Kernel and I'm presently referring to the book Understanding Linux Kernel.I read about the memory management where everything is given well about the paging and segmentation but my question is not answered yet.
If two different processes have same Linear Addresses then can they be different locations in Physical Addresses .Because since each CPU has only one Global Page Directory which is mapped again to the physical addresses by observing the 32 bits Linear Address.But how does two processes can have memory upto 4GB .Please explain.
Yes, two different processes can both be using the same linear pointer, but it can dereference to two different locations in physical memory. That is because each process has its own page tables, and when switching from one process to another, the CPU's page table register is also switched to point to the page tables of the new process.
Have you cloned your own local copy of the Linux source code yet? If not, go and do it right now. You'll need to refer to it as you read your book.
Cloned now? Good. Go to the cloned working directory and open up arch/x86/include/asm/mm_context.h. Go down to line 51, you'll find static inline void switch_mm. This is the function which switches the CPU from the virtual memory space of one process to another. (I'm assuming you are most interested in x86.) Look down now to line 64: load_cr3(next->pgd). That's where the magic happens: the page tables are switched and now the CPU will interpret all pointers using the new process' page tables.

Resources