Protected-Mode Memory Management
I was going through segmentationof this link.
Both LDT are GDT are independent or dependent on each other ?
(TI bit (which is part of the selector) to decide which descriptor table should be used (the GDT or the currently active LDT) so i think is it independent)
From the figure
Also
GDT (Global Descriptor Table) and is used mainly for holding descriptor entries of operating system segments.
example kernel stack --code_section/data_section?
LDT
The second type is known as the LDT (Local Descriptor Table) and contains entries of normal application segments (although not necessarily)
user stack --code_section/data_section ?
it says the LDTR register contains the size and position of the currently active LDT in memory.
Does that mean in context switch we save the LDTR value of each process in pcb of that process ?
Related
Brief Version:
what status are the addresses unpresented in the maps file? Are they belongs to unallocated virtual pages or allocated from anonymous file or others?
Detailed Version
I'm learning about VM. In my book(CS:APP), I learned that all virtual pages can be cut into three sets: unallocated, allocated but not cached, allocated and cached.I have some questions about "what are allocated pages and unallocated pages? When are pages allocated?" And also, is stack and heap belongs to allocated pages or unallocated or only allocate when used?
Trying to solve these problems, I read the /proc/$pid/maps file, while I think I can get anything I want from it. In my mind, the file contains all memory mapping relations. But there isn't information about is it cached(I know maybe it cannot be seen from user mode...), and are the unpresented pages unallocated?
Honestly, I don't really know about the maps file. What I do know is that the information on every page is stored in page structures at all time. I'm gonna take x86-64 as an example.
For x86-64 on Linux you have Page Global Directory (PGD), Page Upper Directory (PUD), Page Middle Directory (PMD) and Page Directory (PD). The address of the bottom of the PGD table is stored in the CR3 register. PGD contains addresses of the PUDs, PUDs contain addresses of the PMDs, PMDs contain addresses of the PDs and PDs contain addresses of the physical pages.
A virtual address, of which only 48 bits are used, is split into 5 parts. The 12 least significant bits are the offset in the physical page. The next chunk of 9 bits is the offset in the PD. The next chunk is the offset in PMD etc. For example let's say you have virtual address 0x0000000000000123. This virtual address will be translated by the MMU in the CPU by looking at entry (offset) 0 of the PGD, entry 0 of the PUD, entry 0 of the PMD, entry 0 of the PD and finally offset 0x123 in the actual physical page in RAM. Every virtual address is 64 bits of which only the 48 least significant bits will be used.
At boot, the kernel makes checks to determine how much memory is available. It then builds its kernel structures accordingly.
When the kernel boots it will mark all pages as unallocated in its own structures (except for kernel needs). The page structure is important for this. The kernel has a page C structure for every page in the system (https://linux-kernel-labs.github.io/refs/heads/master/labs/memory_mapping.html and https://elixir.bootlin.com/linux/v4.6/source/include/linux/mm_types.h). This structure informs the kernel whether the page is allocated or not.
Each physical page in the system has a struct page associated with
it to keep track of whatever it is we are using the page for at the
moment. Note that we have no way to track which tasks are using
a page, though if it is a pagecache page, rmap structures can tell us
who is mapping it.
At first the pages are mostly unallocated. When you start a new process by launching an executable as the user of the system, pages are allocated for your process. On Linux, executables are ELF files. ELF is a conventional format which separates code in loadable segments. Each segment gets a virtual address at which it is going to be loaded in the virtual address space.
Let's say you have an elf file with one segment which should be loaded at virtual address 0x400000. When you launch that ELF executable, the Linux kernel will call certain functions which will look at the size of the code and allocate pages accordingly. The kernel will look at its structures and determine using algorithms where the process will be allocated in RAM. It will then setup the page tables according to where the virtual addresses for that process should land in actual physical memory.
What's important to understand is that each CPU core in the system has only one process running at a time. Each core has it's own set of page tables. When a process switch occurs for one core, the page tables are swapped completely to point to somewhere else in RAM. The same virtual address can point anywhere in RAM depending on how the page tables are set up.
The kernel holds a task_struct for every process running in the system. The task_struct contains a field named pgd which is a pointer to the PGD of the process. Each process has its very own PGD. If you dereference the pointer to the PGD you get the actual value of the first entry of the PGD. This first entry is the address of the PUD. With this only pointer, the kernel can reach every table belonging to the process and modify them at will.
While a process is running, it can ask for more memory. This is called dynamic memory allocation. The kernel has no way to know how much memory the process is going to ask in advance since it is dynamic (done while code is executing). When the process asks for more memory, the kernel determines what page to give to the process depending on an algorithm. It then marks this page as allocated to that process. The task_struct contains a mm field which is of type mm_struct (https://manybutfinite.com/post/how-the-kernel-manages-your-memory/). It is a memory descriptor for that process so that the kernel can know what memory the process is using. In itself the process doesn't need that information since the process should rely only on itself to ask for memory properly to the operating system and to not jump somewhere in RAM where it doesn't belong.
You ask about heap and stack. The stack for a process is allocated at the beginning of the process and I think it has a fixed size. If you overflow the stack, you will throw a CPU exception which will prompt the kernel to kill your process. Each CPU core has a special register called RSP. This is the stack pointer. It points to the top of the stack (the stack grows downward toward low memory). When the kernel allocates a stack for a process you launch, it will set up this register to point at the top of it. The stack pointer contains a virtual address. It will thus be translated using the page tables just like any address.
The heap is allocated and managed completely by the OS. It doesn't have special registers like the stack. It is allocated only when the process asks for more memory during code execution. The kernel knows in advance how much memory a process requires. It is all written in the ELF executable. All static memory is allocated during compilation and thus the kernel knows everything about the size of static memory. The only moment it requires to allocate new memory to a process is when the process actually asks for it. In C++ you use the keyword new to ask for heap memory dynamically. If you don't use this keyword, then the kernel knows in advance where your variables will be allocated (where they will be in memory). Only the stack will be used by static memory.
Recently, I read a book called Understanding the linux kernel. There is a sentence that confused me a lot. Can anybody explain this to me?
As stated earlier, the Current Privilege Level of the CPU indicates
whether the processor is in User or Kernel Mode and is specified by
the RPL field of the Segment Selector stored in the cs register.
Whenever the CPL is changed, some segmentation registers must be
correspondingly updated. For instance, when the CPL is equal to 3
(User Mode), the ds register must contain the Segment Selector of the
user data segment,but when the CPL is equal to 0, the ds register must contain the Segment Selector of the kernel data segment.
A similar situation occurs for the ss register. It must refer to a
User Mode stack inside the user data segment when the CPL is 3, and it
must refer to a Kernel Mode stack inside the kernel data segment when
the CPL is 0. When switching from User Mode to Kernel Mode, Linux
always makes sure that the ss register contains the Segment Selector
of the kernel data segment.
When saving a pointer to an instruction or to a data structure, the
kernel does not need to store the Segment selector component of the
logical address, because the ss register contains the current Segment
Selector.
As an example, when the kernel invokes a function, it executes a call
assembly language instruction specifying just the Offset component of
its logical address; the Segment Selector is implicitly selected as
the one referred to by the cs register. Because there is just one
segment of type “executable in Kernel Mode,” namely the code segment
identified by __KERNEL_CS, it is sufficient to load __KERNEL_CS into
cs whenever the CPU switches to Kernel Mode. The same argument goes
for pointers to kernel data structures (implicitly using the ds
register), as well as for pointers to user data structures (the kernel
explicitly uses the es register).
My understanding is the ss register contains the Segment Selector point to the base of the stack. Does ss register have anything to do with the pointer to an instruction that affects a data structure? If it doesn't, why mention it here?
Finally I make it clear what's the meaning of that paragraph. Actually, this piece of description demonstrates how segmentation works in Linux. It really has implicit objects of comparison--those systems exploit segmentation but not paging. How those systems works? Each process has different segment selectors in their logical address which point to different entries in global descriptor table. Each segment doesn't necessarily need to have the same base. In that case, when you save a pointer to an instruction or data structure, you really have to take care of its segment base. Notice that each logical address has a 16-bit segment selector and a 32-bit offset. If you only save the offset, it's not possible to find that pointer again because there are a lot of different segments in GDT. Things are different when it comes to Linux. All segment selectors have the same base 0. That means a pointer's offset is special enough for picking it up from memory. You might ask, does it work when there are lots of processes running there? It works! Remember each process has its Page Table which has the magical power to map same addresses to different physical addresses. Thanks for all of you who care this question!
My understanding is the ss register contains the Segment Selector
point to the base of the stack.
Right
Does ss register have anything to do with the pointer to an instruction to to a data structure?
No, ss register does not have anything to do with instructions that affect data segment.
If it doesn't, why mention it here?
Because ss register influences the result of instructions that affect the stack (eg : pop, push, etc.).
They are just explaining that Linux maintains also two stack segments (one for user mode, and one for kernel mode) as well as two data segments (one for user mode, and one for kernel mode).
As for the data segment if not updated when switching from user mode to kernel mode, the ss selector would still point to the user stack and the kernel would work with the user stack (would be very bad, right?). So the kernel takes care of updating the ss register as well as the ds register.
NB:
Let's recall that an instruction may access/modify bits in the data segment (mov to a memory address, ) as well as in the stack segment (pop, push, etc.)
Understanding Linux kernel (https://www.amazon.in/Understanding-Linux-Kernel-Process-Management-ebook/dp/B0043D2E54) mentions the following:
As stated earlier, the Current Privilege Level of the CPU indicates whether the processor is in User or Kernel Mode and is specified by the RPL field of the Segment Selector stored in the cs register. Whenever the CPL is changed, some segmentation registers must be correspondingly updated. For instance, when the CPL is equal to 3 (User Mode), the ds register must contain the Segment Selector of the user data segment,but when the CPL is equal to 0, the ds register must contain the Segment Selector of the kernel data segment.
A similar situation occurs for the ss register. It must refer to a User Mode stack inside the user data segment when the CPL is 3, and it must refer to a Kernel Mode stack inside the kernel data segment when the CPL is 0. When switching from User Mode to Kernel Mode, Linux always makes sure that the ss register contains the Segment Selector of the kernel data segment.
Based on the above, I have few questions:
1) What are the RPL in the segment selectors stored in the other segmentation registers used for?
2) When a system call is executing on behalf of a user process, the RPL in cs will be set to 3 (Difference between DPL and RPL in x86). In this case will the data segment (ds) contain __USER_DS instead of __KERNEL_DS, and if so how can the implementation of the system call have access to kernel data structures etc?
Ref. Linux kernel ARM Translation table base (TTB0 and TTB1)
I have father doubt/query on topic discussed in previous link:
0 to 0xbfffffff is a lower part of memory (for user processes) and managed by the page table in TTB0, it contains the page-table of the current process
Ref. arm/include/asm/pgtable-2level.h : PTRS_PER_PGD =2048, PTRS_PER_PMD =1, PTRS_PER_PTE =512
0xc0000000 to 0xffffffff is upper part (OS and memory-mapped I/O) of the address space managed/translated by the page table in TTBR1.
TTB1 table is fixed in size and alignment (to 16k). Each level 1 entry of size is 32bits and represents 1MB page/segment. This is swapper_pg_dir (ref System.map) page tables that placed 16K below the actual text address
Is that the first 768 entry in swapper_pg_dir = 0 (0x0 to 0xbfffffff for user processes) and valid entry from 768 to 1024(0xc0000000 to 0xffffffff is for OS and memory-mapped I/O)?
Anyone like to share some sample code in kernel space (kernel module) to browse this swapper_pg_dir PGD?
Because of how the ARM MMU was designed, both the translations tables (TTB0 and TTB1) can only be used in a 1:1 mapping kernel mapping.
Most Linux Kernels have a 3:1 mapping (3GB User space : 1GB Kernel space for ARM).
This means that 0-0xBFFFFFFF is user space while 0xC0000000 - 0xFFFFFFFF is kernel space.
Now for the HW memory translations, only TTBR0 is used. TTBR1 only holds the address of the initial swapper page (which contains all the kernel mappings) and isn't really used for virtual address translations. TTBR0 hold the address for the current used page directory (the page table that the HW is using for translations). Now each user process has their own page tables, and for each process switch, TTBR0 changes to the current user process page table (they are all located in kernel space).
For example, for each new user process, the kernel creates a new page directory, copies all the kernel mappings from the swapper page(page frames from 3-4GB) to the new page table and clears the user pages(page frames from 0-3GB). It then sets TTB0 to the base address of this page directory and flushes cache to install the new address space. The swapper page is also always kept up to date with changes to the mappings.
For your question:
Simplified, hardwarewise the first level page have 4096 entries. Each entry represent 1MB of virtual address, totalling 4GB of ram. Entry 0-3071 represent user space and entry 3072-4095 represent kernel space.
The swapper page is usually located at address 0xC0004000 - 0xc0008000 (4096 entries *4bytes each entry = 16384 =16kb in hex = 0x4000 ). By examing the memory at 0xc0004000-0xc0007000 you can find entries for user space (empty) and from 0xc0007000-0xc0008000 you can find kernel entries. I use gdb with the command line x /100x 0xc0007000 in order to examine the first 100 kernel entries. You can then examine the technical reference manual for your current platform in order to decipher the page table attributes.
If you want to learn more about the Linux kernel, I recommend you to use Qemu to simulate the Beagleboard together with gdb to examine and debug the source code. I did this to learn how the kernel builds the page table during initialization.
I am going through linux notes from one of the training institute here.
As per that when ever a process is created a region is allocated to it.
Region contains all the segments for the process.
Also region is specified by region-table. Region table contains following entry ;--
virtual address to - Physical address pointer + Disk Block Discriptor
Disk block descriptor point to the swap or exe file on disk.
two douts i have :-----
1> Where does the Global & Local Descriptor role is here.
http://www.google.co.in/imgres?um=1&hl=en&sa=N&tbo=d&biw=1366&bih=677&tbm=isch&tbnid=GSUGxm8x4QWQ1M:&imgrefurl=http://iakovlev.org/index.html%3Fp%3D945&docid=8Y36SIxwT17J6M&imgurl=http://iakovlev.org/images/intel/31.jpg&w=1534&h=1074&ei=oBX8UKuwBoHsrAer8YHQAw&zoom=1&iact=hc&vpx=79&vpy=377&dur=609&hovh=188&hovw=268&tx=150&ty=107&sig=103468883298920883665&page=1&tbnh=155&tbnw=221&start=0&ndsp=27&ved=1t:429,r:14,s:0,i:124
2> Does each process have its own global descriptor table ?
What i think is yes otherwise two processes vitual address will point towards same physical address .
Please suggest
1) The global descriptor table gives the base-address for the linear address. It is NEARLY always zero, and the "Limit" is set too "all ones" (that is, all addressable memory). In effect, the segment selectors are not actually used. The architecture requires them to be present and loaded, but the actual effect that they "can be used for" is not being used in Linux.
The Local descriptor table works exactly the same way, except there is a LDT per process. Typically it holds the stack segment of the task - it still has a base-address of zero. The process can modify the LDT, it can't modify the GDT.
To tell if the segment is GDT or LDT, look at bit 3 (the one worth 8) - for example, in my system ss has the value 0x2b, so it has bit 3 set. cs on the other hand is 33, so it's not got bit 3 set, and thus comes out of the GDT.
2) No. There is one GDT (per CPU core, to be precise) - that's why it's called "global" - there is one for everything. This is also why the stack segment is in the LDT, because there is one per process.