High memory mappings in kernel virtual address space - linux

The linear address beyond 896MB correspond to High memory region ZONE_HIGHMEM.
So the page allocator functions will not work on this region, since they give the linear address of directly mapped page frames in ZONE_NORMAL and ZONE_DMA.
I am confused about these lines specified in Undertanding linux Kernel:
What do they mean when they say "In 64 bit hardware platforms ZONE_HIGHMEM is always empty."
What does this highlighted statement mean: "The allocation of high-memory page frames is done only through alloc_pages() function. These functions do not return linear address since they do not exist. Instead the functions return linear address of the page descriptor of the first allocated page frame. These linear addresses always exist, because all page descriptors are allocated in low memory once and forever during kernel initialization."
What are these Page descriptors and does the 896MB already have all page descriptors of entire RAM.

The x86-32 kernel needs high memory to access more than 1G of physical memory, as it is impossible to permanently map more than 2^{32} addresses within a 32-bit address space and the kernel/user split is 1G/3G.
The x86-64 kernel has no such limitation, as the amount of physically-addressable memory (currently 256T) fits within its 64-bit address space and thus may always be permanently mapped.
High memory is a hack. Ideally you don't need it. Indeed, the point of x86-64 is to be able to directly address all the memory you could possibly want. Taken
from https://www.quora.com/Linux-Kernel/What-is-the-difference-between-high-memory-and-normal-memory
I think page descriptor means struct page. And considering the sizeof struct page. Yes all of them can be stored in ZONE_NORMAL

Related

How to test if address is virtual or logical in linux kernel?

I am familiar that the Linux kernel memory is usually 1:1 mapped (up to a certain limit of the zone). From what I understood is that to make this 1:1 mapping more efficient the array of struct page is virtually mapped.
I wanted to check if that is the case. Is there a way to test if - given an address (lets say that of a struct page) check if it is 1:1 mapped or virtually mapped?
The notion of address space for a 64-bit machine emcompasses 2^64 addresses. This is far larger than any modern amount of physical memory in one machine. Therefore, it is possible to have the entire physical memory mapped into the address space with plenty of room to spare. As discussed in this post and shown here, Linux leaves 64 TB of the address space for the physical mapping. Therefore, if the kernel needed to iterate through all bytes in physical memory, it could just iterate through addresses 0+offset to total_bytes_of_RAM + offset, where offset is the address where the direct mapping starts (ffff888000000000 in the 64 bit memory layout linked above). Also, this direct mapping region is within the kernel address range that is "shared between all processes" so addresses in this range should always be logical.
Your post has two questions: one is how to test if an address is logical or virtual. As I mentioned, the answer is if the address falls within the direct mapping range, then it is logical. Otherwise it is virtual. If it is a virtual address, then obtaining the physical address through the page tables should allow you to access the address logically by following the physical_addr + offset math as mentioned above.
Additionally, kmalloc allocates/reserves memory directly using this logical mapping, so you immediately know that if the address you're using came from kmalloc, it is a logical address. However, vmalloc and any user-space memory allocations use virtual addresses that must be translated to get the logical equivalent.
Your second question is whether "logically mapped pages" can be swapped out. The question should be rephrased because technically all pages that are in RAM are logically mapped in that direct mapping region. And yes certain pages in main memory can be swapped out or kicked out to be used by another page in the same page frame. Now, if you're asking whether pages that are only mapped logically and not virtually (like with kmalloc, which gets memory from slab) can be swapped out, I think the answer is that they can be reclaimed if not being used, but aren't generally swapped out. Kernel pages are generally not swapped out, except for hibernation.

Virtual Address Space

I have started to learn about Virtual Address Space (VAS) and I have few questions:
How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)?
Is VAS for each process created on hard disk? If so, what happens if there is not enough space?
What is the difference between VAS and Virtual Memory (VM)?
Virtual address versus physical address
During the execution of your program, the variables (integers, arrays, strings, etc.) are stored somewhere in the main memory of your computer (RAM). Some programming languages (like C or C++) allow you to obtain the memory address at which a given variable is stored (with the & operator), and to manipulate that address (add to it, subtract from it, print it, etc.).
Here is a C program that prints the memory address of a variable:
#include <stdio.h>
int main(void) {
int variable = 1234;
void *address = &variable;
printf("Memory address of variable: %p\n", address);
return 0;
}
Output:
Memory address of variable: 0x7ffc9e9662a4
Now, if you compile and execute this program on a typical desktop computer, with a typical operating system (like GNU/Linux or Windows), the memory address that is printed by this program is not the hardware address at which the data 1234 is actually located in the memory chip. This may be surprising, but there is a level of indirection between the addresses used by your program and the hardware addresses.
Virtual address space on 64-bit computers
On a 64-bit computer, a memory address manipulated by your program is an integer between 0 and 18446744073709551615 inclusive. Such an address is called a virtual memory address. The range of those addresses is called the virtual address space of the process. You can ask the operating system to map a range of virtual memory addresses to the physical memory of your computer, so that when you try to read or write bytes at thoses addresses, your program doesn't crash for accessing unmapped virtual memory addresses.
Typically, on x86-64 computers, only 248 virtual memory addresses can be successfully mapped to physical memory, because 256 TiB of usable virtual address space is considered sufficient. In the future, processor manufacturers may raise or remove this limit if there is a need for it.
Virtual address space on 32-bit computers
On 32-bit computers, there are 232 virtual memory addresses. On those computers, a memory address manipulated by your program is an integer between 0 and 4294967295 inclusive.
On x86 32-bit computers, there is usually no restriction on the range of virtual memory addresses that can be mapped to physical memory addresses.
Mapping a range of virtual memory addresses
On GNU/Linux, you can request a mapping by calling the function mmap(). On Windows, you can request a mapping by calling the function VirtualAlloc(). Those functions take the size of the mapping as argument, and return the first virtual address that is now backed by actual physical memory. Those functions can fail to create a new mapping if the physical memory is already completely used by other processes. And again, if you try to access (read or write) the content of a virtual memory address that is outside an area mapped by mmap() or VirtualAlloc(), the operating system will terminate your program (by sending a segmentation fault signal).
On GNU/Linux, a process can examine the mappings created in its virtual address space just by reading the file /proc/self/maps. You can learn a lot by reading the output of the command cat /proc/self/maps.
Hard disk drive
On a typical computer, the main memory is a semiconductor memory, and the hard disk drive is only a secondary storage device.
On a typical operating system, a range of virtual memory addresses can only be mapped to the main memory (which is usually a semiconductor memory device). Such a range cannot be directly mapped to a secondary storage device (usually a hard disk drive) without using the main memory as intermediary.
On an n-bit machine, the VAS is 2n bytes large. So, on a 32-bit machine, the VAS 232 = 4 GiB large.
Virtual memory is not created on disk. In fact, the existence of a disk is not needed for implementing virtual memory. Most implementations of virtual memory are paged. So, when a 4 GiB VAS is created, only the pages that are needed are mapped into that VAS. For example, suppose a process only uses 16 pages of memory on a 32-bit system with 4k-sized pages. Despite having a 4 GiB VAS, only 16 * 4k = 216 bytes of memory are mapped into the VAS. The rest of the memory is unmapped. If the CPU tries to access this unmapped memory, a segmentation fault will occur. If a process wants to map memory at this address, then (in a POSIX-complaint OS) it can request the mapping from the OS using mmap(2). This will make a lot more sense once you learn about page tables.
Virtual memory is a concept. A virtual address space is an entity that stems from the concept of virtual memory. These terms go hand in hand, but refer to different things.
I will list a couple of caveats.
Caveat 1.1
I am not not aware of any 64-bit processor that truly supports a 64-bit VAS. The addresses themselves are 64 bits wide, but a certain number of upper bits are ignored. AMD's first implementation of x86_64 only supported 48-bit addresses. The upper 16 bits of an address were effectively ignored. In such a system, the addresses are 64 bits wide, but the real size of the VAS is limited to 248 bytes. Subsequent architectures supported 56-bit addresses.
Caveat 1.2
If a processor supports PAE, then the VAS on an n-bit machine may be larger than 2n bytes. This is how 32-bit processors can support VASs larger than 4 GiB.
Caveat 2.1
Not really a caveat, but this is related to your question. You asked what happens when there isn't enough space on disk to create a VAS. As I mentioned in the main answer, the VAS is not created on disk. However, any computer only has a finite amount of physical memory. What happens when a process requests a page be mapped, but there is no physical memory available? There are there several ways to handle this:
Swapping is done by temporarily moving a page that is mapped in virtual memory to disk. The entire contents of the page are copied to disk. Then, the process that requested the page has the physical page mapped into their memory. Eventually, the old page may be requested. If this occurs, then the OS copies the page from disk and remaps it into the corresponding VAS. This is what Linux and most modern operating systems do.
The process is simply told there is no memory available, for example, through an error number like ENOMEM.
The process is blocked until memory is available. I haven't seen this in practice.
Swapping implies the use of a disk, but virtual memory does not imply the use of swapping, hence a disk is not necessary for virtual memory.
Virtual Address Space - wikipedia
When a new application on a 32-bit OS is executed, the process has a 4 GiB VAS: each one of the memory addresses (from 0 to 232 − 1) in that space can have a single byte as a value. Initially, none of them have values.
For n-bit OS, these n-address lines allow address space upto 2n addresses, i.e., 0 to 2n - 1. This would mean 16 EiB for 64-bit OS. (Though in actual implementations, less space is used as this much space is unnecessary.)
CPU Cache - wikipedia
Most general purpose CPUs implement some form of virtual memory. To summarize, either each program running on the machine sees its own simplified address space, which contains code and data for that program only, or all programs run in a common virtual address space. A program executes by calculating, comparing, reading and writing to addresses of its virtual address space, rather than addresses of physical address space, making programs simpler and thus easier to write.
For example, in C++, program memory is divided in stack, heap, data, code. I'm not sure if analogy is correct (may be), but it somewhat presents an insight if you're aware.
Virtual memory - wikipedia
In computing, virtual memory is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine"3 which "creates the illusion to users of a very large (main) memory".[4]
The computer's operating system, using a combination of hardware and software, maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory.
Address translation hardware in the CPU, often referred to as a memory management unit (MMU), automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.
If you know about computer architecture (which I'm sure you do from the question), it'd be clarified by now.
Still, for anyone in general, I'm giving a bit of explanation.
Assume addresses as pointers in C++. If you don't know C++, closest analogy would be array/list indices in any language. Now the addresses point to the memory locations, just like pointers point to the variable. The actual data is stored in the variable. To get the variable data using pointer/index, you provide address location from where the data is to be extracted. Now in physical memory, there won't be a thing like a variable. There is memory and it's location address through which it is accessed.
The real memory is physical memory, which is the hard disks. It is accessed with physical addresses, which would be unique for each byte.
Accessing physical memory directly with physical addresses would be cumbersome. Thus the addresses are simplified by the OS to virtual addresses. These addresses may or may not be unique (these aren't physical addresses, remember). Thus, multiple virtual addresses may point to same location.
The virtual memory is not actually existent, rather it's just a concept of physical memory simplified using virtual addresses to give the user an illusionous space say where next memory location is stored at next address (virtual address to be precise).
Since multiple virtual addresses can be mapped, by using MMU, to same physical address, and thus to point to same phyical memory location, the virtual memory size can be made to exceed the physical memory size (virtually). But effectively, the memory size would still be same as physical.
Thus, to access a memory data, Virtual Addresses are specified by user/program to OS, which are converted to Physical Addresses by memory management unit (mmu) and then applied to the address lines of the computer architecture (electronics spotted!!), which yields the data at the corresponding physical location. And this concept is called Virtual Memory.
-Himanshu

Why __GFP_HIGHMEM flag can't be applied to the __get_free_page() or kmalloc()

I want to know basically the two things
How does the kmalloc works i mean which function kmalloc calls to allocate memory is it alloc_pages() or __ger_free_pages().
Why Why __GFP_HIGHMEM flag can't be applied to the __get_free_page() or kmalloc()
I got the folowing extract from the LKD Robert Love can any body better explain that what is exact probelm with the alloc_pages() while giving __GFP_HIGHMEM flag.
Page # 240 CHAPTER 12
You cannot specify __GFP_HIGHMEM to either __get_free_pages() or
kmalloc(). Because these both return a logical address, and not a page
structure, it is possible that these functions would allocate memory
not currently mapped in the kernel’s virtual address space and, thus,
does not have a logical address. Only alloc_pages() can allocate high
memory.The majority of your allocations, however, will not specify a
zone modifier because ZONE_NORMAL is sufficient.
As explained in the book Linux Device Drivers 3rd edition (freely available here), "the Linux kernel knows about a minimum of three memory zones: DMA-capable memory, normal memory, and high memory". The __GFP_HIGHMEM flag indicates that "the allocated memory may be located in high memory". This flag has a platform-dependent role, although its usage is valid on all platforms.
Now, as explained here, "high Memory is the part of physical memory in a computer which is not directly mapped by the page tables of its operating system kernel". This zone of memory is not mapped in the kernel's virtual address space, and this prevents the kernel from being capable of directly referring it. Unfortunately, the memory used for kernel-mode data structures must be direct-mapped in the kernel, and therefore cannot be in the HIGHMEM zone.

kmalloc() functionality in linux kernel

I did come across through the LDD book that using the kmalloc we can allocate from high memory. I have one basic question here.
1)But to my knowledge we can't access the high memory directly from the kernel (unless it is mapped to the kernel space through the kmap()). And i didn't see any mapping area reserved for kmalloc(), But for vmalloc() it is present.So, to which part of the kernel address does the kmalloc() will map if allocated from high memory?
This is on x86 architecture,32bit system.
My knowledge may be out of date but the stack is something like this:
kmalloc allocates physically contiguous memory by calling get_free_pages (this is what the acronym GFP stands for). The GFP_* flags passed to kmalloc end up in get_free_pages, which is the page allocator.
Since special handling is required for highmem pages, you won't get them unless you add the GFP_HIGHMEM flag to the request.
All memory in Linux is virtual (a generalization that is not exactly true and that is architecture-dependent, but let's go with it, until the next parenthesized statement in this paragraph). There is a range of memory, however, that is not subject to the virtualization in the sense of remapping of pages: it is just a linear mapping between virtual addresses and physical addresses. The memory allocated by get_free_pages is linearly mapped, except for high memory. (On some architectures, linear mappings are supported for memory ranges without the use of a MMU: it's just a simple arithmetic translation of logical addresses to physical: add a displacement. On other architectures, linear mappings are done with the MMU.)
Anyway, if you call get_free_pages (directly, or via kmalloc) to allocate two or more pages, it has to find physically contiguous ones.
Now virtual memory is also implemented on top of get_free_pages, because we can take a page allocated that way, and install it into a virtual address space.
This is how mmap works and everything else for user space. When a piece of virtual memory is committed (becomes backed by a physical page, on a page fault or whatever), a page comes from get_free_pages. Unless that page is highmem, it has a linear mapping so that it is visible in the kernel. Additionally, it is wired into the virtual address space for which the request is being made. Some kernel data structures keep track of this, and of course it is punched into the page tables so the MMU makes it happen.
vmalloc is similar in principle to mmap, but far simpler because it doesn't deal with multiple backends (devices in the filesystem with a mmap virtual function) and doesn't deal with issues like coalescing and splitting of mappings that mmap allows. The vmalloc area consists of a reserved range of virtual addresses visible only to the kernel (whose base address and is architecture dependent and can be tweaked by you at kernel compile time). The vmalloc allocator carves out this virtual space, and populates it with pages from get_free_pages. These need not be contiguous, and so can be obtained one at a time, and wired into the allocated virtual space.
Highmem pages are physical memory that is not addressable in the kernel's linear map representing physical memory. Highmem exists because the kernel's linear "window" on physical memory isn't always large enough to cover all of memory. (E.g. suppose you have a 1GB window, but 4GB of RAM.) So, for coverage of all of memory, there is in addition to the linear map, some smaller "non linear" map where pages they are selectively made visible, on a temporary basis using kmap and kunmap. Placement of a page into this view is considered the acquisition of a precious resource that must be used sparingly and released as soon as possible.
A highmem page can be installed into a virtual memory map just like any other page, and no special "highmem" handling is needed for that view of the page. Any map: that of a process, or the vmalloc range.
If you're dealing with some virtual memory that could be a mixture of highmem and non-highmem pages, which you have to view through the kernel's linear space, you have to be prepared to use the mapping functions.

why do we need zone_highmem on x86?

In linux kernel, mem_map is the array which holds all "struct page" descriptors. Those pages includes the 128MiB memory in lowmem for dynamically mapping highmem.
Since the lowmem size is 1GiB, so the mem_map array has only 1GiB/4KiB=256KiB entries. If each entry size is 32 byte, then the mem_map memory size = 8MiB. But if we could use mem_map to map all 4GiB physical memory(if we have so much physical memory available on x86-32), then the mem_map array would occupy 32MiB, that is not a lot of kernel memory(or am i wrong?).
So my question is: why do we need to use that 128MiB in low for indirect highmem mapping in the first place? Or put another way, why not to map all those max 4GiB physical memory(if available) in the kernel space directly?
Note: if my understanding of the kernel source above is wrong, please correct. Thanks!
Look Here: http://www.xml.com/ldd/chapter/book/ch13.html
Kernel low memory is the 'real' memory map, addressed with 32-bit pointers on x86.
Kernel high memory is the 'virtual' memory map, addressed with virtual structures on x86.
You don't want to map it all into the kernel address space, because you can't always address all of it, and you need most of your memory for virtual memory segments (virtual, page-mapped process space.)
At least, that's how I read it. Wow, that's a complicated question you asked.
To throw more confusion, chapter 13 talks about some PCI devices not being able to address the 32-bit space, which was the genesis of my previous comment:
On x86, some kernel memory usage is limited to the first Gigabyte of memory bacause of DMA addressing concerns. I'm not 100% familiar with the topic, but there's a comapatibility mode for DMA on the PCI bus. That may be what you are looking at.
3.6 GB is not the ceiling when using physical address extension, which is commonly needed on most modern x86 boards, especially with memory hotplug.
Or put another way, why not to map all those max 4GiB physical
memory(if available) in the kernel space directly?
One reason is userspace: every usespace process have its own virtual address space. Suppose you have 4Gb of RAM on x86. So if we suggest that kernel owns 1Gb of memory (~800 directly mapped + ~200 vmalloc) all other ~3Gb should be dynamically distributed between processes spinning in user space. So how can you map your 4Gbs directly when you have a several address spaces?
why do we need zone_highmem on x86?
The reason is the same. Kernel reserves only ~800Mb for low mem. All other memory will be allocated and connected with particular virtual address only on demand. For example if you will execute a binary a new virtual address space will be created and some pages will be allocated for storing your binary code and data (heap ,stack ...). So the key attribute of high mem is to serve dynamic memory allocation requests, you never know in advance what will be triggered by userspace...

Resources