Linux 3/1 virtual address split - linux

I am missing something when it comes to understanding the need for highmem to address more than 1GB of RAM. Could someone point out where I go wrong? Thanks!
What I know:
1 GB of a processes' virtual memory (high memory region) is reserved for kernel operations. The user space can use the remaining 3 GB. This is the 3/1 split.
The virtual memory features of the VM map the (continuous) virtual memory pages to physical pages (RAM).
What I don't know:
What operations use the kernel virtual memory? I suppose things like kmalloc(...) in kernel-space would use kernel virtual memory.
I would think that 4GB of RAM could be used under this scheme. I don't get why the kernel 1 GB virtual space is the limiting factor when addressing physical space. This is where my understanding breaks down. Please advise.
I've been reading this (http://kerneltrap.org/node/2450), which is great. But it doesn't quite address my question to my liking.

The reason that kernel virtual space is a limiting factor on useable physical memory is because the kernel needs access to all physical memory, and the way it accesses physical memory is through kernel virtual addresses. The kernel doesn't use special instructions that allow direct access to physical memory locations - it has to set up page table entries for any physical ranges that it wants to talk to.
In the "old style" scheme, the kernel set things up so that every process's page tables mapped virtual addresses from 0xC0000000 to 0xFFFFFFFF directly to physical addresses from 0x00000000 to 0x3FFFFFFF (these pages were marked so that they were only accessible in ring 0 - kernel mode). These are the "kernel virtual addresses". Under this scheme, the kernel could directly read and write any physical memory location without having to fiddle with the MMU to change the mappings.
Under the HIGHMEM scheme, the mappings from kernel virtual addresses to physical addresses aren't fixed - parts of physical memory are mapped in and out of the kernel virtual address space as the kernel needs access to that memory. This allows more physical memory to be used, but at the cost of having to constantly change the virtual-to-physical mappings, which is quite an expensive operation.

Mapping 1 GB to kernel in each process allows processes to switch to kernel mode without also performing a context switch. Responses to system calls such as read(), mmap() and others can then be appropriately processed in the calling process' address space.
If space for the kernel were not reserved in each process, switching to "kernel mode" in between executing user space code would be more expensive, and be unable to use virtual address mapping through the hardware MMU (memory management unit) for the system calls being serviced.
Systems running a 32bit kernel with more than 1GB of physical memory, are able to assign physical memory locations in ZONE_HIGHMEM (roughly above the 1GB mark), which can require the kernel to jump through hoops for certain operations to interact with them. The addition of PAE (physical address extension), extends this problem by allowing upto 64GB of physical memory, decreasing the ratio of memory within the 1GB physical address memory, to regions allocated in ZONE_HIGHMEM.

For example the system calls use the kernel space.
You can have 64GB of physical ram, but on 32-bit platforms processors can access only 4gb because of the 32-bit virtual addressing. Actually, you can have 1GB of RAM and 3GB of swap and virtual addressing will make it look like you have 4GB. On 64-bit platforms virtual addressing is practically unlimited.

Related

Reserve physical memory from kernel but keep logical mapping

I'm working on a 32-bit embedded system with 1GB RAM that has an unusual configuration. There exists a 32MB region of RAM(DDR) that is used by an external piece of hardware. This region has size and alignment requirements defined by the hardware and the physical address and size are fixed during boot time before Linux kernels execute. The system also has two independent CPUs (not 2 cores on the same CPU) each running Linux that share the same 1GB of RAM and also need to simultaneously access the 32MB region of RAM.
I need to give each CPU an arbitrary amount of RAM to run Linux and I also need to allow each CPU to access the 32MB region. I have got all of this working by specifying the memory map on the kernel command line (e.g. for CPU1 memmap=512M#0 and for CPU2 memmap=512M#512M) and also using memmap=32M$XXXX to reserve the 32MB region. When the kernels boot, I use ioremap map the 32MB region into each kernels virtual address space so it can access the region.
However, what I've found is that to maintain software compatibility with existing kernel drivers, I need the 32MB region to be mapped into both kernels' "logical" address space so that phys_to_virt and virt_to_phys will work. Of course, when using ioremap, the 32MB region maps into the vmalloc area, so phys_to_virt and virt_to_phys don't work as desired.
I'm not finding any way to accomplish what I need to do with existing command-line or build-configuration options.
From a high-level, I think the kernels' page table entries will have no translation where the gap in the physical/logical memory map was specified (i.e. memmap=32M$XXXX) and what I need to do is to make sure those page table entries have the same information as if the gap was never specified. Or find a way to "reserve" a specific region of physical RAM from the kernel after it has mapped the logical address space into the page table. Either way, any manipulation needs to be done before the kernel starts using the memory.

Why is kmalloc() more efficient than vmalloc()?

I think kmalloc() allocates continuous physical pages in the kernel because the virtual memory space is directly mapping to the physical memory space, by simply adding an offset.
However, I still don't understand why it is more efficient than vmalloc().
It still needs to go through the page table (the kernel page table), right? Because the MMU is not disabled when the process is switching to the kernel. So why Linux directly maps the kernel virtual space to the physical memory? What is the benefit?
In include/asm-x86/page_32.h, there is:
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
Why does the kernel need to calculate the physical address? It has to use the virtual address to access the memory anyway, right? I cannot figure out why the physical address is needed.
Your Queries :-
why is Kmalloc more efficient than vmalloc()?
kmalloc allocates a region of physically contiguous (also virtually contiguous) memory. The physical to virtual map is one-to-one.
For vmalloc(), an MMU/PTE value is allocated for each page; the physical to virtual mapping is not continuous.
vmalloc is often slower than kmalloc, because it may have to remap the buffer space into a virtually contiguous range. kmalloc never remaps.
why Linux directly maps the kernel virtual space to the physical memory?
There is one concept in linux kernel known as DMA(Direct Memory Access) which require contiguous physical memory. so when kernel trigger DMA operation we need to specify physically contiguous memory. that's why we need direct memory mapping.
Why the kernel needs to calculate the physical address? It has to use the virtual address to access the memory anyway, right?
for this question answer you need to read difference between virtual memory and physical memory. but in short, every load and store operation is performed on physical memory(You RAM on PC)
physical memory point to RAM.
virtual memory point to swap area of your HARD DISK.

What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?

The discussion below applies to 32-bit ARM Linux.
Suppose there are 512MB physical RAM in my system. For common configurations, all these 512MB physical RAM will be mapped via direct mapping by kernel(0xC000 0000 to 0xE000 0000).
Question is: kernel itself only uses part of these RAM; most of these RAM would be allocated to user space. Why bother mapping all these 512MB physical RAM in kernel's virtual space(0xC000 0000 to 0xE000 0000)? Why doesn't kernel just map part of these RAM for its only usage(say 64MB RAM)?
If physical RAM is greater than 1GB, things get a little complicated. Let's say directly-mapped area is 768MB in size. The result would be 768MB out of 1GB being directly mapped to kernel's virtual space. I guess the rest of the RAM(256MB) goes to two places: either high memory area or allocated by kernel to user space. But I still don't see any advantage of mapping so many physical RAM into kernel's virtual space.
Actually this question can be reduced to:
what are the drawbacks if kernel only directly maps a small part of physical RAM(say 64MB out of 512MB)?
Before further discussion, it is beneficial to know that
After MMU is turned on, every address issued by CPU is virtual
address.
If kernel wants to access ANY address in RAM, a mapping must be set up before the actual access happens.
If kernel only directly maps a small part of physical RAM, the cost is that every time kernel needs to access other parts of RAM, it needs to set up a temporary mapping before accessing that address and torn down that mapping after the access, which is very tedious and low efficiency.
If that mapping is set up in advance and is always there, it saves quite a lot of trouble for kernel.

How Linux kernel decide to which memory zone to use?

When I check pagetypeinfo
cat /proc/pagetypeinfo
I see three types of memory zones;
DMA
DMA32
Normal
How Linux choose a memory zone to allocate a new page?
These memory zones are defined only for the 32 bit systems and not in the 64 bit.
Rememember these are the kernel accessible main memory we are talking about. In a 32 bit (4GB) system, the split between the kernel and the user space is 1:3. Meaning kernel can access 1GB and the user space 3GB. The kernel's 1GB is split as follows:
Zone_DMA (0-16MB): Permanently mapped into the kernel address space.
For compatibility reasons for older ISA devices that can address only the lower 16MB of main memory.
Zone_Normal (16MB-896MB): Permanently mapped into the kernel address space.
Many kernel operations can only take place using ZONE_NORMAL so it is the most performance critical zone and is the memory mostly allocated by the kernel.
ZONE_HIGH_MEM (896MB-above): not permanently mapped into the kernel's address space.
Kernel can access entire 4GB main memory. kernel's 1GB through Zone_DMA & Zone_Normal and user's 3GB through ZONE_HIGH_MEM. With Intel's Physical Address Extension (PAE), one gets 4 extra bits to address the main memory resulting in 36 bits, a total of 64GB of memory that can be accessed. The delta address space (36 bit address - 32 bit address) is where ZONE_HIGH_MEM is used to map to the user accessed main memory (ie between 2GB - 4GB).
Read more:
http://www.quora.com/Linux-Kernel/Why-is-there-ZONE_HIGHMEM-in-the-x86-32-Linux-kernel-but-not-in-the-x86-64-kernel
http://www.quora.com/Linux-Kernel/What-is-the-difference-between-high-memory-and-normal-memory
Linux 3/1 virtual address split
For every memory allocation request (for eg via kmalloc), based on the flags passed to the function,kernel selects the memory zone. these requests internally triggers the kernel function alloc_pages().
zonelist is an argument that gets passed to alloc_pages(), that
Points to a zonelist data structure describing, in order of preference, the mem-
ory zones suitable for the memory allocation.
refer the memory management chapter in book Understanding the Linux kernel

why do we need zone_highmem on x86?

In linux kernel, mem_map is the array which holds all "struct page" descriptors. Those pages includes the 128MiB memory in lowmem for dynamically mapping highmem.
Since the lowmem size is 1GiB, so the mem_map array has only 1GiB/4KiB=256KiB entries. If each entry size is 32 byte, then the mem_map memory size = 8MiB. But if we could use mem_map to map all 4GiB physical memory(if we have so much physical memory available on x86-32), then the mem_map array would occupy 32MiB, that is not a lot of kernel memory(or am i wrong?).
So my question is: why do we need to use that 128MiB in low for indirect highmem mapping in the first place? Or put another way, why not to map all those max 4GiB physical memory(if available) in the kernel space directly?
Note: if my understanding of the kernel source above is wrong, please correct. Thanks!
Look Here: http://www.xml.com/ldd/chapter/book/ch13.html
Kernel low memory is the 'real' memory map, addressed with 32-bit pointers on x86.
Kernel high memory is the 'virtual' memory map, addressed with virtual structures on x86.
You don't want to map it all into the kernel address space, because you can't always address all of it, and you need most of your memory for virtual memory segments (virtual, page-mapped process space.)
At least, that's how I read it. Wow, that's a complicated question you asked.
To throw more confusion, chapter 13 talks about some PCI devices not being able to address the 32-bit space, which was the genesis of my previous comment:
On x86, some kernel memory usage is limited to the first Gigabyte of memory bacause of DMA addressing concerns. I'm not 100% familiar with the topic, but there's a comapatibility mode for DMA on the PCI bus. That may be what you are looking at.
3.6 GB is not the ceiling when using physical address extension, which is commonly needed on most modern x86 boards, especially with memory hotplug.
Or put another way, why not to map all those max 4GiB physical
memory(if available) in the kernel space directly?
One reason is userspace: every usespace process have its own virtual address space. Suppose you have 4Gb of RAM on x86. So if we suggest that kernel owns 1Gb of memory (~800 directly mapped + ~200 vmalloc) all other ~3Gb should be dynamically distributed between processes spinning in user space. So how can you map your 4Gbs directly when you have a several address spaces?
why do we need zone_highmem on x86?
The reason is the same. Kernel reserves only ~800Mb for low mem. All other memory will be allocated and connected with particular virtual address only on demand. For example if you will execute a binary a new virtual address space will be created and some pages will be allocated for storing your binary code and data (heap ,stack ...). So the key attribute of high mem is to serve dynamic memory allocation requests, you never know in advance what will be triggered by userspace...

Resources