The address space is huge for the x86-64 even though 48-bit addresses are mainly used.
On x86 32-bit machines it was pretty clear how much RAM the kernel took up. Generally around 1 GB of ZONE_NORMAL is on the bottom of memory while everything else above the 1GB in PHYSICAL (not virtual) addresses were for ZONE_HIGHMEM (for user space). This would be a 3:1 split. Of course we can have configurations were we can have a 1:3, 2:2, etc. (by changing VM_SPLIT).
How much memory in RAM is for kernel space for 64 bit kernels?
I know the PAGE_OFFSET is set to a value far above physically addressable memory in x64 (for both 48 and 56). PAGE_OFFSET in x64 just describes the split in virtual address space, not physical (a 48 bit PAGE_OFFSET would be ffff888000000000 ).
So does 1 GB of memory house kernel space? 2GB? 3? Are there variable or macros that describe the size? Is it calculated?
Each user-space process can use its own 2^47 bytes (128 TiB) of virtual address space. Or more on a system with PML5 support.
The available physical RAM to back those pages is the total size of physical RAM, minus maybe 30 MiB or so that the kernel needs for its own code/data. (Not including the pagecache: Linux will use any spare pages as buffers and disk cache). This is mostly unrelated to virtual address-space limits.
1G is how much virtual address space a kernel used up. Not how much physical RAM.
The address-space question mattered for how much memory a single process could use at the same time, but the kernel can still use all your RAM for caching file data, etc. Unless you're finding the 2^(48-1) or 2^(57-1) bytes of the low half virtual address-space range cramped, there's no equivalent problem.
See the kernel's Documentation/x86/x86-64/mm.txt for the x86-64 virtual memory map. Also Why 4-level paging can only cover 64 TiB of physical address re: x86-64 Linux not doing inconvenient HIGHMEM stuff - the entire high half of virtual address space is reserved for the kernel, and it maps all the RAM because it's a kernel.
Virtual address space usage does indirectly set a 64 TiB limit on how much physical RAM the kernel can use, but if you have less than that there's no effect. Just like how a 32-bit kernel wasn't a problem if your machine had less than 1 or 2 GiB of RAM.
The amount of physical RAM actually reserved by the kernel depends on build options and modules, but might be something like 16 to 32 MiB.
Check dmesg output and look for something like this kernel log message from an x86-64 5.16.3-arch1 kernel I found in an old boot-log message.
Memory: 32538176K/33352340K available (14344K kernel code, 2040K rwdata, 8996K rodata, 1652K init, 4336K bss, 813904K reserved, 0K cma-reserved
Don't count the init (freed in after boot) or reserved parts; I'm pretty sure Linux doesn't actually reserve ~800 MiB in a way that makes it unusable for anything else.
Also look for the later Freeing unused decrypted memory: 2036K / Freeing unused kernel image (initmem) memory: 1652K etc. (That's the same size as the init part listed earlier, which is why you don't have to count it.)
It might also dynamically allocate some memory during startup; that initial "memory" line is just the sum of its .text, .data, and .bss sections, static code+data sizes.
On 64-Bit systems, the only limitation is on how much physical memory the kernel can use. The kernel will map all the available ram, and user space applications should be able to gain access to as much as the kernel can provide while maintaining sufficient for the kernel to operate.
Related
While I am trying to understand the high memory problem for 32-bit cpu and Linux, why is there no high-memory problem for 64-bit cpu?
In particular, how is the division of virtual memory into kernel space and user space changed, so that the requirement of high memory doesn't exist for 64-bit cpu?
Thanks.
A 32-bit system can only address 4GB of memory. In Linux this is divided into 3GB of user space and 1GB of kernel space. This 1GB is sometimes not enough so the kernel might need to map and unmap areas of memory which incurs a fairly significant performance penalty. The kernel space is the "high" 1GB hence the name "high memory problem".
A 64-bit system can address a huge amount of memory - 16 EB -so this issue does not occur there.
With 32-bit addresses, you can only address 2^32 bytes of memory (4GB). So if you have more that, you need to address it some special way. With 64-bit addresses, you can address 2^64 bytes of memory without special effort, and that number is way bigger than all the memory that exists on the planet.
That number of bits refers to the word size of the processor. Among other things, the word size is the size of a memory address on your machine. The size of the memory address affects how many bytes can be referenced uniquely. So doing some simple math we find that on a 32 bit system at most 2^32 = 4294967296 memory addresses exist, meaning you have a mathematical limitation to about 4GB of RAM.
However on a 64 bit system you have 2^64 = 1.8446744e+19 memory address available. This means that your computer can theoretically reference almost 20 exabytes of RAM, which is more RAM than anyone has ever needed in the history of computing.
From this post, I know the swap space is correlated to physical memory. So assume the physical memory and the swap space are both 4 GB. Although theoretically, the memory space of the 64-bit application is near to 2^64 (certainly, the kernel will occupy some space), but per my understanding, the actual memory the application can use is only 8 GB.
So my question is: for an application running on Unix/Linux, Is the maximum memory space it can use equals to (physical memory + swap space)?
This is a complicated question.
First of all, the theoretical virtual memory space of 64-bit system is 2^64. But in fact, neither the OS nor the CPU supports so big virtual memory space or physical RAM.
Current x86-64 CPUs (aka AMD64 and Intel's current 64-bit chips) actually use 48-bit address lines (AMD64) and 42-bit address lines (Intel), theoretically allowing 256 terabytes of physical RAM.
And Linux allows 128TB of virtual memory space per process on x86-64, and can theoretically support 64TB of physical RAM.
To your question, in an ideal case, the maximum virtual memory space a Linux process can use is just the Linux limitation of virtual memory space above. Even if your system has run out of all the swap space, leaved only 100MB of free RAM, your process can also make use of the entire memory space.
But your system may have some limitations for the virtual memory space request (malloc, which call brk/sbrk syscall). For example, Linux has a vm.overcommit_memory and vm.overcommit_ratio options to determine whether malloc will refuse in a process. See http://www.win.tue.nl/~aeb/linux/lk/lk-9.html.
However, the virtual memory space is not the real RAM + swap. Considering real RAM + swap, your opinion is right: a process will never use more real RAM + swap than that your system has. But in most cases, there will be a lot of processes exist in your system, so the RAM + swap your process can use is shrinked. If all the physical RAM + swap are going to be exhausted, the OOM killer will choose some process to kill.
When I check pagetypeinfo
cat /proc/pagetypeinfo
I see three types of memory zones;
DMA
DMA32
Normal
How Linux choose a memory zone to allocate a new page?
These memory zones are defined only for the 32 bit systems and not in the 64 bit.
Rememember these are the kernel accessible main memory we are talking about. In a 32 bit (4GB) system, the split between the kernel and the user space is 1:3. Meaning kernel can access 1GB and the user space 3GB. The kernel's 1GB is split as follows:
Zone_DMA (0-16MB): Permanently mapped into the kernel address space.
For compatibility reasons for older ISA devices that can address only the lower 16MB of main memory.
Zone_Normal (16MB-896MB): Permanently mapped into the kernel address space.
Many kernel operations can only take place using ZONE_NORMAL so it is the most performance critical zone and is the memory mostly allocated by the kernel.
ZONE_HIGH_MEM (896MB-above): not permanently mapped into the kernel's address space.
Kernel can access entire 4GB main memory. kernel's 1GB through Zone_DMA & Zone_Normal and user's 3GB through ZONE_HIGH_MEM. With Intel's Physical Address Extension (PAE), one gets 4 extra bits to address the main memory resulting in 36 bits, a total of 64GB of memory that can be accessed. The delta address space (36 bit address - 32 bit address) is where ZONE_HIGH_MEM is used to map to the user accessed main memory (ie between 2GB - 4GB).
Read more:
http://www.quora.com/Linux-Kernel/Why-is-there-ZONE_HIGHMEM-in-the-x86-32-Linux-kernel-but-not-in-the-x86-64-kernel
http://www.quora.com/Linux-Kernel/What-is-the-difference-between-high-memory-and-normal-memory
Linux 3/1 virtual address split
For every memory allocation request (for eg via kmalloc), based on the flags passed to the function,kernel selects the memory zone. these requests internally triggers the kernel function alloc_pages().
zonelist is an argument that gets passed to alloc_pages(), that
Points to a zonelist data structure describing, in order of preference, the mem-
ory zones suitable for the memory allocation.
refer the memory management chapter in book Understanding the Linux kernel
I am missing something when it comes to understanding the need for highmem to address more than 1GB of RAM. Could someone point out where I go wrong? Thanks!
What I know:
1 GB of a processes' virtual memory (high memory region) is reserved for kernel operations. The user space can use the remaining 3 GB. This is the 3/1 split.
The virtual memory features of the VM map the (continuous) virtual memory pages to physical pages (RAM).
What I don't know:
What operations use the kernel virtual memory? I suppose things like kmalloc(...) in kernel-space would use kernel virtual memory.
I would think that 4GB of RAM could be used under this scheme. I don't get why the kernel 1 GB virtual space is the limiting factor when addressing physical space. This is where my understanding breaks down. Please advise.
I've been reading this (http://kerneltrap.org/node/2450), which is great. But it doesn't quite address my question to my liking.
The reason that kernel virtual space is a limiting factor on useable physical memory is because the kernel needs access to all physical memory, and the way it accesses physical memory is through kernel virtual addresses. The kernel doesn't use special instructions that allow direct access to physical memory locations - it has to set up page table entries for any physical ranges that it wants to talk to.
In the "old style" scheme, the kernel set things up so that every process's page tables mapped virtual addresses from 0xC0000000 to 0xFFFFFFFF directly to physical addresses from 0x00000000 to 0x3FFFFFFF (these pages were marked so that they were only accessible in ring 0 - kernel mode). These are the "kernel virtual addresses". Under this scheme, the kernel could directly read and write any physical memory location without having to fiddle with the MMU to change the mappings.
Under the HIGHMEM scheme, the mappings from kernel virtual addresses to physical addresses aren't fixed - parts of physical memory are mapped in and out of the kernel virtual address space as the kernel needs access to that memory. This allows more physical memory to be used, but at the cost of having to constantly change the virtual-to-physical mappings, which is quite an expensive operation.
Mapping 1 GB to kernel in each process allows processes to switch to kernel mode without also performing a context switch. Responses to system calls such as read(), mmap() and others can then be appropriately processed in the calling process' address space.
If space for the kernel were not reserved in each process, switching to "kernel mode" in between executing user space code would be more expensive, and be unable to use virtual address mapping through the hardware MMU (memory management unit) for the system calls being serviced.
Systems running a 32bit kernel with more than 1GB of physical memory, are able to assign physical memory locations in ZONE_HIGHMEM (roughly above the 1GB mark), which can require the kernel to jump through hoops for certain operations to interact with them. The addition of PAE (physical address extension), extends this problem by allowing upto 64GB of physical memory, decreasing the ratio of memory within the 1GB physical address memory, to regions allocated in ZONE_HIGHMEM.
For example the system calls use the kernel space.
You can have 64GB of physical ram, but on 32-bit platforms processors can access only 4gb because of the 32-bit virtual addressing. Actually, you can have 1GB of RAM and 3GB of swap and virtual addressing will make it look like you have 4GB. On 64-bit platforms virtual addressing is practically unlimited.
In linux kernel, mem_map is the array which holds all "struct page" descriptors. Those pages includes the 128MiB memory in lowmem for dynamically mapping highmem.
Since the lowmem size is 1GiB, so the mem_map array has only 1GiB/4KiB=256KiB entries. If each entry size is 32 byte, then the mem_map memory size = 8MiB. But if we could use mem_map to map all 4GiB physical memory(if we have so much physical memory available on x86-32), then the mem_map array would occupy 32MiB, that is not a lot of kernel memory(or am i wrong?).
So my question is: why do we need to use that 128MiB in low for indirect highmem mapping in the first place? Or put another way, why not to map all those max 4GiB physical memory(if available) in the kernel space directly?
Note: if my understanding of the kernel source above is wrong, please correct. Thanks!
Look Here: http://www.xml.com/ldd/chapter/book/ch13.html
Kernel low memory is the 'real' memory map, addressed with 32-bit pointers on x86.
Kernel high memory is the 'virtual' memory map, addressed with virtual structures on x86.
You don't want to map it all into the kernel address space, because you can't always address all of it, and you need most of your memory for virtual memory segments (virtual, page-mapped process space.)
At least, that's how I read it. Wow, that's a complicated question you asked.
To throw more confusion, chapter 13 talks about some PCI devices not being able to address the 32-bit space, which was the genesis of my previous comment:
On x86, some kernel memory usage is limited to the first Gigabyte of memory bacause of DMA addressing concerns. I'm not 100% familiar with the topic, but there's a comapatibility mode for DMA on the PCI bus. That may be what you are looking at.
3.6 GB is not the ceiling when using physical address extension, which is commonly needed on most modern x86 boards, especially with memory hotplug.
Or put another way, why not to map all those max 4GiB physical
memory(if available) in the kernel space directly?
One reason is userspace: every usespace process have its own virtual address space. Suppose you have 4Gb of RAM on x86. So if we suggest that kernel owns 1Gb of memory (~800 directly mapped + ~200 vmalloc) all other ~3Gb should be dynamically distributed between processes spinning in user space. So how can you map your 4Gbs directly when you have a several address spaces?
why do we need zone_highmem on x86?
The reason is the same. Kernel reserves only ~800Mb for low mem. All other memory will be allocated and connected with particular virtual address only on demand. For example if you will execute a binary a new virtual address space will be created and some pages will be allocated for storing your binary code and data (heap ,stack ...). So the key attribute of high mem is to serve dynamic memory allocation requests, you never know in advance what will be triggered by userspace...