Is it faster when access the contiguous physical address than virtual address? - linux

What's the benifit of allocating a chunk of contiguous physical memory?
Is it faster when access the contiguous physical address than virtual address? And why?

All memory accesses from the CPU go through the MMU; the speed does not depend on the actual location of the pages in physical memory.
Physically contiguous memory is needed for other devices that access memory but are not able to remap pages.
In that case, the contiguous allocation is needed to make the device work to begin with, and is not a question of speed.

Related

Why is contiguous memory allocation is required in linux?

GPUs and VPUs need contiguous memory.
CMA and Static memory allocation are the examples of contiguous memory.
Why is contiguous memory required here?
Contiguous memory allocation (CMA) is needed for I/O devices that can only work with contiguous ranges of physical memory. I/O devices that can only work with continuous ranges are built that way in order to simplify the design of the device.
On systems with an I/O memory management unit (IOMMU), this would not be an issue because a buffer that is contiguous in the device address space can be mapped by the IOMMU to non-contiguous regions of physical memory. Also some devices can do scatter/gather DMA (i.e., can read/write from/to multiple non-contiguous buffers). Ideally, all I/O devices should be designed to either work behind an IOMMU or should be capable of scatter/gather DMA. Unfortunately, this is not the case and there are devices that require physically contiguous buffers. There are two ways for a device driver to allocate a contiguous buffer:
The device driver can allocate a chunk of physical memory at boot-time. This is reliable because most of the physical memory would be available at boot-time. However, if the I/O device is not used, then the allocated physical memory is just wasted.
A chunk of physical memory can be allocated on demand, but it may be difficult to find a contiguous free range of the required size. The advantage, though, is that memory is only allocated when needed.
CMA solves this exact problem by providing the advantages of both of these approaches with none of their downsides. The basic idea is to make it possible to migrate allocated physical pages to create enough space for a contiguous buffer. More information on how CMA works can be found here.

Why is kmalloc() more efficient than vmalloc()?

I think kmalloc() allocates continuous physical pages in the kernel because the virtual memory space is directly mapping to the physical memory space, by simply adding an offset.
However, I still don't understand why it is more efficient than vmalloc().
It still needs to go through the page table (the kernel page table), right? Because the MMU is not disabled when the process is switching to the kernel. So why Linux directly maps the kernel virtual space to the physical memory? What is the benefit?
In include/asm-x86/page_32.h, there is:
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
Why does the kernel need to calculate the physical address? It has to use the virtual address to access the memory anyway, right? I cannot figure out why the physical address is needed.
Your Queries :-
why is Kmalloc more efficient than vmalloc()?
kmalloc allocates a region of physically contiguous (also virtually contiguous) memory. The physical to virtual map is one-to-one.
For vmalloc(), an MMU/PTE value is allocated for each page; the physical to virtual mapping is not continuous.
vmalloc is often slower than kmalloc, because it may have to remap the buffer space into a virtually contiguous range. kmalloc never remaps.
why Linux directly maps the kernel virtual space to the physical memory?
There is one concept in linux kernel known as DMA(Direct Memory Access) which require contiguous physical memory. so when kernel trigger DMA operation we need to specify physically contiguous memory. that's why we need direct memory mapping.
Why the kernel needs to calculate the physical address? It has to use the virtual address to access the memory anyway, right?
for this question answer you need to read difference between virtual memory and physical memory. but in short, every load and store operation is performed on physical memory(You RAM on PC)
physical memory point to RAM.
virtual memory point to swap area of your HARD DISK.

Allocating Physically Contiguous Pages from User Space in Linux

I need to allocate physically contiguous pages from user space and get back the physical address. How can I do so? I need it physically contiguous because I'm working in-conjunction with hardware that needs that.
get_free_pages as I understood is Kernel function and returns virtual address

Virtual memory clarification - allocation of large contiguous memory

I have an application where I have to allocate on Windows (using operator new) quite a large memory space (hundreds of MB). The application is 32 bit (we don't use 64 bit for now, even on 64 bit systems) and I enabled /LARGEADDRESSAWARE linker option to be able to use 4 GB of user space memory.
Question If I need to allocate, say 450 MB of contiguous memory does the virtual address space of the process need to have a contiguous large enough space and additionally the physical memory does not have to be fragmented on the system ? I ask this because I can make it so that my application reserves a large enough contiguous space but don't know if other applications on the system can affect me in this way. Does OS page tables need to translate contiguous virtual addresses seen by the application into contiguous physical addresses ?
If the memory is simply used in your software, then your 450MB allocation will only need a hole of 450MB in the virtual space. It can be satisfied with pages from every corner of the memory system [as long as there is at least 450MB available somewhere in the system - including swapspace].
Your system will get a little bit better performance if the OS is able to allocate the pages in contiguous blocks of 2MB a piece [using "large pages" of 2MB at a time]. But the system will fall back to individual 4KB pages if it needs to.
One of several benefits with a paged memory architecture is that any physical page can be placed at any virtual address. In some systems, for example Xen virutalization manager in Debug mode, pages are INTENTIONALLY allocated out of sequence, to make it easier to detect when the system makes assumptions about memory pages being contiguous.
You don't need to be concerned about contiguity of the physical memory. That's one thing that virtual to physical address translation helps you with. As long as you can reserve a chunk of the address space and back it with physical memory, wherever it happens to be, things are going to work.

Linux 3/1 virtual address split

I am missing something when it comes to understanding the need for highmem to address more than 1GB of RAM. Could someone point out where I go wrong? Thanks!
What I know:
1 GB of a processes' virtual memory (high memory region) is reserved for kernel operations. The user space can use the remaining 3 GB. This is the 3/1 split.
The virtual memory features of the VM map the (continuous) virtual memory pages to physical pages (RAM).
What I don't know:
What operations use the kernel virtual memory? I suppose things like kmalloc(...) in kernel-space would use kernel virtual memory.
I would think that 4GB of RAM could be used under this scheme. I don't get why the kernel 1 GB virtual space is the limiting factor when addressing physical space. This is where my understanding breaks down. Please advise.
I've been reading this (http://kerneltrap.org/node/2450), which is great. But it doesn't quite address my question to my liking.
The reason that kernel virtual space is a limiting factor on useable physical memory is because the kernel needs access to all physical memory, and the way it accesses physical memory is through kernel virtual addresses. The kernel doesn't use special instructions that allow direct access to physical memory locations - it has to set up page table entries for any physical ranges that it wants to talk to.
In the "old style" scheme, the kernel set things up so that every process's page tables mapped virtual addresses from 0xC0000000 to 0xFFFFFFFF directly to physical addresses from 0x00000000 to 0x3FFFFFFF (these pages were marked so that they were only accessible in ring 0 - kernel mode). These are the "kernel virtual addresses". Under this scheme, the kernel could directly read and write any physical memory location without having to fiddle with the MMU to change the mappings.
Under the HIGHMEM scheme, the mappings from kernel virtual addresses to physical addresses aren't fixed - parts of physical memory are mapped in and out of the kernel virtual address space as the kernel needs access to that memory. This allows more physical memory to be used, but at the cost of having to constantly change the virtual-to-physical mappings, which is quite an expensive operation.
Mapping 1 GB to kernel in each process allows processes to switch to kernel mode without also performing a context switch. Responses to system calls such as read(), mmap() and others can then be appropriately processed in the calling process' address space.
If space for the kernel were not reserved in each process, switching to "kernel mode" in between executing user space code would be more expensive, and be unable to use virtual address mapping through the hardware MMU (memory management unit) for the system calls being serviced.
Systems running a 32bit kernel with more than 1GB of physical memory, are able to assign physical memory locations in ZONE_HIGHMEM (roughly above the 1GB mark), which can require the kernel to jump through hoops for certain operations to interact with them. The addition of PAE (physical address extension), extends this problem by allowing upto 64GB of physical memory, decreasing the ratio of memory within the 1GB physical address memory, to regions allocated in ZONE_HIGHMEM.
For example the system calls use the kernel space.
You can have 64GB of physical ram, but on 32-bit platforms processors can access only 4gb because of the 32-bit virtual addressing. Actually, you can have 1GB of RAM and 3GB of swap and virtual addressing will make it look like you have 4GB. On 64-bit platforms virtual addressing is practically unlimited.

Resources