mmap and kernel memory - linux

I understand from mmap() internals that a mmap read works by
- causing a page fault
- copying file data from disk to internal kernel buffer
- mapping the kernel buffer to user space
My questions are:
What happens to the kernel mapping to the buffer? if it still exists, dont we have a problem here of user application gaining access to kernel memory?
cant we run out of physical memory this way? I'd assume the kernel needs a minimum amount of physical memory to provide decent level of performance, and if we keep allocating it's buffers to mmapped user space buffer we'd eventually run out of buffers.
during a write, does the relevant memory gets mapped temporarily to a kernel buffer? if and this is a shared maping, another user process may access and again gain access to what is now kernel memory
Thanks, and sorry if these questions are pretty basic, but I did not find a clear answer.

I'm not a kernel hacker by any means, but this is what I've gathered:
I'm not entirely sure when it comes to the question of whether the kernel "relinquishes" its mapping to physical memory, since the kernel can access any physical memory it pleases. However, it would obviously be impermissible for the kernel to keep using that physical memory for its own purposes (e.g. as an internal pipe buffer) if user processes can access that memory as well, for the sake of both the user process and for the sake of the kernel. The kernel will simply designate those pages as part of the filesystem cache (if backed by a file) and not mess with them.
Yes, to the same extent that any process or number of processes can limit the amount of physical memory present for the kernel by requesting lots of resources like pipes. However, the kernel keeps track of how much physical memory is available and will start to page out userland memory to the disk when the remaining amount of physical memory runs low. Kernel memory itself typically should not be paged out to the disk for reasons including performance. Though the nice thing about mmap()ed memory backed by a file is that it's trivial to page out to the disk; no swap space needs to be allocated.
If you mean a write to available memory mapped to userland virtual address space (i.e. memcpy(), not write()), no. The whole point of mmap() is to map userland virtual address space to physical memory to allow reads and writes without resorting to system calls. Syncs to the disk will be performed directly by the kernel without additional copying to kernel buffers.

Related

Is msync always really needed when writing to /dev/mem?

I am using mmap to open /dev/mem for read/write into UART registers.
It works well but my question is :
After a write, is the msync system call with MS_SYNC flag really needed ?
From my understanding, /dev/mem is a virtual device than provide access to physical memory zones (UART registers in my case ) by translating virtual memory address and so give access to some physical memory from user space.
This is not a common file and i guess that modifications of registers are not buffered/cached. I actually would like to avoid this system call for performance reasons.
Thanks
My understanding is that msync() is needed to update the data in a normal file that is modified through a mapping created with mmap().
But when you use mmap on /dev/mem you are not mapping a normal file on disk, you are just mapping the desired hardware memory range directly into your process virtual address space, so msync() is off topic, it will do nothing.
The only thing that lies between your writing into your mmapped virtual space and the hardware device is the CPU cache. To force that you can force a cache flush (__clear_cache() maybe?), but that is usually unnecessary because the kernel identifies the memory mapped device register and disables the cache for that range. In X86 CPUs that is usually done with MTRR, but with ARM I don't know the details...

Understanding Memory Mapped Files

I have started reading about memory mapped IO and I'm having some difficulties grasping the concepts
This is what I have understood so far:
Each process has a virtual address space. Memory mapped files are allocated a
specific address range in the virtual address space, that maps to the same address on
the physical memory. This way, all the writes that are done by the disk controller on
the memory(through DMA) will be reflected to the process without any additional
copying. (In a non memory mapped file case, CPU will have to copy the contents over
to the buffer of the process).
My Doubts:
Is my understanding correct?
What will happen if there are multiple processes trying to mmap a
file and there is no continuous block of memory available for direct mapping?
The memory subsystem itself doesn't have any understanding of "files", which are an OS concept, and there have been some operating systems that didn't use files at all. You're close but a little off in your understanding of how mmap works.
Each process does have its own virtual address space, which may have very little to do with the physical memory (lots of virtual address space doesn't have any memory associated at all, ever, and virtual memory that's swapped out doesn't have any physical memory). The system uses some sort of lookup tables (called descriptor tables on x86) that specify what virtual address ranges map to what physical address ranges. Virtual memory that isn't "resident" (swapped out, mmapped but not loaded) has a "not present" entry.
Whenever a program tries to access this memory, the CPU causes a page fault, which tells the OS to go find the appropriate contents somewhere and load them into physical memory. In the case of swap, the contents are loaded out of a swap file or partition; in the case of mmap, they're loaded out of somewhere in the filesystem.
The mechanism for getting them into physical memory and updating the descriptor table can vary. What you're describing is DMA, which lets the drive controller copy contents directly into a block of physical memory, and zero-copy I/O, which is a technique where the OS just creates a new descriptor mapping telling the processor to "teleport" the region of physical memory into the program's address space. Neither is technically required for mmap (the OS could load the file "by hand" and copy it into a new buffer for the program, and this may happen in a read-copy-update situation), but modern systems do it like you described.
The physical memory doesn't necessarily have to be contiguous. When the POSIX version of mmap is called, the OS allocates length bytes for the mapping, but thanks to virtual memory, those bytes could be split up among multiple blocks and mapped together by the processor.
If multiple processes are trying to mmap the same file, the OS behavior depends on whether the access is read-only or read/write; read-only copies can be shared among many processes (such as the actual executable code; this is why even though Chrome may have dozens of processes running, the Chrome binary is only in memory once).

Where does virtual memory exist in linux?

As program is stored on flash/disk. For it execution, program is loaded into virtual memory and is mapped to RAM by virtual manager. During its execution process is in RAM. Then where does virtual memory exist (where it has all .text, .data, .stack, .heap)?
The virtual memory is a view of the RAM plus maybe some swap space provided by a virtual memory manager. Modern OSs have virtual memory managers and provide virtual memory to processes so that the executing program can behave as if it had a contiguous address space whose size is not limited by the actual RAM. The pages or blocks making up the virtual memory can be mapped anywhere in the RAM, so that contiguos virtual pages need to be stored in contiguos RAM areas. Or they can be swapped out to page space or swap space, waiting there until needed, whereupon they're read by the OS and mapped to some RAM page.
When you say
During its execution process is in RAM.
This is not entirely correct. Some or all memory pages that belong to the process may be swapped out, as explained.
One more word concerning the answers and comments that say that "virtual" means it doesn't exist. This makes no sense. On the contrary, according to Webster:
being such in essence or effect ...
Hence virtual memory is something (therefore, it exists!) that behaves as if it were memory.
Virtual memory is just like an illusion of RAM. It uses paging to acquire additional RAM that could be used by the processes in operating system.
Virtual memory means memory you can access with "normal" momory access methods, although it isn't clear where the data is actually stored.
It may be
actually in RAM
in a swap area
in another file (memory mapped file)
and access to it will be handled appropriately.
It is a layer of, well, virtualization so that you as a programmer don't have to worry about where the data is actually put.
The original purpose was mainly to be able to provide more memory to processes than we actually have and to extend it with means of swap space, but there are even more:
The OS is free to use the RAM for whatever it seems necessary, e. g. caching. Under some circumstances, it may be more effective to use RAM for cache than for holding parts of a program which hasn't been used for a long time.
Provide additional memory to a program when it requests it: if you call malloc(), the program's library may request the OS to provide a part of memory which can be attached seamlessly into the address space.
Avoid stack overflow: if the stack grows larger and larger, the respective memory section may be extended as well transparently so that the program won't have to worry about it.
A system can even do "overcommitment" of memory: if a process requests a large amount of memory, the OS may say "yes, ok", i. e. provide the memory to the program. That means in the first place "allow the program to access a certain address space area", but this address space is not immediately backed by memory. Only as soon as the program accesses this memory the mapping will be done, and if this cannot be fulfilled, the program is crashed by the Out of emory killer (at least, under Linux).
All this works by page-wise (1 page = 4 kiB) assignment of physical memory to a program, viewed via the program's address space, and this in the amount and frequency as it is needed.

For arm Linux, could threads in user space access virtual address of Kernel space?

Virtual memory is split two parts. In tradition, 0~3GB is for user space and 3GB~4GB for kernel space.
My question:
Could the thread in user space access memory of kernel space?
For ARM datasheet, the access attribution is in the charge of domain access control register. But in kernel source code,the domain value in page table entry of user space virtual memory is same as kernel space's page table entry.
In fact, your application might access page 0xFFFF0000, as it contains the swi-handler and a couple of other userspace-helpers. So no, the 3/1 split is nothing magical, it's just very easy for the kernel to manage.
Usually the kernel will setup all memory above 3GB to be only accessible by the kernel-domain itself. If a driver needs to share memory between user and kernel-space it will usually provide an mmap interface, which then creates an aliased mapping, so you have two virtual addresses for the same physical address. This only works reliably on VIPT-Cache systems or with a LOT of careful explicit cache flushing. If you don't want this you CAN hack the kernel to make a chunk of memory ABOVE the 3G-split accessible to userspace. But then all userspace applications will share this memory. I've done this once for a special application on a armv5-system.
Userspace code getting Kernel memory? The only kernel that ever allowed that was DOS and its archaic friends.
But back to the question, look at this example C code:
char c=42;
*c=42;
We take one byte (a char) and assign it the numeric value 42. We then dereference this non-pointer, which will probably try to access the 42nd byte of virtual memory, which is almost definitely not your memory, and, for the sake of this example, Kernel memory. guess what happens when you run this (if you manage to hold the compiler at gunpoint):
Segmentation fault
Linux has memory protection like any modern operating system. If you try to access the memory of another process, your process will be terminated before it can do anything (other things I'm not so sure about happen with debuggers though). Even if that memory was that of another Userland process, you would still get terminated. I'm almost sure that root programs can't access other programs memory, or Kernel memory. The only way to access Kernel memory is to be part of the Kernel, or indirectly through the kernel's cooperation.

Is heap allocated on memory pages?

In Linux x86-64 environment, is the entire process allocated on virtual memory pages? By entire process i mean the text, data, bss, heap and stack?
Also, when libc calls Brk, does the kernel returns memory that is managed via pages by virtual memory manager ?
Lastly, can a process get memory on heap, which is not managed by virtual memory manager, in other words, can a process get access to physical memory?
In Linux x86-64 environment, is the entire process allocated on virtual memory pages?
Yes, all processes have a virtual address space, i.e. have their own page table and virtual memory to physical memory mapping pattern.
Also, when libc calls Brk, does the kernel returns memory that is managed via pages by virtual memory manager ?
Yes, in fact, if you aren't hacking the OS kernel, virtual memory is transparent to you.
can a process get memory on heap, which is not managed by virtual memory manager, in other words, can a process get access to physical memory?
No, you can't manage physical memory per my knowledge unless you run your program without support from OS. Because process has its own virtual space, all your action related to memory management is on virtual memory.
A process has one or more tasks (scheduled by the kernel) which for a multi-threaded process are the processes' threads (and for a non-threaded process the task running the process), and it has an address space (and some other resources, e.g. opened file descriptors).
Of course, the address space is in virtual memory. The kernel is allowed to swap pages (to e.g. the swap zone of your disk). It tries hard to avoid doing that (swapping pages to disk is very slow, because the disk access time is in dozens of milliseconds, while the RAM access time is in tenth of microsecond).
text & bss etc are virtual memory segments, which are memory mappings. You can think of a process space as a memory map. The mmap(2) system call is the way to modify it. When an executable is started with execve system call, the kernel establish a few mappings (e.g for text, data, bss, stack, ...). The sbrk(2) system call also change it. Most malloc implementations use mmap (at least for big enough zones) and sometimes sbrk.
You can avoid that a memory range is swapped out by locking it into RAM using the mlock(2) syscall, which usually requires root privilege. It is rarely useful in practice (unless you code real-time applications). There is also the msync syscall (to flush memory to disk), you can of course map a portion of file into virtual memory (using mmap), you can change the protection with mprotect(2), remove map with munmap(2), extend a mapping with mremap -a Linux specific syscall-, and you could even catch the SIGSEGV signal and handle it (often in a machine specific way). The madvise(2) syscall enables you to tune paging with hints.
You can understand the memory map of a process of pid 1234 by reading the /proc/1234/maps file (or also /proc/1234/smaps). (From inside an application, you can use /proc/self/ instead of /proc/1234/ ...) I suggest you to run in a terminal:
cat /proc/self/maps
which will show you the memory map of the process running that cat command. You can also use the pmap utility.
Most recent linux kernels provide Adress Space Layout Randomization (so two similar processes running the same program on the same input have different mmap-ed & malloc-ed addresses). You could disable it thru /proc/sys/kernel/randomize_va_space
Except in very rare circumstances (uClinux), processes only see virtual memory, which is mapped to physical memory by the kernel.
The kernel can be asked to make specific mappings that give a predictable physical address for a given virtual address; you need the appropriate capability to do that however, as this breaks down the process separation.
On execve, the current mappings are replaced by the loadable segments from the ELF file specified; these are mapped so that referenced pages are loaded from the ELF file (some initial readahead is also performed). The brk system call mainly extends the non-executable mapping with the highest addresses (excluding the stack mapping) by a few pages, allowing the process to access more virtual addresses without being sent a SIGSEGV.
The heap is generally managed by the process internally, but the virtual address space assigned to heap objects must be known to the virtual memory manager beforehand in order to create a mapping. malloc will generally look into its internal tables for a region that is already mapped and usable, and if none can be found, use either brk() or mmap() to create more mappings.

Resources