Is msync always really needed when writing to /dev/mem? - linux

I am using mmap to open /dev/mem for read/write into UART registers.
It works well but my question is :
After a write, is the msync system call with MS_SYNC flag really needed ?
From my understanding, /dev/mem is a virtual device than provide access to physical memory zones (UART registers in my case ) by translating virtual memory address and so give access to some physical memory from user space.
This is not a common file and i guess that modifications of registers are not buffered/cached. I actually would like to avoid this system call for performance reasons.
Thanks

My understanding is that msync() is needed to update the data in a normal file that is modified through a mapping created with mmap().
But when you use mmap on /dev/mem you are not mapping a normal file on disk, you are just mapping the desired hardware memory range directly into your process virtual address space, so msync() is off topic, it will do nothing.
The only thing that lies between your writing into your mmapped virtual space and the hardware device is the CPU cache. To force that you can force a cache flush (__clear_cache() maybe?), but that is usually unnecessary because the kernel identifies the memory mapped device register and disables the cache for that range. In X86 CPUs that is usually done with MTRR, but with ARM I don't know the details...

Related

How to get writes via an mmap mapped memory pointer to flush immediately?

I'm having what appears to be a caching problem when using /dev/mem with mmap on a dual ARM processor system (Xilinx Zynq, to be exact). My configuration is asymmettric, with one processor running Linux and the other processor running a bare metal application. They communicate through a block of RAM that isn't in the Linux virtual memory space (it was excluded by the devicetree file). When my userspace Linux application writes to memory using the pointer returned from mmap(), it can take anywhere from 100 ms to well over a second for the second processor to detect the changed memory content.
On the open() call to /dev/mem, I tried to specify O_RDRW, O_SYNC, and O_DIRECT, but the O_DIRECT caused the open to fail, so I removed O_DIRECT. I thought O_SYNC should have guaranteed that data was written to memory before the write() call returned, but I'm using a memory pointer instead of writing through write(). I don't see any parameters on the mmap() call that would seem to address caching issues.
I've tried calling fsync(fd) and fdatasync() after writing to memory, but that didn't change the behavior.
What DID seem to work was spawning this command immediately after the memory write:
sync; echo 3 /proc/sys/vm/drop_caches
What is the simplest way to get writes via a mapped memory pointer to flush immediately?
fsync, etc. all synchronize the memory mapped region to the backing block device (e.g., file).
They do not affect the CPU data cache. You will either need to use explicit cache clean calls to flush the CPU cache to DRAM or you will have to use the ACP port.
The ACP port is supposed to be cache coherent, but I've never gotten it to work.
Here's an answer for how to flush the cache. I believe that code needs to go in your device driver. We have that code packaged in a generic "portalmem" driver. It enables your application to allocate memory that you can share with your hardware, and it provides an ioctl for flushing the cache after your application writes to it.

Understanding Memory Mapped Files

I have started reading about memory mapped IO and I'm having some difficulties grasping the concepts
This is what I have understood so far:
Each process has a virtual address space. Memory mapped files are allocated a
specific address range in the virtual address space, that maps to the same address on
the physical memory. This way, all the writes that are done by the disk controller on
the memory(through DMA) will be reflected to the process without any additional
copying. (In a non memory mapped file case, CPU will have to copy the contents over
to the buffer of the process).
My Doubts:
Is my understanding correct?
What will happen if there are multiple processes trying to mmap a
file and there is no continuous block of memory available for direct mapping?
The memory subsystem itself doesn't have any understanding of "files", which are an OS concept, and there have been some operating systems that didn't use files at all. You're close but a little off in your understanding of how mmap works.
Each process does have its own virtual address space, which may have very little to do with the physical memory (lots of virtual address space doesn't have any memory associated at all, ever, and virtual memory that's swapped out doesn't have any physical memory). The system uses some sort of lookup tables (called descriptor tables on x86) that specify what virtual address ranges map to what physical address ranges. Virtual memory that isn't "resident" (swapped out, mmapped but not loaded) has a "not present" entry.
Whenever a program tries to access this memory, the CPU causes a page fault, which tells the OS to go find the appropriate contents somewhere and load them into physical memory. In the case of swap, the contents are loaded out of a swap file or partition; in the case of mmap, they're loaded out of somewhere in the filesystem.
The mechanism for getting them into physical memory and updating the descriptor table can vary. What you're describing is DMA, which lets the drive controller copy contents directly into a block of physical memory, and zero-copy I/O, which is a technique where the OS just creates a new descriptor mapping telling the processor to "teleport" the region of physical memory into the program's address space. Neither is technically required for mmap (the OS could load the file "by hand" and copy it into a new buffer for the program, and this may happen in a read-copy-update situation), but modern systems do it like you described.
The physical memory doesn't necessarily have to be contiguous. When the POSIX version of mmap is called, the OS allocates length bytes for the mapping, but thanks to virtual memory, those bytes could be split up among multiple blocks and mapped together by the processor.
If multiple processes are trying to mmap the same file, the OS behavior depends on whether the access is read-only or read/write; read-only copies can be shared among many processes (such as the actual executable code; this is why even though Chrome may have dozens of processes running, the Chrome binary is only in memory once).

mmap and kernel memory

I understand from mmap() internals that a mmap read works by
- causing a page fault
- copying file data from disk to internal kernel buffer
- mapping the kernel buffer to user space
My questions are:
What happens to the kernel mapping to the buffer? if it still exists, dont we have a problem here of user application gaining access to kernel memory?
cant we run out of physical memory this way? I'd assume the kernel needs a minimum amount of physical memory to provide decent level of performance, and if we keep allocating it's buffers to mmapped user space buffer we'd eventually run out of buffers.
during a write, does the relevant memory gets mapped temporarily to a kernel buffer? if and this is a shared maping, another user process may access and again gain access to what is now kernel memory
Thanks, and sorry if these questions are pretty basic, but I did not find a clear answer.
I'm not a kernel hacker by any means, but this is what I've gathered:
I'm not entirely sure when it comes to the question of whether the kernel "relinquishes" its mapping to physical memory, since the kernel can access any physical memory it pleases. However, it would obviously be impermissible for the kernel to keep using that physical memory for its own purposes (e.g. as an internal pipe buffer) if user processes can access that memory as well, for the sake of both the user process and for the sake of the kernel. The kernel will simply designate those pages as part of the filesystem cache (if backed by a file) and not mess with them.
Yes, to the same extent that any process or number of processes can limit the amount of physical memory present for the kernel by requesting lots of resources like pipes. However, the kernel keeps track of how much physical memory is available and will start to page out userland memory to the disk when the remaining amount of physical memory runs low. Kernel memory itself typically should not be paged out to the disk for reasons including performance. Though the nice thing about mmap()ed memory backed by a file is that it's trivial to page out to the disk; no swap space needs to be allocated.
If you mean a write to available memory mapped to userland virtual address space (i.e. memcpy(), not write()), no. The whole point of mmap() is to map userland virtual address space to physical memory to allow reads and writes without resorting to system calls. Syncs to the disk will be performed directly by the kernel without additional copying to kernel buffers.

For arm Linux, could threads in user space access virtual address of Kernel space?

Virtual memory is split two parts. In tradition, 0~3GB is for user space and 3GB~4GB for kernel space.
My question:
Could the thread in user space access memory of kernel space?
For ARM datasheet, the access attribution is in the charge of domain access control register. But in kernel source code,the domain value in page table entry of user space virtual memory is same as kernel space's page table entry.
In fact, your application might access page 0xFFFF0000, as it contains the swi-handler and a couple of other userspace-helpers. So no, the 3/1 split is nothing magical, it's just very easy for the kernel to manage.
Usually the kernel will setup all memory above 3GB to be only accessible by the kernel-domain itself. If a driver needs to share memory between user and kernel-space it will usually provide an mmap interface, which then creates an aliased mapping, so you have two virtual addresses for the same physical address. This only works reliably on VIPT-Cache systems or with a LOT of careful explicit cache flushing. If you don't want this you CAN hack the kernel to make a chunk of memory ABOVE the 3G-split accessible to userspace. But then all userspace applications will share this memory. I've done this once for a special application on a armv5-system.
Userspace code getting Kernel memory? The only kernel that ever allowed that was DOS and its archaic friends.
But back to the question, look at this example C code:
char c=42;
*c=42;
We take one byte (a char) and assign it the numeric value 42. We then dereference this non-pointer, which will probably try to access the 42nd byte of virtual memory, which is almost definitely not your memory, and, for the sake of this example, Kernel memory. guess what happens when you run this (if you manage to hold the compiler at gunpoint):
Segmentation fault
Linux has memory protection like any modern operating system. If you try to access the memory of another process, your process will be terminated before it can do anything (other things I'm not so sure about happen with debuggers though). Even if that memory was that of another Userland process, you would still get terminated. I'm almost sure that root programs can't access other programs memory, or Kernel memory. The only way to access Kernel memory is to be part of the Kernel, or indirectly through the kernel's cooperation.

Write to a cacheable physical address in linux kernel without using ioremap or mmap

I am changing the linux kernel scheduler to print the pid of the next process in a known physical memory location. mmap is used for userspace programs while i read that ioremap marks the page as non-cacheable which would slowdown the execution of the program. I would like a fast way to write to a known physical memory. phys_to_virt is the option that i think is feasible. Any idea for a different technique.
PS: i am running this linux kernel on top of qemu. the physical address will be used by qemu to read information sent by guest kernel. writing to a known io-port is not feasible since the device code backing this io-device will be called every time there is an access to the device.
EDIT : I want the physical address location of the pid to be safe. How can I make sure that a physical address that the kernel is using is not being assigned to any process. As far as my knowledge goes, ioremap would mark the page as cacheable and would hence not be of great use.
The simplest way to do this would be to do kmalloc() to get some memory in the kernel. Then you can get the physical address of the pointer that returns by passing it to virt_to_phys(). This is a total hack but for your case of debugging / tracing under qemu, it should work fine.
EDIT: I misunderstood the question. If you want to use a specific physical address, there are a couple of things you could do. Maybe the cleanest thing to do would be to modify the e820 map that qemu passes in to mark the RAM page as reserved, and then the kernel won't use it. (ie the same way that ACPI tables are passed in).
If you don't want to modify qemu, you could also modify the early kernel startup (around arch/x86/kernel/setup.c probably) to do reserve_bootmem() on the specific physical page you want to protect from being used.
To actually use the specified physical address, you can just use ioremap_cache() the same way the ACPI drivers access their tables.
It seems I misunderstood the cache coherency between VM and host part, here is an updated answer.
What you want is "virtual adress in VM" <-> "virtual or physical adress in QEMU adress space".
Then you can either kmalloc it, but it may vary from instance to instance,
or simply declare a global variable in the kernel.
Then virt_to_phys would give you access to the physical address in VM space, and I suppose you can translate this in a QEMU adress space. What do you mean by "a physical address that the kernel is using is not assigned to any process ?" You are afraid the page conatining your variable might be swapped ? kmalloced memory is not swappable
Original (and wrong) answer
If the adress where you want to write is in it's own page, I can't see how an ioremap
of this page would slow down code executing in a different page.
You need a cache flush anyway, and without SSE, I can't see how you can bypass the cache if MMU and cache are on. I can see only this two options :
ioremap and declare a particular page non cacheable
use a "normal" address, and manually do a cache flush each time you write.

Resources