Mmap DMA Coherent Memory to User Space - linux

I am trying to map DMA coherent memory, which I allocated in my kernel driver, to user space. There I use mmap() and in kernel driver I use dma_alloc_coherent() and afterwards remap_pfn_range() to remap the pages.
The purpose of mapping the DMA memory to User Space is to minimize the ioctl access to the kernel. The host must perform quite a high number of DMA coherent memory accesses and I want to access it directly in User Space instead of wasting time by using countless ioctl() operations.
mmap() returns EPERM (1) - Operation not permitted.
I found this post: mmap: Operation not permitted
Answer:
It sounds like the kernel has been compiled with CONFIG_STRICT_DEVMEM
enabled. This is a security feature to prevent user space access to
(possibly sensitive) physical memory above 1MB (IIRC). You might be
able to disable this with sysctl dev.mem.restricted.
That is the only useful info I've found. However, I see 2 issues:
1) I've allocated for test purposes only 4k. According to the above statement, only physical memory > 1MB should be a problem. I still can't mmap (anyway, for the final driver I would need a lot more dma memory, but recompiling the kernel can't be the solution to my problem) Which leads me to 2)
2) Furthermore, re-compiling the kernel is not an option as the driver should work without tweaking the kernel in a specific way.
Any ideas on this one? I appreciate the help.
I am using Ubuntu 16.04.1, Kernel: 4.10.0-40-generic
EDIT: SOLVED
I made a copy-paste mistake which resulted in a ret=-1. So the .mmap function in the kernel driver which calls remap_pfn_range, returned -1 instead of 0. This resulted in a failing mmap() in user space

Related

Is msync always really needed when writing to /dev/mem?

I am using mmap to open /dev/mem for read/write into UART registers.
It works well but my question is :
After a write, is the msync system call with MS_SYNC flag really needed ?
From my understanding, /dev/mem is a virtual device than provide access to physical memory zones (UART registers in my case ) by translating virtual memory address and so give access to some physical memory from user space.
This is not a common file and i guess that modifications of registers are not buffered/cached. I actually would like to avoid this system call for performance reasons.
Thanks
My understanding is that msync() is needed to update the data in a normal file that is modified through a mapping created with mmap().
But when you use mmap on /dev/mem you are not mapping a normal file on disk, you are just mapping the desired hardware memory range directly into your process virtual address space, so msync() is off topic, it will do nothing.
The only thing that lies between your writing into your mmapped virtual space and the hardware device is the CPU cache. To force that you can force a cache flush (__clear_cache() maybe?), but that is usually unnecessary because the kernel identifies the memory mapped device register and disables the cache for that range. In X86 CPUs that is usually done with MTRR, but with ARM I don't know the details...

is there a linux kernel module to perform contiguous physical memory allocation?

I have read this.
But I am working with PCI UIO, and I therefore need continuous physical memory. I am talking (among other things) virtio, so the presence of an IOMMU won't help here. PCI-VFIO has a much nicer and secured approach, I agree, but the IOMMU virtualisation techniques are not mature, as far as I understand.
Writing a PCI-UIO virtio driver in user space running on a guest requires physical contiguous memory. Besides, not all HW have IOMMUs, so even for such simpler hosts systems, VFIO cannot be used, and there is a need for contiguous physical memory...
So, in short, as long as IOMMUs are not everywhere and are not properly emmulated on virtualizers, it seems allocation of contiguous physical memory from user space is needed.
I am aware that user can READ the page mapping from /proc/<pid>/... and that allows for a try and error approach which dpdk is using...
But it feels that contigous physical memory allocation should typically a job from a kernel module... And that I cannot be the first one facing this situation...
Does such a kernel module exist?

mmap and kernel memory

I understand from mmap() internals that a mmap read works by
- causing a page fault
- copying file data from disk to internal kernel buffer
- mapping the kernel buffer to user space
My questions are:
What happens to the kernel mapping to the buffer? if it still exists, dont we have a problem here of user application gaining access to kernel memory?
cant we run out of physical memory this way? I'd assume the kernel needs a minimum amount of physical memory to provide decent level of performance, and if we keep allocating it's buffers to mmapped user space buffer we'd eventually run out of buffers.
during a write, does the relevant memory gets mapped temporarily to a kernel buffer? if and this is a shared maping, another user process may access and again gain access to what is now kernel memory
Thanks, and sorry if these questions are pretty basic, but I did not find a clear answer.
I'm not a kernel hacker by any means, but this is what I've gathered:
I'm not entirely sure when it comes to the question of whether the kernel "relinquishes" its mapping to physical memory, since the kernel can access any physical memory it pleases. However, it would obviously be impermissible for the kernel to keep using that physical memory for its own purposes (e.g. as an internal pipe buffer) if user processes can access that memory as well, for the sake of both the user process and for the sake of the kernel. The kernel will simply designate those pages as part of the filesystem cache (if backed by a file) and not mess with them.
Yes, to the same extent that any process or number of processes can limit the amount of physical memory present for the kernel by requesting lots of resources like pipes. However, the kernel keeps track of how much physical memory is available and will start to page out userland memory to the disk when the remaining amount of physical memory runs low. Kernel memory itself typically should not be paged out to the disk for reasons including performance. Though the nice thing about mmap()ed memory backed by a file is that it's trivial to page out to the disk; no swap space needs to be allocated.
If you mean a write to available memory mapped to userland virtual address space (i.e. memcpy(), not write()), no. The whole point of mmap() is to map userland virtual address space to physical memory to allow reads and writes without resorting to system calls. Syncs to the disk will be performed directly by the kernel without additional copying to kernel buffers.

For arm Linux, could threads in user space access virtual address of Kernel space?

Virtual memory is split two parts. In tradition, 0~3GB is for user space and 3GB~4GB for kernel space.
My question:
Could the thread in user space access memory of kernel space?
For ARM datasheet, the access attribution is in the charge of domain access control register. But in kernel source code,the domain value in page table entry of user space virtual memory is same as kernel space's page table entry.
In fact, your application might access page 0xFFFF0000, as it contains the swi-handler and a couple of other userspace-helpers. So no, the 3/1 split is nothing magical, it's just very easy for the kernel to manage.
Usually the kernel will setup all memory above 3GB to be only accessible by the kernel-domain itself. If a driver needs to share memory between user and kernel-space it will usually provide an mmap interface, which then creates an aliased mapping, so you have two virtual addresses for the same physical address. This only works reliably on VIPT-Cache systems or with a LOT of careful explicit cache flushing. If you don't want this you CAN hack the kernel to make a chunk of memory ABOVE the 3G-split accessible to userspace. But then all userspace applications will share this memory. I've done this once for a special application on a armv5-system.
Userspace code getting Kernel memory? The only kernel that ever allowed that was DOS and its archaic friends.
But back to the question, look at this example C code:
char c=42;
*c=42;
We take one byte (a char) and assign it the numeric value 42. We then dereference this non-pointer, which will probably try to access the 42nd byte of virtual memory, which is almost definitely not your memory, and, for the sake of this example, Kernel memory. guess what happens when you run this (if you manage to hold the compiler at gunpoint):
Segmentation fault
Linux has memory protection like any modern operating system. If you try to access the memory of another process, your process will be terminated before it can do anything (other things I'm not so sure about happen with debuggers though). Even if that memory was that of another Userland process, you would still get terminated. I'm almost sure that root programs can't access other programs memory, or Kernel memory. The only way to access Kernel memory is to be part of the Kernel, or indirectly through the kernel's cooperation.

Force Linux to use only memory over 4G?

I have a Linux device driver that interfaces to a device that, in theory, can perform DMA using 64-bit addresses. I'd like to test to see that this actually works.
Is there a simple way that I can force a Linux machine not to use any memory below physical address 4G? It's OK if the kernel image is in low memory; I just want to be able to force a situation where I know all my dynamically allocated buffers, and any kernel or user buffers allocated for me are not addressable in 32 bits. This is a little brute force, but would be more comprehensive than anything else I can think of.
This should help me catch (1) hardware that wasn't configured correctly or loaded with the full address (or is just plain broken) as well as (2) accidental and unnecessary use of bounce buffers (because there's nowhere to bounce to).
clarification: I'm running x86_64, so I don't care about most of the old 32-bit addressing issues. I just want to test that a driver can correctly interface with multitudes of buffers using 64-bit physical addresses.
/usr/src/linux/Documentation/kernel-parameters.txt
memmap=exactmap [KNL,X86] Enable setting of an exact
E820 memory map, as specified by the user.
Such memmap=exactmap lines can be constructed based on
BIOS output or other requirements. See the memmap=nn#ss
option description.
memmap=nn[KMG]#ss[KMG]
[KNL] Force usage of a specific region of memory
Region of memory to be used, from ss to ss+nn.
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
Region of memory to be used, from ss to ss+nn.
memmap=nn[KMG]$ss[KMG]
[KNL,ACPI] Mark specific memory as reserved.
Region of memory to be used, from ss to ss+nn.
Example: Exclude memory from 0x18690000-0x1869ffff
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
If you add memmap=4G$0 to the kernel's boot parameters, the lower 4GB of physical memory will no longer be accessible. Also, your system will no longer boot... but some variation hereof (memmap=3584M$512M?) may allow for enough memory below 4GB for the system to boot but not enough that your driver's DMA buffers will be allocated there.
IIRC there's an option within kernel configuration to use PAE extensions which will enable you to use more than 4GB (I am a bit rusty on the kernel config - last kernel I recompiled was 2.6.4 - so please excuse my lack of recall). You do know how to trigger a kernel config
make clean && make menuconfig
Hope this helps,
Best regards,
Tom.

Resources