I have a PCI device that needs to read and write from userspace. I'm trying to use zero copy; is there a way to allocate, pin, and get the physical address of a userspace address completely within userspace or do I need to have a kernel module that, say, calls virt_to_phys or get_user_pages? The device's memory is mapped into userspace memory via MMIO so I can pass it any data that's needed. Thanks.
It was a total hack, but I limited Linux to a range of memory and used MMIO to allocate memory for my device that the kernel was unaware of.
Basically you need the memory to be DMA-able, and as far as I know only a kernel module can do that. See http://lxr.free-electrons.com/source/Documentation/PCI/PCI-DMA-mapping.txt
Related
I'm currently programming a Linux Kernel driver, which needs to tell a FPGA a base address in RAM to write to.
The memory is allocated in the kernel driver with dma_alloc_coherent.
This will generate a 32 bit physical address and a kernel virtual address, the physical address is being passed to the FPGA.
The FPGA is a Cyclone V with embedded ARM Cortex-A9, on which an embedded Linux with the driver is running.
The problem now is, that the FPGA fabric only generates a 27 bit wide bus to address the sdram, while the physical address, that is being generated by the dma call has 32 bits, e.g. th physical address has been 0x2f220000, which exceeds 27 bit span.
I want to know, if it is okay to mask the most significant 5 bits and tell the FPGA the address 0x7220000 and still have the correct behaviour (In doc it is stated, that the physical address shall be casted to buswidth, that would mean masking, because I can't use 27 bit in processor).
Also is it okay to access the DMA memory with a simple memcpy command, that copies from the kernel virtual address to a buffer?
Thanks in advance.
The answer truly depends on the physical memory layout of your device. If the address bus of the FPGA complements the missing bits so that the actual address resolves to the correct memory, then masking is, probably, okay. If not, then it is possible that the memory that the Linux kernel returned to you is simply in accessible to the FPGA. If that's the case, you will have to find a way to ask Linux to only give you buffers from memory that is accessible.
I am using mmap to open /dev/mem for read/write into UART registers.
It works well but my question is :
After a write, is the msync system call with MS_SYNC flag really needed ?
From my understanding, /dev/mem is a virtual device than provide access to physical memory zones (UART registers in my case ) by translating virtual memory address and so give access to some physical memory from user space.
This is not a common file and i guess that modifications of registers are not buffered/cached. I actually would like to avoid this system call for performance reasons.
Thanks
My understanding is that msync() is needed to update the data in a normal file that is modified through a mapping created with mmap().
But when you use mmap on /dev/mem you are not mapping a normal file on disk, you are just mapping the desired hardware memory range directly into your process virtual address space, so msync() is off topic, it will do nothing.
The only thing that lies between your writing into your mmapped virtual space and the hardware device is the CPU cache. To force that you can force a cache flush (__clear_cache() maybe?), but that is usually unnecessary because the kernel identifies the memory mapped device register and disables the cache for that range. In X86 CPUs that is usually done with MTRR, but with ARM I don't know the details...
I've got a user-mode process and kernel module. Now I want to read certain regions of usermode process from kernel, but there's one catch: no copying of usermode memory and simple access by VA.
So what we have: task_struct for target process, other related structs (like mm_struct, vma_struct) and virtual address like 0x0070abcd that I want to read or rather map somehow to my kernel module.
I can get page list using get_user_pages for desired memory regions, but what next? Should I map pages somehow into kernel and then try to read them as continuous memory region or there are better solutions?
The problem is that "looking" at user-space requires locking a ton of stuff. So it's better that you do a short copy than leave everything locked for arbitrary amounts of time. Your user-space process may not be VM-mapped into the current CPU. In fact, it may be entirely swapped out to disk, running on another CPU, in the middle of it's own kernel call, etc.
Linux Kernel: copy_from_user - struct with pointers
I've been struggling with this one, would really appreciate some help. I want to use the internal SRAM (stepping stone - not used after boot) of my At91sam9g45 to speed up some intensive computations and am having trouble meeting all the following conditions:
Memory is accessible from user space. This was easy using the user space mmap() and then kernel remap_pfn_range(). Using the pointer returned, my user space programs can read/write to the SRAM.
Using the kernel DMA API call dma_async_memcpy_buf_to_buf() to do a memcpy using DMA. Within my basic driver, I want to call this operation to copy data from DDR( allocated with kmalloc()) into the SRAM buffer.
So my problem is that I have the user space and physical addresses, but no kernel-space DMA API friendly mapping.
I've tried using ioremap and using the fixed virutal address provided to iotable_init(). None of these seems to result in a kernel virtual address that can be used with something like virt_to_bus (which works for the kmalloc addresses and i think is used within the DMA API).
There's way around and thats just triggering the DMA manually using the physical addresses, but I'd like to try and figure this out. I've been reading through LDD3 and googling, but i can't see any examples of using non-kmalloc memory for the DMA API (except for PCI buses).
LDD3 (p:453) demos dma_map_single using a buffer passed in as a parameter.
bus_addr = dma_map_single(&dev->pci_dev->dev, buffer, count, dev->dma_dir);
Q1: What/where does this buffer come from?
kmalloc?
Q2: Why does DMA-API-HOWTO.txt state I can use raw kmalloc to DMA into?
Form http://www.mjmwired.net/kernel/Documentation/DMA-API-HOWTO.txt
L:51 If you acquired your memory via the page allocator kmalloc() then you may DMA to/from that memory using the addresses returned from those routines.
L:74 you cannot take the return of a kmap() call and DMA to/from that.
So I can pass the address returned from kmalloc to my hardware device?
Or should I run virt_to_bus on it first?
Or should I pass this into dma_map_single?
Q3: When the DMA transfer is complete, can I read the data in the kernel driver via the kmalloc address?
addr = kmalloc(...);
...
printk("test result : 0x%08x\n", addr[0]);
Q4: Whats the best way to get this to user-space?
copy_to_user?
mmap the kmalloc memory?
others?
kmalloc is indeed one source to get the buffer. Another can be alloc_page with the GFP_DMA flag.
The meaning is that the memory that kmalloc returns is guaranteed to be contiguous in physical memory, not just virtual memory, so you can give the bus address of that pointer to your hardware. You do need to use dma_map_single() on the address returned which depending on exact platform might be no more then wrapper around virt_to_bus or might do more then do (set up IOMMU or GART tables)
Correct, just make sure to follow cache coherency guidelines as the DMA guide explains.
copy_to_user will work fine and is the easiest answer. Depending on your specific case it might be enough or you might need something with better performance. You cannot normaly map kmalloced addresses to user space, but you can DMA into user provided address (some caveats apply) or allocate user pages (alloc_page with GFP_USER)
Good luck!