I have buffer coming in from the user space which needs to be filled with device registers as a debugging mechanism. Is it safe to use copy_to_user() / copy_from_user() for device memory? If not, what's the best alternative given that the device driver lies in kernel space?
All the comments are wrong.
For any data moves between user and kernel spaces, you have to use copy_from/to_user
memcpy_from/toio() are reserved for addresses IN the kernel space and MMIO. It's unsafe to use those functions with user-space addresses.
Answer:
You can simply use copy_from/to_user() directly with the mapped MMIO address in void * to or void * from. So that you don't need a useless intermediate buffer.
To be used only with prefetchable memory since it might read/write several times the same memory and/or in an unordered way.
Related
I have a custom device driver that implements an mmap operation to map a shared RAM buffer (outside of the OS) to userspace. The buffer is reserved by passing mem=32M as a boot argument for the OS, leaving the rest of the 512MB available as a buffer. I would like to perform zero-copy operations from the mapped memory, which is not possible if the vm_flags include VM_PFNMAP and VM_IO.
My driver currently performs the mapping by calling vm_iomap_memory(vma, start, size), which in turn calls io_remap_pfn_range and remap_pfn_range, which sets up the vma with the VM_PFNMAP and VM_IO set. This works to map the memory to userspace, but zero-copy socket operations fail at get_user_pages due either to the VM_PFNMAP flags being set or the struct page being missing. The comments for remap_pfn_range show this is intended behavior, as pfn-mapped memory should not be treated as 'normal'. However, for my case it is just a block of reserved RAM, so I don't see why it should not be treated as normal. I have set up cache invalidation/flushing to manually manage the memory.
I have tried unsetting the VM_PFNMAP and VM_IO flags on the vm_area_struct both during and after the mapping, but get_user_pages still fails. I have also looked at the dma libraries but it looks like they rely on a call to remap_pfn_range behind the scenes.
My question is how do I map physical memory as a normal, non-pfn, struct page-backed userspace address? Or is there some other way I should be looking at it? Thanks!
I've found the solution to mapping a memory buffer outside the Kernel that requires a correction to several wrong starting points that I mentioned above. It's not possible to post full source code here, but the steps to get it working are:
Device tree: Define reserved memory region for buffer with no associated driver. Do not use mem or memmap bootargs. Kernel will confine itself to using memory outside of this reserved space for itself, but now will be able to make struct pages for reserved memory.
In a device driver (a LKM in my case), mapping physical address to kernel virtual address requires using using memremap instead of ioremap, as it is real memory we are mapping.
In device driver mmap routine, do not use any variant of remap_pfn_range to setup the vma for usespace, instead assign a custom fault nopage routine to the vma->vm_ops.fault to look up the page when the userspace virtual address is used. This approach is described in lddv3 ch15.
The nopage function in the driver should use the vm_fault structure argument that is passed to it to calculate the offset into the vma for the address that needs a page. Then use that offset to calculate an kernel virtual address (against the memremap'd address), and get the page with a call to page = virt_to_page(pageptr);, followed by a call to get_page(page);, and assign it to the vm_fault structure with vmf->page = page; The latter part of this is illustrated in lddv3 chapter 15 as well.
The memory mapped in this fashion using mmap against the custom device driver can be used just like normal malloc'd memory as far as I can tell. There are probably ways to achieve a similar result with the DMA libraries, but I had constraints preventing that route, or associating the device tree node with the driver.
I am using mmap to open /dev/mem for read/write into UART registers.
It works well but my question is :
After a write, is the msync system call with MS_SYNC flag really needed ?
From my understanding, /dev/mem is a virtual device than provide access to physical memory zones (UART registers in my case ) by translating virtual memory address and so give access to some physical memory from user space.
This is not a common file and i guess that modifications of registers are not buffered/cached. I actually would like to avoid this system call for performance reasons.
Thanks
My understanding is that msync() is needed to update the data in a normal file that is modified through a mapping created with mmap().
But when you use mmap on /dev/mem you are not mapping a normal file on disk, you are just mapping the desired hardware memory range directly into your process virtual address space, so msync() is off topic, it will do nothing.
The only thing that lies between your writing into your mmapped virtual space and the hardware device is the CPU cache. To force that you can force a cache flush (__clear_cache() maybe?), but that is usually unnecessary because the kernel identifies the memory mapped device register and disables the cache for that range. In X86 CPUs that is usually done with MTRR, but with ARM I don't know the details...
Following the text at https://www.kernel.org/doc/Documentation/DMA-API.txt a few inlined questions
Part Ia - Using large dma-coherent buffers
------------------------------------------
void *
dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag)
Consistent memory is memory for which a write by either the device or
the processor can immediately be read by the processor or device
without having to worry about caching effects. (You may however need
to make sure to flush the processor's write buffers before telling
devices to read that memory.)
Q1. Is it safe to assume that the area allocated is cacheable ? As the last line state that flushing is required
Q1a. Does this API allocate memory from lower 16MB which is considered DMA safe.
dma_addr_t
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
enum dma_data_direction direction)
Maps a piece of processor virtual memory so it can be accessed by the
device and returns the physical handle of the memory.
The direction for both api's may be converted freely by casting.
However the dma_ API uses a strongly typed enumerator for its
direction:
DMA_NONE no direction (used for debugging)
DMA_TO_DEVICE data is going from the memory to the device
DMA_FROM_DEVICE data is coming from the device to the memory
DMA_BIDIRECTIONAL direction isn't known
Q2. Does the DMA_XXX options direct change of Page Attributes for the VA=>PA mapping. Say DMA_TO_DEVICE would mark the area as non-cacheable ?
It says "without having to worry about caching effects". That means dma_alloc_coherent() returns uncacheable memory unless the architecture has cache coherent DMA hardware so the caching makes no difference. However being uncached does not mean that writes do not go through the CPU write buffers (i.e. not every memory access is immediately executed or executed in the same order as they appear in the code). To be sure that everything you write into memory is really there when you tell the device to read it, you will have to execute a wmb() at least. See Documentation/memory-barriers.txt for more information.
dma_alloc_coherent() does not return memory from the lower 16 MB, it returns memory that is accessible by the device inside the addressable area specified by dma_set_coherent_mask(). You have to call that as part of the device initialization.
Cacheability is irrelevant to dma_map_*() functions. They make sure that the given memory region is accessible to the device at the DMA address they return. After the DMA is finished dma_unmap_*() is called. For DMA_TO_DEVICE the sequence is "write data to memory, map(), start DMA, unmap() when finished", for DMA_FROM_DEVICE "map(), start DMA, unmap() when finished, read data from memory".
Cache makes no difference because usually you are not writing or reading the buffer while it is mapped. If you really have to do that you have to explicitly dma_sync_*() the memory before reading or after writing the buffer.
LDD3 (p:453) demos dma_map_single using a buffer passed in as a parameter.
bus_addr = dma_map_single(&dev->pci_dev->dev, buffer, count, dev->dma_dir);
Q1: What/where does this buffer come from?
kmalloc?
Q2: Why does DMA-API-HOWTO.txt state I can use raw kmalloc to DMA into?
Form http://www.mjmwired.net/kernel/Documentation/DMA-API-HOWTO.txt
L:51 If you acquired your memory via the page allocator kmalloc() then you may DMA to/from that memory using the addresses returned from those routines.
L:74 you cannot take the return of a kmap() call and DMA to/from that.
So I can pass the address returned from kmalloc to my hardware device?
Or should I run virt_to_bus on it first?
Or should I pass this into dma_map_single?
Q3: When the DMA transfer is complete, can I read the data in the kernel driver via the kmalloc address?
addr = kmalloc(...);
...
printk("test result : 0x%08x\n", addr[0]);
Q4: Whats the best way to get this to user-space?
copy_to_user?
mmap the kmalloc memory?
others?
kmalloc is indeed one source to get the buffer. Another can be alloc_page with the GFP_DMA flag.
The meaning is that the memory that kmalloc returns is guaranteed to be contiguous in physical memory, not just virtual memory, so you can give the bus address of that pointer to your hardware. You do need to use dma_map_single() on the address returned which depending on exact platform might be no more then wrapper around virt_to_bus or might do more then do (set up IOMMU or GART tables)
Correct, just make sure to follow cache coherency guidelines as the DMA guide explains.
copy_to_user will work fine and is the easiest answer. Depending on your specific case it might be enough or you might need something with better performance. You cannot normaly map kmalloced addresses to user space, but you can DMA into user provided address (some caveats apply) or allocate user pages (alloc_page with GFP_USER)
Good luck!
My question is about passing data from kernel to a user space program. I want to implement a system call "get_data(size, char *buff, char **meta_buf)". In this call, buff is allocated by user space program and its length is passed in the size argument. However, meta_buf is a variable length buffer that is allocated (in the user space program's vm pages) and filled by kernel. User space program will free this region.
(I cant allocate data in user space as user space program does not know size of the meta_buff. Also, user space program cannot allocate a fixed amount of memory and call the system call again and again to read the entire meta data. meta_data has to be returned in one single system call)
How do I allocate memory for a user space program from kernel thread?
(I would even appreciate if you can point me to any other system call that does a similar operation - allocating in kernel and freeing in user space)
Is this interface right or is there a better way to do it?
Don't attempt to allocate memory for userspace from the kernel - this is a huge violation of the kernel's abstraction layering. Instead, consider a few other options:
Have userspace ask how much space it needs. Userspace allocates, then grabs the memory from the kernel.
Have userspace mmap pages owned by your driver directly into its address space.
Set an upper bound on the amount of data needed. Just allocate that much.
It's hard to say more without knowing why this has to be atomic. Actually allocating memory's going to need to be interruptible anyway (or you're unlikely to succeed), so it's unlikely that going out of and back into the kernel is going to hurt much. In fact, any write to userspace memory must be interruptible, as there's the potential for page faults requiring IO.
Call vm_mmap() to get a address in user space.
Call find_vma() to find the vma corresponding to this address.
Call virt_to_phys() to get the physical address of kernel memory.
Call remap_pfn_range() map the physical address to the vma.
There is a point to take care, that is, the mapped address should be aligned to one page size. If not, you should ajust it, and add the offset after you get the user space address.