Finding physical memory offset to video buffer - linux

I am using Fedora 17 running on board with integrated graphics.
Given that I am able to manipulate content of physical memory, how can I find out the physical memory offsets that I could write to in order to display something on the screen?
I tried to lookup 0xB8000 and 0xB0000 offsets but they contain all 0xff.
Is there a specific pattern that starts video buffer in memory?
Is there any good source of information about this topic?
The root cause of my problem was that Linux is not using legacy video mode, so the memory at 0xB8000 is restricted (in my case read only). However issuing an interrupt can switch to other modes:
INT 10 - VIDEO - SET VIDEO MODE
AH = 00h
AL = desired video mode (see #00010)
Found on: http://www.delorie.com/djgpp/doc/rbinter/id/74/0.html

Living like it's 1989
#include <linux/fb.h>
#define DEV_MEM "/dev/fb0"
/* Screen parameters (probably via ioctl() and /sys. */
#define YRES 240
#define XRES 320
#define BYTES_PER_PIXEL (sizeof(unsigned short)) /* 16 bit pixels. */
#define MAP_SIZE XRES*YRES*BYTES_PER_PIXEL
unsigned short *map_lbase;
if((fd = open(DEV_MEM, O_RDWR | O_SYNC)) == -1) {
fprintf(stderr, "cannot open %s - are you root?\n", DEV_MEM);
exit(1);
}
// Map that page.
map_lbase = (unsigned short *)mmap(NULL, MAP_SIZE,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if((long)map_lbase == -1) {
perror("cannot mmap");
exit(1);
}
Humons - Framebuffer API doc, Framebuffer Doc dir.
Smart Humons - Internals, Deferred I/O doc or how to emulate memory mapped video.
You can not use 0xB8000 and 0xB0000 directly as those are physical addresses. I assume you are in user space and not writing a kernel driver. Under Linux, we normally have the MMU enable; In other words, we have virtual memory. Not all processes/users have access to video memory. However, if you are allowed, you can mmap a framebuffer device to your address space. It is best to let the kernel give you an address and not request a specific one.
Or see how professionals do it.
Man: mmap
Edit: If you aren't root, you can still use Unix permissions on /dev/fb0 (or which ever device) to give group permission to read/write or use some sort of login process that gives a user on the current tty permission.

Maybe you could start around here:
http://www.tldp.org/HOWTO/Framebuffer-HOWTO/
But modern video graphics are in no way as simple as "finding where in the meory is the VRAM" and writing there.

Related

Writing out DMA buffers into memory mapped file

I need to write in embedded Linux(2.6.37) as fast as possible incoming DMA buffers to HD partition as raw device /dev/sda1. Buffers are aligned as required and are of equal 512KB length. The process may continue for a very long time and fill as much as, for example, 256GB of data.
I need to use the memory-mapped file technique (O_DIRECT not applicable), but can't understand the exact way how to do this.
So, in pseudo code "normal" writing:
fd=open(/dev/sda1",O_WRONLY);
while(1) {
p = GetVirtualPointerToNewBuffer();
if (InputStopped())
break;
write(fd, p, BLOCK512KB);
}
Now, I will be very thankful for the similar pseudo/real code example of how to utilize memory-mapped technique for this writing.
UPDATE2:
Thanks to kestasx the latest working test code looks like following:
#define TSIZE (64*KB)
void* TBuf;
int main(int argc, char **argv) {
int fdi=open("input.dat", O_RDONLY);
//int fdo=open("/dev/sdb2", O_RDWR);
int fdo=open("output.dat", O_RDWR);
int i, offs=0;
void* addr;
i = posix_memalign(&TBuf, TSIZE, TSIZE);
if ((fdo < 1) || (fdi < 1)) {
printf("Error in files\n");
return -1; }
while(1) {
addr = mmap((void*)TBuf, TSIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fdo, offs);
if ((unsigned int)addr == 0xFFFFFFFFUL) {
printf("Error MMAP=%d, %s\n", errno, strerror(errno));
return -1; }
i = read(fdi, TBuf, TSIZE);
if (i != TSIZE) {
printf("End of data\n");
return 0; }
i = munmap(addr, TSIZE);
offs += TSIZE;
sleep(1);
};
}
UPDATE3:
1. To precisely imitate the DMA work, I need to move read() call before mmp(), because when the DMA finishes it provides me with the address where it has put data. So, in pseudo code:
while(1) {
read(fdi, TBuf, TSIZE);
addr = mmap((void*)TBuf, TSIZE, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_SHARED, fdo, offs);
munmap(addr, TSIZE);
offs += TSIZE; }
This variant fails after(!) the first loop - read() says BAD ADDRESS on TBuf.
Without understanding exactly what I do, I substituted munmap() with msync(). This worked perfectly.
So, the question here - why unmapping the addr influenced on TBuf?
2.With the previous example working I went to the real system with the DMA. The same loop, just instead of read() call is the call which waits for a DMA buffer to be ready and its virtual address provided.
There are no error, the code runs, BUT nothing is recorded (!).
My thought was that Linux does not see that the area was updated and therefore does not sync() a thing.
To test this, I eliminated in the working example the read() call - and yes, nothing was recorded too.
So, the question here - how can I tell Linux that the mapped region contains new data, please, flush it!
Thanks a lot!!!
If I correctly understand, it makes sense if You mmap() file (not sure if it You can mmap() raw partition/block-device) and data via DMA is written directly to this memory region.
For this to work You need to be able to control p (where new buffer is placed) or address where file is maped. If You don't - You'll have to copy memory contents (and will lose some benefits of mmap).
So psudo code would be:
truncate("data.bin", 256GB);
fd = open( "data.bin", O_RDWR );
p = GetVirtualPointerToNewBuffer();
adr = mmap( p, 1GB, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset_in_file );
startDMA();
waitDMAfinish();
munmap( adr, 1GB );
This is first step only and I'm not completely sure if it will work with DMA (have no such experience).
I assume it is 32bit system, but even then 1GB mapped file size may be too big (if Your RAM is smaller You'll be swaping).
If this setup will work, next step would be to make loop to map regions of file at different offsets and unmap already filled ones.
Most likely You'll need to align addr to 4KB boundary.
When You'll unmap region, it's data will be synced to disk. So You'll need some testing to select appropriate mapped region size (while next region is filled by DMA, there must be enough time to unmap/write previous one).
UPDATE:
What exactly happens when You fill mmap'ed region via DMA I simply don't know (not sure how exactly dirty pages are detected: what is done by hardware, and what must be done by software).
UPDATE2: To my best knowledge:
DMA works the following way:
CPU arranges DMA transfer (address where to write transfered data in RAM);
DMA controller does the actual work, while CPU can do it's own work in parallel;
once DMA transfer is complete - DMA controller signals CPU via IRQ line (interrupt), so CPU can handle the result.
This seems simple while virtual memory is not involved: DMA should work independently from runing process (actual VM table in use by CPU). Yet it should be some mehanism to invalidate CPU cache for modified by DMA physical RAM pages (don't know if CPU needs to do something, or it is done authomatically by hardware).
mmap() forks the following way:
after successfull call of mmap(), file on disk is attached to process memory range (most likely some data structure is filled in OS kernel to hold this info);
I/O (reading or writing) from mmaped range triggers pagefault, which is handled by kernel loading appropriate blocks from atached file;
writes to mmaped range are handled by hardware (don't know how exactly: maybe writes to previously unmodified pages triger some fault, which is handled by kernel marking these pages dirty; or maybe this marking is done entirely in hardware and this info is available to kernel when it needs to flush modified pages to disk).
modified (dirty) pages are written to disk by OS (as it sees appropriate) or can be forced via msync() or munmap()
In theory it should be possible to do DMA transfers to mmaped range, but You need to find out, how exactly pages ar marked dirty (if You need to do something to inform kernel which pages need to be written to disk).
UPDATE3:
Even if modified by DMA pages are not marked dirty, You should be able to triger marking by rewriting (reading ant then writing the same) at least one value in each page (most likely each 4KB) transfered. Just make sure this rewriting is not removed (optimised out) by compiler.
UPDATE4:
It seems file opened O_WRONLY can't be mmap'ed (see question comments, my experimets confirm this too). It is logical conclusion of mmap() workings described above. The same is confirmed here (with reference to POSIX standart requirement to ensure file is readable regardless of maping protection flags).
Unless there is some way around, it actually means that by using mmap() You can't avoid reading of results file (unnecessary step in Your case).
Regarding DMA transfers to mapped range, I think it will be a requirement to ensure maped pages are preloalocated before DMA starts (so there is real memory asigned to both DMA and maped region). On Linux there is MAP_POPULATE mmap flag, but from manual it seams it works with MAP_PRIVATE mapings only (changes are not writen to disk), so most likely it is usuitable. Likely You'll have to triger pagefaults manually by accessing each maped page. This should triger reading of results file.
If You still wish to use mmap and DMA together, but avoid reading of results file, You'll have to modify kernel internals to allow mmap to use O_WRONLY files (for example by zero-filling trigered pages, instead of reading them from disk).

mmap CMA area on /dev/mem

I need reserve 256-512 Mb of continuous physical memory and have access to this memory from the user space.
I decided to use CMA for memory reserving.
Here are the steps on my idea that must be performed:
Reservation required amount of memory by CMA during system booting.
Parsing of CMA patch output which looks like for example: "CMA: reserved 256 MiB at 27400000" and saving two parameters: size of CMA area = 256*1024*1024 bytes and phys address of CMA area = 0x27400000.
Mapping of CMA area at /dev/mem file with offset = 0x27400000 using mmap(). (Of course, CONFIG_STRICT_DEVMEM is disabled)
It would let me to read data directly from phys memory from user space.
But the next code make segmentation fault(there size = 1Mb):
int file;
void* start;
file=open("/dev/mem", O_RDWR | O_SYNC);
if ( (start = mmap(0, 1024*1024, PROT_READ | PROT_WRITE, MAP_SHARED, file, 0x27400000)) == MAP_FAILED ){
perror("mmap");
}
for (int offs = 0; offs<50; offs++){
cout<<((char *)start)[offs];
}
Output of this code: "mmap: Invalid argument".
When I changed offset = 0x27400000 on 0, this code worked fine and program displayed trash. It also work for alot of offsets which I looked at /proc/iomem.
According to information from /proc/iomem, phys addr of CMA area (0x27400000 on my system) always situated in System RAM.
Does anyone have any ideas, how to mmap CMA area on /dev/mem? What am I doing wrong?
Thanks alot for any help!
Solution to this problem was suggested to me by Jeff Haran in kernelnewbies mailing list.
It was necessary to disable CONFIG_x86_PAT in .config and mmap() has started to work!
If CONFIG_X86_PAT is configured you will have problems mapping memory to user space. It basically implements the same restrictions as CONFIG_STRICT_DEVMEM.
Jeff Haran
Now I can mmap /dev/mem on any physical address I want.
But necessary to be careful:
Word of caution. CONFIG_X86_PAT was likely introduced for a reason. There may be some performance penalties for turning this off, though in my testing so far turning if off doesn’t seem to break anything.
Jeff Haran

How to access physical addresses from user space in Linux?

On a ARM based system running Linux, I have a device that's memory mapped to a physical address. From a user space program where all addresses are virtual, how can I read content from this address?
busybox devmem
busybox devmem is a tiny CLI utility that mmaps /dev/mem.
You can get it in Ubuntu with: sudo apt-get install busybox
Usage: read 4 bytes from the physical address 0x12345678:
sudo busybox devmem 0x12345678
Write 0x9abcdef0 to that address:
sudo busybox devmem 0x12345678 w 0x9abcdef0
Source: https://github.com/mirror/busybox/blob/1_27_2/miscutils/devmem.c#L85
mmap MAP_SHARED
When mmapping /dev/mem, you likely want to use:
open("/dev/mem", O_RDWR | O_SYNC);
mmap(..., PROT_READ | PROT_WRITE, MAP_SHARED, ...)
MAP_SHARED makes writes go to physical memory immediately, which makes it easier to observe, and makes more sense for hardware register writes.
CONFIG_STRICT_DEVMEM and nopat
To use /dev/mem to view and modify regular RAM on kernel v4.9, you must fist:
disable CONFIG_STRICT_DEVMEM (set by default on Ubuntu 17.04)
pass the nopat kernel command line option for x86
IO ports still work without those.
See also: mmap of /dev/mem fails with invalid argument for virt_to_phys address, but address is page aligned
Cache flushing
If you try to write to RAM instead of a register, the memory may be cached by the CPU: How to flush the CPU cache for a region of address space in Linux? and I don't see a very portable / easy way to flush it or mark the region as uncacheable:
How to write kernel space memory (physical address) to a file using O_DIRECT?
How to flush the CPU cache for a region of address space in Linux?
Is it possible to allocate, in user space, a non cacheable block of memory on Linux?
So maybe /dev/mem can't be used reliably to pass memory buffers to devices?
This can't be observed in QEMU unfortunately, since QEMU does not simulate caches.
How to test it out
Now for the fun part. Here are a few cool setups:
Userland memory
allocate volatile variable on an userland process
get the physical address with /proc/<pid>/maps + /proc/<pid>/pagemap
modify the value at the physical address with devmem, and watch the userland process react
Kernelland memory
allocate kernel memory with kmalloc
get the physical address with virt_to_phys and pass it back to userland
modify the physical address with devmem
query the value from the kernel module
IO mem and QEMU virtual platform device
create a platform device with known physical register addresses
use devmem to write to the register
watch printfs come out of the virtual device in response
Bonus: determine the physical address for a virtual address
Is there any API for determining the physical address from virtual address in Linux?
You can map a device file to a user process memory using mmap(2) system call. Usually, device files are mappings of physical memory to the file system.
Otherwise, you have to write a kernel module which creates such a file or provides a way to map the needed memory to a user process.
Another way is remapping parts of /dev/mem to a user memory.
Edit:
Example of mmaping /dev/mem (this program must have access to /dev/mem, e.g. have root rights):
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Usage: %s <phys_addr> <offset>\n", argv[0]);
return 0;
}
off_t offset = strtoul(argv[1], NULL, 0);
size_t len = strtoul(argv[2], NULL, 0);
// Truncate offset to a multiple of the page size, or mmap will fail.
size_t pagesize = sysconf(_SC_PAGE_SIZE);
off_t page_base = (offset / pagesize) * pagesize;
off_t page_offset = offset - page_base;
int fd = open("/dev/mem", O_SYNC);
unsigned char *mem = mmap(NULL, page_offset + len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, page_base);
if (mem == MAP_FAILED) {
perror("Can't map memory");
return -1;
}
size_t i;
for (i = 0; i < len; ++i)
printf("%02x ", (int)mem[page_offset + i]);
return 0;
}

Linux kernel device driver to DMA from a device into user-space memory

I want to get data from a DMA enabled, PCIe hardware device into user-space as quickly as possible.
Q: How do I combine "direct I/O to user-space with/and/via a DMA transfer"
Reading through LDD3, it seems that I need to perform a few different types of IO operations!?
dma_alloc_coherent gives me the physical address that I can pass to the hardware device.
But would need to have setup get_user_pages and perform a copy_to_user type call when the transfer completes. This seems a waste, asking the Device to DMA into kernel memory (acting as buffer) then transferring it again to user-space.
LDD3 p453: /* Only now is it safe to access the buffer, copy to user, etc. */
What I ideally want is some memory that:
I can use in user-space (Maybe request driver via a ioctl call to create DMA'able memory/buffer?)
I can get a physical address from to pass to the device so that all user-space has to do is perform a read on the driver
the read method would activate the DMA transfer, block waiting for the DMA complete interrupt and release the user-space read afterwards (user-space is now safe to use/read memory).
Do I need single-page streaming mappings, setup mapping and user-space buffers mapped with get_user_pages dma_map_page?
My code so far sets up get_user_pages at the given address from user-space (I call this the Direct I/O part). Then, dma_map_page with a page from get_user_pages. I give the device the return value from dma_map_page as the DMA physical transfer address.
I am using some kernel modules as reference: drivers_scsi_st.c and drivers-net-sh_eth.c. I would look at infiniband code, but cant find which one is the most basic!
Many thanks in advance.
I'm actually working on exactly the same thing right now and I'm going the ioctl() route. The general idea is for user space to allocate the buffer which will be used for the DMA transfer and an ioctl() will be used to pass the size and address of this buffer to the device driver. The driver will then use scatter-gather lists along with the streaming DMA API to transfer data directly to and from the device and user-space buffer.
The implementation strategy I'm using is that the ioctl() in the driver enters a loop that DMA's the userspace buffer in chunks of 256k (which is the hardware imposed limit for how many scatter/gather entries it can handle). This is isolated inside a function that blocks until each transfer is complete (see below). When all bytes are transfered or the incremental transfer function returns an error the ioctl() exits and returns to userspace
Pseudo code for the ioctl()
/*serialize all DMA transfers to/from the device*/
if (mutex_lock_interruptible( &device_ptr->mtx ) )
return -EINTR;
chunk_data = (unsigned long) user_space_addr;
while( *transferred < total_bytes && !ret ) {
chunk_bytes = total_bytes - *transferred;
if (chunk_bytes > HW_DMA_MAX)
chunk_bytes = HW_DMA_MAX; /* 256kb limit imposed by my device */
ret = transfer_chunk(device_ptr, chunk_data, chunk_bytes, transferred);
chunk_data += chunk_bytes;
chunk_offset += chunk_bytes;
}
mutex_unlock(&device_ptr->mtx);
Pseudo code for incremental transfer function:
/*Assuming the userspace pointer is passed as an unsigned long, */
/*calculate the first,last, and number of pages being transferred via*/
first_page = (udata & PAGE_MASK) >> PAGE_SHIFT;
last_page = ((udata+nbytes-1) & PAGE_MASK) >> PAGE_SHIFT;
first_page_offset = udata & PAGE_MASK;
npages = last_page - first_page + 1;
/* Ensure that all userspace pages are locked in memory for the */
/* duration of the DMA transfer */
down_read(&current->mm->mmap_sem);
ret = get_user_pages(current,
current->mm,
udata,
npages,
is_writing_to_userspace,
0,
&pages_array,
NULL);
up_read(&current->mm->mmap_sem);
/* Map a scatter-gather list to point at the userspace pages */
/*first*/
sg_set_page(&sglist[0], pages_array[0], PAGE_SIZE - fp_offset, fp_offset);
/*middle*/
for(i=1; i < npages-1; i++)
sg_set_page(&sglist[i], pages_array[i], PAGE_SIZE, 0);
/*last*/
if (npages > 1) {
sg_set_page(&sglist[npages-1], pages_array[npages-1],
nbytes - (PAGE_SIZE - fp_offset) - ((npages-2)*PAGE_SIZE), 0);
}
/* Do the hardware specific thing to give it the scatter-gather list
and tell it to start the DMA transfer */
/* Wait for the DMA transfer to complete */
ret = wait_event_interruptible_timeout( &device_ptr->dma_wait,
&device_ptr->flag_dma_done, HZ*2 );
if (ret == 0)
/* DMA operation timed out */
else if (ret == -ERESTARTSYS )
/* DMA operation interrupted by signal */
else {
/* DMA success */
*transferred += nbytes;
return 0;
}
The interrupt handler is exceptionally brief:
/* Do hardware specific thing to make the device happy */
/* Wake the thread waiting for this DMA operation to complete */
device_ptr->flag_dma_done = 1;
wake_up_interruptible(device_ptr->dma_wait);
Please note that this is just a general approach, I've been working on this driver for the last few weeks and have yet to actually test it... So please, don't treat this pseudo code as gospel and be sure to double check all logic and parameters ;-).
You basically have the right idea: in 2.1, you can just have userspace allocate any old memory. You do want it page-aligned, so posix_memalign() is a handy API to use.
Then have userspace pass in the userspace virtual address and size of this buffer somehow; ioctl() is a good quick and dirty way to do this. In the kernel, allocate an appropriately sized buffer array of struct page* -- user_buf_size/PAGE_SIZE entries -- and use get_user_pages() to get a list of struct page* for the userspace buffer.
Once you have that, you can allocate an array of struct scatterlist that is the same size as your page array and loop through the list of pages doing sg_set_page(). After the sg list is set up, you do dma_map_sg() on the array of scatterlist and then you can get the sg_dma_address and sg_dma_len for each entry in the scatterlist (note you have to use the return value of dma_map_sg() because you may end up with fewer mapped entries because things might get merged by the DMA mapping code).
That gives you all the bus addresses to pass to your device, and then you can trigger the DMA and wait for it however you want. The read()-based scheme you have is probably fine.
You can refer to drivers/infiniband/core/umem.c, specifically ib_umem_get(), for some code that builds up this mapping, although the generality that that code needs to deal with may make it a bit confusing.
Alternatively, if your device doesn't handle scatter/gather lists too well and you want contiguous memory, you could use get_free_pages() to allocate a physically contiguous buffer and use dma_map_page() on that. To give userspace access to that memory, your driver just needs to implement an mmap method instead of the ioctl as described above.
At some point I wanted to allow user-space application to allocate DMA buffers and get it mapped to user-space and get the physical address to be able to control my device and do DMA transactions (bus mastering) entirely from user-space, totally bypassing the Linux kernel. I have used a little bit different approach though. First I started with a minimal kernel module that was initializing/probing PCIe device and creating a character device. That driver then allowed a user-space application to do two things:
Map PCIe device's I/O bar into user-space using remap_pfn_range() function.
Allocate and free DMA buffers, map them to user space and pass a physical bus address to user-space application.
Basically, it boils down to a custom implementation of mmap() call (though file_operations). One for I/O bar is easy:
struct vm_operations_struct a2gx_bar_vma_ops = {
};
static int a2gx_cdev_mmap_bar2(struct file *filp, struct vm_area_struct *vma)
{
struct a2gx_dev *dev;
size_t size;
size = vma->vm_end - vma->vm_start;
if (size != 134217728)
return -EIO;
dev = filp->private_data;
vma->vm_ops = &a2gx_bar_vma_ops;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_private_data = dev;
if (remap_pfn_range(vma, vma->vm_start,
vmalloc_to_pfn(dev->bar2),
size, vma->vm_page_prot))
{
return -EAGAIN;
}
return 0;
}
And another one that allocates DMA buffers using pci_alloc_consistent() is a little bit more complicated:
static void a2gx_dma_vma_close(struct vm_area_struct *vma)
{
struct a2gx_dma_buf *buf;
struct a2gx_dev *dev;
buf = vma->vm_private_data;
dev = buf->priv_data;
pci_free_consistent(dev->pci_dev, buf->size, buf->cpu_addr, buf->dma_addr);
buf->cpu_addr = NULL; /* Mark this buffer data structure as unused/free */
}
struct vm_operations_struct a2gx_dma_vma_ops = {
.close = a2gx_dma_vma_close
};
static int a2gx_cdev_mmap_dma(struct file *filp, struct vm_area_struct *vma)
{
struct a2gx_dev *dev;
struct a2gx_dma_buf *buf;
size_t size;
unsigned int i;
/* Obtain a pointer to our device structure and calculate the size
of the requested DMA buffer */
dev = filp->private_data;
size = vma->vm_end - vma->vm_start;
if (size < sizeof(unsigned long))
return -EINVAL; /* Something fishy is happening */
/* Find a structure where we can store extra information about this
buffer to be able to release it later. */
for (i = 0; i < A2GX_DMA_BUF_MAX; ++i) {
buf = &dev->dma_buf[i];
if (buf->cpu_addr == NULL)
break;
}
if (buf->cpu_addr != NULL)
return -ENOBUFS; /* Oops, hit the limit of allowed number of
allocated buffers. Change A2GX_DMA_BUF_MAX and
recompile? */
/* Allocate consistent memory that can be used for DMA transactions */
buf->cpu_addr = pci_alloc_consistent(dev->pci_dev, size, &buf->dma_addr);
if (buf->cpu_addr == NULL)
return -ENOMEM; /* Out of juice */
/* There is no way to pass extra information to the user. And I am too lazy
to implement this mmap() call using ioctl(). So we simply tell the user
the bus address of this buffer by copying it to the allocated buffer
itself. Hacks, hacks everywhere. */
memcpy(buf->cpu_addr, &buf->dma_addr, sizeof(buf->dma_addr));
buf->size = size;
buf->priv_data = dev;
vma->vm_ops = &a2gx_dma_vma_ops;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_private_data = buf;
/*
* Map this DMA buffer into user space.
*/
if (remap_pfn_range(vma, vma->vm_start,
vmalloc_to_pfn(buf->cpu_addr),
size, vma->vm_page_prot))
{
/* Out of luck, rollback... */
pci_free_consistent(dev->pci_dev, buf->size, buf->cpu_addr,
buf->dma_addr);
buf->cpu_addr = NULL;
return -EAGAIN;
}
return 0; /* All good! */
}
Once those are in place, user space application can pretty much do everything — control the device by reading/writing from/to I/O registers, allocate and free DMA buffers of arbitrary size, and have the device perform DMA transactions. The only missing part is interrupt-handling. I was doing polling in user space, burning my CPU, and had interrupts disabled.
Hope it helps. Good Luck!
I'm getting confused with the direction to implement. I want to...
Consider the application when designing a driver.
What is the nature of data movement, frequency, size and what else might be going on in the system?
Is the traditional read/write API sufficient?
Is direct mapping the device into user space OK?
Is a reflective (semi-coherent) shared memory desirable?
Manually manipulating data (read/write) is a pretty good option if the data lends itself to being well understood. Using general purpose VM and read/write may be sufficient with an inline copy. Direct mapping non cachable accesses to the peripheral is convenient, but can be clumsy. If the access is the relatively infrequent movement of large blocks, it may make sense to use regular memory, have the drive pin, translate addresses, DMA and release the pages. As an optimization, the pages (maybe huge) can be pre pinned and translated; the drive then can recognize the prepared memory and avoid the complexities of dynamic translation. If there are lots of little I/O operations, having the drive run asynchronously makes sense. If elegance is important, the VM dirty page flag can be used to automatically identify what needs to be moved and a (meta_sync()) call can be used to flush pages. Perhaps a mixture of the above works...
Too often people don't look at the larger problem, before digging into the details. Often the simplest solutions are sufficient. A little effort constructing a behavioral model can help guide what API is preferable.
first_page_offset = udata & PAGE_MASK;
It seems wrong. It should be either:
first_page_offset = udata & ~PAGE_MASK;
or
first_page_offset = udata & (PAGE_SIZE - 1)
It is worth mention that driver with Scatter-Gather DMA support and user space memory allocation is most efficient and has highest performance. However in case we don't need high performance or we want to develop a driver in some simplified conditions we can use some tricks.
Give up zero copy design. It is worth to consider when data throughput is not too big. In such a design data can by copied to user by
copy_to_user(user_buffer, kernel_dma_buffer, count);
user_buffer might be for example buffer argument in character device read() system call implementation. We still need to take care of kernel_dma_buffer allocation. It might by memory obtained from dma_alloc_coherent() call for example.
The another trick is to limit system memory at the boot time and then use it as huge contiguous DMA buffer. It is especially useful during driver and FPGA DMA controller development and rather not recommended in production environments. Lets say PC has 32GB of RAM. If we add mem=20GB to kernel boot parameters list we can use 12GB as huge contiguous dma buffer. To map this memory to user space simply implement mmap() as
remap_pfn_range(vma,
vma->vm_start,
(0x500000000 >> PAGE_SHIFT) + vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot)
Of course this 12GB is completely omitted by OS and can be used only by process which has mapped it into its address space. We can try to avoid it by using Contiguous Memory Allocator (CMA).
Again above tricks will not replace full Scatter-Gather, zero copy DMA driver, but are useful during development time or in some less performance platforms.

PTE structure in the linux kernel

I have been trying to look around in the linux source code for a structure/union that'd correspond to the PTE on x86 system with PAE disabled. So far I've found only the following in arch/x86/include/asm/page_32.h
typedef union {
pteval_t pte;
pteval_t pte_low;
} pte_t;
I'm a bit confused right now since I have the Intel Reference Manual Vol 3A open in front of me and nothing in that union corresponds to the dozen odd fields present in the PTE as the manual explains.
This might be a trivial question but for me it has become more like a stumbling block in the process of understanding memory management in the linux kernel.
EDIT: I have the 2.6.29 source with me
The pteval_t just treats the page table entry as an opaque blob - on the architecture you're looking at, it's just a 32 bit unsigned value.
The fields within the PTE are accessed using bitwise operators and masks - in the source I have handy (Linux 2.6.24), these are defined in include/asm-x86/pgtable_32.h. The fields you see in the Intel Reference Manual (most of which are single-bit flags) are defined here - for example:
#define _PAGE_PRESENT 0x001
#define _PAGE_RW 0x002
#define _PAGE_USER 0x004
#define _PAGE_PWT 0x008
#define _PAGE_PCD 0x010
#define _PAGE_ACCESSED 0x020
#define _PAGE_DIRTY 0x040
#define _PAGE_PSE 0x080 /* 4 MB (or 2MB) page, Pentium+, if present.. */
#define _PAGE_GLOBAL 0x100 /* Global TLB entry PPro+ */
#define _PAGE_UNUSED1 0x200 /* available for programmer */
#define _PAGE_UNUSED2 0x400
#define _PAGE_UNUSED3 0x800
I would recommend buying Understanding the Linux Kernel from O'REILLY, as well as Linux Device Drivers. And subscribing to LWN.net; though you can get a pretty good start from their kernel index page even without a subscription.
For the memory management, look on the index page for the "Memory management" section... and the "Large-memory systems" section. The latter has a few articles that talk about the move to four-level page tables that should be helpful in understanding this area of the code.

Resources