Low level IO in linux kernel - linux

I am writing a driver for a device that generates a bunch of data and stores its in a memory buffer. Driver should read this buffer and store data to a nvme storage. Device and memory buffer are implemented in fpga logic. The buffer size is about 1G. CPU sees it like a regular ram but linux know nothing about it and this is a problem. When I use bio layer in order to save the data I need a strust* page pointer but I don’t have one.
The question is:
Is there any way to save the data from the buffer using just a physical address and size?
Or I have to use pages so I need to add this buffer to linux memory pool somehow.

Related

How to get the buffer pointer inside the driver which gets allocated in videobuf2 framework of linux subsystem?

I am using videobuf2 framework in one of my drivers development. I am using VB2_MMAP as the memory type. So, the memory for the frame buffer will be allocated in the kernel space. As per the documentation, in the buf_queue callback of vb2_ops I will get the buffer pointer to map to any DMA operation.
Below are my questions.
How exactly I will get the memory address of the buffer to map to DMA?
If I get the memory address of the buffer, how can I get the address of the pages based on the transfer length? I somewhere read that, since its a kernel space, we don't need to pin any pages. Is it true? If yes, then how can we get the address of the already pinned pages?

Store variable anywhere (swap space, disk) but not in physical memory

I know that it's possible to force a variable to be stored in physical memory using mlock() function.
void *buffer = malloc(buf_size);
mlock(buffer, buf_size);
// If there is no error when executing these instructions,
// On First Write to buffer, the buffer will be stored in physical memory
However, what if we want to make sure that the variable will never reside in physical memory. Is it possible to do that ? If yes, how Linux allows to do this in userspace.
When something is written to disk, the disk controller reads the contents of the file via DMA. DMA is the abbreviation for direct memory access, and the term "memory" is key here. It will access memory. This is even OS independent, because that's implemented in hardware.
system("wget http://example.com/?x=2+2");
This will store the variable x with a value of 4 on my webserver, not in the RAM of your PC. Except for extreme examples like this, I cannot think of any solution.

Readback of writes via an mmap mapped region returns wrong values

I am working on a Zynq board with custom FPGA design, where I have some Block Memories and I want read/write to them from software running on ARM. I have used mmap to map the address space of the block memory to user space from /dev/mem. I write a chunk of data (less than one kernel page) to the mmap'ed region one by one. When I read back the written address immediately after write I get back the correct data. But when I read back the data after writing the entire chunk, what I get is the repetition of the last 16 written addresses. I use O_SYNC to open /dev/mem and use MAP_SHARED for mmap.
What is going wrong here? Is this some sort of caching issue?

DMA and I/O memory region under Linux

I'm writing this because I have some doubts about the behaviour of DMA.
I'm reading about the PCI layout and how the device drivers interacts with the card, and I read about DMA.
Since I understood, PCI cards doesn't have a DMA controller, instead of that they request to be a master of the bus, and then they are able to take a DMA address and do transfers between memory and device ( through the bus ).
This DMA address is a portion of RAM, actually it's a physical address and before do nothing you need to convert that in something that your drivers can use, like a kernel virtual memory.
I've checked that with this code:
/* Virtual kernel address */
kernel_buff = pci_alloc_consistent(dev, PAGE_SIZE, &dma_addr);
pr_info("Kernel buffer - %12p , Dma_addr - %12p\n", kernel_buff, (void *)dma_addr );
pr_info( "Kernelbuffer - dma_addr - %12p\n", kernel_buff - dma_addr);
strcpy(kernel_buff, "Test dma\n");
/* Test memory */
ptest = (void *)dma_addr;
ptest = phys_to_virt((unsigned long)ptest);
pr_info("Ptest virtual memory(%p) containts - %s\n", ptest, (char *)ptest);
And the output was:
[425971.835669] Kernel buffer - ffff8800ca70a000 , Dma_addr - ca70a000
[425971.835671] Kernelbuffer - dma_addr - ffff880000000000
[425971.835673] Ptest virtual memory(ffff8800ca70a000) containts - Test dma
This is how I understood that DMA is a portion of RAM.
My doubt is about how this transfer is made.
I mean, every time that I write in this buffer, the data that the buffer constains will be transfered to the device? Or only the adress of the memory location, and then the device will read from this location?
This is about DMA.
And about I/O memory maps:
When we request a I/O memory region of the device with for example:
pci_resource_start
We are requesting the region of the memory where device's registers is located?
So in this way we have this memory location into the RAM ? And we wan write/read as a normal memory locations.
And the final point is that, we use DMA because the I/O memory mapping only allows few bytes per cycle since this process involves the CPU, right?
So we can transfer amounts of data between memory locations( RAM and bus of device) without the cpu.
The steps involved to transfer the data to the device could be summarized as follows :
Assume that you have the data in a buffer.
The driver creates a DMA mapping for this buffer (say using pci_alloc_consistent() or the newer dma_alloc_coherent()), and returns the corresponding DMA bus address.
This DMA bus address is to be informed to the device. This is done by writing into the correct DMA registers of the device through writel() (assuming that the device registers are memory mapped).
The device also needs to be informed about the amount of data that is being transferred and such (by writing to the appropriate registers of the device using writel())
Now issue the command to the device to start the DMA transactions by writing to one of its control registers (again possibly using writel()).
Once the data transaction is completed, the device issues an interrupt.
In the interrupt handler, the driver may unallocate the buffer which was used for transaction and might as well perform DMA unmapping.
And there you have it.. The data is transferred to the device!
Now coming to the question regarding the IO memory maps :
First of all when we call pci_resource_start(), we do not "request" the IO ports. This is the way by which we are just gathering info. about the ports. The request is done using pci_request_regions(). To be specific to your questions :
We are requesting the region of the memory where device's registers is located?
Using this, we are requesting the kernel to give access to this region of memory (memory mapped ports) where the device's registers are located.
So in this way we have this memory location into the RAM ?
No, we do not have this memory location in RAM, it is only memory mapped, which means that the device shares the same address, data and control lines with the RAM and hence, same instruction that are used to access the RAM can also be used to access the device registers.
You've answered your last question yourself. DMA provides huge amounts to data to be transferred efficiently. But, there are cases where you need to use the memory mapping to transfer the data. The best example is already stated in the explanation of DMA transaction process, where, you need to transfer the address and control information to the device. This could be done only through memory mapped IO.
Hope this helps.

What is the difference between DMA and memory-mapped IO?

What is the difference between DMA and memory-mapped IO? They both look similar to me.
Memory-mapped I/O allows the CPU to control hardware by reading and writing specific memory addresses. Usually, this would be used for low-bandwidth operations such as changing control bits.
DMA allows hardware to directly read and write memory without involving the CPU. Usually, this would be used for high-bandwidth operations such as disk I/O or camera video input.
Here is a paper has a thorough comparison between MMIO and DMA.
Design Guidelines for High Performance RDMA Systems
Since others have already answered the question, I'll just add a little bit of history.
Back in the old days, on x86 (PC) hardware, there was only I/O space and memory space. These were two different address spaces, accessed with different bus protocol and different CPU instructions, but able to talk over the same plug-in card slot.
Most devices used I/O space for both the control interface and the bulk data-transfer interface. The simple way to access data was to execute lots of CPU instructions to transfer data one word at a time from an I/O address to a memory address (sometimes known as "bit-banging.")
In order to move data from devices to host memory autonomously, there was no support in the ISA bus protocol for devices to initiate transfers. A compromise solution was invented: the DMA controller. This was a piece of hardware that sat up by the CPU and initiated transfers to move data from a device's I/O address to memory, or vice versa. Because the I/O address is the same, the DMA controller is doing the exact same operations as a CPU would, but a little more efficiently and allowing some freedom to keep running in the background (though possibly not for long as it can't talk to memory).
Fast-forward to the days of PCI, and the bus protocols got a lot smarter: any device can initiate a transfer. So it's possible for, say, a RAID controller card to move any data it likes to or from the host at any time it likes. This is called "bus master" mode, but for no particular reason people continue to refer to this mode as "DMA" even though the old DMA controller is long gone. Unlike old DMA transfers, there is frequently no corresponding I/O address at all, and the bus master mode is frequently the only interface present on the device, with no CPU "bit-banging" mode at all.
Memory-mapped IO means that the device registers are mapped into the machine's memory space - when those memory regions are read or written by the CPU, it's reading from or writing to the device, rather than real memory. To transfer data from the device to an actual memory buffer, the CPU has to read the data from the memory-mapped device registers and write it to the buffer (and the converse for transferring data to the device).
With a DMA transfer, the device is able to directly transfer data to or from a real memory buffer itself. The CPU tells the device the location of the buffer, and then can perform other work while the device is directly accessing memory.
Direct Memory Access (DMA) is a technique to transfer the data from I/O to memory and from memory to I/O without the intervention of the CPU. For this purpose, a special chip, named DMA controller, is used to control all activities and synchronization of data. As result, compare to other data transfer techniques, DMA is much faster.
On the other hand, Virtual memory acts as a cache between main memory and secondary memory. Data is fetched in advance from the secondary memory (hard disk) into the main memory so that data is already available in the main memory when needed. It allows us to run more applications on the system than we have enough physical memory to support.

Resources