How does an IO device know that a value in memory pertaining to it has changed in memory mapped IO?
For example, let's say memory address 0 has been dedicated to hold the background color for a VGA device. How does the VGA device know when we change the value in memory[0]? Is the VGA device constantly polling the memory location? Or does the CPU somehow notify the device when it changes the value (and if so how?)?
An example architecture is MIPS. Given that the MIPS instruction set does not have in or out instructions, I don't understand how it could possibly communicate (on change) with the VGA device in the example. Another example is the ARM architecture.
In memory-mapped I/O, performing a memory read/write to the device's memory region will cause the CPU to perform a transaction with the device to fetch/store that value -- either directly through the CPU's memory bus, or through a secondary bus (such as AHB/APB on ARM systems). This memory transaction directly notifies the device that a value is being changed; no separate notification is necessary.
You're assuming that memory-mapped I/O is mapped by normal RAM. This is not the case. Indeed, these devices may behave in ways which are entirely unlike real memory! For instance, a typical UART or SPI device implementation may have a single data register which can be written to to transmit data, or read from to retrieve received data. Similarly, it's not uncommon for interrupt registers to have "clear on read" or "write 1 to clear" semantics.
For what it's worth: in practice, many framebuffer graphics implementations do actually behave as normal memory. What's different is that the memory is stored in a dual-ported RAM (or a time-multiplexed bus), and the video RAMDAC continuously reads through that memory to transmit its contents to an attached display.
A region of the physical address space that is designated as memory-mapped I/O (MMIO) is not mapped to main memory (system memory); it's mapped to I/O registers which are physically part of the I/O device.
To determine how to handle a memory access (read or write), the processor checks first the type of the region to which the target memory address belongs. In any MIPS processor, there are at least two types: Uncached and Cached. MMIO regions are always Uncached. An Uncached memory access request is directly sent to the main memory controller without examining or affecting any of the caches. However, an I/O Uncached memory access request is sent to an I/O controller, and eventually the request will reach the destination I/O device.
Now exactly how the CPU and the I/O device communicate with each other is completely specified by the I/O device itself. So an I/O device would have a specification that discusses how many I/O registers there are and how each of them should be used. An I/O register could be used to hold status flags, control flags, data to be read or written by the CPU, or some combination thereof. Note that since the I/O registers are physically part of the I/O device, then the I/O device can be designed so that it can detect when any of its registers are being read from or written to and take an action accordingly if required.
An I/O device can send an interrupt to the CPU to inform it that some data is available or maybe it wants attention for whatever reason. The CPU can also frequently poll the I/O device by checking some status flag(s) and then take some action accordingly.
Related
Based on what I have learned from the comments and answers (thanks everyone!), I edited the question to be more targeted:
DMA:
Before first DMA, CPU has to setup things like RAM address range reserved to be used by the device for DMA. Once the setup work is done, can the device initiate the transfer at will, basically owning that part of RAM, or it still has to get some sort of permission from CPU again before every sinlge DMA transfer?
MMIO:
CPU accessing the device memory via mmio is more expensive compared to CPU accessing RAM, but I can see on my desktop, pci devices reserve hundreds of mega bytes for mmio, what is an example that this can be used efficiently (As opposed to copy the data back to RAM using DMA and then access them)?
Look at it from the device's perspective. The device can:
directly access memory itself (using DMA itself)
wait for the CPU to transfer data to it (by providing memory mapped IO for CPU to use)
So the question is, if the CPU can access the PCIe memory by memory-maps, why does it have to do DMAs?
The CPU doesn't use DMA at all. The entire point of DMA is to allow the CPU to do other things (or nothing) while the device does the DMA. The end result is a significant performance increase for the system as a whole - e.g. CPU/s doing lots of other work, while lots of devices (hard drive controller, video card, audio card, network card, ...) are also using DMA to transfer data around.
CPU can access this memory as if it is DRAM, by memory mapped IO.
You're misusing terminology.
Instead of "DRAM" you should be using the term "main memory", aka system memory or RAM.
On modern computers main/system memory is implemented by some type of SDRAM (synchronous dynamic RAM).
Conflating the functional term (e.g. main memory) with the hardware implementation (e.g. DDR3 SDRAM) seems harmless, but can lead to the false syllogism that "RAM is volatile" or other misunderstandings.
Memory mapping can put the memory/memories of a PCIe device in the same address space as main memory.
CPU can transfer a chunk of data from this PCIe device's memory into real physical memory, via DMA. And then CPU can access the physical memory freely.
"Real physical memory" is redundant. What other types of "physical memory" are there?
There's no "fake physical memory".
You seem to be referring to the use of a buffer in main memory as "DMA".
That is misguided.
DMA is not required in order to employ or copy data to a buffer in main memory.
So the question is, if the CPU can access the PCIe memory by memory-maps, why does it have to do DMAs?
You seem to be misusing terminology.
You might want to study this article on PCIe.
Is it because PCIe bus is slow for random access?
Accessing data from a PCIe device is very slow compared to main/system memory.
This has nothing to do with "random access".
Information (e.g. data retrieval) over the PCIe bus is accomplished with (high-speed) packets (even when the PCIe memory is mapped into processor address space).
And if so, DMA is basically a single dump to speedup frequent random access, and memory-mapped IO is for occasional access?
You're misusing terminology.
If the software is written inefficiently or only needs to use the data just once, then it might access the PCIe memory.
But if the software is going to access the data more than once or deems a "local" copy to be more efficient, then the software could allocate a buffer in main/system memory and copy the data from PCIe memory to main/system memory using either PIO (programmed I/O by the CPU) or DMA (direct memory access by a PCIe bus master or system DMA controller).
The use of buffers is widespread in computers.
A large part of "computing time" is spent on buffering and copying and moving data around.
I/O is almost always performed between a device and a buffer in main memory, even if direct device-to-device transfer is possible.
Do not mislabel the use of a buffer as "DMA".
For some info on DMA, see Why driver need to map DMA buffers when dma-engine is in device?
and
dma vs interrupt-driven i/o .
DMA is usually done by the CPU programming registers on the device mapped to MMIO regions. It wouldn't make sense to map an entire hard drive into physical address space and would quickly use up the available physcial address space on the chipset which is often limited to as low as 39 bit on modern chipsets, so instead only the host controller (xHCI, AHCI etc.) registers are mapped into the MMIO space. It would also mean that the CPU would be using mov commands to copy the data to the hard drive for the entire transfer, which occupies CPU bandwidth. Instead DMA is asynchronous and the CPU issues a command to the device, and the device, PCIe bus, DRAM controller, gets on with it, while the CPU is free.
With a IGPU without dedicated VRAM, you have VRAM in DRAM (GFX stolen memory), which is reserved for the IGPU and is of course accessible by the IGPU and CPU. You also have a GTT page table in DRAM that the IGPU uses to translate internal virtual addresses to physical pages that it then accesses via DMA over the ring bus. The CPU renders there and programs the IGPU to perform DMA to read it in to the IGPU.
On a discrete GPU with VRAM, the CPU writes to DRAM and then inserts then address of the allocation into the GTT table in VRAM via the VRAM aperture, and then programs the GPU to copy from the equivalent GART aperture address that corresponds mathematically to that GTT entry – the aperture is a contiguous GPU device local address space separate to VRAM. The GPU then reads from the aperture space which results in it indexing into the GTT and acquires the real system address of the data and then initiates a DMA transfer from the real system memory address to an arbitrary address in the 256MiB VRAM aperture. There is also the option of using PCIe BARs or resizable BARs to expose a VRAM aperture to which the CPU can directly write to without the need of a copy. Another advantage of this is a cpu core could interleave several transfers, or several cores could work on different transfers, but with DMA, the GPU can likely only perform one DMA transfer at a time sequentially/synchronously with no concurrency or parallelism.
I'm writing this because I have some doubts about the behaviour of DMA.
I'm reading about the PCI layout and how the device drivers interacts with the card, and I read about DMA.
Since I understood, PCI cards doesn't have a DMA controller, instead of that they request to be a master of the bus, and then they are able to take a DMA address and do transfers between memory and device ( through the bus ).
This DMA address is a portion of RAM, actually it's a physical address and before do nothing you need to convert that in something that your drivers can use, like a kernel virtual memory.
I've checked that with this code:
/* Virtual kernel address */
kernel_buff = pci_alloc_consistent(dev, PAGE_SIZE, &dma_addr);
pr_info("Kernel buffer - %12p , Dma_addr - %12p\n", kernel_buff, (void *)dma_addr );
pr_info( "Kernelbuffer - dma_addr - %12p\n", kernel_buff - dma_addr);
strcpy(kernel_buff, "Test dma\n");
/* Test memory */
ptest = (void *)dma_addr;
ptest = phys_to_virt((unsigned long)ptest);
pr_info("Ptest virtual memory(%p) containts - %s\n", ptest, (char *)ptest);
And the output was:
[425971.835669] Kernel buffer - ffff8800ca70a000 , Dma_addr - ca70a000
[425971.835671] Kernelbuffer - dma_addr - ffff880000000000
[425971.835673] Ptest virtual memory(ffff8800ca70a000) containts - Test dma
This is how I understood that DMA is a portion of RAM.
My doubt is about how this transfer is made.
I mean, every time that I write in this buffer, the data that the buffer constains will be transfered to the device? Or only the adress of the memory location, and then the device will read from this location?
This is about DMA.
And about I/O memory maps:
When we request a I/O memory region of the device with for example:
pci_resource_start
We are requesting the region of the memory where device's registers is located?
So in this way we have this memory location into the RAM ? And we wan write/read as a normal memory locations.
And the final point is that, we use DMA because the I/O memory mapping only allows few bytes per cycle since this process involves the CPU, right?
So we can transfer amounts of data between memory locations( RAM and bus of device) without the cpu.
The steps involved to transfer the data to the device could be summarized as follows :
Assume that you have the data in a buffer.
The driver creates a DMA mapping for this buffer (say using pci_alloc_consistent() or the newer dma_alloc_coherent()), and returns the corresponding DMA bus address.
This DMA bus address is to be informed to the device. This is done by writing into the correct DMA registers of the device through writel() (assuming that the device registers are memory mapped).
The device also needs to be informed about the amount of data that is being transferred and such (by writing to the appropriate registers of the device using writel())
Now issue the command to the device to start the DMA transactions by writing to one of its control registers (again possibly using writel()).
Once the data transaction is completed, the device issues an interrupt.
In the interrupt handler, the driver may unallocate the buffer which was used for transaction and might as well perform DMA unmapping.
And there you have it.. The data is transferred to the device!
Now coming to the question regarding the IO memory maps :
First of all when we call pci_resource_start(), we do not "request" the IO ports. This is the way by which we are just gathering info. about the ports. The request is done using pci_request_regions(). To be specific to your questions :
We are requesting the region of the memory where device's registers is located?
Using this, we are requesting the kernel to give access to this region of memory (memory mapped ports) where the device's registers are located.
So in this way we have this memory location into the RAM ?
No, we do not have this memory location in RAM, it is only memory mapped, which means that the device shares the same address, data and control lines with the RAM and hence, same instruction that are used to access the RAM can also be used to access the device registers.
You've answered your last question yourself. DMA provides huge amounts to data to be transferred efficiently. But, there are cases where you need to use the memory mapping to transfer the data. The best example is already stated in the explanation of DMA transaction process, where, you need to transfer the address and control information to the device. This could be done only through memory mapped IO.
Hope this helps.
This more a general question. Consider an external device. From time to time this device writes data via its device driver to a specific memory address. I want to write a small C program which read out this data. Is there a better way than just polling this address to check if the value has been changed? I want to keep the CPU load low.
I have done some further research
Is "memory mapped IO" an option? My naive idea is to let the external device writes a flag to a "memory mapped IO"-address which triggers a kernel device driver. The driver then "informs" the program which proceed the value. Can this work? How can a driver informs the program?
The answer may depend on what processor you intend to use, what the device is and possibly whether you are using an operating system or RTOS.
Memory mapped I/O per se is not a solution, that simply refers to I/O device registers that can be directly addressed via normal memory access instructions. Most devices will generate an interrupt when certain registers are updated or contain new valid data.
In general if using an RTOS you can arrange for the device driver to signal via a suitable IPC mechanism any client thread(s) that need to handle the data. If you are not using an RTOS, you could simply register a callback with the device driver which it would call whenever the data is updated. What the client does in the call back is its business - including reading the new data.
If the device in question generates interrupts, then the handling can be done on interrupt, if the device is capable of DMA, then it can handle blocks of data autonomously before teh DMA controller generates an DMA interrupt to a handler.
Given the starting memory address & word count DMA controller transfers data while the CPU works on some other process.
The Input Output processor too handles I/O processes given the starting address & word count..
(correct me if I'm in error)
So what's the difference in functionality between IOP & DMA controller?
In case of memory specific I/O operations (Simple example instructions like lw $r1,$r2,16 in case of MIPS processor), CPU needs to get the data from memory,to facilitate I/O operations. And so CPU has to pause any other operation and monitor the memory READ/WRITE operation till it is not completed. In other words CPU is totally occupied as long as read/write operation is in progress without DMA .If the processor was free during this time,then processor could have executed some other instructions .
Direct Memory Access(DMA):
DMA provides this capability to carry out memory specific operations with minimal CPU intervention. When any I/O device needs a memory access. It sends a DMA request(in form of interrupt) to CPU. CPU initiates the the transfer by providing appropriate grant signals to the data bus. And passes the control to the DMA controller which controls the rest of the data transfer and transfers the data directly to I/O device. During this time, CPU continues with other instructions. Once the Read/Write operation in completed (or any exception is occurred )the DMA controller initiates an interrupt and notifies the processor about the status of read/write operation.
In this way the read/write operation is also carried out and CPU also executes some other instruction during that time. However, initialization of DMA still requires CPU intervention. And so the overall performance is maximized.
I/O processor
You can think I/O processor along the lines of DMA approach.
The I/O processor, generally used in large computer systems, is a coprocessor which is capable of executing the instructions in addition to transfer of data. By the way, the coprocessor instruction system is different from the central processing unit.
CPU can execute the I/O specific program by initializing the basic operations like enabling the data path and setting up the I/O devices participating in operation. And then it transfers the task to I/O processor,which then carry out rest of the tasks and upon completion notifies the processor. The processor meanwhile executes other important instructions.
The I/O processor is essentially a small DMA dedicated processor that can execute limited input and output instructions and can be shared by multiple peripherals.
The I/O processor solves two problems:
The job of input and output is assumed by the CPU.
Although DMA does not require CPU for data exchange between peripherals and memory, it only reduces the burden of CPU. Because in DMA, the initialization of input and output is still done by CPU.
The problem of sharing DMA interface of high speed equipment in large computer system. A large computer system peripherals so much that it had to share the DMA interface Limited (small computer systems such as PC in each device is assigned a DMA high speed interface).
DMA is a hardware module able to transfer data between a peripheral and memory (UART, SPI, DAC, ADCs) or two differents memory addresses without consuming CPU processing time. Generally, configuirng DMA modules involves setting up a memory destination address and a source address, also users are able to configure options such as: buffer data size, automatic address increment and circular buffer. Moreover, these kind of module emits a IRQ signal at the end of the data transfer.
There is a DMA configuration example below for the microcontroller STM32F373. The example shows a DMA configuration between sigma-delta ADC and a memory buffer.
DMA_InitTypeDef DMA_InitStructure;
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_DMA2, ENABLE);
DMA_DeInit(DMA2_Channel3);
/* DISABLE the DMA SDADC1 channel */
DMA_Cmd(DMA2_Channel3, DISABLE);
/* DMA channel SDADC1 Configuration */
DMA_InitStructure.DMA_BufferSize = bufferSize;
DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)&SDADC1->JDATAR;
DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_HalfWord;
DMA_InitStructure.DMA_MemoryBaseAddr = (uint32_t)memoryAddress;
DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc;
DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_HalfWord;
DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralSRC;
DMA_InitStructure.DMA_Priority = DMA_Priority_High;
DMA_InitStructure.DMA_Mode = DMA_Mode_Circular;
DMA_InitStructure.DMA_M2M = DMA_M2M_Disable;
DMA_Init(DMA2_Channel3, &DMA_InitStructure);
/* Avoid interrupt on DMA ENABLE */
DMA_ClearITPendingBit(DMA2_FLAG_TC3);
// Enable DMA2 Channel Transfer Complete interrupt
DMA_ITConfig(DMA2_Channel3, DMA_IT_TC, ENABLE);
/* Enable the DMA channel */
DMA_Cmd(DMA2_Channel3, ENABLE);
Regarding to I/O processor, I didn't understand it all what did you you mean. But I can say that GPIO hardware modules are able to map general digital input/output to a memory address, i.e: The I/O I/o has a memory address, but read and write operations in fact are done in a peripheral register.
What is the difference between DMA and memory-mapped IO? They both look similar to me.
Memory-mapped I/O allows the CPU to control hardware by reading and writing specific memory addresses. Usually, this would be used for low-bandwidth operations such as changing control bits.
DMA allows hardware to directly read and write memory without involving the CPU. Usually, this would be used for high-bandwidth operations such as disk I/O or camera video input.
Here is a paper has a thorough comparison between MMIO and DMA.
Design Guidelines for High Performance RDMA Systems
Since others have already answered the question, I'll just add a little bit of history.
Back in the old days, on x86 (PC) hardware, there was only I/O space and memory space. These were two different address spaces, accessed with different bus protocol and different CPU instructions, but able to talk over the same plug-in card slot.
Most devices used I/O space for both the control interface and the bulk data-transfer interface. The simple way to access data was to execute lots of CPU instructions to transfer data one word at a time from an I/O address to a memory address (sometimes known as "bit-banging.")
In order to move data from devices to host memory autonomously, there was no support in the ISA bus protocol for devices to initiate transfers. A compromise solution was invented: the DMA controller. This was a piece of hardware that sat up by the CPU and initiated transfers to move data from a device's I/O address to memory, or vice versa. Because the I/O address is the same, the DMA controller is doing the exact same operations as a CPU would, but a little more efficiently and allowing some freedom to keep running in the background (though possibly not for long as it can't talk to memory).
Fast-forward to the days of PCI, and the bus protocols got a lot smarter: any device can initiate a transfer. So it's possible for, say, a RAID controller card to move any data it likes to or from the host at any time it likes. This is called "bus master" mode, but for no particular reason people continue to refer to this mode as "DMA" even though the old DMA controller is long gone. Unlike old DMA transfers, there is frequently no corresponding I/O address at all, and the bus master mode is frequently the only interface present on the device, with no CPU "bit-banging" mode at all.
Memory-mapped IO means that the device registers are mapped into the machine's memory space - when those memory regions are read or written by the CPU, it's reading from or writing to the device, rather than real memory. To transfer data from the device to an actual memory buffer, the CPU has to read the data from the memory-mapped device registers and write it to the buffer (and the converse for transferring data to the device).
With a DMA transfer, the device is able to directly transfer data to or from a real memory buffer itself. The CPU tells the device the location of the buffer, and then can perform other work while the device is directly accessing memory.
Direct Memory Access (DMA) is a technique to transfer the data from I/O to memory and from memory to I/O without the intervention of the CPU. For this purpose, a special chip, named DMA controller, is used to control all activities and synchronization of data. As result, compare to other data transfer techniques, DMA is much faster.
On the other hand, Virtual memory acts as a cache between main memory and secondary memory. Data is fetched in advance from the secondary memory (hard disk) into the main memory so that data is already available in the main memory when needed. It allows us to run more applications on the system than we have enough physical memory to support.