Peer-to-peer communications between PCIe devices? - p2p

In order to enable p2p communication between NVMe SSDs and other PCIe devices, I wonder if I need to make some modifications to the Linux kernel and the NVMe driver, and something else? And what modifications should I make? I have searched for some days, but there is little information about this, I need some references.
Thanks a lot!

There's an NVMe spec. feature called "Controller Memory Buffer Write Data". In turn CMB Write Data supports P2P transactions between an NVMe device and something else on the PCIe bus. While most of the CMB spec has been implemented in the latest upstream kernel, CMB Write Data has not. Problem with CMB is it offers very little memory to play with for these P2P transactions. This is one of the reasons no one has implemented the feature upstream yet. You can however play with it via the user-space NVMe driver/framework, SPDK.
Also note that there's a new NVMe spec. feature called Persistent Memory Region. This has far more memory to play with and I suspect someone will contribute it upstream in the not too distant future.
https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_SOFT-201-1_Bates.pdf is useful background material.

Related

Implement SDIO to interface SPI device

People,
I have always seen references about how to use a SPI interface to operate a SD memory card.
This is not what I want. I need to do exactly the opposite.
I want to be able to use the SDIO controller (through SD slot) in my "host" (any PC having a SD-card interface) to talk to my devices (basically microcontrollers) that can only "speak" SPI.
If my understanding is not too wrong, I cannot simply tell my SD controller to talk in a raw SPI mode but I can teach my microcontrollers to behave as a SDIO device that can be controlled by my host.
This way I still have two challenges left:
Correctly implement a generic SDIO device in my microcontroller.
Implement/configure the correct drivers in the host to be able to interact with my devices.
Implement the SDIO device seems to be a matter of following the spec.
The host-side driver, though, is something I hope I can accomplish with a user-space driver in Linux using some already existing kernel-space driver to SDIO.
That's the point that I come to ask for help.
Can anyone please point me any samples, documents or any kind of resources that can help me in my task?
On the PC side, this is all you need: http://sourceforge.net/projects/sdio-linux/
This may be useful for reference: http://www.varsanofiev.com/inside/WritingLinuxSDIODrivers.htm (although, I don't think you would be writing a driver)
On the microcontroller side, use "bit-banging" to implement the SDIO spec.
However, first consider why do this. SDIO and SPI are just serial protocols, so is USB; wouldn't you rather make an SPI-to-USB bridge? USB is much more user-friendly on the host side, as well as being more standard/more common. And if you do want a SPI-to-USB bridge, turns out it already exists, the SPI Shortcut (probably other options, this is just the first one that comes to mind)
EDIT Or, you could bit-bang I2C on the micro, if the host supports I2C (many do). Actually, go through every serial protocol the host supports, and see if you can support it easily from the micro side (by bit banging, since the micro is likely to not have a slave mode for that protocol built-in). RS232 (with level shifter), I2C, and SPI are likely to be the preferred choices. SDIO is pretty much the last choice, I think.
SDIO is very tightly specified. Unless your microcontroller has an SDIO block that is designed to act as a device rather than host, I don't think this will be possible. I know of a few special purpose communications controllers that implement an SDIO device, but I haven't come across any general purpose microcontrollers.
You would need a fairly fast microcontroller to be able to bit-bang SDIO initialization at up to 400 kHz. If running an STM32F4 at 180 MHz, this gives you only microcontroller cycles between SDIO clock cycles. If the host turns up the clock speed to the maximum of 25 MHz after initialization, then you're down to 7 cycles between SDIO clocks.
For perspective on the SDIO spec, the one you linked is a simplified spec that doesn't cover the signalling and timing of the bus. The full spec is many times larger.
As Alex I mentioned, there may be better alternatives for what you need. If your SDIO host supports SPI mode, most microcontrollers do have SPI peripherals that can act as slaves rather than hosts, so this may be an avenue without a peripheral. If your data rates are low enough, a simple UART may suffice (you can reasonably hit 1 Mbit over short distances).

Linux user space PCI driver

I'm trying to write a PCI device driver that runs in user space. Not my idea, what the client wants. Target is an embedded Linux board that will never have more than a single user. I'm an experienced C programmer and know Linux, just not familiar with Linux driver development.
Is this really a device driver or just a library? Do I need to use the typical calls pci_register_driver, etc. or can I just access the device using fopen, and using mmap and ioperm to get to it?
Interrupts will be done using the MSI model. Also need to handle DMA transfers. The device will be streaming lots of data to the user.
There's not much info out there on this subject, LDD3 only devotes a couple of pages to it, and there's nothing else that I could find here on SO.
Thanks in advance!
If there is no driver handling the PCI card it would be possible to access it using ioperm (or iopl - depending on the address) if only port accesses are required.
Using DMA and interrupts is definitely impossible without a kernel-mode driver.
By googleing I found some text about something like a "generic kernel-mode driver" that allows writing user-mode drivers (including DMA and interrupts).
You should ask your customer which kind of kernel-mode drivers for accessing PCI cards is installed on the Linux board.
There is now a proper way to do high performance userspace PCI drivers, called vfio. There is not much documentation, but see the kernel docs http://lxr.free-electrons.com/source/Documentation/vfio.txt and the header file /usr/include/linux.vfio.h. It is available since Linux 3.6.

Is it possible to write to multiple devices that use different PCIe lanes on the same PCIe slot?

I am writing a Linux device driver which supports multiple devices. I have a x8 PCIe card with 4 of these devices on it. Each runs through a PCIe switch and gets 2 PCIe lanes. Is there a way to have the driver write to multiple lanes at the same time? If so, how would I do this? I would think it should be possible since it is all on one PCIe slot, but I have no idea how this would be done from the driver.
PCIe doesn't work quite the way you think it does. The switch is not partitioning up the upstream x8 link into multiple x2 links -- it simply forwards traffic from one link to another. So what you will see is a x8 link to the switch, and then 4 x2 links from the switch to the downstream devices. However with a different switch and different downstream devices, it would be equally possible to (for example) have x8 links everywhere, ie a x8 link from the root port to the switch and x8 links from the switch to the downstream devices.
However, in your case you have a matched amount of bandwidth on both sides of the switch, so there should be no issue with devices competing for a limited amount of bandwidth. Your driver can talk to all the devices simultaneously as efficiently as if there were independent links.
It sounds like you're looking for PCIe multicast. This has no connection to the number of lanes, but would simply be a function of delivering a single write to multiple destinations as efficiently as possible. There is a standard for this, mostly intended for backplane uses, see: http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=12f5c260ccf5e054366d4c96ee655fa6827db5b3
It looks like this is supported with a new PCI BAR type, where multiple devices would have the same mapped physical address range, and the switch would also be configured to know about this multicast range. But this all needs OS support, and I haven't found anything on the web to suggest that Linux has the pieces necessary to configure the devices to do all this.
Since your parent link has enough bandwidth to saturate all four child links, you don't have a throughput problem. The only thing you'd save with multicast is bandwidth from the memory subsystem. If you have a modern architecture, the amount you'd save would be in the noise.
In other words, don't worry about it. Treat your devices as independent (this will make for a cleaner driver, anyway) and get on with your project.

Which h/w knoledge needed to have to become a Device driver programmer?

I am very interested in device driver programming .I have started reading the LDD3 ,there author said
"to become a device driver programmer you have to understand your specific device well"
can any one tell me what is the meaning of the "understand your specific device" .what are the thing I should know before writing a device driver.
Thanks
That's a basic list with software and hardware combined.
The Operating System driver API
The processor architecture as it relates to hardware interfacing
The 'bus' structure that interfaces the device hardware to the processor
Interrupt Handling
Dma Control
Processor Caching
Processor MMU control
OS Semaphores and scheduling
Data/Byte Alignment
Assembly language when needed
Control of Instruction Execution Order and Optimization
Consideration of performance issues
What's IO and memory mapped hardware ?
http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/io.html
This link talks about generic hardware access in Linux device drivers.
http://www.linuxforu.com/2011/06/generic-hardware-access-in-linux/
This is specifically about USB hardware
http://www.beyondlogic.org/usbnutshell/usb2.shtml
Check lwn.net it never disappoints a device driver developer.
https://lwn.net/Archives/
Last, but not least, They have everythig, CPU, Memory, Camera, PCI..
http://www.hardwaresecrets.com/page/memory
-
Hi, I'm very pleased to share what I have learnt with you guys.
Yes, it is the basic need to know your device if you wanna be a device driver programmer. I also want to be a linux device driver programmer, or even more, though I have some device driver experience under other software platforms.
The reason why you wanna contact with it is that you wanna make it do something for you.
Normally, what it can do is the first thing you must know. It's very obvious that you will never send Ethernet frames through UART or SPI, right?
There exist various kinds of devices in the world, such as, storage device, FLASH, SD card, harddisk; communicating devices, network card, wifi; interconnected bus, PCI-express; how many there are.
After that, the next thing you will concern is how you can do to reach your goal.
To access the device, reading, or writing, there is usually a controller embedded in the processor. Here, when I say "processor", it means it is a core integrated with various kinds of controllers, no matter pc desktop or embeded system areas.
The Controller is the interface you will face, to work on the device behind the controller.
Through the controller, you can ask the device to do what you want. In the controller, there are registers, which are the deepest points the software can touch. Beyond that lies the hardware, as you are a device driver programmer, it is very common for you to communicate with hardware engineers to make things done.
If going into details about the register, there are control registers used to tell device what you want it to do, status registers, used to reflect the status of operations underway in devices, if interrupt is supported by that device, there are also some registers for you to deal with interrupts.
Well, I almost forget there are also data registers, whicha are used to store the data to be sent or written, or to be read by user. According to the specific implementation, registers used to store data from upper users to be sent or written and registes used to hold data from outside that will be read by users may be the same, or may be not.
In a normal way, if you wanna let someone do something for you, you should provide something to him first. Whoever wants to do anything, there must be some inputs to him, right?
in a summary,
action(read,write,or others) + data(you give, or you ask for) + status(what progress it is)
what it can do
how it does that, how to assemble the command cell, time sequence?
what you must provide it to reach your goal
genarally, two kinds of things you should provide:
if you ask for, where to store what you ask for;
if you give, where are what you give
how it reflects the progress of operations, polling or interrupts
Well, that is all I want to share with you.
Thanks.

Best way to transfer video data to a device over PCI in linux

I need to transfer video data to and from an FPGA device over PCI in a linux environment. I'm using a third party PCI master core on the FPGA. So far, I've implemented a simple DMA controller on the FPGA to transfer data from the FPGA to the CPU, using consecutive PCI write bursts.
Next, I need to transfer video data from the CPU to the FPGA. What is the best way to go about this?
Should I implement a module on the FPGA which performs a whole bunch of burst reads over PCI. Or is there a way to get the CPU to efficiently write data into the FPGA's memory using PCI write bursts?
My bandwidth requirements are around 30 MB/s in both directions.
Thanks.
You could do posted writes from CPU like what video card drivers do but you'll need to have some driver magic such as setting MTRR (which means you might have some architectural dependency). If you want to be safe DMA read from FPGA is a better way to go. 30MB/s isn't much.
Sounds to me the FPGA should master both reads and writes. Otherwise you would hog the host CPU. That's a classic task for a DMA (and you cannot guarantee a DMA exists on every host).

Resources