Is vfio-pci a userspace driver for all pci devices? - linux

I know that vfio can expose interrupt, DMA and pci I/O to userspace. I read that if someone want to take advantage of vfio for pci devices, he has to unbind the original driver and bind to vfio-pci driver. So my question is, is vfio-pci a userspace driver for all pci devices? Because in my understanding, vfio just offers some basic interfaces. Or, if I need a driver for a new pci device, should I just use vfio-pci driver or use interfaces it offers to write a new driver?

Related

What kernel flags and PCIe setting needed for bus mastering

I am having trouble getting bus-mastering DMA working on a device driver in Ubuntu with kernel 5.3.0-68-generic.
I have enabled bus mastering with pci_set_master (when using lspci -v, the PCIe device will have the bus_master flag) and I am allocating a DMA buffer with dma_alloc_coherent.
I take the dma_addr_t returned by the dma alloc and pass that to the device and then I use the kernel virtual address with a chrdev mmap driver to map the address into userspace (using remap_pfn_range) where a userspace driver can populate the DMA memory region.
It doesn't appear that the the PCIe device can see the memory updates in the DMA region, is there perhaps some dma, iommu, or pci settings I need to enable to allow the PCIe device to read back into system memory as the bus master?

Can PCI device address CPU PA directly if IOMMU (intel VT-D) is disabled

My understanding is that if a PCI device want to do DMA RW, and IOMMU is enabled, the driver should map CPU PA into a DMA address via pci_map_page(for non-coherent), then PCI device can use this DMA address and IOMMU will translate the DMA address into CPU PA.
My questions are:
Is it possible for driver to disable IOMMU for a given device?
If someone disable IOMMU via bios, does it mean any CPU PA can be directly DMA RW?
The VT-d hardware allows setting pass-through separately for each device, but Linux does not currently provide a driver API to do it.
Yes, DMA from PCI/PCIe devices uses system physical addresses when the IOMMU is disabled, either in the BIOS or by using intel_iommu=off in the Linux command line.

How existing kernel driver should be initialized as PCI memory-mapped?

Existing kernel drivers such as xilinx have specific way to be registered (as tty device), if they are mapped directly to cpu memory map as done here with device tree:
https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842249/Uartlite+Driver
But in other cases, there is a PCIe device (like FPGA which has the xilinx uart IPs) which is connected to and the cpu.
How should we make the uart get registered when using PCIe device ?
The device tree I try to register into PCIe is uartlite driver:
https://github.com/Xilinx/linux-xlnx/blob/master/drivers/tty/serial/uartlite.c
I think that what I probably need to do is:
Write a custom pci driver.
Need to prepare platform_device struct and then call the uart probe routine from pci driver:
ulite_probe(struct platform_device *pdev)
I've seen related question with others using FPGA with multiple device connected, but seems that there is no docuemnt, or tutorial which describes how to do this.
Any comment, example or document is appreciated.
So something like a ARM CPU connected to an Artix FPGA over PCIe right?
Yes, you would need a custom PCIe driver. The PCIe configuration and data spaces would have to be mapped. Have a look at pci_resource_{start, len} and pci_remap_bar functions. You can then use pci_get_device to get a pointer to the struct device and retrieve the virtual address of the PCIe configuration space. The UART driver can then use the struct device pointer and it's register map should be at some offset to the virtual address of the PCIe configuration space as per your design. You can invoke the probe call of UARTlite IP driver in your own driver.
"Existing kernel drivers such as xilinx have specific way to be registered (as tty device), if they are mapped directly to cpu memory map as done here with device tree". Note that this is true if we are only talking of tty devices. A GPIO peripheral IP won't be expose as tty but in /sys/class/gpio.

How does PCI/PCIe devices init/register themselves in the Linux kernel?

When the kernel starts up, the PCI subsystem creates a pci_bus for each physical PCI bus, then the pci_bus is added to pci_root_buses(with PCI configuration). But the PCI device driver registers drivers by pci_register_driver, and it adds PCI driver to pci_bus_type.
My questions:
How does pci_bus_type know PCI configuration.
What is the relationship between pci_bus_type and pci_root_buses.
Since the question is partially incomplete, but comments are too small to give a answer I'll try to mix this in a bit.
So the kernel tries to abstract the physical implementation of the PCI(e) bus from the driver developer. Hence a PCI bus on an NVidia Tegra is different to a PCI bus on a Freescale ARM and a x86_64 PCI bus, but it should be possible to register devices against them regardless of the real bus implementation.
The structure pci_root_buses is a list of abstract PCI buses, where the implementation could be different.
You can see this in the bus type structure, where function pointers are defined to allow each real bus to have a different implementation how to treat a device. I think it's best if you read the PCI chapter in LDD3 and have a special look at Boot Time.
Also look at Configuration Time to see how the Kernel matches drivers to hardware. The rough idea of PCI is that the kernel can discover the bus and map memory to each physical PCI device allowing access to the PCI configuration space of the device. The driver developer registers a device class by calling pci_register_driver and therefore telling the kernel which driver functions to use for certain vendor ids.
Looking at LDD3 again it seems the missing mapping that you might be looking for is the probe function:
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id);
Pointer to the probe function in the PCI driver. This function is called by the PCI core when it has a struct pci_dev that it thinks this driver wants to control. A pointer to the struct pci_device_id that the PCI core used to make this decision is also passed to this function. If the PCI driver claims the struct pci_dev that is passed to it, it should initialize the device properly and return 0. If the driver does not want to claim the device, or an error occurs, it should return a negative error value. More details about this function follow later in this chapter.
Kernel data structures
Bus type
Further reading
Linux Device Drivers 3rd edition - Chapter PCI
In Kernel Documentation about PCIe

How to access i2c device driver node

Situation 1:
I have an i2c chip driver as part of linux kernel. I can verify the i2c chip driver is in the kernel from kernel boot messages (my chip driver is mma8450)
dmesg:
mma8450 0-001c: uevent
I can also see this driver in (0x1c is i2c address of chip)
cat /sys/bus/i2c/devices/0-001c/name
mma8450
I can not see this driver node in /dev interface. My question is how can I create node of this device in /dev so that I can access this device in a user program ?
Situation 2:
I create the module of the same chip driver and does not make it a part of kernel. I can load this module using insmod mma8450, how can I create a node of this device as I don't have its major / minor numbers ? (I can not see major & minor numbers assigned to this driver in mma8450 source code)
Any help is appreciated
Regards
Load the kernel module:
modprobe i2c-dev
ls /dev/i2*
/dev/i2c-0
/dev/i2c-10
/dev/i2c-12
/dev/i2c-14
/dev/i2c-3
/dev/i2c-5
/dev/i2c-7
/dev/i2c-9
/dev/i2c-1
/dev/i2c-11
/dev/i2c-13
/dev/i2c-2
/dev/i2c-4
/dev/i2c-6
/dev/i2c-8
Find the major/minor numbers for your device:
cat /proc/devices
You should see a device for the i2c bus and one for the i2c device itself.
Create the device node for the i2c device driver:
mknod /dev/[device name] [type] [major] [minor]
This is 3-Axis Accelerometer. Linux registers it as a driver for input_polled_dev type.
You can uaccess it using /dev/i2c-x bus (controller) device node, but there is no much sense using it that way directly from userspace.
I2C clients are not meant to be used using /dev device nodes.
They should be registered to Kernel I2C framework and used through higher layers API.
There is sample program for reading similar MMA7455L x,y,z registers from userspace using /dev/i2c-X bus device node.
Reading the Accelerometer With I²C

Resources