What is an “irqchip”? - linux

In reference to the QEMU x86_64 machine option kernel_irqchip=on|off, the description reads:
Controls in-kernel irqchip support for the chosen accelerator when available
What is an "irqchip"?

An "irqchip" is KVM's name for what is more usually called an "interrupt controller". This is a piece of hardware which takes lots of interrupt signals (from devices like a USB controller, disk controller, PCI cards, serial port, etc) and presents them to the CPU in a way that lets the CPU control which interrupts are enabled, be notified when a new interrupt arrives, dismiss a handled interrupt, and so on.
An emulated system (VM) needs an emulated interrupt controller, in the same way that real hardware has a real hardware interrupt controller. In a KVM VM, it is possible to have this emulated device be in userspace (ie QEMU) like all the other emulated devices. But because the interrupt controller is so closely involved with the handling of simulated interrupts, having to go back and forth between the kernel and userspace frequently as the guest manipulates the interrupt controller is bad for performance. So KVM provides an emulation of an interrupt controller in the kernel (the "in-kernel irqchip") which QEMU can use instead of providing its own version in userspace. (On at least some architectures the in-kernel irqchip is also able to use hardware assists for virtualization of interrupt handling which the userspace version cannot, which further improves VM performance.)
The default QEMU setting is to use the in-kernel irqchip, and this gives the best performance. So you don't need to do anything with this command line option unless you know you have a specific reason why the in-kernel irqchip will not work for you.

Related

How to emulate a pcie device which has a cpu?

Now, some pcie device has a cpu, ex:DPU.
I want to use qemu to emulate this device.
Can qemu support this requirment?
QEMU's emulation framework doesn't support having devices which have fully programmable CPUs which can execute arbitrary guest-provided code in the same way as the main system emulated CPUs. (The main blocker is that all the CPUs in the system have to be the same architecture, eg all x86 or all Arm.)
For devices that have a CPU on them as part of their implementation but where that CPU is generally running fixed firmware that exposes a more limited interface to guest code, a QEMU device model can provide direct emulation of that limited interface, which is typically more efficient anyway.
In theory you could write a device that did a purely interpreted emulation of an onboard CPU, using QEMU facilities like timers and bottom-half callbacks to interpret a small chunk of instructions every so often. I don't know of any examples of anybody having written a device like that, though. It would be quite a lot of work and the speed of the resulting emulation would not be very fast.
This can be done by hoisting two instances of QEMU, one built with the host system architecture and the other with the secondary architecture, and connecting them through some interface.
This has been done by Xilinx by starting two separate QEMU processes with some inter-process communication between them, and by Neuroblade by building QEMU in nios2 architecture as a shared library and then loading it from a QEMU process that models the host architecture (in this case the interface can simply be modelled by direct function calls).
Related:
How can I use QEMU to simulate mixed platforms?
https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg01969.html

What role do QEMU and KVM play in virtual machine I/O?

I find the boundary between QEMU and KVM very blurred. I find that someone says a virtual machine is a qemu process while others say a kvm process. What is it exactly?
And what role does QEMU and KVM plays in virtual machine I/O? For example, when a vm does PIO/MMIO, is it qemu or kvm that will trap it and turns it to hardware operation. Or both are responsible?
KVM: the code in the Linux kernel which provides a friendly interface to userspace for using the CPU virtualization. This includes functions that userspace can call for "create CPU", "run CPU", etc. For a full virtual machine, you need to have some userspace code which can use this. This is usually either QEMU, or the simpler "kvmtool"; some large cloud providers have their own custom userspace code instead.
QEMU: emulates a virtual piece of hardware, including disks, memory, CPUs, serial port, graphics, and other devices. Also provides mechanisms (a UI, and some programmable interfaces) for doing things like starting, stopping, and migrating. QEMU supports several different 'accelerator' modes for how it handles the CPU emulation:
TCG: pure emulation -- works anywhere, but very slow
KVM: uses the Linux kernel's KVM APIs to allow running guest code using host CPU hardware virtualization support
hax: similar to KVM, but using the Intel HAXM code, which will work on Windows hosts
From an implementation point of view the boundary between KVM and QEMU is very clear -- KVM is a part of the host kernel, whereas QEMU is a separate userspace program. For a user, you typically don't have to care where the boundary is, because that's an implementation detail.
To answer your question about what happens for MMIO:
the guest makes an MMIO access
this is trapped to the host kernel by the hardware
the host kernel (KVM) might then emulate this MMIO access itself, because a few devices are implemented in the kernel for performance reasons. This usually applies only to the interrupt controller and perhaps the iommu.
otherwise, KVM reports the MMIO access back to userspace (ie to QEMU, kvmtool, etc)
userspace then can handle the access, using its device emulation code
userspace then returns the result (eg the data to return for a read) to the kernel
the kernel updates the vcpu's register state as required to complete emulation of the instruction
the kernel then resumes execution of the VM at the following instruction

uclinux and necessity for device drivers

Normally MMU-less systems don't have MPU (memory protection unit) as well, there's also no distinction between the user & kernel modes. In such a case, assuming we have a MMU-less system with some piece of hardware which is mapped in CPU address space, does it really make sense to have device drivers in the kernel, if all the hardware resources can be accessed from the userspace?
Does a kernel code have more control over memory, then the usercode?
Yes, on platforms without MMUs that host ucLinux it makes sense to do everything as if you had a normal embedded Linux environment. It is a cleaner design to have user applications and services go through their normal interfaces (syscalls, etc.) and have the OS route those kernel requests through to device drivers, file systems, network stack, etc.
Although the kernel does not have more control over the hardware in these circumstances, the actual hardware should only be touched by system software running in the kernel. Not limiting access to the hardware would make debugging things like system resets and memory corruption virtually impossible. This practice also makes your design more portable.
Exceptions may be for user mode debugging binaries that are only used in-house for platform bring-up and diagnostics.

kernel driver or user space driver?

I would like to ask your advice on the following: I need to write drivers for omap3, for accessing external dsp through fpga (through gpmc interface). The dsp is required to load file to dsp, and to read/write buffer from dsp. There is already FPGA driver in kernel. The kernel is 2.6.32. So I thought of the following options:
writing dsp driver in kernel, which uses the existing fpga driver.
writing a user space driver which interfaces with the fpga kernel driver.
writing user space driver using UIO , which will not use the kernel fpga driver, but shall do the access to fpga, as part of the user space single and complete dsp driver.
What do you think is prefered option ?
What is the advantage of kernel driver over user sace and vise versa ?
Thanks, Ran
* User-space driver:
Easier to debug.
Loads of libraries to support you.
Allows you to hide the details of your IP if you want to ( people will really hate you if you did! )
A crash won't affect the whole system.
Higher latency in handling interrupts since the kernel will have to relay the interrupt to the user space somehow.
You can't control access to your device from user space.
* Kernel-space driver:
Harder to debug.
Only linux kernel frameworks are supported.
You can always provide a binary blob to hide the details of your IP but this is annoying since it has to be generated against a specific kernel.
A crash will bring down the whole system.
Less latency in handling interrupts.
You can control access to your device from kernel space because it's a global context that all processes see.
As a kernel engineer I'm more comfortable/happy hacking code in a kernel context, that's probably why I would write the whole driver in the kernel.
However, I would say that the best thing to do is to divide the functionality of your driver into units and only put the unit in the kernel when there's a reason to do so.
For example:
If your device has a shared resource ( like an MMU, hardware FIFO ) and you want multiple processes to be able to use it safely, then probably you need some buffer manager to be in the kernel and all the processes would be communicating with it through ioctl.
If your driver needs to respond to an interrupt as fast as possible ( very low-latency ), then you will need to put the part of the code that handles interrupts in the kernel interrupt handler rather than putting it in user space and notifying user space when an interrupt occurs.

How can the linux kernel be forced to enumerate the PCI-e bus?

Linux kernel 2.6
I've got an fpga that is loaded over GPIO connected to a development board running linux.
The fpga will transmit and receive data over the pci-express bus. However, this is enumerated
at boot and as such, no link is discovered (because the fpga is not loaded at boot).
How can I force re-enumeration of the pci-e bus in linux?
Is there a simple command or will I have to make kernel changes?
I need the capability to hotplug pcie devices.
As root, try the following command:
echo "1" > /sys/bus/pci/rescan
See this link for more information: http://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci
I wonder what platform you are on: A work around (aka hack) for this that works on x86 systems is to have the BIOS basically statically configure a PCI device at whatever bus, device, function the FPGA normally lands on, then the OS will enumerate the device and reserve the PCI space for it (even though the device isn't really there). Then in your device driver you will have to do some extra things like setup the BARs and int lines manually after the fpga has been programmed. Of course this requires modifying the BIOS, which if you are working with a BIOS vendor you can contract them to make this change for you, if you are not working with a BIOS vendor then it will be much harder... Also keep in mind that I was working on VxWorks on x86, and we had a AMI make a custom BIOS for our boards...
If you don't have a BIOS, then consider programming it in the bootloader, there you already have the ability to read from disk, and adding GPIO capabilities probably isn't too difficult (assuming you are using jtag and GPIOs?), in fact depending on what bootloader you use it might already be able to do GPIO?
The issues with modifying the kernel to do this is that you have to find the sweet spot where you can read the bitfile, before the PCI enumeration... If for example the disk device drivers are initialized after PCI, then obviously you must do some radical changes to the kernel just to read the bitfile prior to PCI enumeration, which might cause other annoying problems...
One other option which you may have already discovered, and which is really only ok for development time: Power up the system, program the fpga board, then do a reset (without power cycle, for example: sudo reboot now), the FPGA should keep its configuration, and linux should enumerate it...
After turning on your computer, the BIOS enumerates the PCI bus and attempts to fulfill all IO space and memory mapped IO (MMIO) requests. It sets up these BAR's initially, and when the operating system loads these BAR's can be changed by the OS as it sees fit while the PCI bus driver enumerates the bus yet again. It is even possible for the superuser of the system to run the command setpci to change these BAR's after the BIOS has already attempted to configure them and the OS has loaded (may cause drivers to fail and several other bad things if done improperly).
I have had to do this in cases where the card in question was not assigned any resources by the BIOS since the region requested required a 64-bit address and the BIOS only operated with 32-bit address assignments. I was able to go in after-the-fact and change these addresses (originally assigned by the BIOS) to whatever addresses I saw fit, insert the kernel module, and my driver would map and use these newly-assigned addresses for the card without knowing the difference.
The problem that exists with hotplugging PCI-Express cards is that the power to the slot, itself, cannot be turned on/off without specific hotplug controllers that need to exist on the motherboard/backplane. Not having these hotplug controllers to turn the slot's power off may lead to shorts between the tiny pins when the card is physically inserted and/or removed if power is still present. Hotplug events, however, can be initiated by either end (the host or the endpoint device). This does not seem to be the case, however if your FPGA already has a link established with the root complex, a possible solution to your problem would be to generate hotplug interrupts to cause a bus rescan in the OS.
There is a major problem, though -- if your card does not actually obtain a link to the root complex, it won't be able to generate any hotplug events; which seems to be the case. After booting, the FPGA should toggle the PRESENT line on the PCIe bus to tell the OS there is a card ready to be enumerated. Once detected, the OS should attempt to establish a link to the card and assign memory regions to the device. After the OS enumerates the card you'll be able to load drivers against it and see it in lspci. You stated you're using kernel 2.6, which does have support for hotplugging and dynamic resource allocation so this method should work as long as your FPGA supports the ability to toggle the PRESENT PCIe line, too.

Resources