What role do QEMU and KVM play in virtual machine I/O? - linux

I find the boundary between QEMU and KVM very blurred. I find that someone says a virtual machine is a qemu process while others say a kvm process. What is it exactly?
And what role does QEMU and KVM plays in virtual machine I/O? For example, when a vm does PIO/MMIO, is it qemu or kvm that will trap it and turns it to hardware operation. Or both are responsible?

KVM: the code in the Linux kernel which provides a friendly interface to userspace for using the CPU virtualization. This includes functions that userspace can call for "create CPU", "run CPU", etc. For a full virtual machine, you need to have some userspace code which can use this. This is usually either QEMU, or the simpler "kvmtool"; some large cloud providers have their own custom userspace code instead.
QEMU: emulates a virtual piece of hardware, including disks, memory, CPUs, serial port, graphics, and other devices. Also provides mechanisms (a UI, and some programmable interfaces) for doing things like starting, stopping, and migrating. QEMU supports several different 'accelerator' modes for how it handles the CPU emulation:
TCG: pure emulation -- works anywhere, but very slow
KVM: uses the Linux kernel's KVM APIs to allow running guest code using host CPU hardware virtualization support
hax: similar to KVM, but using the Intel HAXM code, which will work on Windows hosts
From an implementation point of view the boundary between KVM and QEMU is very clear -- KVM is a part of the host kernel, whereas QEMU is a separate userspace program. For a user, you typically don't have to care where the boundary is, because that's an implementation detail.
To answer your question about what happens for MMIO:
the guest makes an MMIO access
this is trapped to the host kernel by the hardware
the host kernel (KVM) might then emulate this MMIO access itself, because a few devices are implemented in the kernel for performance reasons. This usually applies only to the interrupt controller and perhaps the iommu.
otherwise, KVM reports the MMIO access back to userspace (ie to QEMU, kvmtool, etc)
userspace then can handle the access, using its device emulation code
userspace then returns the result (eg the data to return for a read) to the kernel
the kernel updates the vcpu's register state as required to complete emulation of the instruction
the kernel then resumes execution of the VM at the following instruction

Related

How to emulate a pcie device which has a cpu?

Now, some pcie device has a cpu, ex:DPU.
I want to use qemu to emulate this device.
Can qemu support this requirment?
QEMU's emulation framework doesn't support having devices which have fully programmable CPUs which can execute arbitrary guest-provided code in the same way as the main system emulated CPUs. (The main blocker is that all the CPUs in the system have to be the same architecture, eg all x86 or all Arm.)
For devices that have a CPU on them as part of their implementation but where that CPU is generally running fixed firmware that exposes a more limited interface to guest code, a QEMU device model can provide direct emulation of that limited interface, which is typically more efficient anyway.
In theory you could write a device that did a purely interpreted emulation of an onboard CPU, using QEMU facilities like timers and bottom-half callbacks to interpret a small chunk of instructions every so often. I don't know of any examples of anybody having written a device like that, though. It would be quite a lot of work and the speed of the resulting emulation would not be very fast.
This can be done by hoisting two instances of QEMU, one built with the host system architecture and the other with the secondary architecture, and connecting them through some interface.
This has been done by Xilinx by starting two separate QEMU processes with some inter-process communication between them, and by Neuroblade by building QEMU in nios2 architecture as a shared library and then loading it from a QEMU process that models the host architecture (in this case the interface can simply be modelled by direct function calls).
Related:
How can I use QEMU to simulate mixed platforms?
https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg01969.html

What is an “irqchip”?

In reference to the QEMU x86_64 machine option kernel_irqchip=on|off, the description reads:
Controls in-kernel irqchip support for the chosen accelerator when available
What is an "irqchip"?
An "irqchip" is KVM's name for what is more usually called an "interrupt controller". This is a piece of hardware which takes lots of interrupt signals (from devices like a USB controller, disk controller, PCI cards, serial port, etc) and presents them to the CPU in a way that lets the CPU control which interrupts are enabled, be notified when a new interrupt arrives, dismiss a handled interrupt, and so on.
An emulated system (VM) needs an emulated interrupt controller, in the same way that real hardware has a real hardware interrupt controller. In a KVM VM, it is possible to have this emulated device be in userspace (ie QEMU) like all the other emulated devices. But because the interrupt controller is so closely involved with the handling of simulated interrupts, having to go back and forth between the kernel and userspace frequently as the guest manipulates the interrupt controller is bad for performance. So KVM provides an emulation of an interrupt controller in the kernel (the "in-kernel irqchip") which QEMU can use instead of providing its own version in userspace. (On at least some architectures the in-kernel irqchip is also able to use hardware assists for virtualization of interrupt handling which the userspace version cannot, which further improves VM performance.)
The default QEMU setting is to use the in-kernel irqchip, and this gives the best performance. So you don't need to do anything with this command line option unless you know you have a specific reason why the in-kernel irqchip will not work for you.

uclinux and necessity for device drivers

Normally MMU-less systems don't have MPU (memory protection unit) as well, there's also no distinction between the user & kernel modes. In such a case, assuming we have a MMU-less system with some piece of hardware which is mapped in CPU address space, does it really make sense to have device drivers in the kernel, if all the hardware resources can be accessed from the userspace?
Does a kernel code have more control over memory, then the usercode?
Yes, on platforms without MMUs that host ucLinux it makes sense to do everything as if you had a normal embedded Linux environment. It is a cleaner design to have user applications and services go through their normal interfaces (syscalls, etc.) and have the OS route those kernel requests through to device drivers, file systems, network stack, etc.
Although the kernel does not have more control over the hardware in these circumstances, the actual hardware should only be touched by system software running in the kernel. Not limiting access to the hardware would make debugging things like system resets and memory corruption virtually impossible. This practice also makes your design more portable.
Exceptions may be for user mode debugging binaries that are only used in-house for platform bring-up and diagnostics.

How does framebuffer get updated in qemu running kvm/arm

I am running qemu-system-arm with kvm following this tutorial :
http://www.virtualopensystems.org/media/chromebook/chromebook.pdf
After reading some documentations (mostly blogs, lkml, and mailing list archives) and skim through codes (qemu, kvm, kernel), also, utilizing kernel trace function to trace kvm events, I can ascertain that vcpu thread runs guest code until hyp-trap occurs, and this is, mostly because of guest accessing mmio address for:
1. gpio access (I can't figure out what)
2. virtio-net, accessing ethernet
3. serial access
the execution then goes back to kvm code to request userspace device emulation to qemu, and back again to guest.
My question is, for framebuffer, when does qemu have time to update the display (sdl)? if I am not mistaken, framebuffer memory range is within RAM itself, thus it is not MMIO. So, it does not interrupt cpu, nor it generates hyp-trap.

How do emulator/virtual computer work?

How does an emulator work like the android one. And a virtual pc live VirtualBox?
In one form the software reads the binary in the same way that hardware on the real system would, it fetches instructions decodes them and executes them using variables in a program instead of registers in hardware. Memory and other I/O is similarly emulated/simulated. To be interesting beyond just an instruction set simulator it needs to also simulate hardware, so it may have software that pretends to behave for example like a VGA video card the software run on the emulator ideally cannot tell the Memory/I/O is from simulated hardware, ideally you do enough to fool the software being simulated. Also though you try to honor what those register writes and reads mean by making calls to the operating system you are running on and/or hardware directly (assuming your program of course thinks it is talking to hardware and not an emulator).
The next level up would be a virtual machine. For the case I am describing it is a matching instruction set, so you are say wanting to virtualize an x86 program on an x86 host machine. The long and short of it is the host processor/machine has hardware features that allow you to run the actual instructions of the program being virtualized. so long as the instructions are simple register based or stack or other local memory, once the program ventures out of its memory space the virtualization hardware will interrupt the operating system, the virtual machine software like vmware or virtualbox then examines the Memory or I/O request from the software being virtualized and then determines if that was a video card request or usb device or nic or whatever, and then it emulates the device in question much in the same way that a pure non-virtualized setup would. A virtual machine can often outrun a purely emulated machine because it allows a percentage of the software to run at the full speed of the processor. the downside is you have to have a virtual machine that matches the software being run. An emulator can be far more accurate and portable than a virtual machine at the cost of performance.
The next level up would be something like wine or cygwin where not only are you trying to do something like a virtual machine and run native instructions and trap memory requests but you are going beyond that and trying to trap operating system calls so that you can run a program compiled for one operating system on another operating system, but much faster than a virtual machine. Instead of traping the hardware level register or memory access to a video card, you trap the operating system call for a bitblt or fill or line draw or string draw with a specific font, etc. Then you can translate that operating system request with calls to the native operating system.
At their simplest emulators or virtual computers provide an abstraction layer built on top of the host system (the actual physical system running the emulator) that implements the emulated system's functionality to the code that is to be run.
Emulators and virtual machines simulate hardware like a PC or an android phone. A virtual machine (or virtual pc) looks at an operating system's machine code instructions and runs them on top of your current (host) operating system in a virtual computer.
http://en.wikipedia.org/wiki/Virtual_machine
and
http://en.wikipedia.org/wiki/Emulator
Depending on the type or virtualization, a virtual machine is not always an emulator.

Resources