How to add VFIO-IOMMU in KVM virtual machine (Aarch64)? - linux

I am using aarch64 Linux to test VFIO-IOMMU feature in KVM VM.
The host is cortex-A78 running Linux-5.10.104 (with VFIO_IOMMU enabled). The guest OS is Ubuntu-22.04 (Linux-5.15, also with VFIO_IOMMU enabled).
The VM is created with virt-manager with virtio devices, like NIC, SCSI, etc.
But I did not find the way to add VFIO-IOMMU device to the VM in internet.
I tried by adding following lines into the vm.xml,
<iommu model='smmuv3'/>
But after guest OS boot, I found following logs about iommu but nothing about SMMUv3.
t#t:~$ dmesg | grep -i mmu
[ 0.320696] iommu: Default domain type: Translated
[ 0.321218] iommu: DMA domain TLB invalidation policy: strict mode
So how can VFIO-IOMMU be supported/added to the VM in this case?
The qemu-system-aarch64 is 4.2.1, I am not sure if it could support smmuv4 for ARMv8

I confirmed that QEMU-6.2.0 supports SMMUv3. The guest OS log shows something as follows,
[ 0.578157] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[ 0.578841] arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008305)
[ 0.580289] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[ 0.581060] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 128 entries for evtq

Related

How to find the root device name of a debootstrap image

I'm creating a ubuntu 20.04 QEMU image with debootstrap (debootstrap --arch amd64 focal .). However, when I tried to boot it with a compiled Linux kernel, it failed to boot:
[ 0.678611] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.681639] Call Trace:
...
[ 0.685135] ret_from_fork+0x35/0x40
[ 0.685712] Kernel Offset: disabled
[ 0.686182] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
I'm using the following command:
sudo qemu-system-x86_64 \
-enable-kvm -cpu host -smp 2 -m 4096 -no-reboot -nographic \
-drive id=root,media=disk,file=ubuntu2004.img \
-net nic,macaddr=00:da:bc:de:00:13 -net tap,ifname=tap0,script=no \
-kernel kernel/arch/x86/boot/bzImage \
-append "root=/dev/sda1 console=ttyS0"
So I'm guessing the error comes from the wrong root device name (/dev/sda1 in my case). Is there any way to find the correct root device name?
Update from #Peter Maydell's comment:
[ 0.302200] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[ 0.302413] Please append a correct "root=" boot option; here are the available partitions:
[ 0.302824] fe00 4194304 vda
[ 0.302856] driver: virtio_blk
[ 0.303152] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
Where vda should be the root device name.
This kind of "unable to mount root fs on unknown-block" error has several possible causes:
You asked the kernel to use the wrong device as the rootfs (or you didn't specify, and the built-in default is the wrong one)
You asked the kernel to use the right device as the rootfs, but the kernel doesn't have a device driver for it compiled in. (This might include more complicated cases like "the device is a PCI device and the kernel doesn't have the PCI controller driver compiled in.")
You asked the kernel to use the right device as the rootfs, and the kernel does have a driver for it, but it couldn't find the hardware, perhaps because the QEMU command line is incorrect
An important clue in figuring out which is the problem is to look at the part of the kernel log just before the "Kernel panic" part of the log. The kernel should print a list of "available partitions", which are the devices that it has a driver for and which are present. If that list contains a plausible looking device name, as in your case (where "vda" is listed as provided by the "virtio_blk" driver) then you now know what the root device name should be, and all you need to do is fix the kernel command line, eg "root=vda". Note that this list is a list of available partitions, so if your disk image has multiple partitions they should show up in the list as "vda1", "vda2", etc. (In this case it looks like your image is a single filesystem, not a disk image with multiple partitions, so only "vda" is in the list.)
If the kernel's list of available partitions doesn't include anything that looks like the disk you were expecting, then either the kernel is missing the driver, or the QEMU command line doesn't have the option to provide the device. This is a little harder to debug, but there may be useful information earlier in the kernel bootup log where the kernel probes for hardware -- for instance there should be logging when the PCI controller is probed. You can also of course double-check the config file for your kernel to see if the right CONFIG options are set.
If you're using a standard distro kernel then these usually have all the usual devices built-in, and your first check should be your QEMU command line. If you built your own kernel from source, check your config, especially if you were trying to achieve a "minimal" kernel with only the desired drivers present.

Unable to load bnxt_en driver intermittently on linux os backed by hypervisor

I have a VM backed by vCenter.
vCenter ESXi have physical adapter "Broadcom BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller" and SR-IOV enabled on this.
VM is connected to 1mgmt network (vmxnet3) and 2 SR-IOV adapters (SRIOVPassthrough).
Upon booting of the VM, only 2 networks shown up. (1mgmt and 1SR-IOV).
Journalctl -k logs showed following error.
[ 4832.408471] bnxt_en 0000:13:00.0 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0
[ 4832.408930] bnxt_en: probe of 0000:13:00.0 failed with error -1
Reboot of machine did not help at all.
For the successful one adapter
bnxt_en 0000:03:00.0 eth1: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit
bnxt_en 0000:03:00.0 eth1: FEC autoneg off encodings: None
I did rescan of the pci devices and did multiple times reboot without any success.
Any pointers would be really helpful
We've got a similar issue and were able to fix it.
In our case we had the same error message on Debian 10, 11 and Oracle Linux 8 but we installed it directly on hardware without an hypervisor.
But it could be the same issue cause you're using passthrough.
There are two ways to fix it:
Usage of UEFI Boot
Disable PXE Boot and keep Bios / Legacy Boot
Both options fixed it.
Disabling PXE didn't work for us, but we can get the ports back online, by running
echo 0000:af:00.0 > /sys/bus/pci/drivers/bnxt_en/bind
Where 0000:af:00.0 is the PCI number for the port, which can be gotten from dmesg | grep bnxt_en and looking for the port or ports that failed.

how to enable PMU in KVM guest

I am running KVM/QEMU in my Lenovo X1 laptop.
The guest OS is Ubuntu 15.04 x86_64.
Now, I want to run 'perf' command in guest OS, but I found followings in guest OS in dmesg.
...
[ 0.055442] smpboot: CPU0: Intel Xeon E3-12xx v2 (Ivy Bridge) (fam: 06, model: 3a, stepping: 09)
[ 0.056000] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
[ 0.057602] x86: Booting SMP configuration:
[ 0.058686] .... node #0, CPUs: #1
[ 0.008000] kvm-clock: cpu 1, msr 0:1ffd6041, secondary cpu clock
...
So, the perf command could NOT work hardware PMU event in guest OS.
How could I enable hardware PMU from my host to the Ubuntu guest?
Thanks,
-Tao
Page https://github.com/mozilla/rr/wiki/Building-And-Installing gives some hints how to enable guest PMU:
Qemu: On QEMU command line use
-cpu host
Libvirt/KVM: Specify CPU passthrough in domain XML definition:
<cpu mode='host-passthrough'/>
Same advice in https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-monitoring_tools-vpmu
I edit <cpu mode='host-passthrough'/> line into /etc/libvirt/qemu/my_vm_name.xml file instead of <cpu>...</cpu> block.
(In virt-manager use "host-passthrough" as CPU "Model:" field - http://blog.wikichoon.com/2016/01/using-cpu-host-passthrough-with-virt.html)
Now PMU works, tested with perf stat echo inside the VM, there are "arch_perfmon" in /proc/cpuinfo and PMUs are enabled in dmesg|grep PMU.
-cpu host option of Qemu was used according to /var/log/libvirt/qemu/vm_name.log:
/usr/bin/kvm-spice ... -machine ...,accel=kvm,... -cpu host ...

Error running qemu-system-riscv using root.bin and vmlinux

I am following riscv.org guides for toolchain building. When emulate using qemu running local built rootfilesystem (with busybox) and Linux Kernel, encounter the error below:
Running Qemu using local-built root.bin and kernel image
danny#danny:~/test/riscv/work$ qemu-system-riscv -hda root-local.bin -kernel vmlinux-local -nographic
unassigned address was called?
with addr: 102000735F80006E
not implemented for riscv
Running Qemu using riscv.org stocked root.bin and kernel image
danny#danny:~/test/riscv/work$ qemu-system-riscv -hda root.bin -kernel vmlinux -nographic
[ 0.150000] io scheduler cfq registered (default)
[ 0.160000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.160000] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 0.160000] TCP: cubic registered
[ 0.160000] htifbd: detected disk with ID 1
[ 0.160000] htifbd: adding htifbd0
[ 0.160000] VFS: Mounted root (ext2 filesystem) readonly on device 254:0.
[ 0.160000] devtmpfs: mounted
[ 0.160000] Freeing unused kernel memory: 64K (ffffffff80002000 - ffffffff80012000)
[ 0.200000] EXT2-fs (htifbd0): warning: mounting unchecked fs, running e2fsck is recommended
#uname -a
Linux ucbvax 3.14.15-g4073e84-dirty #4 Sun Jan 11 07:17:06 PST 2015 riscv GNU/Linux
If qemu testing using the downloaded root.bin and vmlinux from riscv.org, seem ok but cant see the busybox starting message and the terminal cant Halt :
Have tested qemu using various combination and result as below:
**root.bin vmlinux RESULT**
local-built local-built Unassigned address was called ....
Downloaded Downloaded Seem OK but without busybox starting bar
local-built Downloaded Kernelpanic-not syncing:No working init found
Downloaded local-built Unassigned address was called ....
We are starting a project to build and fabricate a RISCV silicon chip for Makers around the world and testing the toolchain now in order to port Ubuntu Core & Android to RISCV. Any idea what might probably went wrong ?
Thanks.
QEMU hasn't been fully updated to support the new RISC-V privileged spec (github issue). The update is currently underway.
For an ISA simulator, spike is a good alternative. It may not have all of the platform features of QEMU, but it could serve as a starting point while the QEMU update completes.

Is there an OS command I can run to determine if running inside a Xen based virtual machine

Is there an OS command I can run from within a Xen based virtual machine to tell me that it is a virtual box rather than a physical box - I heard that the kernel had some self awareness smarts about it. e.g. like an extra column in "ps" output or something? [I know vmstat provides the "st" column but I have seen this on physical host boxes running Linux Kernel 2.6.11 and greater as well].
Many Thanks,
Paul
Try file /sys/hypervisor/uuid.
It does not exist -> Not related to XEN.
It does exist, and is full of 0-s -> It is a XEN Dom0
It does exist, and has a not-0 values -> It is a DomU
This requires of course, that /sys is mounted and populated...
Dmesg may give some hints from the kernel message buffer, here is output on a virtualized Ubuntu instance from Slicehost:
bvm#qdbp:~$ sudo dmesg | grep Xen
[ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
[ 0.000000] Xen: 0000000000100000 - 0000000010000000 (usable)
[ 0.000000] Booting paravirtualized kernel on Xen
[ 0.000000] Xen version: 3.1.2-rc1
[ 0.000000] Xen: using vcpu_info placement
[ 0.000000] Xen: using vcpuop timer interface
[ 0.000000] installing Xen timer for CPU 0
[ 0.021223] installing Xen timer for CPU 1
[ 0.046157] installing Xen timer for CPU 2
[ 0.046157] installing Xen timer for CPU 3
[ 0.265880] Initialising Xen virtual ethernet driver.

Resources