Cannot get custom kernel to boot - mkinitpcio does not add any modules - linux

1. What I am trying to achieve:
Build a custom kernel so I can install and run Anbox-git from AUR on my Arch laptop. Custom kernel is needed for the package to work.
2. What I did to achieve it:
Download Arch Linux kernel v5.8.5-arch1 from here
I followed the guidelines on tradional compilation Arch wiki to create the custom kernel
Via make nconfig I applied the changes mentioned in the Anbox Arch wiki page.
Via make nconfig I changed EFIVAR_FS option from "M" to "*" to resolve an error from earlier attemps.
Via make nconfig under Location: -> Device Drivers-> Multiple devices driver support (RAID and LVM) (MD [=y])-> Device mapper support (BLK_DEV_DM [=y]) I added a few more options (*) because on earlier builds mkinitpcio gave errors for missing modules for DM_CRYPT, and some more DM_ modules which I cannot easily reproduce (will do if necessary for the answer, but I hope it'll be irrelevant).
After creating the config this way I did:
sudo make modules_install
sudo cp -v arch/x86_64/boot/bzImage /boot/vmlinuz-linux58ac
sudo cp /etc/mkinitcpio.d/linux.preset /etc/mkinitcpio.d/linux58ac.preset
Adapted the preset file per Arch wiki instructions
sudo mkinitcpio -p linux58ac
Important: The mkinitpcio runs fine, but keeps giving me a warning:
WARNING: No modules were added to the image. This is probably not what
you want.
sudo grub-mkconfig -o /boot/grub/grub.cfg
3. Expected result:
I am able to reboot, select the new kernel from grub menu, get the usual LVM password prompt, and launch into it without problems.
4. Result I get:
I can reboot and select new kernel from grub but when I select it I get a
Warning: /lib/modules/5.8.5-arch1/modules.devname not found, ignoring.
Starting version 246.4-1-arch
ERROR device 'dev/mapper/vg0-root' not found. Skipping fsck.
mount /new_root: special device /dev/mapper/vg0-root does not exist.
You are being dropped into an emergy shell.
I checked and the /lib/modules/5.8.5-arch1/modules.devnamedoes indeed exist. But I think the actual problem is that mkinitcpio doesn't load the correct modules into the custom kernel, causing it to become unbootable.
Any help appreciated!

Related

Is there any short A to Z description of how to debug the Linux kernel that has been tested and contains ALL necessary steps ? Esp. for Yocto?

Debugging the Linux Kernel with kgdb over rs-232 needs several preparation steps. I found awesome documentation, but no single-source that is fully self-contained and summarizes all steps needed, does not explain for ages, and has been tested. And also covers Yocto.
Is there any source that covers all that is needed in one single and short description ?
I.e.:
What files are needed in the directory GDB is started from (e.g. kernel awareness, source, vmlinux) and how to get theese, where to put it ?
When and where to get a cross-gdb from ?
ALL kernel config options needed, also the not-obvious ones (like CONFIG_RANDOMIZE_BASE)
How to configure the serial ports
Explaining a working back and forth of breaking into debugee and debugger to get started.
Explaining one rock-solid option of stopping the kernel that runs everywhere.
Explaining how to get this done not only for PC-PC debugging, but also for Yocto targets.
Debugging the Linux Kernel via a Nullmodem-Cable:
It took me a while to get a kgdb connection with Linux kernel awareness fully running. I share my way of doing this with Ubuntu Eoan (optional: Yocto Warrior) in 2020 here:
Tested with:
Debugging a linux based Intel PC from an Intel MacBook running MacOS Catalina. Using the gdb from the Homebrew package "i386-elf-gdb“. (wituout „-tui“ option in GDB)
Debugging a linux based ARM target (i.mx6, Yocto) from a linux based Intel PC.
Prerequisites:
You need two computers and a serial nullmodem cable. Check the cable by firiing up a serial termianl (e.g. screen or putty) on both hosts, connecting to your serial port (e.g. /dev/ttyS0 or /dev/ttyUSB0) and print characters from each station to the other. Remember the /dev/tty ports you confirmed.
Preparation:
You need on the first debuggee computer, we call it „target":
Special kernel installed that contains symbols, kgdb support etc.
Learn how to compile and install a kernel and use in make menuconfig belows configuration. You can search for Sybmbols with F8 or the / key in menuconfig.
(E.g. wiki.ubuntu.com. There take care in the first paragraph to execute deb-src before apt-get :)
# CONFIG_SERIAL_KGDB_NMI is not set
CONFIG_CONSOLE_POLL=y
# CONFIG_DEBUG_INFO is not set
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
# CONFIG_KGDB_TESTS is not set
# CONFIG_KGDB_KDB is not set
CONFIG_FRAME_POINTER=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_GDB_SCRIPTS=y
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_RANDOMIZE_BASE is not set
(Note for advanced Yocto use, skip if you're debugging a PC:
In yocto I created in my layer a file: recipes-kernel/linux/linux-mainline_%.bbappend with the content:
FILESEXTRAPATHS_prepend := "${THISDIR}/files:"
SRC_URI += "file://kgdb.cfg“
And in files/kgdb.cfg I added the config fragment shown above (without the on ARM unavailable options CONFIG_RANDOMIZE_BASE and CONFIG_FRAME_POINTER)
)
You need on the second debugger computer, we call it „debugger pc":
Full kernel source code, same code you used to compile the kernel above. (If you compiled the .o and .ko objects in place and not in a build-folder you better not copy the directory from the other pc, where you called make etc. in, but then better grab fresh sources again.)
vmlinux file containing the symbols (lies in the kernel source root, or build folder on the highest level after kernel make).
vmlinux-gdb.py file that was made during the kernel build (also lies at the same position on the highest level.).
All scripts in the folder scripts/gdb (Folder scripts in the same toplevel-position. If you use a dedicated build folder use the script folder from there, not from the source folder.)
(Advanced: If both computers don’t match in CPU, like Intel and Arm, a cross-gdb build. Ignore if you're on Intel/AMD.)
Note for advanced Yocto use, I did something like (ignore if you debug a PC):
bitbake -c patch virtual/kernel #(apply the changed kernel config from above)
bitbake -f -c compile virtual/kernel #(unpack is not sufficient because of vmlinux-gdb.py)
mkdir ~/gdbenv
cp -a tmp/work-shared/phyboard-mira-imx6-14/kernel-source/. ~/gdbenv
cp tmp/work/phyboard_mira_imx6_14-phytec-linux-gnueabi/linux-mainline/4.19.100-phy1-r0.0/build/vmlinux ~/gdbenv
cp tmp/work/phyboard_mira_imx6_14-phytec-linux-gnueabi/linux-mainline/4.19.100-phy1-r0.0/build/vmlinux-gdb.py ~/gdbenv
mkdir ~/gdbenv/scripts
cp -r tmp/work/phyboard_mira_imx6_14-phytec-linux-gnueabi/linux-mainline/4.19.100-phy1-r0.0/build/scripts/gdb ~/gdbenv/scripts
Then (ignore if you're on a PC)
yocto bitbake -c populate_sdk [my-image]
Then (still ignore on PC) install the sdk .sh-installation file from your deploy directory on the debugger pc and start the environment as guided by the output of the install script (remember that command), then use "$GDB" for starting the cross-gdb instead of „gdb".
Debug execution
Launch on the debugger two console screens:
Console 1, ssh: +++++++++++++++++++++++++++++++++++++++
ssh user#192.168.x.y
sudo -s
echo ttyS0,9600n8 > /sys/module/kgdboc/parameters/kgdboc
echo 1 > /proc/sys/kernel/sysrq
echo g > /proc/sysrq-trigger
Console 2, local: ++++++++++++++++++++++++++++++++++++++++
cd ~/gdbenv
gdb -tui ./vmlinux
add-auto-load-safe-path ~/gdbenv
source ~/gdbenv/vmlinux-gdb.py
set serial baud 9600
target remote /dev/ttyS0 (use the tty port you confirmed in the beginning)
b [name of the c funtion you want to debug]
cont
Back to console 1, ssh: +++++++++++++++++++++++++++++++++++++++
[Now trigger the function, e.g. sudo modprobe yourFancyKernelModule]
Back to console 2, local: ++++++++++++++++++++++++++++++++++++++++
Now use gdb functions, like bt, step, next, finish ...
You can also use linux-aware commands. Call "apropos lx“ in gdb for a list of commands.

How can I solve stdarg.h No such file or directory while compiling out-of-tree Linux kernel module?

I have an out-of-tree Linux kernel module that I need to compile. When I execute "make" in the kernel module directory I am getting:
"fatal error: stdarg.h: No such file or directory"
Before starting the build I installed the header file based on my Linux distribution.
$sudo apt-get install kernel-headers-$(uname -r)
How can I solve this compilation error? (my distribution is Ubuntu 16.04 with linux-headers-4.15.0-42-generic)
I ran a search of stdarg.h with the "locate" command to see if I can sport the file on the system.
I got:
/usr/include/c++/5/tr1/stdarg.h
/usr/lib/gcc/i686-linux-gnu/5/include/cross-stdarg.h
/usr/lib/gcc/i686-linux-gnu/5/include/stdarg.h
...
It tells me there is at least one stdarg.h provided by the compiler.
I tried to include the path "/usr/lib/gcc/i686-linux-gnu/5/include" in the kernel module Makefile so stdarg.h can be picked up. It did not work (while building, another reference to stdarg.h in the official kernel header was not being resolved).
I finally created a symlink directly under:
/usr/src/linux-headers-4.15.0-42-generic/include
$sudo ln -s /usr/lib/gcc/i686-linux-gnu/5/include/stdarg.h stdarg.h
This was just enough to solve the compilation issue.
I am wondering if the kernel headers should come with an implementation of stdarg.h by default (that is the first time I encounter this issue). I have also read that the compiler provide one implementation and most of the time it is better to use the compiler version.
Updated note: if the above solution still does not solve the problem:
Before running make again, do a make clean. Do a ls -la in the folder and look for a ".cache.mk" file. If this is still there, remove it and run "make" again. It should solve the problem.
I had the same issue with CentOS 9, and the other answers didn't work for me. Apparently the problem is that in more recent kernels, it shouldn't be <stdarg.h> but <linux/stdarg.h>. With virtualbox guest additions 6.1.34, it correctly checks for kernel with a version of 5.15.0 or more. But my kernel is the 5.14.xx, meaning the include for stdarg.h is wrong.
Solving the issue
Dependencies
Install all the dependencies for the guest edition
gcc make perl kernel-devel kernel-headers bzip2 dkms
Installation
Run the Guest Addition installation like you would normally. It will fail by saying it is unable to compile the kernel modules. That's expected. It will copy all the file we need to the VM disk.
Editing
We now need to edit the erroneous files.
/opt/VBoxGuestAdditions-6.1.34/src/vboxguest-6.1.34/vboxguest/include/iprt/stdarg.h
/opt/VBoxGuestAdditions-6.1.34/src/vboxguest-6.1.34/vboxsf/include/iprt/stdarg.h
On line 48 (may change for different versions), it check for a version of Linux and select the correct header depending on the version. We need to replace if RTLNX_VER_MIN(5,15,0) with if RTLNX_VER_MIN(5,14,0) in both files.
Compile the kernel modules
We can now compile the kernel modules, and the error should be gone.
sudo rcvboxadd quicksetup all
I personally got an error the first time, but then I recompiled without changing anything and it worked.
Remember that it's just a workaround, it may not work with different versions.
If you using Arch Linux with zen-kernel:
sudo CPATH=/usr/src/linux-zen/include/linux vmware-modconfig --console --install-all
I had the same problem with VirtualBox 6.1.0 running archlinux with kernel 6.1.9.
I downloaded VirtualBoxGuestAdditions_7.2.0.iso file from https://download.virtualbox.org/virtualbox/7.0.2/ link(you may select more appropriate to your VirtualBox version) and assigned as an optical drive to virtualbox machine. After start of the system running blkid command on terminal showed the name of CD rom device which was /dev/sr0. then I created iso folder on
/mnt folder
mkdir /mnt/iso
and mounted cd drive to that folder
mount -o loop /dev/sr0 /mnt/iso
after I cd'ed to /mnt/iso
cd /mnt/iso
and manually run VirtualBoxGuestAdditions.run script
sh ./VirtualBoxGuestAdditions.run
which successfully compiled and istalled required virtualbox guest modules.
Now everytime I update kernel version I redo the same procedure. And it work fine.
It also remove old 6.1.0 guest additons folder.

Yocto menuconfig not working

For some reason the menuconfig menu does not come up when I try launching it from my Yocto installation. I am using the Toradex Yocto 1.6 system as is described here http://developer.toradex.com/software-resources/arm-family/linux/board-support-package/openembedded-%28core%29, with my board set to "apalis-t30". When I run either bitbake virtual/kernal -c menuconfig or bitbake linux-toradex -c menuconfig, it executes fine but finishes (without erros) before actually showing anything. Running devshell also gives the same results.
If I just use the kernel sources on their own as is described here http://developer.toradex.com/software-resources/arm-family/linux/board-support-package/build-u-boot-and-linux-kernel-from-source-code, I can get menuconfig open using make nconfig. From the Yocto scripts it appears as if though the exact same kernel sources are being used. If I try adding adding make nconfig to the do_configure_prepend script in the linux-toradex_git.bb file then the commands get stuck stating that the process (I assume menuconfig) is running and then provides a PID for it, but no window or menu is displayed anywhere and the task does not seem to finish.
PS. I am on Fedora 21 64-bit.
EDIT:
I have now checked the default Yocto image and menuconfig comes up fine there. I am assuming that the Toradex BSP is not entirely compatible enough with Yocto for this to work out of the box. I have spoken to Toradex and they have told me that I should instead fork their kernel, modify it the normal way in my own repo and then tell the script to pull from my modified repo. I guess this could work but its a bit of a hassle and I would like to fix their Yocto system. I am assuming that this cannot be to hard as running make nconfig is usually enough, I just can't figure out how to get that command working with bitbake.
This should work fine with the meta-toradex layer. In the local.conf file, comment out the INHERIT += "rm_work" line:
#INHERIT += "rm_work"
Then do a full build of the kernel:
MACHINE=apalis-t30 bitbake virtual/kernel
Then try menuconfig now that all the sources are in place:
MACHINE=apalis-t30 bitbake -c menuconfig virtual/kernel
If you are using Ubuntu, try to reconfigure system shell to bash instead of dash(that is default for Ubuntu):
$ sudo dpkg-reconfigure dash
press "No" when prompted.
Actually I got the same problem few times. In one case shell reconfigure helped me.

How does "biosdevname" really work?

I know the purpose of "biosdevname" feature in Linux, but I'm not sure how
exactly it works.
I tested it with Ubuntu 14.04 and Ubuntu 14.10 (both 64-bit server editions)
and it looks like they enable it by default - right after system startup my
network interface has a name such as p4p1 instead of eth0, no customization
is needed. As I understood it, in order for biosdevname to be enabled, BOTH
of these two conditions must be met:
a boot option biosdevname=1 must be passed to a kernel
biosdevname package must be installed
As I already mentioned, both Ubuntu 14.04 and 14.10 seem to offer biosdevname
as a default feature: they come with biosdevname package already installed, I
didn't need to modify grub.cfg either - GRUB_CMDLINE_LINUX_DEFAULT has no
parameters and my network interface still has a BIOS name (p*p*) instead of a
kernel name (eth*.)
Later I wanted to restore the old style device naming and that's where the
interesting part begins. I decided to experiment a bit while trying to disable
the biosdevname feature. Since it requires biosdevname package to work (or
so I read here and there), I assumed removing it would be enough to disable the
feature, so I typed:
sudo apt-get purge biosdevname
To my surprise, after reboot my network interface was still p4p1, so
biosdevname clearly still worked even though biosdevname package had been
wiped out.
As a next step, I applied appropriate changes to /etc/network/interfaces in
order to restore the old name of my network interface (removed entry for p4p1
and added entry for eth0). As a result, after another reboot, ifconfig
reported neither eth0 nor p4p1 which was another proof that OS still
understood BIOS names instead of kernel names.
It turned out that I also had to explicitly change GRUB entry to
GRUB_CMDLINE_LINUX_DEFAULT=biosdevname=0 and update GRUB to get the expected
result (biosdevname disabled and old name of network interface restored).
My question is: how could biosdevname work without biosdevname package? Is
it not required after all? If so, what exactly provides the biosdevname
functionality and how does it work?
The reason biosdevname keeps annoying you even after you uninstall the package, is that it installed itself in the initrd 'initial ramdisk' file as well.
When uninstalling, the /usr/share/initramfs-tools/hooks/biosdevname is removed, but there is no postrm script in the package so update-initramfs is not executed and biosdevname is still present in the /boot/initrd... file used in the first stage of system startup.
You can fully get rid of it like this:
$ sudo update-initramfs -u

Error inserting scsi_wait_scan - Invalid module format

The system is CentOS 6.3.
I've compile a new kernel and the resulting rpm installed on a target machine.
When booting from the kernel I've receive the error in a title of the question.
I've extracted corresponding initramfs and compared output of:
modprobe --dump-modversions /path/to/scsi_wait_scan.ko
with entries in corresponding /boot/symvers-*. All symbols checksums fit, including of module_layout.
Is there a way to extract symvers from kernel itself?
I've found the problem.
Short answer
The problem was that I installed kernel rpm (B) over already installed kernel rpm (A),
without removing it first.
Detaild answer
scsi_mod.ko was owned only by (A). While installing (B), scsi_mod.ko was in /lib/modules/.
When intramfs was created in (B)'s postinstall script. depmod decided that scsi_wait_scan.ko depends on scsi_mod.ko, while both build against different configurations.
Later when booting the machine, kernel started run initramfs. This in turn modprob'ed scsi_wait_scan.ko. modprobe tried loaded as a consiquence scsi_mod.ko, which is not appropriate to the current kernel, thus resulting to error I saw.

Resources