crashkernel not starting after crash

crashkernel not starting after crash - linux

I am trying to start a crash kernel using Linux's Kernel Crash Dump. Both of the host and crash kernel are compiled linux-4.13.16 kernel. Unfortunately, the crash kernel fails to start after the crash occurs.
iomem reports reserved space for crash kernel and kdump reports its ready for kdump:
28000000-37ffffff : Crash kernel
$ sudo kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x28000000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.13.16ksa
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.13.16ksa
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.13.16ksa root=UUID=3254c608-d885-4dfc-b20b-fa4e69564dca ro quiet splash vt.handoff=7 irqpoll noirqdistrib nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.13.16ksa root=UUID=3254c608-d885-4dfc-b20b-fa4e69564dca ro quiet splash crashkernel=384M-2G:128M,2G-:256M vt.handoff=7
After triggering crash using sysrq-trigger, it doesn't load crash kernel.
I have tested with a generic kernel, linux-4.8.0-36-generic, which successfully works.
Syslog file are here.
Both linux-4.8.0-36-generic and linux-4.13.16ksa has same .config file. Only difference I can see is that during boot, for linux-4.13.0-38, it loads the efi.signed vmlinuz(vmlinuz-4.13.0-38-generic.efi.signed) where as the compiled linux-4.13.16ksa is not efi signed.
Can it be an issue? How can I resolve this?

Related

Kernel / Linux location is memory, how to verify

I'm running linux on a Atlas-SoC Kit/DE0-Nano-SoC Kit.
Through u-boot I've placed the kernel at a different location.
mcboot=setenv bootargs console=ttyS0,115200 root=${mmcroot} rw rootwait;bootz ${loadaddr} - ${fdtaddr}
mmcload=mmc rescan;${mmcloadcmd} mmc 0:${mmcloadpart} ${loadaddr} ${bootimage};${mmcloadcmd} mmc 0:${mmcloadpart} ${fdtaddr} ${fdtimage}
mmcloadcmd=fatload
mmcloadpart=1
mmcroot=/dev/mmcblk0p2 mem=744M memmap=744M$256M
the last line, request 744M starting at 256M offset.
No my question is, how can i verify that this actually happend? This since i'm reading mixed solutions online between using device-tree and memmap configurations. And i want to make sure, before i continue on writing the device driver section.
My /proc/iomem:
root#cyclone5:~# cat /proc/iomem
00000000-2e7fffff : System RAM
00008000-0077656f : Kernel code
007d6000-00859433 : Kernel data
ff702000-ff703fff : /soc/ethernet#ff702000
ff704000-ff704fff : /soc/dwmmc0#ff704000
ffb40000-ffb4fffe : /soc/usb#ffb40000
ffc00000-ffc00fff : c_can_platform
ffc02000-ffc0201f : serial
ffc04000-ffc04fff : /soc/i2c#ffc04000
ffc05000-ffc05fff : /soc/i2c#ffc05000
ffd02000-ffd02fff : /soc/wd#ffd02000
ffe01000-ffe01fff : /soc/amba/pdma#ffe01000
fff01000-fff01fff : fff01000.spi
ffff0000-ffffffff : /soc/sram#ffff0000
any detailed explanations would be highly appreciated,
regards Auke

Device tree and kernel parameter memmap both helps in reserving memory that will not be used by linux kernel. Take a look at linux device tree memory documentation and kernel parameters.
You can use Emulator or high end debugger for example Trace32 to see memory contents.

What does udev need to startup properly in Linux?

I have an issue with udev startup on my i.MX6 board. udev-182 was cross-built by the Yocto 1.8 BSP for the board. I see the following output on startup:
INIT: version 2.88 booting
Starting udev
udevd[188]: bind failed: No such file or directory
error binding udev control socket
udevd[188]: error binding udev control socket
I believe the error is a result of the lack of /run/udev/control existing. But I am unsure what creates that.
I noticed this while I was looking into an issue with my touchscreen not working. If I manually restart udev from the command line, everything seems to work fine and my touchscreen begins functioning.
root#nitrogen6x:~# /etc/init.d/udev restart
Stopping udevd
Starting udev
udevd[451]: starting version 182
mxc_v4l_open: Mxc Camera no sensor ipu1/csi0
mxc_cam_select_input: input(0) CSI IC MEM
mxc_v4l_open: Mxc Camera no sensor ipu0/csi0
mxc_v4l_open: Mxc Camera no sensor ipu0/csi1
When I do a restart, /run/udev/control is created.
Any ideas on what could be causing this failure?
Thanks

I had the same issue and I managed to resolve that by appending rootwait rw to my bootargs in u-boot.
For instance, if your bootargs were:
console=ttymxc3,115200 root=/dev/mtdblock4 rootfstype=jffs2 mtdparts=spi0.0:512k(uboot),256k(ubootenv),6144k(kernel),256k(fdt),20m(rootfs),-(data)
Change it to:
console=ttymxc3,115200 root=/dev/mtdblock4 rootfstype=jffs2 rootwait rw mtdparts=spi0.0:512k(uboot),256k(ubootenv),6144k(kernel),256k(fdt),20m(rootfs),-(data)
That's because the kernel mounts the rootfs as r/o by default and thus new files cannot be created by any process at startup.

Compare strace output of "udev start by init" and "udev start from console" might give you some idea.

How can I debug this User Space application crash?

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}

Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

why does the i2cdetect always gives UU on my RTC in embedded Linux

I'd like to communicate read from my RTC in C code rather than the "hwclock" shell command.
However, when I use i2cdetect, it shows 0x68(which is my RTC slave address) is having the status "UU", which means "Probing was skipped, because this address is currently in use by a driver". And after I tried the i2cget, its givng "could bot set address to 0x68: Device or resource busy".
So I'm thinking if there are some problem in my Linux kernel that will force to read from my RTC all the time, or some other reason.
Thanks

I am assuming that you are using DS-1307 RTC, or one of its variants (because of 0x68 slave address). Check if its driver is loaded by:
$ lsmod | grep rtc
If you seen an entry of rtc_ds1307, (like this -> rtc_ds1307 17394 0 ) in the output of above command then this driver might be in hold of that address.
If the driver is loaded in system then unload it using
$ rmmod rtc-ds1307
EDIT:
(In light of OP's feedback,) Please do the following
1) cat /sys/bus/i2c/devices/3-0068/modalias. This will give you the name of the kernel driver that is keeping this device busy. Copy the driver-name after the colon(:)
OP's output of the command tells us that its ds1337
2) Check if ds1337 is an alias for a driver, using
grep ds1337 /lib/modules/`uname -r`/modules.alias
Hopefully you will get the following output
alias i2c:ds1337 rtc_ds1307
This confirms our presumption that rtc_ds1307 is infact the driver in hold of the I2C address 0x68.
3) use rmmod rtc_ds1307 to unload the driver.
Note: This will only work if the driver is a Loadable Kernel Module, otherwise you will see the following error:
ERROR: Module rtc_ds1307 does not exist in /proc/modules
In that case you will have to recompile the kernel again with that driver disabled/modularized.

0x68 is being used by some driver,
Disable that driver in kernel source code and recompile source code.

Initramfs built into custom Linux kernel is not running

I am building a custom initramfs image that I am building as a CPIO archive into the Linux kernel (3.2).
The issue I am having is that no matter what I try, the kernel does not appear to even attempt to run from the initramfs.
The files I have in my CPIO archive:
cpio -it < initramfs.cpio
.
init
usr
usr/sbin
lib
lib/libcrypt.so.1
lib/libm.so
lib/libc.so.6
lib/libgcc_s.so
lib/libcrypt-2.12.2.so
lib/libgcc_s.so.1
lib/libm-2.12.2.so
lib/libc.so
lib/libc-2.12.2.so
lib/ld-linux.so.3
lib/ld-2.12.2.so
lib/libm.so.6
proc
sbin
mnt
mnt/root
root
etc
bin
bin/sh
bin/mknod
bin/mount
bin/busybox
sys
dev
4468 blocks
Init is very simple, and should just init devices and spawn a shell (for now):
#!/bin/sh
mount -t devtmpfs none /dev
mount -t proc none /proc
mount -t sysfs none /sys
/bin/busybox --install -s
exec /bin/sh
In the kernel .config I have:
CONFIG_INITRAMFS_SOURCE="../initramfs.cpio"
CONFIG_INITRAMFS_ROOT_UID=0
CONFIG_INITRAMFS_ROOT_GID=0
CONFIG_BLK_DEV_INITRD=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=32768
Kernel builds and the uImage size is larger depending on the initramfs size, so I know the image is being packed. However I get this output when I boot:
console [netcon0] enabled
netconsole: network logging started
omap_rtc omap_rtc: setting system clock to 2000-01-02 00:48:38 UTC (946774118)
Warning: unable to open an initial console.
Freeing init memory: 1252K
mmc0: host does not support reading read-only switch. assuming write-enable.
mmc0: new high speed SDHC card at address e624
mmcblk0: mmc0:e624 SU08G 7.40 GiB
mmcblk0: p1
Kernel panic - not syncing: Attempted to kill init!
[<c000d518>] (unwind_backtrace+0x0/0xe0) from [<c0315cf8>] (panic+0x58/0x188)
[<c0315cf8>] (panic+0x58/0x188) from [<c0021520>] (do_exit+0x98/0x6c0)
[<c0021520>] (do_exit+0x98/0x6c0) from [<c0021e88>] (do_group_exit+0xb0/0xdc)
[<c0021e88>] (do_group_exit+0xb0/0xdc) from [<c0021ec4>] (sys_exit_group+0x10/0x18)
[<c0021ec4>] (sys_exit_group+0x10/0x18) from [<c00093a0>] (ret_fast_syscall+0x0/0x2c)
From that output, it does not look like it is even trying to extract the CPIO archive as initramfs. I expect to see this printk output, which is present in linux code init/initramfs.c:
printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n");
I tried the filesystem once booting is complete (using chroot) and it works fine... so I believe the filesystem/libraries are sane.
Could anyone give me some pointers as to what I may have incorrect? Thanks in advance for any assistance!

I figured it out. I will post the answer in case anyone else has this issue.
I was missing a console device, this line was the clue:
Warning: unable to open an initial console.
After adding printk's so that I better understood the startup sequence, I realized that console device is opened prior to running the init script. Therefore, the console device must be in the initramfs filesystem directly, and we cannot rely on the devtmpfs mount to create that.
I think when the init script ran the shell was trying to open the console and failed, that's why the kernel was outputting:
Kernel panic - not syncing: Attempted to kill init!
Executing the folowing commands from within the /dev directory of initramfs on the kernel build machine will generate the required device nodes:
mknod -m 622 console c 5 1
mknod -m 622 tty0 c 4 0
After re-CPIO archiving the filesystem and rebuilding the kernel, I finally have a working filesystem in initramfs that the kernel will boot.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string