Linux kernel remote debugging: Can not break back into debugger - linux

I am debugging linux ubuntu kernel using kgdb / remote gdb. I have 2 computers - ubuntu target and windows host. Computers are connected using serial port and null-modem cable.
KGDB support is enabled in target ubuntu system, command options fro KGDB:
kgdbwait kgdboc=ttyS0,115200
On my Windows system i have MinGW gdb build (x86_64):
GNU gdb (GDB) 7.4 Copyright (C) 2012 Free Software Foundation, Inc.
This GDB was configured as "x86_64-w64-mingw32".
I start my target system and it waits till remote debugger connection. I enter the following commands in GDB window:
(gdb) set remotebaud 115200
(gdb) target remote COM4
My gdb is able to connect to target and prints the following:
Remote debugging using COM4
???() at kernel/debug/debug_core.c:1043 wmb(); /* Sync point after breakpoint */
I then enter command to set breakpoints to be able to return to gdb when OS is booted:
(gdb) b sys_sync
Breakpoint 1 at 0xffffffff8124a710
I also tried hardware-assisted breakpoints in another run of same setup:
(gdb) hbreak sys_sync
This breakpoint setup should lead to kernel beak back to debuger when i enter sync command from target ubuntu console.
After i hit continue in GDB, OS is booted OK, but i never able to bring control back to gdb. I tried setting breakpoints on sys_sync, i tried
echo g > /proc/sysrq-trigger
in all cases with no success.
Very interesting: if i dont initially set breakpoint in sys_sync, entering sync command later does nothing. If i set sys_sync breakpoint, entering sync command later do halt target computer completely - so i suppose breakpoint actually set in this case.
How to break into debugger? GDB becomes irresponsive to any CTrl-C commands, so no way to continue debuging after i initially hit continue.
May be architecture incompatibility (Windows gdb - Linux target) - but seems like breakpoints are really set.
Please help

On your Test machine: login as a superuser.
sudo su
and after that generate a sysrq-trigger
echo g > /proc/sysrq-trigger
After this your target machine should freeze but the dev machine should break into target.

Lower the baud rate on both ends. In my experience, while the 115200 baud rate is theoretically supported, it rarely works properly. I recommend dropping the baud rate down to 9600 (on both ends), and finding success. Once that is achieved, then you can step the speed back up one step at a time. I am rarely able to get the kgdb to work reliably beyond 34K baud, and often lower. And I have also made sure to use the proper RTS/CTS cross linked connectors, some store bought, and some hand-made. It never matters, I think the problem is some issue in kgdboc on the SUT side that does not really always work at full 115200 baud rate. Solution is just to lower the baud rate to 9600. The data is small, so the lower speed does not hurt anything, and having it work reliably is worth any speed penalty.

Related

How can I debug kernel module running on a virtual machine?

I think I have once read about this but can't find it now.
I'm running linux-5.4.188 on a qemu arm64 virtual machine. Because I built the kernel from the source, I can debug(analyze) the kernel by attaching to the linux kernel program running on a remote machine(qemu virtual machine). To test an application which uses our device(the device model is in qemu too), I compiled a device driver against kernel 5.4.188 and the linux application and can do insmod the driver and run the application.
Now something is wrong and I have panic while running the application. I can debug linux kernel itself, but I don't know where the kernel module was loaded, so the debugger cannot debug the driver module. How can I debug the device driver? (or even the application? in case I need to someday). I remember by first getting the loaded address of the kernel module, and doing add-symbol-file for the driver image relative to that loaded address, it is possible to do kernel module debug. I think this is what driver developers will be doing always. Please tell me how I can do it. If this is possible, it will save many days for me.
As Ian Abbott said I use printk for kernel debugging. Usually for this I put his this line in /etc/sysctl.conf. (for ubuntu-20.04 case. I'm not sure it's applicable to vanila linux too)
kernel.printk = 5 4 1 7
The first value is the console log level and the second value is default log level. By making default 4 (WARNING) and console log level 5 (NOTICE), every default printk appears in the console (because 5 is higher thatn 4, less value is more important and more important messages than the console log level gets printed in the console. see How can I show printk() message in console?).
This way you don't have to check with dmesg everytime. You can see the kernel __log_buf before the serial port is initialized in setup_arch function in start_kernel. The kernel message are written in memory called __log_buf before serial port is initialized.
I refered to https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-debugging.html that stark pointed me to and I also found https://wiki.st.com/stm32mpu/wiki/Debugging_the_Linux_kernel_using_the_GDB very useful.
Important things during the kernel config :
set CONFIG_GDB_SCRIPTS=y, CONFIG_DEBUG_INFO_REDUCED=n, CONFIG_FRAME_POINTER=y.
I use nokaslr in the kernel command line, but added CONFIG_RANDOMIZE_BASE=n to make sure.
Now the part I originally wanted to know from my question :
In the shell (in virtual machine linux shell) do,
$ls -la /sys/module/<module_name>/sections
Then you'll see files like .text, .data, .bss, etc. You cd to that directory and do
$cat .text .data .bss
To see the address of each section. In my case,
/sys/module/axpu_ldd_kc/sections # cat .bss .data .text
0xffff800008ca5480
0xffff800008ca5000
0xffff800008ca0000
and in the gdb, (after stopping the program by ctrl-c) I did
add-symbol-file ~/testlin540/axpu_ldd_kc.ko 0xffff800008ca0000 -s .data 0xffff800008ca5000 -s .bss 0xffff800008ca5480
and I tried setting breakpoint at my ioctl function.
(gdb) b axpu_ioctl
and when I pressed c (continue) and when I started my application, I could see the program stop at the axpu_ioctl and I could single step through the code and see the values.
When I did kernel debug for booting using u-boot recently, I frequently wrapped a function with
#pragma GCC push_options
#pragma GCC optimize ("O0")
and
#pragma GCC pop_options
To prevent some parts of the codes from being optimized away. (sometime you should do it for the related #include <xxx.h> statement too to prevent compile error).

Need an overview of debugging process from the hardware layer

I want a comprehensive overview of how the debugging process occurs on a typical x86 machine running Linux operating system; let's say the program used for debugging is gdb. Question #1 : is the process of debugging facilitated by the hardware (or it is implemented completely in software instead?). If so, what architecture features from the instruction set are involved?
The x86 ISA includes a single-byte int3 encoding that's intended for software breakpoints. GDB uses this (via ptrace) by default for breakpoints.
(Why Single Stepping Instruction on X86?)
x86 also has a Trap Flag (TF) in EFLAGS for single-step mode. (https://en.wikipedia.org/wiki/Trap_flag). See also Difference between trap flag (TF) and monitor trap flag?
There are even "debug registers" for setting hardware breakpoints, without modifying the machine code to be run. And also hardware support for watch points, to break on write to a certain address. This makes GDB watch points efficient, not requiring it to single-step and manually decode the instruction to see where it writes.
https://wiki.osdev.org/CPU_Registers_x86#Debug_Registers
Implementing hardware breakpoints using x86 debug register osdev forum thread might be relevant.
Some other ISAs exist without nearly as much HW support for debugging. e.g. without a single-step flag, a debugger might have to always decode the current instruction (pointed to by program counter) to find the next one to be executed, and set a software breakpoint there.
ARM Linux used to do that to implement ptrace single-step, but that disassembler code was removed from the kernel and now just returns -EIO. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=425fc47adb5bb69f76285be77a09a3341a30799e is the commit that removed it.

Intel-PT does not record any packets when KVM-QEMU is on

I am trying to use Intel-PT on the host, while I run a general software program in the guest machine. So what I expect is the Intel-PT running in the host will record all the relevant packets(like PIP, FUP, TSC etc.) and also all the VM-based packets like VMCS.
I use the below command -
./perf kvm --host --guest --guestkallsyms=guest-kallsyms --guestmodules=guest-modules record -e intel_pt//
guest-kallsyms and guest-modules are the kallsyms and module files I copied from the guest onto my host.
I will then start my Virtual Machine. I will run a program on the guest machine. Once the program execution is complete, I will press Ctrl + C (SIGINT) in my host to stop the recording.
I see that once I try to use perf report to read the file generated using the below command -
./perf kvm report -i perf.data.kvm
It returns "NO SAMPLES FOUND". This means that Intel-PT has failed to record any samples.
NOTE: I found that the bit 14 for the value in the MSR MSR_IA32_VMX_MISC is 0, for my processor. As per the Intel documentation, this bit should be 1 for Intel-PT to be used in VMX operation. Does this in any way affect why Intel-PT does not record any samples ?
Will INTEL-PT work even when the VM is on ? Or is my method of recording data wrong ?
Edit: I am using Linux Kernel 4.11.3, having Ubuntu 17.04 and a Broadwell CPU, which supports Intel-PT.
Since I now have a clear idea of why Intel-PT does not work with QEMU-KVM on, I will post an answer.
As I mentioned in the question, the main reason for this not to work is the fact that the bit 14 for the value in the MSR MSR_IA32_VMX_MISC is 0, for my processor. As per the Intel documentation, this bit should be 1 for Intel-PT to be used in VMX root operation(between VMXON and VMXOFF).
The main problem is that when the above bit is 0, a VMXON instruction will set the TraceEn component of IA32_RTIT_CTL MSR to 0. This component controls the tracing operation, if this is reset, no tracing data is written to the buffer. This reset is controlled at the hardware level.
To perform this activity, it is necessary to have a Skylake processor, at least. I was using a Broadwell system, which, as it looks now, will not work.

Raspbian hangs in qemu

i'm running raspbian (2015-05-05-raspbian-wheezy.img) in qemu using compiled kernel (https://github.com/dhruvvyas90/qemu-rpi-kernel) on ubuntu 14.04. my final goal is to launch my python script within the emulation.
i'm following manual from http://www.unixmen.com/emulating-raspbian-using-qemu/, though many others suggest very similar sequence of actions.
things i'm trying and issues i'm experiencing:
first boot is more or less ok. i comment the line in /etc/ld.so.preload as suggested and reboot.
on second boot (after i remove init=/bin/bash) and all subsequent boots i get
ERROR ../libkmod/libkmod.c:554 kmod_search_moddep: could not open moddep file '/lib/modules/3.10.25/modules.dep.bin'
some googling suggested to run "sudo rpi-update". it didn't help, same message during boot.
on second boot (after i remove init=/bin/bash) and all subsequent boots i get
fsck died with exit status 6
looking into "/var/log/fsck/checkfs" as suggested tells that some location is not there, but it doesn't say which one
running "startx" produces error message from 1. it loads the UI eventually, but desktop only has "wastebasket" icon. there is also a thick white stripe on top of the screen blinking, like it keeps trying to load a tab but fails everytime. qemu window stops to respond to further interaction after this.
running "sudo apt-get upgrade" installs some packages, but after reboot i can't even get to UI - just blank screen with mouse cursor.
i'm not very experienced with how linux is configured at low level. i understand that i might be doing something completely stoopid.
so, my questions are:
how do i debug? i couldn't figure out the settings for qemu to write logs. i really don't want to fallback to gdb, as i'm not debugging qemu itself, just want to get notification on it's events.
ctrl key doesn't seem to work inside qemu window.
no copy-paste available. or i can't see how to turn it on.
am i missing something? from all the manuals i have seen it seems like this should go much much smoother. like it should "just work".
Since your post many things changed. The most important things is that now using Andrew Baumann GitHub repo you can build QEMU that boots recent Raspbian. I described my experience woth this code here. Instructions are straight forward. Implementation needs polishing but it best compilation of work so far.
To answer your questions:
QEMU have -s and -S options for GDB. First option setup gdb server hook and second freez CPU, so you can connect debugger. This is not for QEMU debugging this for guest system debugging. Default QEMU logging is to stderr, so if something valuable happen you will see it in terminal. You can raise QEMU verbosity by uncommenting various *DEBUG_ statements in source code. Also check help for -d and -D command line flags of QEMU.
Not sure I can help with this. Only thing that I can say is that my QEMU version 2.5.50 reacts to Ctrl+Alt which exits from GUI after capturing cursor, so it looks like QEMU understand Ctrl key. I assume that QEMU do not capture your special keys combination because your window manager do it before passing to QEMU.
This also not work for me, but I see some work was done in this area. Not sure how to enable and use that feature.
Emulating any hardware is very complex and requires a lot of work. All emulated targets are limited to some most important features. BCM2835/BCM2836 (Raspberry Pi/Raspberry Pi 2) SoC are still not accepted by mainline QEMU, so just work will not apply to those platforms.

gdb - No hardware breakpoint support in the target

I'm tring to set hardware breakpoint using gdb hbreak command
hbreak *address
but I'm getting the following error: "No hardware breakpoint support in the target".
Is there anyway to fix this problem?
Try the start command first. GDB often says that when the program has not been started yet even if there is hardware support (this is a very misleading message in this context).
Your hardware may not be supporting hardware breakpoints or perhaps you are out of available hw-breakpoint registers. You can still use software breakpoints as a fix.

Resources