Kernel on x86_64 not boot after upgrading Binutils and GCC - linux

I'm not able to see ANY logs on console (even not the "Decompressing Linux... " message).
I enabled any Early Boot prints under Kernel config, at least which I know (see below Kernel configuration)
Tried stopping the Kernel with KDB - by adding kgdbwait at the end of kernel command line arguments in grub
I tried to boot the kernel manually from grub
I tried to add panic() or logs at the function:
asmlinkage void __init start_kernel(void) (init/main.c)
Added / Changed / Removed the loglevel=verbose Kernel command line argument (in GRUB)
Remove "quiet" from Kernel command line argument (in GRUB)
Add crashkernel to Kernel command line argument (in GRUB)
My Questions:
1) Is there other thing I can do to get early boot logs ?
2) Is there any additional code I need to add to Kernel debug it ?
3) Is there any other methodology to debug it ?
Binutils:
2.26.1
binutils-2.26.1-1.fc25.src.rpm
GCC:
6.4.1
gcc-6.4.1-1.fc25.src.rpm
Kernel:
3.10.0-514.21.1
GRUB configuration
default=0
timeout=3
serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
terminal --timeout=34 serial console
title XIV-System
root (hd0,0)
kernel /boot/vmlinuz init=/system/my_init console=tty0 mce=0 i8042.noaux idle=poll scsi_mod.inq_timeout=2 selinux=0 nohpet console=ttyS0,115200 earlyprintk=ttyS0,115200 kgdboc=ttyS0,115200 ro crashkernel=auto
Kernel configuration
- For verbos boot message
CONFIG_X86_VERBOSE_BOOTUP=y
- For early printk
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
- For KGDB
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_KGDB_KDB=y
CONFIG_SERIAL_KGDB_NMI=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_TESTS=y
- For KDB
CONFIG_KDB_KEYBOARD=y
CONFIG_KDB_CONTINUE_CATASTROPHIC=0

I was manage to solve the problem only after debug kernel with gdb (the early boot stage), basically I found out that the kernel fails on:
(gdb) bt
#0 early_idt_handler () at arch/x86/kernel/head_64.S:407
#1 0xffff9fffffffffff in ?? ()
#2 0xffffff07ffffffff in ?? ()
#3 0xffffe0ffffffffff in ?? ()
#4 0x00000e0000000000 in ?? ()
#5 0xffffffff81bee8a0 in ?? ()
#6 0xffff880000014560 in ?? ()
#7 0xffffffff819c0120 in early_idt_handler () at arch/x86/kernel/head_64.S:374
#8 0x0000000000000400 in irq_stack_union ()
#9 0xffffffff81bee8a0 in ?? ()
#10 0x000000000000000f in irq_stack_union ()
#11 0x0000000000000000 in ?? ()
from them I looked into the Kernel source tree (for patch related to head_64.S) and applied it. you can find the patch and it's related on:
commit 5f020130d5360e8266e369dc2b5f4e32ec5b05f4 (HEAD -> my_commit)
Author: Andy Lutomirski <luto#kernel.org>
Date: Fri May 22 16:15:47 2015 -0700

Related

gdb symbols loaded but no symbols shown for seg fault [duplicate]

This question already has answers here:
Debugging core files generated on a Customer's box
(5 answers)
Closed 2 years ago.
So I have my core dump after setting the ulimit: (ulimit -c unlimited)
The core dump comes from another system that is experiencing some issues.
I have copied the core over to my dev system to examine it.
I go into gdb:
$ gdb -c core
...
Core was generated by `./ovcc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fedd95678a9 in ?? ()
[Current thread is 1 (LWP 15155)]
(gdb) symbol-file ./ovcc
Reading symbols from ./ovcc...
(gdb) bt
#0 0x00007fedd95678a9 in ?? ()
#1 0x0000000000000002 in ?? ()
#2 0x000055e01cd5e7e0 in ?? ()
#3 0x00007fedd21e9e00 in ?? ()
#4 0x0000000000000201 in ?? ()
#5 0x000055e01cd5e7e0 in ?? ()
#6 0x0000000000000201 in ?? ()
#7 0x0000000000000000 in ?? ()
(gdb)
I check the compile and link commands and they both have "-g" and I can visually step through the program with the codium debugger!
So why can't I see where the executable is crashing?
What have I missed?
Is the problem the fact that the core was created on another system?
Is the problem the fact that the core was created on another system?
Yes, exactly.
See this answer for possible solutions.
Update:
So does this mean I can only debug the program on the system where it is both built and crashes?
It is certainly not true that you can only debug a core on the system where the binary was both built and crashed -- I debug core dumps from different systems every day, and in my case the build host, the host where the program crashed, and the host on which I debug are all separate.
One thing I just noticed: your style of loading the core: gdb -c core followed by symbol-file, doesn't work for PIE executables (at least when using GDB 10.0) -- this may be a bug in GDB.
The "regular" way of loading the core is:
gdb ./ovcc core
See if that gives you better results. (You still need to arrange for matching DSOs, as linked answer shows how to do.)

Pin process crashes before opening a socket for gdb

When I run Intel pin with my custom pin-tool, for some reason it crashes on a segfault, before even starting the application under test. It happens for one application, even though, the same setup works for another application.
Here is an example of a successful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable1 <args to the executable>
Application stopped until continued from debugger.
Start GDB, then issue this command at the (gdb) prompt:
target remote :42312
(unset HOME is there for the application's sake) Here is an example of an unsuccessful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
C: Tool (or Pin) caused signal 11 at PC 0x000000000
Segmentation fault
Note, that it does not even open a socket for gdb to attach to.
When running it directly under gdb, it seems to fail differently (on SIGUSR1):
$ unset HOME && TEST_FILE=test000001.test gdb --args pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pin...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 185838 is executing new program: /home/necto/pin/intel64/bin/pinbin
Program received signal SIGUSR1, User defined signal 1.
0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
(gdb) bt
#0 0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#1 0x00007fffffffd3d0 in ?? ()
#2 0x00007ffff7edbb53 in OS_SyscallDo () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#3 0x00007ffff7eda4a3 in OS_SendSignalToThread () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#4 0x00007ffff7ed8f8a in OS_RaiseException () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#5 0x00007ffff7e87dad in raise () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#6 0x00005555558e747e in ?? ()
#7 0x00005555558e757e in LEVEL_INJECTOR::DoSystemChecks() ()
#8 0x00005555558db0ae in LEVEL_INJECTOR::UNIX_INJECTOR::Run() ()
#9 0x00005555558e0695 in LEVEL_INJECTOR::PIN_UNIX_ENVIRONMENT::LaunchPin() ()
#10 0x00005555558c8be5 in LEVEL_INJECTOR::PIN_ENVIRONMENT::Main() ()
#11 0x0000555555657cf9 in main ()
(gdb)
The backtrace looks like nothing familiar. How can I find out the cause of this segfault?
Edit
As per suggestion of #Employed Russian, I let gdb pass the SIGUSR1 to pin, which helped it advance, but not by far:
(gdb) handle SIGUSR1 nostop noprint pass
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 186041 is executing new program: /home/necto/pin/intel64/bin/pinbin
E: Attach to pid 186041 failed.
E: The Operating System configuration prevents Pin from using the default (parent) injection mode.
E: To resolve this, either execute the following (as root):
E: $ echo 0 > /proc/sys/kernel/yama/ptrace_scope
E: Or use the "-injection child" option.
E: For more information, regarding child injection, see Injection section in the Pin User Manual.
E:
Edit2
The problem is in my pin tool. My pin tool pin-trace.so calls a function from the user code (from the application). This function fails on an assertion in the executable2, which becomes an exception in pin and converts into a segfault, being unhandled.
When running it directly under gdb, it seems to fail differently (on SIGUSR1)
It looks like pin is trying to use SGIUSR1 internally. If you ask GDB to ignore this signal with handle SIGUSR1 nostop noprint pass, your GDB session will likely proceed further, hopefully to the crash on NULL pointer dereference.
In case it helps, this:
C: Tool (or Pin) caused signal 11 at PC 0x000000000
means that your pintool (or pin itself) called NULL function pointer.

gdb: how to get a complete backtrace when fglrx_dri.so segfaults?

I'm experiencing segmentation faults in the fglrx dri library when running my own Qt based OpenGL application. The backtrace I get from gdb (with dbg symbols installed for Qt and my own application):
Thread 1 (Thread 0xb7fd9720 (LWP 1809)):
#0 0x06276705 in ?? () from /usr/lib/fglrx/dri/fglrx_dri.so
#1 0x000020dc in ?? ()
#2 0x000020d9 in ?? ()
#3 0x00000000 in ?? ()
I can not see where from my code I call the fglrx function which causes the segmentation fault. How could I extend this backtrace to see it completely from the main() function down to the fglrx dri library?
edit: To confirm my own application is built with debug symbols:
Reading symbols from /home/user/fglrx crash/crashtest-build-desktop-Qt_4_8_1__Qt-4_8_1__Debug/crashtest...done.
(gdb) br main
Breakpoint 1 at 0x804996d: file ../program/main.cpp, line 21.
(gdb) run
Starting program: /home/user/fglrx crash/crashtest-build-desktop-Qt_4_8_1__Qt-4_8_1__Debug/crashtest [Thread debugging using libthread_db enabled]
Breakpoint 1, main (argc=1, argv=0xbffff2a4) at ../program/main.cpp:21
21 QApplication a(argc, argv);
(gdb) bt
#0 main (argc=1, argv=0xbffff2a4) at ../program/main.cpp:21
(gdb) n
[New Thread 0xb7d2bb70 (LWP 2475)]
[New Thread 0xb752ab70 (LWP 2476)]
22 QMainWindow w;
(gdb) bt
#0 main (argc=1, argv=0xbffff2a4) at ../program/main.cpp:22
(gdb) s
QFlags<Qt::WindowType>::QFlags (this=0xbffff164) at /usr/local/Trolltech/Qt-4.8.1/include/QtCore/qglobal.h:2284
2284 Q_DECL_CONSTEXPR inline QFlags(Zero = 0) : i(0) {}
(gdb) bt
#0 QFlags<Qt::WindowType>::QFlags (this=0xbffff164) at /usr/local/Trolltech/Qt-4.8.1/include/QtCore/qglobal.h:2284
#1 0x080499a4 in main (argc=1, argv=0xbffff2a4) at ../program/main.cpp:22
You have to generate debug symbols for your own binary as well. Compile your application with GCC's -g option. It's also advisable to turn off optimization for the time of debugging; use GCC's -O0 flag for this purpose.
The simple and awful answer is you can't. According to Graham Sellers of AMD, the driver is compiled with the -fomit-frame-pointer flag, which confuses gdb if it's deeper down the stack.

How to debug a mozilla based binary application ?

Komodo Edit crashed on my system , and i tried to debug it , added '-g' option inside komodo script,
And i got:
[New Thread 0xa80c2b70 (LWP 5102)]
[New Thread 0xa78c1b70 (LWP 5107)]
Program received signal SIGSEGV, Segmentation fault.
0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
(gdb) bt
#0 0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
#1 0x00000000 in ?? ()
(gdb) c
Continuing.
Operation not permitted
Is there any way to find out the real problem here ?
I wanted to know where that last string 'Operation not permitted' come from , but how ?
Many thanks !
added '-g' option inside komodo script,
When you say this, do you mean that you passed -g as a command-line argument?
If so, that won't work. -g (or -ggdb) needs to be passed to gcc, during compilation of Komodo Edit, so that debugging symbols are included in the output.

gdb backtrace and pthread_cond_wait()

This is on a Redhat EL5 machine w/ a 2.6.18-164.2.1.el5 x86_64 kernel using gcc 4.1.2 and gdb 7.0.
When I run my application with gdb and break in while it's running, several of my threads show the following call stack when I do a backtrace:
#0 0x000000000051d7da in pthread_cond_wait ()
#1 0x0000000100000000 in ?? ()
#2 0x0000000000c1c3b0 in ?? ()
#3 0x0000000000c1c448 in ?? ()
#4 0x00000000000007dd in ?? ()
#5 0x000000000051d630 in ?? ()
#6 0x00007fffffffdc90 in ?? ()
#7 0x000000003b1ae84b in ?? ()
#8 0x00007fffffffdd50 in ?? ()
#9 0x0000000000000000 in ?? ()
Is this a symptom of a common problem?
Is there a known issue with viewing the call stack while waiting on a condition?
The problem is that pthread_cond_wait is written in hand-coded assembly, and apparently doesn't have proper unwind descriptor (required on x86_64 to unwind the stack) in your build of glibc. This problem may have recently been fixed here.
You can try to build and install the latest glibc (note: if you screw up installation, your machine will likely become unbootable; approach with extreme caution!), or just live with "bogus" stack traces from pthread_cond_wait.
Generally, synchronization is required when multiple threads share a single resource.
In such a case, when you interrupt the program, you'll see only 1 thread is running (i.e., accessing the resource) and other threads are waiting within pthread_cond_wait().
So I don't think pthread_cond_wait() itself is problematic.
If your program hangs with deadlock or performance doesn't scale, it might be caused by pthread_cond_wait().
That looks like a corrupt stack trace to me
for example:
#9 0x0000000000000000 in ?? ()
There shouldn't be code at NULL

Resources