Pin process crashes before opening a socket for gdb - linux

When I run Intel pin with my custom pin-tool, for some reason it crashes on a segfault, before even starting the application under test. It happens for one application, even though, the same setup works for another application.
Here is an example of a successful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable1 <args to the executable>
Application stopped until continued from debugger.
Start GDB, then issue this command at the (gdb) prompt:
target remote :42312
(unset HOME is there for the application's sake) Here is an example of an unsuccessful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
C: Tool (or Pin) caused signal 11 at PC 0x000000000
Segmentation fault
Note, that it does not even open a socket for gdb to attach to.
When running it directly under gdb, it seems to fail differently (on SIGUSR1):
$ unset HOME && TEST_FILE=test000001.test gdb --args pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pin...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 185838 is executing new program: /home/necto/pin/intel64/bin/pinbin
Program received signal SIGUSR1, User defined signal 1.
0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
(gdb) bt
#0 0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#1 0x00007fffffffd3d0 in ?? ()
#2 0x00007ffff7edbb53 in OS_SyscallDo () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#3 0x00007ffff7eda4a3 in OS_SendSignalToThread () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#4 0x00007ffff7ed8f8a in OS_RaiseException () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#5 0x00007ffff7e87dad in raise () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#6 0x00005555558e747e in ?? ()
#7 0x00005555558e757e in LEVEL_INJECTOR::DoSystemChecks() ()
#8 0x00005555558db0ae in LEVEL_INJECTOR::UNIX_INJECTOR::Run() ()
#9 0x00005555558e0695 in LEVEL_INJECTOR::PIN_UNIX_ENVIRONMENT::LaunchPin() ()
#10 0x00005555558c8be5 in LEVEL_INJECTOR::PIN_ENVIRONMENT::Main() ()
#11 0x0000555555657cf9 in main ()
(gdb)
The backtrace looks like nothing familiar. How can I find out the cause of this segfault?
Edit
As per suggestion of #Employed Russian, I let gdb pass the SIGUSR1 to pin, which helped it advance, but not by far:
(gdb) handle SIGUSR1 nostop noprint pass
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 186041 is executing new program: /home/necto/pin/intel64/bin/pinbin
E: Attach to pid 186041 failed.
E: The Operating System configuration prevents Pin from using the default (parent) injection mode.
E: To resolve this, either execute the following (as root):
E: $ echo 0 > /proc/sys/kernel/yama/ptrace_scope
E: Or use the "-injection child" option.
E: For more information, regarding child injection, see Injection section in the Pin User Manual.
E:
Edit2
The problem is in my pin tool. My pin tool pin-trace.so calls a function from the user code (from the application). This function fails on an assertion in the executable2, which becomes an exception in pin and converts into a segfault, being unhandled.

When running it directly under gdb, it seems to fail differently (on SIGUSR1)
It looks like pin is trying to use SGIUSR1 internally. If you ask GDB to ignore this signal with handle SIGUSR1 nostop noprint pass, your GDB session will likely proceed further, hopefully to the crash on NULL pointer dereference.
In case it helps, this:
C: Tool (or Pin) caused signal 11 at PC 0x000000000
means that your pintool (or pin itself) called NULL function pointer.

Related

gdb symbols loaded but no symbols shown for seg fault [duplicate]

This question already has answers here:
Debugging core files generated on a Customer's box
(5 answers)
Closed 2 years ago.
So I have my core dump after setting the ulimit: (ulimit -c unlimited)
The core dump comes from another system that is experiencing some issues.
I have copied the core over to my dev system to examine it.
I go into gdb:
$ gdb -c core
...
Core was generated by `./ovcc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fedd95678a9 in ?? ()
[Current thread is 1 (LWP 15155)]
(gdb) symbol-file ./ovcc
Reading symbols from ./ovcc...
(gdb) bt
#0 0x00007fedd95678a9 in ?? ()
#1 0x0000000000000002 in ?? ()
#2 0x000055e01cd5e7e0 in ?? ()
#3 0x00007fedd21e9e00 in ?? ()
#4 0x0000000000000201 in ?? ()
#5 0x000055e01cd5e7e0 in ?? ()
#6 0x0000000000000201 in ?? ()
#7 0x0000000000000000 in ?? ()
(gdb)
I check the compile and link commands and they both have "-g" and I can visually step through the program with the codium debugger!
So why can't I see where the executable is crashing?
What have I missed?
Is the problem the fact that the core was created on another system?
Is the problem the fact that the core was created on another system?
Yes, exactly.
See this answer for possible solutions.
Update:
So does this mean I can only debug the program on the system where it is both built and crashes?
It is certainly not true that you can only debug a core on the system where the binary was both built and crashed -- I debug core dumps from different systems every day, and in my case the build host, the host where the program crashed, and the host on which I debug are all separate.
One thing I just noticed: your style of loading the core: gdb -c core followed by symbol-file, doesn't work for PIE executables (at least when using GDB 10.0) -- this may be a bug in GDB.
The "regular" way of loading the core is:
gdb ./ovcc core
See if that gives you better results. (You still need to arrange for matching DSOs, as linked answer shows how to do.)

GDB hangs during remote debugging, library version mismatches

I'm using linux and am trying to remote debug a program.
I launch gdbserver on the target, from .xinitrc with
gdbserver localhost:9134 /root/game/game
On my local pc, which I'm running inside a virtualbox vm, I connect to the target from gdb with
target remote 192.168.1.20:9134
and it connects fine.
I can set a breakpoint at main with
b main
and then I can continue and it will break there. I can single step for a ways until it gets to the call SDL_Init(), from which it will never return back to gdb.
If I don't single step to SDL_Init but instead set a breakpoint further on in the program, the program will start up and run normally (so it gets past SDL_Init). But when it reaches the breakpoint, it freezes up on the target machine and gdb on my local machine never shows a prompt. The entire thing hangs and must be restarted. It's not completely frozen, however, as the mouse pointer still moves on the target and you can ping it, but the gdb connection no longer works. So it seems that the graphics systems somehow interferes with this since breakpoints before the graphics system init do work, but not after.
I've tried setting the remotetimeout setting to 500 seconds and it exhibits the same behaviour. When I ping the remote target from my local pc the reported time is around 0.3 to 0.4 ms. So that doesn't seem out of the ordinary, but I wouldn't rule out any other misconfigured network settings on my part.
It's on a legacy system (but hey, it still makes money) with gdbserver version 6.8-19.fc10 and gdb version 6.8-29.fc10. Upgrading versions, while a very large headache, could be possible but probably should not be necessary (any upgrades I make to my pc have to also be done to a state regulator's system, as they use the gdb setup for their testing purposes, but it's not impossible). Remote debugging was working in the past before I took over the project, and no one who worked on it before is still around to ask. The gdbserver version definitely worked, as I'm using the exact program used previously.
Update 1:
I updated the gdb version on the host machine to version 7.0.1 and it still exhibits the same behavior. I couldn't do version 8 as it needs a C++11 compiler and the legacy system is before that time.
Update 2:
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
Update 3:
I dug out a serial cable and finally got the remote debugging setup via serial. It still doesn't work, but it gives me more error messages. I get the error
gdbserver: error initializing thread_db library: version mismatch between libthread_db and libpthread
which I think makes sense since my breakpoints quit working after the graphics system is initialized which involves creating some threads. After googling that error, I've tried using set solib-absolute-prefix, set solib-search-path, and set sysroot to the root directory on the host machine of a copy of the filesystem on the target machine (on the host, that is /fw_dev/fgs/cf/initrd/expand, which contains the filesystem that the initrd is made from)
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error. I've also tried setting those variables to the lib subdirectory, which doesn't work either. I also tried just copying the local thread libraries from the host's /lib directory to the /lib on the target, but then x windows won't even start.
Update #4:
I tried launching gdb from the root of the copy of the target filesystem on the host (/fw_dev/fgs/cf/initrd/expand), and gdb still hangs on breakpoints but I no longer get the error message about libthread_db and libpthread mismatches, so back to the drawing board.
Update #5
Maybe I'm getting to where I should ask this a separate question, but I compiled gdb, then ran gbd on itself. Then used file to set it to the program on the host, set the remote target, set my breakpoints and then ran continue. When I get to the breakpoint, gdb hangs as always. But now when I press ctrl-c in gdb, I get this backtrace
#0 0x00110416 in __kernel_vsyscall ()
#1 0x00b3f39d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x08203b9a in ser_base_wait_for (scb=0x96a2930, timeout=1) at ser-base.c:206
#3 0x08203c89 in do_ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:256
#4 0x08204046 in generic_readchar (scb=0x96a2930, timeout=-1, do_readchar=0x8203c60 <do_ser_base_readchar>) at ser-base.c:326
#5 0x082040b0 in ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:391
#6 0x081ecac2 in serial_readchar (scb=0x96a2930, timeout=-1) at serial.c:376
#7 0x080c4357 in readchar (timeout=<value optimized out>) at remote.c:5922
#8 0x080c5e35 in getpkt_or_notif_sane_1 (buf=0x839f140, sizeof_buf=0x839f144, forever=1, expecting_notif=0) at remote.c:6440
#9 0x080d1e0a in getpkt_sane (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:6534
#10 remote_wait_as (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4736
#11 remote_wait (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4843
#12 0x08184d4b in target_wait (ptid=..., status=0xbffff388, options=0) at target.c:2098
#13 0x0815daf2 in wait_for_inferior (treat_exec_as_sigtrap=0) at infrun.c:2028
#14 0x0815ddd4 in proceed (addr=4294967295, siggnal=TARGET_SIGNAL_DEFAULT, step=0) at infrun.c:1652
#15 0x08153729 in continue_1 (all_threads=0) at infcmd.c:668
#16 0x08153ea2 in continue_command (args=0x0, from_tty=0) at infcmd.c:760
#17 0x0808e9e8 in execute_command (p=0x83b89a1 "", from_tty=0) at top.c:453
#18 0x0816b028 in command_handler (command=0x83b89a0 "c") at event-top.c:511
#19 0x0816bd5a in command_line_handler (rl=0x8ce83e8 "\340&\266\b\340\230\321\b") at event-top.c:736
#20 0x0822d5a5 in rl_callback_read_char () at callback.c:205
#21 0x0816b17b in rl_callback_read_char_wrapper (client_data=0x0) at event-top.c:178
#22 0x0816ac54 in handle_file_event (data=...) at event-loop.c:812
#23 0x08169e6b in process_event () at event-loop.c:394
#24 0x0816aba4 in gdb_do_one_event (data=0x0) at event-loop.c:459
#25 0x0816500b in catch_errors (func=0x816a950 <gdb_do_one_event>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#26 0x080f072a in tui_command_loop (data=0x0) at ./tui/tui-interp.c:153
#27 0x08165684 in current_interp_command_loop () at interps.c:291
#28 0x0808653b in captured_command_loop (data=0x0) at ./main.c:226
#29 0x0816500b in catch_errors (func=0x8086530 <captured_command_loop>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#30 0x08085ecc in captured_main (data=0xbffff7a4) at ./main.c:902
#31 0x0816500b in catch_errors (func=0x80853d0 <captured_main>, func_args=0xbffff7a4, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#32 0x080851d1 in gdb_main (args=0xbffff7a4) at ./main.c:911
#33 0x08085195 in main (argc=128, argv=0x0) at gdb.c:33
So it seems gdb is hanging inside __kernel_vsyscall(). Doing a diff on libc.so.6 on in the /lib directory on the host and the libc.so.6 on the target reveal differences. I've tried using LD_PRELOAD and LD_LIBRARY_PATH but that backtrace always shows /lib/libc.so.6 instead of pointing to the copy that the target has. Maybe I'm not setting them correctly, but I've tried setting them in gdb with set env and also setting them on the command line and exporting them, but to no effect. I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
So how do I get gdb to load different libraries?
Update #6:
So I made a bootable usb key using the target system's disk image as the base. I made minimal changes to it to get it to run on a standard PC, and added gdb and gdb's requisite libraries to it. So now, ibc is the same on both host and target machines and it still hangs on me.
Final. While I know gdb 6.8 worked in the past, I can't figure out the configuration. After upgrading both gdb and gdbserver to 7.12 it worked.
Upgrading versions, while a very large headache, could be possible but probably should not be necessary...
This is the right option. All of the other issues you are encountering are because of this.
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
You should build on the same version, architecture, etc. as the system you are attempting to deploy your code to.
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error.
Per this answer,
Can be caused by 32/64 bit mixups. Check, for example, that you didn't attach to a 32-bit binary with a 64-bit process' ID, or vice versa.
I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
Don't do that. As you've found out, it won't work.
So how do I get gdb to load different libraries?
Per this question, you can use LD_LIBRARY_PATH.
Here are some interesting suggestions. Have you tried to attach gdbserver to strace to see what kind of activity is going on during the hang? As other says - it could be a good way to go one step further into figuring out the problem.
You can do that with following command on target machine:
strace -p `pidof gdbserver`
Also sending a CONT signal to gdbserver may help when it hangs:
kill -CONT `pidof gdbserver`

Kernel on x86_64 not boot after upgrading Binutils and GCC

I'm not able to see ANY logs on console (even not the "Decompressing Linux... " message).
I enabled any Early Boot prints under Kernel config, at least which I know (see below Kernel configuration)
Tried stopping the Kernel with KDB - by adding kgdbwait at the end of kernel command line arguments in grub
I tried to boot the kernel manually from grub
I tried to add panic() or logs at the function:
asmlinkage void __init start_kernel(void) (init/main.c)
Added / Changed / Removed the loglevel=verbose Kernel command line argument (in GRUB)
Remove "quiet" from Kernel command line argument (in GRUB)
Add crashkernel to Kernel command line argument (in GRUB)
My Questions:
1) Is there other thing I can do to get early boot logs ?
2) Is there any additional code I need to add to Kernel debug it ?
3) Is there any other methodology to debug it ?
Binutils:
2.26.1
binutils-2.26.1-1.fc25.src.rpm
GCC:
6.4.1
gcc-6.4.1-1.fc25.src.rpm
Kernel:
3.10.0-514.21.1
GRUB configuration
default=0
timeout=3
serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
terminal --timeout=34 serial console
title XIV-System
root (hd0,0)
kernel /boot/vmlinuz init=/system/my_init console=tty0 mce=0 i8042.noaux idle=poll scsi_mod.inq_timeout=2 selinux=0 nohpet console=ttyS0,115200 earlyprintk=ttyS0,115200 kgdboc=ttyS0,115200 ro crashkernel=auto
Kernel configuration
- For verbos boot message
CONFIG_X86_VERBOSE_BOOTUP=y
- For early printk
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
- For KGDB
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_KGDB_KDB=y
CONFIG_SERIAL_KGDB_NMI=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_TESTS=y
- For KDB
CONFIG_KDB_KEYBOARD=y
CONFIG_KDB_CONTINUE_CATASTROPHIC=0
I was manage to solve the problem only after debug kernel with gdb (the early boot stage), basically I found out that the kernel fails on:
(gdb) bt
#0 early_idt_handler () at arch/x86/kernel/head_64.S:407
#1 0xffff9fffffffffff in ?? ()
#2 0xffffff07ffffffff in ?? ()
#3 0xffffe0ffffffffff in ?? ()
#4 0x00000e0000000000 in ?? ()
#5 0xffffffff81bee8a0 in ?? ()
#6 0xffff880000014560 in ?? ()
#7 0xffffffff819c0120 in early_idt_handler () at arch/x86/kernel/head_64.S:374
#8 0x0000000000000400 in irq_stack_union ()
#9 0xffffffff81bee8a0 in ?? ()
#10 0x000000000000000f in irq_stack_union ()
#11 0x0000000000000000 in ?? ()
from them I looked into the Kernel source tree (for patch related to head_64.S) and applied it. you can find the patch and it's related on:
commit 5f020130d5360e8266e369dc2b5f4e32ec5b05f4 (HEAD -> my_commit)
Author: Andy Lutomirski <luto#kernel.org>
Date: Fri May 22 16:15:47 2015 -0700

How can I debug this User Space application crash?

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}
Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

How to debug a mozilla based binary application ?

Komodo Edit crashed on my system , and i tried to debug it , added '-g' option inside komodo script,
And i got:
[New Thread 0xa80c2b70 (LWP 5102)]
[New Thread 0xa78c1b70 (LWP 5107)]
Program received signal SIGSEGV, Segmentation fault.
0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
(gdb) bt
#0 0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
#1 0x00000000 in ?? ()
(gdb) c
Continuing.
Operation not permitted
Is there any way to find out the real problem here ?
I wanted to know where that last string 'Operation not permitted' come from , but how ?
Many thanks !
added '-g' option inside komodo script,
When you say this, do you mean that you passed -g as a command-line argument?
If so, that won't work. -g (or -ggdb) needs to be passed to gcc, during compilation of Komodo Edit, so that debugging symbols are included in the output.

Resources