How can I debug this User Space application crash? - linux

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}

Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

Related

GDB hangs during remote debugging, library version mismatches

I'm using linux and am trying to remote debug a program.
I launch gdbserver on the target, from .xinitrc with
gdbserver localhost:9134 /root/game/game
On my local pc, which I'm running inside a virtualbox vm, I connect to the target from gdb with
target remote 192.168.1.20:9134
and it connects fine.
I can set a breakpoint at main with
b main
and then I can continue and it will break there. I can single step for a ways until it gets to the call SDL_Init(), from which it will never return back to gdb.
If I don't single step to SDL_Init but instead set a breakpoint further on in the program, the program will start up and run normally (so it gets past SDL_Init). But when it reaches the breakpoint, it freezes up on the target machine and gdb on my local machine never shows a prompt. The entire thing hangs and must be restarted. It's not completely frozen, however, as the mouse pointer still moves on the target and you can ping it, but the gdb connection no longer works. So it seems that the graphics systems somehow interferes with this since breakpoints before the graphics system init do work, but not after.
I've tried setting the remotetimeout setting to 500 seconds and it exhibits the same behaviour. When I ping the remote target from my local pc the reported time is around 0.3 to 0.4 ms. So that doesn't seem out of the ordinary, but I wouldn't rule out any other misconfigured network settings on my part.
It's on a legacy system (but hey, it still makes money) with gdbserver version 6.8-19.fc10 and gdb version 6.8-29.fc10. Upgrading versions, while a very large headache, could be possible but probably should not be necessary (any upgrades I make to my pc have to also be done to a state regulator's system, as they use the gdb setup for their testing purposes, but it's not impossible). Remote debugging was working in the past before I took over the project, and no one who worked on it before is still around to ask. The gdbserver version definitely worked, as I'm using the exact program used previously.
Update 1:
I updated the gdb version on the host machine to version 7.0.1 and it still exhibits the same behavior. I couldn't do version 8 as it needs a C++11 compiler and the legacy system is before that time.
Update 2:
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
Update 3:
I dug out a serial cable and finally got the remote debugging setup via serial. It still doesn't work, but it gives me more error messages. I get the error
gdbserver: error initializing thread_db library: version mismatch between libthread_db and libpthread
which I think makes sense since my breakpoints quit working after the graphics system is initialized which involves creating some threads. After googling that error, I've tried using set solib-absolute-prefix, set solib-search-path, and set sysroot to the root directory on the host machine of a copy of the filesystem on the target machine (on the host, that is /fw_dev/fgs/cf/initrd/expand, which contains the filesystem that the initrd is made from)
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error. I've also tried setting those variables to the lib subdirectory, which doesn't work either. I also tried just copying the local thread libraries from the host's /lib directory to the /lib on the target, but then x windows won't even start.
Update #4:
I tried launching gdb from the root of the copy of the target filesystem on the host (/fw_dev/fgs/cf/initrd/expand), and gdb still hangs on breakpoints but I no longer get the error message about libthread_db and libpthread mismatches, so back to the drawing board.
Update #5
Maybe I'm getting to where I should ask this a separate question, but I compiled gdb, then ran gbd on itself. Then used file to set it to the program on the host, set the remote target, set my breakpoints and then ran continue. When I get to the breakpoint, gdb hangs as always. But now when I press ctrl-c in gdb, I get this backtrace
#0 0x00110416 in __kernel_vsyscall ()
#1 0x00b3f39d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x08203b9a in ser_base_wait_for (scb=0x96a2930, timeout=1) at ser-base.c:206
#3 0x08203c89 in do_ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:256
#4 0x08204046 in generic_readchar (scb=0x96a2930, timeout=-1, do_readchar=0x8203c60 <do_ser_base_readchar>) at ser-base.c:326
#5 0x082040b0 in ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:391
#6 0x081ecac2 in serial_readchar (scb=0x96a2930, timeout=-1) at serial.c:376
#7 0x080c4357 in readchar (timeout=<value optimized out>) at remote.c:5922
#8 0x080c5e35 in getpkt_or_notif_sane_1 (buf=0x839f140, sizeof_buf=0x839f144, forever=1, expecting_notif=0) at remote.c:6440
#9 0x080d1e0a in getpkt_sane (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:6534
#10 remote_wait_as (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4736
#11 remote_wait (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4843
#12 0x08184d4b in target_wait (ptid=..., status=0xbffff388, options=0) at target.c:2098
#13 0x0815daf2 in wait_for_inferior (treat_exec_as_sigtrap=0) at infrun.c:2028
#14 0x0815ddd4 in proceed (addr=4294967295, siggnal=TARGET_SIGNAL_DEFAULT, step=0) at infrun.c:1652
#15 0x08153729 in continue_1 (all_threads=0) at infcmd.c:668
#16 0x08153ea2 in continue_command (args=0x0, from_tty=0) at infcmd.c:760
#17 0x0808e9e8 in execute_command (p=0x83b89a1 "", from_tty=0) at top.c:453
#18 0x0816b028 in command_handler (command=0x83b89a0 "c") at event-top.c:511
#19 0x0816bd5a in command_line_handler (rl=0x8ce83e8 "\340&\266\b\340\230\321\b") at event-top.c:736
#20 0x0822d5a5 in rl_callback_read_char () at callback.c:205
#21 0x0816b17b in rl_callback_read_char_wrapper (client_data=0x0) at event-top.c:178
#22 0x0816ac54 in handle_file_event (data=...) at event-loop.c:812
#23 0x08169e6b in process_event () at event-loop.c:394
#24 0x0816aba4 in gdb_do_one_event (data=0x0) at event-loop.c:459
#25 0x0816500b in catch_errors (func=0x816a950 <gdb_do_one_event>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#26 0x080f072a in tui_command_loop (data=0x0) at ./tui/tui-interp.c:153
#27 0x08165684 in current_interp_command_loop () at interps.c:291
#28 0x0808653b in captured_command_loop (data=0x0) at ./main.c:226
#29 0x0816500b in catch_errors (func=0x8086530 <captured_command_loop>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#30 0x08085ecc in captured_main (data=0xbffff7a4) at ./main.c:902
#31 0x0816500b in catch_errors (func=0x80853d0 <captured_main>, func_args=0xbffff7a4, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#32 0x080851d1 in gdb_main (args=0xbffff7a4) at ./main.c:911
#33 0x08085195 in main (argc=128, argv=0x0) at gdb.c:33
So it seems gdb is hanging inside __kernel_vsyscall(). Doing a diff on libc.so.6 on in the /lib directory on the host and the libc.so.6 on the target reveal differences. I've tried using LD_PRELOAD and LD_LIBRARY_PATH but that backtrace always shows /lib/libc.so.6 instead of pointing to the copy that the target has. Maybe I'm not setting them correctly, but I've tried setting them in gdb with set env and also setting them on the command line and exporting them, but to no effect. I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
So how do I get gdb to load different libraries?
Update #6:
So I made a bootable usb key using the target system's disk image as the base. I made minimal changes to it to get it to run on a standard PC, and added gdb and gdb's requisite libraries to it. So now, ibc is the same on both host and target machines and it still hangs on me.
Final. While I know gdb 6.8 worked in the past, I can't figure out the configuration. After upgrading both gdb and gdbserver to 7.12 it worked.
Upgrading versions, while a very large headache, could be possible but probably should not be necessary...
This is the right option. All of the other issues you are encountering are because of this.
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
You should build on the same version, architecture, etc. as the system you are attempting to deploy your code to.
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error.
Per this answer,
Can be caused by 32/64 bit mixups. Check, for example, that you didn't attach to a 32-bit binary with a 64-bit process' ID, or vice versa.
I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
Don't do that. As you've found out, it won't work.
So how do I get gdb to load different libraries?
Per this question, you can use LD_LIBRARY_PATH.
Here are some interesting suggestions. Have you tried to attach gdbserver to strace to see what kind of activity is going on during the hang? As other says - it could be a good way to go one step further into figuring out the problem.
You can do that with following command on target machine:
strace -p `pidof gdbserver`
Also sending a CONT signal to gdbserver may help when it hangs:
kill -CONT `pidof gdbserver`

Pin process crashes before opening a socket for gdb

When I run Intel pin with my custom pin-tool, for some reason it crashes on a segfault, before even starting the application under test. It happens for one application, even though, the same setup works for another application.
Here is an example of a successful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable1 <args to the executable>
Application stopped until continued from debugger.
Start GDB, then issue this command at the (gdb) prompt:
target remote :42312
(unset HOME is there for the application's sake) Here is an example of an unsuccessful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
C: Tool (or Pin) caused signal 11 at PC 0x000000000
Segmentation fault
Note, that it does not even open a socket for gdb to attach to.
When running it directly under gdb, it seems to fail differently (on SIGUSR1):
$ unset HOME && TEST_FILE=test000001.test gdb --args pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pin...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 185838 is executing new program: /home/necto/pin/intel64/bin/pinbin
Program received signal SIGUSR1, User defined signal 1.
0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
(gdb) bt
#0 0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#1 0x00007fffffffd3d0 in ?? ()
#2 0x00007ffff7edbb53 in OS_SyscallDo () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#3 0x00007ffff7eda4a3 in OS_SendSignalToThread () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#4 0x00007ffff7ed8f8a in OS_RaiseException () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#5 0x00007ffff7e87dad in raise () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#6 0x00005555558e747e in ?? ()
#7 0x00005555558e757e in LEVEL_INJECTOR::DoSystemChecks() ()
#8 0x00005555558db0ae in LEVEL_INJECTOR::UNIX_INJECTOR::Run() ()
#9 0x00005555558e0695 in LEVEL_INJECTOR::PIN_UNIX_ENVIRONMENT::LaunchPin() ()
#10 0x00005555558c8be5 in LEVEL_INJECTOR::PIN_ENVIRONMENT::Main() ()
#11 0x0000555555657cf9 in main ()
(gdb)
The backtrace looks like nothing familiar. How can I find out the cause of this segfault?
Edit
As per suggestion of #Employed Russian, I let gdb pass the SIGUSR1 to pin, which helped it advance, but not by far:
(gdb) handle SIGUSR1 nostop noprint pass
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 186041 is executing new program: /home/necto/pin/intel64/bin/pinbin
E: Attach to pid 186041 failed.
E: The Operating System configuration prevents Pin from using the default (parent) injection mode.
E: To resolve this, either execute the following (as root):
E: $ echo 0 > /proc/sys/kernel/yama/ptrace_scope
E: Or use the "-injection child" option.
E: For more information, regarding child injection, see Injection section in the Pin User Manual.
E:
Edit2
The problem is in my pin tool. My pin tool pin-trace.so calls a function from the user code (from the application). This function fails on an assertion in the executable2, which becomes an exception in pin and converts into a segfault, being unhandled.
When running it directly under gdb, it seems to fail differently (on SIGUSR1)
It looks like pin is trying to use SGIUSR1 internally. If you ask GDB to ignore this signal with handle SIGUSR1 nostop noprint pass, your GDB session will likely proceed further, hopefully to the crash on NULL pointer dereference.
In case it helps, this:
C: Tool (or Pin) caused signal 11 at PC 0x000000000
means that your pintool (or pin itself) called NULL function pointer.

Why process terminates abnormally without coredump?

I have stuck in a such problem that my c++ server program can't coredump when terminate abnormally. The program running in daemon mode with chdir to '/'.
I had done the following things:
ulimit -c unlimited, so coredump enabled.
echo "/tmp/coredump/core.%e.%p.%t" > /proc/sys/kernel/core_pattern, and chmod a+w coredump, so it has the permission to write coredump file.
and I had try such things:
send a SIGABRT via kill -6, it can coredump.
in dmesg, I can't find any info about the abnormally terminates process.
running the program not in daemon mode.
My OS version: CentOS release 6.4 (Final), x86_64
ps. the server program installed a signal handler (sigaction() with flag SA_RESETHAND) to catch such signals {SIGHUP, SIGINT, SIGQUIT, SIGTERM} for normally terminates(free resources). so it can exclude the signal shielding.

How to locate a thread's stack base from a core file?

I have a core dump that shows a thread dying from a SIGBUS signal while executing mov %r15d,0xa0(%rsp). That seems to tell me that it died because it ran out of thread stack.
But how can I prove it? I cannot seem to find a GDB command to display thread information besides thread backtraces. In this case there is no backtrace. It shows the current function and then 0x0000000000000000. Yet another indication of stack corruption, I think.
I don't have a copy of /proc/[pid]/maps from when the program died. Is there anything in GDB or in the core file I can look at to find the base of each thread stack?
That seems to tell me that it died because it ran out of thread stack.
Very likely
But how can I prove it?
(gdb) p/x $rsp
$1 = 0x7fffc5791000
(gdb) info target
Symbols from "a.out".
Local core dump file:
`core', file type elf64-x86-64.
0x0000000000400000 - 0x0000000000401000 is load1
...
0x00007faaf2240000 - 0x00007faaf2241000 is load14
0x00007fffc5791000 - 0x00007fffc5f91000 is load15
0x00007fffc5faf000 - 0x00007fffc5fb0000 is load16
0xffffffffff600000 - 0xffffffffff600000 is load17
Local exec file:
...
Note how $rsp is at the (low) end of the load15 segment, and there is no mapping that "covers" $rsp-8

How do I enable reverse debugging on a multi-threaded program?

I'm trying to use the reverse debugging features of gdb 7.3.1 on a multi-threaded project (using libevent), but I get the following error:
(gdb) reverse-step
Target multi-thread does not support this command.
From this question, I thought perhaps that it was an issue loading libthread_db but, when I run the program, gdb says:
Starting program: /home/robb/slug/slug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
How can I enable reverse debugging with gdb 7.3.1 on a multi-threaded project? Is it possible?
To do this, you need to activate the instruction-recording target, by executing the command
record
from the point where you want to go forward and backward (remember that the recording will significantly slow down the execution, especially if you have several threads!)
I've just checked that it's working correctly:
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7860700 (LWP 5503) "a.out" hello (arg=0x601030) at test2.c:16
* 1 Thread 0x7ffff7fca700 (LWP 5502) "a.out" main (argc=2, argv=0x7fffffffe2e8) at test2.c:47
...
(gdb) next
49 p[i].id=i;
(gdb) reverse-next
47 for (i=0; i<n; i++)
...
17 printf("Hello from node %d\n", p->id);
(gdb) next
Hello from node 1
18 return (NULL);
(gdb) reverse-next
17 printf("Hello from node %d\n", p->id);

Resources