How can I get GDB to tell me what address caused a segfault in a core dump file - linux

I just know when gdb attach to a process, I can use p $_siginfo._sifields._sigfault.si_addr to show what address caused a segfault.
But, how to do in a core dump file?
I try it in a core dump file:
(gdb) p $_siginfo._sifields._sigfault.si_addr
Unable to read siginfo

Related

Need help resolving segfault in libc-2.23.so

Need help debugging shared library with gdb.
I am trying to debug a shared library and in my case it is:
libc-2.23.so
The reason is that I get theese lines in dmesg:
[10081.433266] compiz[11346]: segfault at 7f30a4100010 ip 00007f309c36f44b sp 00007ffdde303aa0 error 4 in libc-2.23.so[7f309c2f1000+1bf000]
[22005.764635] compiz[16149]: segfault at 7f30e3456db0 ip 00007f30db85044b sp 00007fffaab9c0a0 error 4 in libc-2.23.so[7f30db7d2000+1bf000]
[48777.031064] compiz[25203]: segfault at 7f0b8e23b050 ip 00007f0b87edf44b sp 00007ffd51d15740 error 4 in libc-2.23.so[7f0b87e61000+1bf000]
[78850.413793] compiz[4889]: segfault at 7f60ddbf2440 ip 00007f60d598944b sp 00007ffedc5e31b0 error 4 in libc-2.23.so[7f60d590b000+1bf000]
[84583.754783] compiz[8441]: segfault at 7f5f8c3930c0 ip 00007f5f871d544b sp 00007ffc436bb5a0 error 4 in libc-2.23.so[7f5f87157000+1bf000]
[100625.457854] compiz[15619]: segfault at 7ffffa967680 ip 00007ffff722844b sp 00007fffffffdad0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[104234.596331] compiz[19076]: segfault at 7ffffa2dc540 ip 00007ffff722844b sp 00007fffffffd810 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[112314.238115] compiz[22152]: segfault at 7ffffe232760 ip 00007ffff722844b sp 00007fffffffd810 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[130828.195732] compiz[26013]: segfault at 7ffffa966180 ip 00007ffff722844b sp 00007fffffffdad0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[225379.026592] compiz[19275]: segfault at 7ffff821b6d0 ip 00007ffff722844b sp 00007fffffffd7c0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
The address where libc-2.23.so is loaded does not change after time stamp 100625.457854 since I ran the command:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
In order to be able to load it under gdb.
What I have done so far is that I have established that the segfault always occur on the same offset from the shared librarys loaded address.
I calculated the offset by taking instruction pointer minus load address in python:
ld = ["7f309c2f1000", "7f30db7d2000", "7f0b87e61000", "7f60d590b000", "7f5f87157000", "7ffff71aa000"]
ip = ["7f309c36f44b", "7f30db85044b", "7f0b87edf44b", "7f60d598944b", "7f5f871d544b", "7ffff722844b"]
ld_val = [int(x,16) for x in ld]
ip_val=[int(x,16) for x in ip]
ip_off=[i-s for (i,s) in zip(ip_val,ld_val)]
ip_off
[517195, 517195, 517195, 517195, 517195, 517195]
So using this information I got the offending line from executing:
$ addr2line -e /lib/x86_64-linux-gnu/libc-2.23.so -fCi 0x7e44b
malloc_consolidate
/build/glibc-9tT8Do/glibc-2.23/malloc/malloc.c:4167
Since I run Ubuntu 16.04 I installed the sources by issuing:
$ apt-get source glibc-source
Inspecting the offending line showed that it was just a comment.
malloc.c:4167
/* Slightly streamlined version of consolidation code in free() */
inside function:
static void malloc_consolidate(mstate av)
So I am assuming I am doing something wrong here.
Any pointer on how to capture this "segfault"?
So I am assuming I am doing something wrong here.
You aren't.
The symptoms you are looking at are 99.999% result of heap corruption, and since this is happening in compiz, there is little you can do except file a bug report.
To make a useful bug report, it would help if you could run compiz under Valgrind. Running it under GDB will not help.
I had gdb loaded with the library and breakpoint on line 4167 but no break even if I got a new entry in dmesg.
That means you are debugging the wrong process. Perhaps compiz forks helper processes, and one of them dies?

gdb symbol don't load

I try to remotely debug a program using gdb and gdbserver.
I login on the remote PC with ssh and run gdbserver --multi :4444
And on my local, I use the command./arm-linux-gnueabihf-gdb -x /path/init
where the content of the /path/init file is:
symbol-file /home/username/workspace/piCCompileProj/Debug/main
target extended-remote 192.168.0.100:4444
set remote exec-file /home/username/cppSandbox/main
If I try to set a breakpoint with (gdb) b 6, gdb prints the error:
Cannot access memory at address 0x86c8
And if I type file /home/username/workspace/piCCompileProj/Debug/main and then run, the output is:
Starting program: /home/username/workspace/piCCompileProj/Debug/main
Program received signal SIGSEGV, Segmentation fault.
0x76fe6c00 in ?? ()
from /home/username/pi/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/arm-linux-gnueabihf/libc/lib/ld-linux-armhf.so.3
How can I load symbols in gdb and debug?

How can I debug this User Space application crash?

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}
Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

How to locate a thread's stack base from a core file?

I have a core dump that shows a thread dying from a SIGBUS signal while executing mov %r15d,0xa0(%rsp). That seems to tell me that it died because it ran out of thread stack.
But how can I prove it? I cannot seem to find a GDB command to display thread information besides thread backtraces. In this case there is no backtrace. It shows the current function and then 0x0000000000000000. Yet another indication of stack corruption, I think.
I don't have a copy of /proc/[pid]/maps from when the program died. Is there anything in GDB or in the core file I can look at to find the base of each thread stack?
That seems to tell me that it died because it ran out of thread stack.
Very likely
But how can I prove it?
(gdb) p/x $rsp
$1 = 0x7fffc5791000
(gdb) info target
Symbols from "a.out".
Local core dump file:
`core', file type elf64-x86-64.
0x0000000000400000 - 0x0000000000401000 is load1
...
0x00007faaf2240000 - 0x00007faaf2241000 is load14
0x00007fffc5791000 - 0x00007fffc5f91000 is load15
0x00007fffc5faf000 - 0x00007fffc5fb0000 is load16
0xffffffffff600000 - 0xffffffffff600000 is load17
Local exec file:
...
Note how $rsp is at the (low) end of the load15 segment, and there is no mapping that "covers" $rsp-8

How to print the last received signal in GDB?

when loading a core dump into GDB the reason why it crashed automatically is displayed. For example
Program terminated with signal 11, Segmentation fault.
Is there any way to get the information again?
The thing is, that I'm writing a script which needs this information. But if the signal is only available after loading the core dump amd I can't access the information later on.
Is there really no command for such an important feature?
To print the information about the last signal execute
p $_siginfo
If you know what the core file name is, you can issue the target core command which respecifies the target core file:
(gdb) target core core.8577
[New LWP 8577]
Core was generated by `./fault'.
Program terminated with signal 11, Segmentation fault.
#0 0x080483d5 in main () at fault.c:10
10 *ptr = '\123';
(gdb)
As for the implied question, what is the info last signal command?, I don't know. There does not seem to be one.
The core file's name can be obtained from the command info target:
(gdb) info target
Symbols from "/home/wally/.bin/fault".
Local core dump file:
`/home/wally/.bin/core.8577', file type elf32-i386.
0x00da1000 - 0x00da2000 is load1
0x08048000 - 0x08049000 is load2
...
0xbfe8d000 - 0xbfeaf000 is load14
Local exec file:
`/home/wally/.bin/fault', file type elf32-i386.
Entry point: 0x8048300
0x08048134 - 0x08048147 is .interp
0x08048148 - 0x08048168 is .note.ABI-tag
0x08048168 - 0x0804818c is .note.gnu.build-id
0x0804818c - 0x080481ac is .gnu.hash
0x080481ac - 0x080481fc is .dynsym
0x080481fc - 0x08048246 is .dynstr
...

Resources