GDB locate TLS variable from core

GDB locate TLS variable from core - linux

In debugging a core file, I can find the pthread_specific data but I have not found a way to access __thread data such as errno.
The access is via %fs:0x0 register and I can disassemble __errno_location to find its relative address from %fs:0x0 but I don't see a way to resolve it on a per thread basis as it seems like the base %fs value is not available to GDB.
Example:
Dump of assembler code for function __errno_location:
0x00007fefa83911f0 <+0>: push %rbp
0x00007fefa83911f1 <+1>: mov 0x206d90(%rip),%rax # 0x7fefa8597f88
0x00007fefa83911f8 <+8>: add %fs:0x0,%rax
0x00007fefa8391201 <+17>: mov %rsp,%rbp
0x00007fefa8391204 <+20>: pop %rbp
0x00007fefa8391205 <+21>: retq
End of assembler dump.
Is there a way to find out what the value of %fs:0x0 would be for the current thread?
The info registers command shows only this for the segment registers:
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
libc version: 2.17
libc.so.6 debug not currently installed
GDB version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1

Related

Find the crashing assembly instruction from core dump on Linux

If I load the crashing program and the core dump into gdb, it shows me a stack trace and crash point as below.
Core was generated by `./cut --output-d=: -b1,1234567890- /dev/fd/63'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 is_printable_field (i=1234567890) at src/cut.c:266
266 return (printable_field[n] >> (i % CHAR_BIT)) & 1;
(gdb) bt
#0 is_printable_field (i=1234567890) at src/cut.c:266
#1 set_fields (fieldstr=0x7ffccb0561c4 "") at src/cut.c:533
#2 main (argc=4, argv=0x7ffccb055cf8) at src/cut.c:865
Is there any means to know the exact assembly instruction that caused the segfault?

One possibility is to set:
(gdb)layout asm
When GDB stops the corresponding assembly line is pointed.
Example:
│0x7ffff7aa441d <strtok+45> je 0x7ffff7aa44d6 <strtok+230> │
│0x7ffff7aa4423 <strtok+51> mov %rsi,%rax │
>│0x7ffff7aa4426 <strtok+54> mov (%rax),%cl │
│0x7ffff7aa4428 <strtok+56> test %cl,%cl │
│0x7ffff7aa442a <strtok+58> je 0x7ffff7aa4454 <strtok+100>
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa4426 in strtok () from /lib64/libc.so.6
(gdb)

You could use the disassemble GDB command. Also perhaps use x/i on $rip (the program counter on x86-64)
However, in your case, assuming the code is in C (not C++ with some operator []), the only possible culprits are the printable_field pointer, or the n index.
Consider also using valgrind and/or compiling (in addition of -g -Wall options to a recent GCC compiler) with -fsanitize=... options, notably -fsanitize=address or -fsanitize=undefined...

Breakpoints on ARM

I recently read that software breakpoints for Linux on ARM are implemented using UND instruction in ARM mode and the BKPT instruction in Thumb mode. Why are there 2 separate instructions used to raise software interrupts?

Thumb compatible code:
0000e150 <pthread_mutexattr_setpshared>:
e150: b573 push {r0, r1, r4, r5, r6, lr}
e152: 4605 mov r5, r0
e154: 460c mov r4, r1
e156: 4616 mov r6, r2
e158: f7fd fa70 bl b63c <pthread_mutexattr-0xba>
e15c: 4629 mov r1, r5
Pure arm:
0000d564 <pthread_mutex_init>:
d564: e2503000 subs r3, r0, #0
d568: 03a00016 moveq r0, #22
d56c: 012fff1e bxeq lr
arm bkpt 0xe7f001f0
thumb bkpt 0xde01
If try to use always arm bkpt and rewrite first instruction in function:
pthread_mutex_init all will be fine but if rewrite first instruction in pthread_mutexattr_setpshared second instruction will be rewrote too.
If always try to use thumb bkpt and rewrite first instruction in pthread_mutex_init resulted instruction will be invalid.

dlopen failed: cannot locate symbol "signal"

I am developing an Android app using NDK.
I have built OpenSSL as static libraries, libcrypto.a and libssl.a, which I linked with my custom C code.
When I try to load the library at runtime I get:
dlopen failed: cannot locate symbol "signal"...
Any idea how to fix this?
Thanks!
Update:
This comes from libcrypto:
libcrypto.a:
00000000 *UND* 00000000 signal
In my .so I see:
libtest.so:
NEEDED libc.so
...
00040240 <signal#plt>:
40240: e28fc601 add ip, pc, #1048576 ; 0x100000
40244: e28cca80 add ip, ip, #128, 20 ; 0x80000
40248: e5bcfd64 ldr pc, [ip, #3428]! ; 0xd64
So why is it complaining about "signal"?

Segfaults in new libstdc++ on a legacy system

I'm trying to retrofit a current (GCC >= 4.6) toolchain onto a legacy embedded ARM/Linux system based on glibc 2.3.6. I have successfully built the toolchain, but now my test programs are segfaulting in libstdc++, for example:
int main()
{
int* foo = new int[100];
delete [] foo;
return 0;
}
... segfaults in static initialization of libstdc++:
#0 0x40082778 in (anonymous namespace)::__future_category_instance ()
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:64
#1 0x40082bb0 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1)
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:103
#2 _GLOBAL__sub_I_future.cc(void) () at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:109
#3 0x400e92b8 in __do_global_ctors_aux () from /path/to/symbols/libstdc++.so.6
#4 0x400627a0 in _init () from /path/to/symbols/libstdc++.so.6
#5 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
#6 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I have several more examples, but the crash sites all look similar to this:
Dump of assembler code for function (anonymous namespace)::__future_category_instance():
0x40082764 <+0>: ldr r3, [pc, #264] ; 0x40082874 <(anonymous namespace)::__future_category_instance()+272>
0x40082768 <+4>: push {r11, lr}
0x4008276c <+8>: add r11, sp, #4
0x40082770 <+12>: sub sp, sp, #64 ; 0x40
0x40082774 <+16>: mov r1, #0
=> 0x40082778 <+20>: ldr r3, [r1, r3]
I interpret this as the code trying to read from base address 0 (r1 = 0, r3 in this case was 3736), which might hint at a relocation problem?
This particular crash occurs when I build with either -static, -static-libgcc -static-libstdc++ or force loading of the libgcc_s.so.1 and libstdc++.so.6 from my toolchain via LD_LIBRARY_PATH.
I'm pretty much stuck here and would appreciate any clues as to what might be wrong with my toolchain, and whether this should work at all.

So I have now tracked this down to a change in GCC 4.6.0 that seems to have broken the code generation for the obsolete ABI I'm forced to use here (APCS).
With that change reversed, my test code now runs successfully.

My guess is that it's either a broken build, or it's trying to load a library from your old system.
You can check the second option by running with strace to see what library files it opens:
strace your-program
This will work fine for a statically linked binary, but is more tricky if you want to set LD_LIBRARY_PATH because that will most likely break the strace binary. In that case try it like this:
strace /path/to/ld-linux.so --library-path /path/to/libraries your-program
You'll need to figure out what ld-linux.so is called on your system.

linux gcc linked executable missing static definition of stat64

A linux stat64 call is supposed to end up calling xstat64 with a static version of stat64 generated that passes a version along with the call.
We are seeing a condition where a C linked (gcc) version of code that calls stat64, when linked against an older version of a (C++ linked) shared library (libdb2.so.1, that uses stat64, but isn't supposed to provide it), is not ending up with a the "proper" static version of this stat64 call. The C++ linked app has what we expect:
00000000004007c8 <__xstat64#plt>:
4007c8: jmpq *1051250(%rip) # 501240 <_GLOBAL_OFFSET_TABLE_+0x20>
4007ce: pushq $0x1
4007d3: jmpq 4007a8 <_init+0x18>
0000000000400ac0 <stat64>:
400ac0: push %rbp
400ac1: mov %rsp,%rbp
400ac4: sub $0x10,%rsp
400ac8: mov %rdi,0xfffffffffffffff8(%rbp)
400acc: mov %rsi,0xfffffffffffffff0(%rbp)
400ad0: mov 0xfffffffffffffff0(%rbp),%rdx
400ad4: mov 0xfffffffffffffff8(%rbp),%rsi
400ad8: mov $0x1,%edi
400add: callq 4007c8 <__xstat64#plt>
400ae2: leaveq
400ae3: retq
whereas the gcc linked code (that also links to our libdb2 shared lib) ends up with a global reference to stat64 instead of the "static" version that it is suppose to have:
0000000000400618 <stat64#plt>:
400618: jmpq *1050146(%rip) # 500c40 <_GLOBAL_OFFSET_TABLE_+0x20>
40061e: pushq $0x1
400623: jmpq 4005f8 <_init+0x18>
The same code, also when linked with gcc, when not linked to our libdb2 library, ends up with the expected "static" stat64 function:
0000000000400550 <__xstat64#plt>:
400550: jmpq *1050170(%rip) # 500b90 <_GLOBAL_OFFSET_TABLE_+0x20>
400556: pushq $0x1
40055b: jmpq 400530 <_init+0x18>
00000000004007b0 <stat64>:
4007b0: mov %rsi,%rdx
4007b3: mov %rdi,%rsi
4007b6: mov $0x1,%edi
4007bb: jmpq 400550 <__xstat64#plt>
EDIT: more info obtained from a linker map (-Wl,--print-map)
When the gcc linked exe doesn't link to our (libdb2) shared lib, we see that it gets it's stat64 from libc_nonshared.a:
/usr/lib64/libc_nonshared.a(stat64.oS)
/home/hotellnx94/peeterj/tmp/cc2f7ETx.o (stat64)
...
.plt 0x0000000000400530 0x70
*(.plt)
.plt 0x0000000000400530 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400540 __libc_start_main##GLIBC_2.2.5
0x0000000000400550 __xstat64##GLIBC_2.2.5
0x0000000000400560 printf##GLIBC_2.2.5
0x0000000000400570 memset##GLIBC_2.2.5
0x0000000000400580 strerror##GLIBC_2.2.5
0x0000000000400590 __errno_location##GLIBC_2.2.5
.text 0x00000000004007b0 0x10 /usr/lib64/libc_nonshared.a(stat64.oS)
0x00000000004007b0 stat64
whereas, once we link to our shared lib (libdb2), the symbols are picked up from crt1.o instead of lib_nonshared.a:
.plt 0x00000000004005f8 0x70
*(.plt)
.plt 0x00000000004005f8 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400608 __libc_start_main##GLIBC_2.2.5
0x0000000000400618 stat64
0x0000000000400628 printf##GLIBC_2.2.5
0x0000000000400638 memset##GLIBC_2.2.5
0x0000000000400648 strerror##GLIBC_2.2.5
0x0000000000400658 __errno_location##GLIBC_2.2.5
What could we be doing (or would have been doing since we don't see this in new versions of our library), that would cause lib_nonshared.a to no longer be shared once the consumer links to our library?

It turned out that this was due to an intel compiler bug that was fixed. When we started using the compiler version that had the fix we were then exposed to a binary compatibility issue since the new version of the intel compiler (producing the shared lib in question), properly didn't export this stat64 symbol.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

GDB locate TLS variable from core - linux

Related

Find the crashing assembly instruction from core dump on Linux

Breakpoints on ARM

dlopen failed: cannot locate symbol "signal"

Segfaults in new libstdc++ on a legacy system

linux gcc linked executable missing static definition of stat64

Categories

Resources