Segfaults in new libstdc++ on a legacy system

Segfaults in new libstdc++ on a legacy system - linux

I'm trying to retrofit a current (GCC >= 4.6) toolchain onto a legacy embedded ARM/Linux system based on glibc 2.3.6. I have successfully built the toolchain, but now my test programs are segfaulting in libstdc++, for example:
int main()
{
int* foo = new int[100];
delete [] foo;
return 0;
}
... segfaults in static initialization of libstdc++:
#0 0x40082778 in (anonymous namespace)::__future_category_instance ()
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:64
#1 0x40082bb0 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1)
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:103
#2 _GLOBAL__sub_I_future.cc(void) () at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:109
#3 0x400e92b8 in __do_global_ctors_aux () from /path/to/symbols/libstdc++.so.6
#4 0x400627a0 in _init () from /path/to/symbols/libstdc++.so.6
#5 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
#6 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I have several more examples, but the crash sites all look similar to this:
Dump of assembler code for function (anonymous namespace)::__future_category_instance():
0x40082764 <+0>: ldr r3, [pc, #264] ; 0x40082874 <(anonymous namespace)::__future_category_instance()+272>
0x40082768 <+4>: push {r11, lr}
0x4008276c <+8>: add r11, sp, #4
0x40082770 <+12>: sub sp, sp, #64 ; 0x40
0x40082774 <+16>: mov r1, #0
=> 0x40082778 <+20>: ldr r3, [r1, r3]
I interpret this as the code trying to read from base address 0 (r1 = 0, r3 in this case was 3736), which might hint at a relocation problem?
This particular crash occurs when I build with either -static, -static-libgcc -static-libstdc++ or force loading of the libgcc_s.so.1 and libstdc++.so.6 from my toolchain via LD_LIBRARY_PATH.
I'm pretty much stuck here and would appreciate any clues as to what might be wrong with my toolchain, and whether this should work at all.

So I have now tracked this down to a change in GCC 4.6.0 that seems to have broken the code generation for the obsolete ABI I'm forced to use here (APCS).
With that change reversed, my test code now runs successfully.

My guess is that it's either a broken build, or it's trying to load a library from your old system.
You can check the second option by running with strace to see what library files it opens:
strace your-program
This will work fine for a statically linked binary, but is more tricky if you want to set LD_LIBRARY_PATH because that will most likely break the strace binary. In that case try it like this:
strace /path/to/ld-linux.so --library-path /path/to/libraries your-program
You'll need to figure out what ld-linux.so is called on your system.

Related

GDB locate TLS variable from core

In debugging a core file, I can find the pthread_specific data but I have not found a way to access __thread data such as errno.
The access is via %fs:0x0 register and I can disassemble __errno_location to find its relative address from %fs:0x0 but I don't see a way to resolve it on a per thread basis as it seems like the base %fs value is not available to GDB.
Example:
Dump of assembler code for function __errno_location:
0x00007fefa83911f0 <+0>: push %rbp
0x00007fefa83911f1 <+1>: mov 0x206d90(%rip),%rax # 0x7fefa8597f88
0x00007fefa83911f8 <+8>: add %fs:0x0,%rax
0x00007fefa8391201 <+17>: mov %rsp,%rbp
0x00007fefa8391204 <+20>: pop %rbp
0x00007fefa8391205 <+21>: retq
End of assembler dump.
Is there a way to find out what the value of %fs:0x0 would be for the current thread?
The info registers command shows only this for the segment registers:
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
libc version: 2.17
libc.so.6 debug not currently installed
GDB version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1

Find the crashing assembly instruction from core dump on Linux

If I load the crashing program and the core dump into gdb, it shows me a stack trace and crash point as below.
Core was generated by `./cut --output-d=: -b1,1234567890- /dev/fd/63'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 is_printable_field (i=1234567890) at src/cut.c:266
266 return (printable_field[n] >> (i % CHAR_BIT)) & 1;
(gdb) bt
#0 is_printable_field (i=1234567890) at src/cut.c:266
#1 set_fields (fieldstr=0x7ffccb0561c4 "") at src/cut.c:533
#2 main (argc=4, argv=0x7ffccb055cf8) at src/cut.c:865
Is there any means to know the exact assembly instruction that caused the segfault?

One possibility is to set:
(gdb)layout asm
When GDB stops the corresponding assembly line is pointed.
Example:
│0x7ffff7aa441d <strtok+45> je 0x7ffff7aa44d6 <strtok+230> │
│0x7ffff7aa4423 <strtok+51> mov %rsi,%rax │
>│0x7ffff7aa4426 <strtok+54> mov (%rax),%cl │
│0x7ffff7aa4428 <strtok+56> test %cl,%cl │
│0x7ffff7aa442a <strtok+58> je 0x7ffff7aa4454 <strtok+100>
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa4426 in strtok () from /lib64/libc.so.6
(gdb)

You could use the disassemble GDB command. Also perhaps use x/i on $rip (the program counter on x86-64)
However, in your case, assuming the code is in C (not C++ with some operator []), the only possible culprits are the printable_field pointer, or the n index.
Consider also using valgrind and/or compiling (in addition of -g -Wall options to a recent GCC compiler) with -fsanitize=... options, notably -fsanitize=address or -fsanitize=undefined...

GHC RTS runtime errors when using hs_init with profiling of shared cabal library

I have a large C project which must be compiled with gcc. So I link the main executable with a file like this:
#include <HsFFI.h>
static void my_enter(void) __attribute__((constructor));
static void my_enter(void) {
static char *argv[] = { "Pointer.exe", 0 };
//static char *argv[] = { "Pointer.exe", "+RTS", "-N", "-p", "-s", "-h", "-i0.1", "-RTS", 0 };
static char **argv_ = argv;
static int argc = 1; // 8 for profiling
hs_init(&argc, &argv_);
//hs_init_with_rtsopts(&argc, &argv_);
}
static void my_exit(void) __attribute__((destructor));
static void my_exit(void) { hs_exit(); }
which works as expected - the GHC runtime system gets initialized and I'm able to use the FFI to call Haskell code from C.
I then attempt to enable profiling (mainly for stack traces with Debug.Trace) and code coverage (HPC) using the "+RTS", "-N", "-p", "-s", "-h", "-i0.1", "-RTS" flags on the line commented out above. However I get error messages about threading and profiling during initialization:
Pointer.exe: the flag -N requires the program to be built with -threaded
Pointer.exe: the flag -p requires the program to be built with -prof
Pointer.exe: Most RTS options are disabled. Use hs_init_with_rtsopts() to enable them.
Pointer.exe: newBoundTask: RTS is not initialised; call hs_init() first
I configured the cabal package with:
"--enable-library-profiling"
"--enable-executable-profiling"
"--enable-shared"
"--enable-tests"
"--enable-coverage"
which has properly given me stack traces and code coverage when running executables compiled as part of the cabal project.
If I try to use hs_init_with_rtsopts as recommended by the error message, I get a SIGSEGV during the GHC rts initialization:
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a2d0ca in strlen () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff6a2d0ca in strlen () from /usr/lib/libc.so.6
#1 0x00007ffff798c5f6 in copyArg (
arg=0x657372615062696c <error: Cannot access memory at address 0x657372615062696c>) at rts/RtsFlags.c:1684
#2 0x00007ffff798c679 in copyArgv (argc=8, argv=0x555555554cee) at rts/RtsFlags.c:1696
#3 0x00007ffff798dbe2 in setFullProgArgv (argc=<optimized out>, argv=<optimized out>) at rts/RtsFlags.c:1780
#4 0x00007ffff798e773 in hs_init_ghc (argc=0x555555756090 <argc>, argv=0x5555557560a0 <argv>, rts_config=...)
at rts/RtsStartup.c:162
#5 0x00007ffff798e7cc in hs_init_with_rtsopts (argc=<optimized out>, argv=<optimized out>)
at rts/RtsStartup.c:121
#6 0x0000555555554c7d in __libc_csu_init ()
#7 0x00007ffff69cc59f in __libc_start_main () from /usr/lib/libc.so.6
#8 0x0000555555554b29 in _start ()
So how can I enable runtime profiling from a program compiled with gcc?

So the segfault was due to a typo which was not included in the question. Sneaky!
To enable threading and profiling and so on you must link your final program against the appropriate RTS flavor. This is one effect of GHC's -prof flag, and the only effect of its -threaded flag, and various other flags like -debug.
The RTS flavors are in different libraries with names of the form
libHSrts.a libHSrts-ghc7.8.4.so (vanilla)
libHSrts_debug.a libHSrts_debug-ghc7.8.4.so (debug)
libHSrts_thr.a libHSrts_thr-ghc7.8.4.so (threaded)
libHSrts_p.a - (profiling)
libHSrts_thr_p.a - (threaded+profiling)
libHSrts_l.a libHSrts_l-ghc7.8.4.so (eventlog)
...
On the left are static libraries; on the right are dynamic libraries, whose library names include the GHC version to make it easier for the runtime dynamic loader to find the correct version if you have multiple versions of GHC installed. You can see the full list for your GHC installation under "RTS ways" in ghc --info.
There are no dynamic profiling libraries installed by default but I think there is no fundamental problem with having them and you can configure the GHC build system to build them. (They are not especially useful now because ghci does not support profiling, though that is hopefully changing very soon.)

dlopen failed: cannot locate symbol "signal"

I am developing an Android app using NDK.
I have built OpenSSL as static libraries, libcrypto.a and libssl.a, which I linked with my custom C code.
When I try to load the library at runtime I get:
dlopen failed: cannot locate symbol "signal"...
Any idea how to fix this?
Thanks!
Update:
This comes from libcrypto:
libcrypto.a:
00000000 *UND* 00000000 signal
In my .so I see:
libtest.so:
NEEDED libc.so
...
00040240 <signal#plt>:
40240: e28fc601 add ip, pc, #1048576 ; 0x100000
40244: e28cca80 add ip, ip, #128, 20 ; 0x80000
40248: e5bcfd64 ldr pc, [ip, #3428]! ; 0xd64
So why is it complaining about "signal"?

linux gcc linked executable missing static definition of stat64

A linux stat64 call is supposed to end up calling xstat64 with a static version of stat64 generated that passes a version along with the call.
We are seeing a condition where a C linked (gcc) version of code that calls stat64, when linked against an older version of a (C++ linked) shared library (libdb2.so.1, that uses stat64, but isn't supposed to provide it), is not ending up with a the "proper" static version of this stat64 call. The C++ linked app has what we expect:
00000000004007c8 <__xstat64#plt>:
4007c8: jmpq *1051250(%rip) # 501240 <_GLOBAL_OFFSET_TABLE_+0x20>
4007ce: pushq $0x1
4007d3: jmpq 4007a8 <_init+0x18>
0000000000400ac0 <stat64>:
400ac0: push %rbp
400ac1: mov %rsp,%rbp
400ac4: sub $0x10,%rsp
400ac8: mov %rdi,0xfffffffffffffff8(%rbp)
400acc: mov %rsi,0xfffffffffffffff0(%rbp)
400ad0: mov 0xfffffffffffffff0(%rbp),%rdx
400ad4: mov 0xfffffffffffffff8(%rbp),%rsi
400ad8: mov $0x1,%edi
400add: callq 4007c8 <__xstat64#plt>
400ae2: leaveq
400ae3: retq
whereas the gcc linked code (that also links to our libdb2 shared lib) ends up with a global reference to stat64 instead of the "static" version that it is suppose to have:
0000000000400618 <stat64#plt>:
400618: jmpq *1050146(%rip) # 500c40 <_GLOBAL_OFFSET_TABLE_+0x20>
40061e: pushq $0x1
400623: jmpq 4005f8 <_init+0x18>
The same code, also when linked with gcc, when not linked to our libdb2 library, ends up with the expected "static" stat64 function:
0000000000400550 <__xstat64#plt>:
400550: jmpq *1050170(%rip) # 500b90 <_GLOBAL_OFFSET_TABLE_+0x20>
400556: pushq $0x1
40055b: jmpq 400530 <_init+0x18>
00000000004007b0 <stat64>:
4007b0: mov %rsi,%rdx
4007b3: mov %rdi,%rsi
4007b6: mov $0x1,%edi
4007bb: jmpq 400550 <__xstat64#plt>
EDIT: more info obtained from a linker map (-Wl,--print-map)
When the gcc linked exe doesn't link to our (libdb2) shared lib, we see that it gets it's stat64 from libc_nonshared.a:
/usr/lib64/libc_nonshared.a(stat64.oS)
/home/hotellnx94/peeterj/tmp/cc2f7ETx.o (stat64)
...
.plt 0x0000000000400530 0x70
*(.plt)
.plt 0x0000000000400530 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400540 __libc_start_main##GLIBC_2.2.5
0x0000000000400550 __xstat64##GLIBC_2.2.5
0x0000000000400560 printf##GLIBC_2.2.5
0x0000000000400570 memset##GLIBC_2.2.5
0x0000000000400580 strerror##GLIBC_2.2.5
0x0000000000400590 __errno_location##GLIBC_2.2.5
.text 0x00000000004007b0 0x10 /usr/lib64/libc_nonshared.a(stat64.oS)
0x00000000004007b0 stat64
whereas, once we link to our shared lib (libdb2), the symbols are picked up from crt1.o instead of lib_nonshared.a:
.plt 0x00000000004005f8 0x70
*(.plt)
.plt 0x00000000004005f8 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400608 __libc_start_main##GLIBC_2.2.5
0x0000000000400618 stat64
0x0000000000400628 printf##GLIBC_2.2.5
0x0000000000400638 memset##GLIBC_2.2.5
0x0000000000400648 strerror##GLIBC_2.2.5
0x0000000000400658 __errno_location##GLIBC_2.2.5
What could we be doing (or would have been doing since we don't see this in new versions of our library), that would cause lib_nonshared.a to no longer be shared once the consumer links to our library?

It turned out that this was due to an intel compiler bug that was fixed. When we started using the compiler version that had the fix we were then exposed to a binary compatibility issue since the new version of the intel compiler (producing the shared lib in question), properly didn't export this stat64 symbol.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Segfaults in new libstdc++ on a legacy system - linux

So I have now tracked this down to a change in GCC 4.6.0 that seems to have broken the code generation for the obsolete ABI I'm forced to use here (APCS). With that change reversed, my test code now runs successfully.

Related

GDB locate TLS variable from core

Find the crashing assembly instruction from core dump on Linux

GHC RTS runtime errors when using hs_init with profiling of shared cabal library

dlopen failed: cannot locate symbol "signal"

linux gcc linked executable missing static definition of stat64

Categories

Resources