I'm testing a library from a third party and it crashes. When I wanted to see the reason of the crash my gdb told me that there were no debugging symbols available
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb53ffb70 (LWP 3722)]
0x00172a89 in tsip_transac_send () from /usr/local/lib/libtinySIP.so.0
I issued bt full on the gdb console and I get a series of lines like below
#0 0x00172a89 in tsip_transac_send () from /usr/local/lib/libtinySIP.so.0
No symbol table info available
I recompiled the library after checking the CFLAGS in the Makefile. The values were fine all the time but I recompiled it anyway
CFLAGS = -g -O2
I ran the test again with the same luck, no debug symbols for the shared library.
What am I missing here?
I'm using Centos 6.0, and I installed the library in Opensuse before but I didn't have this problem. It probably has something to do with my Centos installation.
In case anyone cares, I'm testing Doubango's library for webrtc2sip.
EDIT:
Debug symbols are being loaded properly
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x002fb830 0x0031339f Yes (*) /lib/ld-linux.so.2
0x00115040 0x00120028 Yes /usr/local/lib/libtinySAK.so.0
0x00133f30 0x0018b378 Yes /usr/local/lib/libtinySIP.so.0
0x001d8ac0 0x00201b98 Yes /usr/local/lib/libtinyNET.so.0
0x00215dd0 0x0023f638 Yes /usr/local/lib/libtinyDAV.so.0
0x0024eec0 0x00261728 Yes /usr/local/lib/libtinyMEDIA.so.0
0x0026bb00 0x002774d8 Yes /usr/local/lib/libtinyHTTP.so.0
0x002ae340 0x002b0358 Yes /usr/local/lib/libtinyXCAP.so.0
0x002b3990 0x002b8d18 Yes /usr/local/lib/libtinySMS.so.0
0x002be630 0x002c9388 Yes /usr/local/lib/libtinyMSRP.so.0
0x002de240 0x002e8e18 Yes /usr/local/lib/libtinySDP.so.0
0x00323060 0x00345778 Yes /usr/local/lib/libtinyRTP.so.0
Check file /usr/local/lib/libtinySIP.so.0. If it says stripped, check your lib's build process. It may invoke strip manually to strip the debugging symbols.
Well, it looks like it was a bug in gdb.
Centos 6 has the version 7.2-56.el6 precompiled in its repository. I updated (by compiling the sources) to the latest version of gdb and now it is working.
Thanks to all for your help.
It appears you have conflicting compilation flags. I'm not an expert but it looks like your -g flag (which is to generate debug symbols) is being mixed with -O2 flag (which is an optimization parameter).
Try using just -g and post the results.
Related
Background
I have read countless GitHub project threads and everything I can find on StackOverflow about this problem, though so far no luck. I have a Mac 10.14 box running with the stock CommandLineTools and/or Xcode. I'm trying to "port" a Python wrapper library I have written around an older C and C++ library using CTypes in Python3. It works well already on Ubuntu Linux. However, there is no end to the problems I have been coming across since moving to a Mac platform. There just doesn't seem to be an easy answer to fixing what I'm trying to do on the broken Mac OS platform right now -- at least for the uninitiated Linux person like myself.
I have one question right off the bat before I describe how I'm compiling the dylib I'm trying to load up with CTypes: Do I now need to sign my dylib somehow before I can use it on Mac 10.14? Otherwise, I guess my question boils down to how the $%^# (and that is truncated speak for what I do mean right now) can I deal with shared / dynamic libraries on Mac with a Python C extension interface?
My preference is to not even touch Xcode and just use the stock Mac tools that come with the system out of the box. My solution must work on the command line without defering to some auto configuration magic that Xcode will give you in GUI form. Really, this is all fairly painless for what it is under Linux.
Compilation and linking
The scenario is actually more complicated than I can describe. I will just sketch what seems to me to be the relevant parts of the solution, and then let you all experts who have gotten this working before tell me the obvious missteps in between.
I'm compiling the older C/C++ source code library as a static archive first using the following
gcc (read clang) options on Mac (some of them get ignored):
-O0 -march=native -force_cpusubtype_ALL -fPIC -I../include -fPIC -m64 \
-fvisibility=default -Wno-error -lc++ -lc++abi
Then I'm compiling and linking with a combination of
-Wl,-all_load $(LIBGOMPSTATIC).a $(LIBGMPSTATIC) -Wl,-noall_load \
-ldl -lpthread -lc++ -lc++abi
and
-dynamiclib -install_name $(MODULENAME) \
-current_version 1.0.0 -compatibility_version 1.0
to generate the dylib output.
For comparison, on Linux, the analogs to these flags that work are approximately
-Wl,-export-dynamic -Wl,--no-undefined -shared -fPIC \
-fvisibility=default -Wl,-Bsymbolic
-Wl,-Bstatic -Wl,--whole-archive $(LIBGOMPSTATIC).a $(LIBGMPSTATIC) -Wl,--no-whole-archive \
-Wl,-Bdynamic -Wl,--no-as-needed -ldl -lpthread
and
-Wl,-soname,$(MODULENAME)
The dylib output
The above procedure gives me a dylib file that I can scan with nm to see the symbols I am trying to import with CTypes. This is a good start. When I try to run the test python script to test my CTypes wrapper library, I get a SEGFAULT immediately. Since gdb is apparently useless on Mac these days (sorry), I used the stock llvm to load up a brew-installed python3 with extra debugging symbols:
lldb /usr/local/Cellar/python-dbg\#3.7/3.7.6_13/bin/python3
(lldb) run myscript.py
Process 75435 launched: '/usr/local/Cellar/python-dbg#3.7/3.7.6_13/bin/python3' (x86_64)
Process 75435 stopped
* thread #2, stop reason = exec
frame #0: 0x0000000100005000 dyld`_dyld_start
dyld`_dyld_start:
-> 0x100005000 <+0>: popq %rdi
0x100005001 <+1>: pushq $0x0
0x100005003 <+3>: movq %rsp, %rbp
0x100005006 <+6>: andq $-0x10, %rsp
Target 0: (Python) stopped.
(lldb) bt
* thread #2, stop reason = exec
* frame #0: 0x0000000100005000 dyld`_dyld_start
(lldb) c
... redacted path information ...
File "/usr/local/Cellar/python-dbg#3.7/3.7.6_13/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary
return self._dlltype(name)
File "/usr/local/Cellar/python-dbg#3.7/3.7.6_13/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(GTFoldPython.dylib, 6): image not found
Process 75435 exited with status = 1 (0x00000001)
I do have each of the environment variables PYTHONAPPSDIR=/usr/local/Cellar/python-dbg#3.7/3.7.6_13, PYTHONPATH, LD_LIBRARY_PATH, DYLD_LIBRARY_PATH set to correct paths.
So the question is what do I try next to get this working? Many, many hours of thanks in advance!
In my case, the issue turned out to be two things.
The first is that I was running my python script with a different version of python than the C extensions were linked with. For example, the following is the output of my python3-config --ldflags command:
-L/usr/local/Cellar/python-dbg#3.7/3.7.6_13/Frameworks/Python.framework/Versions/3.7/lib/python3.7/config-3.7dm-darwin -lpython3.7dm -ldl -framework CoreFoundation
So running it with /usr/local/Cellar/python-dbg#3.7/3.7.6_13/bin/python3 caused errors for me. This can be resolved by running the script with /usr/local/Cellar/python-dbg#3.7/3.7.6_13/bin/python3.7dm. Not an obvious fix given that brew installs each with an only slightly modified tap formula.
Second, in my C code, I am frequently writing to an extern'ed character buffer that lives on the stack. When this happens, the default clang stack protection mechanisms throw a SIGABRT at the script. To avoid this, you can recompile by passing the following flags to both the compiler and linker (might be more disabling than is actually needed):
-fno-stack-check -fno-stack-protector -no-pie -D_FORTIFY_SOURCE=0
With these two fixes in place my script runs. And still crashes for other reasons related to multithreading with Python in C. However, this is to be expected, but still has yet to show up in my testing on Linux.
Thanks to #l'L'l for helping me to work through this.
Ok, so I want to link against a lower version of libc / glibc, for compatibility. I noticed this answer about how to do this, on a case-by-case basis:
How can I link to a specific glibc version?
https://stackoverflow.com/a/2858996/920545
However, when I tried to apply this myself, I ran into problems, because I can't figure out what lower-version-number I should use to link against. Using the example in the answer, if I use "nm" to inspect the symbols provided by my /lib/libc.so.6 (which, in my case, is a link to libc-2.17.so), I see that it seems to provide versions 2.0 and 2.3 of realpath:
> nm /lib/libc.so.6 | grep realpath#
4878d610 T realpath##GLIBC_2.3
48885c20 T realpath#GLIBC_2.0
However, if I try to link against realpath#GLIBC_2.0:
__asm__(".symver realpath,realpath#GLIBC_2.0");
...i get an error:
> gcc -o test_glibc test_glibc.c
/tmp/ccMfnLmS.o: In function `main':
test_glibc.c:(.text+0x25): undefined reference to `realpath#GLIBC_2.0'
collect2: error: ld returned 1 exit status
However, using realpath#GLIBC_2.3 works... and the code from the example, realpath#GLIBC_2.2.5 works - even though, according to nm, no such symbol exists. (FYI, if I compile without any __asm__ instruction, then inspect with nm, I see that it linked against realpath#GLIBC_2.3, which makes sense; and I confirmed that linking to realpath#GLIBC_2.2.5 works.)
So, my question is, how the heck to I know which version of the various functions I can link against? Or even which are available? Are there some other kwargs I should be feeding to nm? Am I inspecting the wrong library?
Thanks!
It seems to me that you have mixed up your libraries and binaries a bit...
/lib/libc.so.6 on most Linux distributions is a 32-bit shared object and should contain the *#GLIBC_2.0 symbols. If you are on an x86_64 platform, though, I would expect GCC to produce an 64-bit binary by default. 64-bit binaries are generally linked against /lib64/libc.so.6, which would not contain compatibility symbols for an old glibc version like 2.0 - the x86_64 architecture did not even exist back then...
Try compiling your *#GLIBC_2.0 program with the -m32 GCC flag to force linking against the 32-bit C library.
I have a stack trace from an application that was built and run on CentOS 5.4. The application was built without debug so there are no symbols or line numbers in the stack trace, but only addresses, like so:
/opt/app/bin/myApp [0x22ec09e]
/opt/app/bin/myApp [0x1fcdf31]
/opt/app/bin/myApp [0x22ebbcb]
...
I also have the same application, but built with debug (-g). So I am able to open this binary with gdb and find out the corresponding source files, function names and line numbers corresponding to these addresses.
My question is, having this binary built with debug on CentOS 5.4, does it matter on which OS I am using gdb to resolve the symbols? If I open it with gdb on CentOS 5.4 and use info line or list, could the result differ from when doing the same on say Fedora 16? I have done a few tests doing this on CentOS 5.4 and Fedora 16 which indicates that there is no difference. However, can I trust that this is always so or could I one day find out that there could be differences under certain circumstances?
Notes: Application was written in C++ and built with g++. Please let me know if any additional information is needed to answer this question.
does it matter on which OS I am using gdb to resolve the symbols?
No: the mapping of addresses to line numbers is fixed at binary link time. Once the binary is linked, you can perform the mapping on any OS you wish.
I also have the same application, but built with debug (-g).
Note that the mapping does change depending on optimization flags you used. This would work:
# original application build
g++ -O2 foo.cc bar.cc -o app
# same with debug symbols:
g++ -O2 -g foo.cc bar.cc -o app_g
This would not work (symbols between app and app_g2 will not match):
g++ -g foo.cc bar.cc -o app_g2
I'm trying to make custom binaries for initrd for x86 system. I took generic precompiled Debian 7 gcc (version 4.7.2-5) and compiled kernel with it. Next step was to make helloworld program instead of init script in initrd to check my development progress. Helloworld program was also compiled with that gcc. When I tried to start my custom system, kernel started with no problem, but helloworld program encountered some errors:
kernel: init[24879] general protection ip:7fd7271585e0 sp:7fff1ef55070 error:0 in init[7fd727142000+20000]
(numbers are not mine, I took similar string from google). Helloworld program:
#include <stdio.h>
int main(){
printf("Helloworld\r\n");
sleep(9999999);
return 0;
}
Compilation:
gcc -static -o init test.c
Earlier I also had stuck with same problem on ARM system (took generic compiler, compiled kernel and some binaries with it and tried to run, kernel runs, but binary - not). Solved it with complete buildroot system, and took buildroot compiler in next projects.
So my question is: what difference between gcc compiled as part of buildroot and generic precompiled gcc?
I know that buildroot compiler is made in several steps, with differenet libs and so on, is this main difference, platform independence?
I don't need a solution, I can take buildroot anytime. I want to know source of my problem, to avoid such problems in future. Thanks.
UPD: Replaced sleep with while(1); and got same situation. My kernel output:
init[1]: general protection ip: 8053682 sp: bf978294 error: 0 in init[8048000+81000]
printk: 14300820 message suppressed.
and repeating every second.
UPD2: I added vdso32-int80.so (original name, like in kernel tree), tested - no luck.
I added ld-linux.so (2 files: ld-2.13.so with symbolic link), tested - same error.
Busybox way allows to run binaries without any of this libraries, tested by me on ARM platform.
Thanks for trying to help me, any other ideas?
I want to debug pthreads on my custom linux distribution but I am missing something. My host is Ubuntu 12.04, my target is an i486 custom embedded Linux built with a crosstool-NG cross compiler toolset, the rest of the OS is made with Buildroot.
I will list the facts:
I can run multi-threaded applications on my target
Google Breakpad fails to create a crash report when I run a multi-threaded application on the target. The exact same application with the exact same build of Breakpad libraries will succeed when I run it on my host.
GDB fails to debug multithreaded applications on my target.
e.g.
$./gdb -n -ex "thread apply all backtrace" ./a.out --pid 716
dlopen failed on 'libthread_db.so.1' - /lib/libthread_db.so.1: undefined symbol: ps_lgetfpregs
GDB will not be able to debug pthreads.
GNU gdb 6.8
I don't think ps_lgetfpregs is a problem because of this.
My crosstool build created the libthread_db.so file, and I put it on the target.
My crosstool build created the gdb for my target, so it should have been linked against the same libraries that I run on the target.
If I run gdb on my host, against my test app, I get a backtrace of each running thread.
I suspect the problem with Breakpad is related to the problem with GDB, but I cannot substantiate this. The only commonality is lack of multithreaded debug.
There is some crucial difference between my host and target that stops me from being able to debug pthreads on the target.
Does anyone know what it is?
EDIT:
Denys Dmytriyenko from TI says:
Normally, GDB is not very picky and you can mix and match different
versions of gdb and gdbserver. But, unfortunately, if you need to
debug multi-threaded apps, there are some dependencies for specific
APIs...
For example, this is one of the messages you may see if you didn't
build GDB properly for the thread support:
dlopen failed on 'libthread_db.so.1' - /lib/libthread_db.so.1:
undefined symbol: ps_lgetfpregs GDB will not be able to debug
pthreads.
Note that this error is the same as the one that I get but he doesn't go in to detail about how to build GDB "properly".
and the GDB FAQ says:
(Q) GDB does not see any threads besides the one in which crash occurred;
or SIGTRAP kills my program when I set a breakpoint.
(A) This frequently
happen on Linux, especially on embedded targets. There are two common
causes:
you are using glibc, and you have stripped libpthread.so.0
mismatch between libpthread.so.0 and libthread_db.so.1
GDB itself does
not know how to decode "thread control blocks" maintained by glibc and
considered to be glibc private implementation detail. It uses
libthread_db.so.1 (part of glibc) to help it do so. Therefore,
libthread_db.so.1 and libpthread.so.0 must match in version and
compilation flags. In addition, libthread_db.so.1 requires certain
non-global symbols to be present in libpthread.so.0.
Solution: use
strip --strip-debug libpthread.so.0 instead of strip libpthread.so.0.
I tried a non-stripped libpthread.so.0 but it didn't make a difference. I will investigate any mismatch between pthread and thread_db.
This:
dlopen failed on 'libthread_db.so.1' - /lib/libthread_db.so.1: undefined symbol: ps_lgetfpregs
GDB will not be able to debug pthreads.
means that the libthread_db.so.1 library was not able to find the symbol ps_lgetfpregs in gdb.
Why?
Because I built gdb using Crosstoolg-NG with the "Build a static native gdb" option and this adds the -static option to gcc.
The native gdb is built with the -rdynamic option and this populates the .dynsym symbol table in the ELF file with all symbols, even unused ones. libread_db uses this symbol table to find ps_lgetfpregs from gdb.
But -static strips the .dynsym table from the ELF file.
At this point there are two options:
Don't build a static native gdb if you want to debug threads.
Build a static gdb and a static libthread_db (not tested)
Edit:
By the way, this does not explain why Breakpad in unable to debug multithreaded applications on my target.
Just a though... To use the gdb debugger, you need to compile your code with -g option. For instance, gcc -g -c *.c.