I'm using linux and am trying to remote debug a program.
I launch gdbserver on the target, from .xinitrc with
gdbserver localhost:9134 /root/game/game
On my local pc, which I'm running inside a virtualbox vm, I connect to the target from gdb with
target remote 192.168.1.20:9134
and it connects fine.
I can set a breakpoint at main with
b main
and then I can continue and it will break there. I can single step for a ways until it gets to the call SDL_Init(), from which it will never return back to gdb.
If I don't single step to SDL_Init but instead set a breakpoint further on in the program, the program will start up and run normally (so it gets past SDL_Init). But when it reaches the breakpoint, it freezes up on the target machine and gdb on my local machine never shows a prompt. The entire thing hangs and must be restarted. It's not completely frozen, however, as the mouse pointer still moves on the target and you can ping it, but the gdb connection no longer works. So it seems that the graphics systems somehow interferes with this since breakpoints before the graphics system init do work, but not after.
I've tried setting the remotetimeout setting to 500 seconds and it exhibits the same behaviour. When I ping the remote target from my local pc the reported time is around 0.3 to 0.4 ms. So that doesn't seem out of the ordinary, but I wouldn't rule out any other misconfigured network settings on my part.
It's on a legacy system (but hey, it still makes money) with gdbserver version 6.8-19.fc10 and gdb version 6.8-29.fc10. Upgrading versions, while a very large headache, could be possible but probably should not be necessary (any upgrades I make to my pc have to also be done to a state regulator's system, as they use the gdb setup for their testing purposes, but it's not impossible). Remote debugging was working in the past before I took over the project, and no one who worked on it before is still around to ask. The gdbserver version definitely worked, as I'm using the exact program used previously.
Update 1:
I updated the gdb version on the host machine to version 7.0.1 and it still exhibits the same behavior. I couldn't do version 8 as it needs a C++11 compiler and the legacy system is before that time.
Update 2:
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
Update 3:
I dug out a serial cable and finally got the remote debugging setup via serial. It still doesn't work, but it gives me more error messages. I get the error
gdbserver: error initializing thread_db library: version mismatch between libthread_db and libpthread
which I think makes sense since my breakpoints quit working after the graphics system is initialized which involves creating some threads. After googling that error, I've tried using set solib-absolute-prefix, set solib-search-path, and set sysroot to the root directory on the host machine of a copy of the filesystem on the target machine (on the host, that is /fw_dev/fgs/cf/initrd/expand, which contains the filesystem that the initrd is made from)
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error. I've also tried setting those variables to the lib subdirectory, which doesn't work either. I also tried just copying the local thread libraries from the host's /lib directory to the /lib on the target, but then x windows won't even start.
Update #4:
I tried launching gdb from the root of the copy of the target filesystem on the host (/fw_dev/fgs/cf/initrd/expand), and gdb still hangs on breakpoints but I no longer get the error message about libthread_db and libpthread mismatches, so back to the drawing board.
Update #5
Maybe I'm getting to where I should ask this a separate question, but I compiled gdb, then ran gbd on itself. Then used file to set it to the program on the host, set the remote target, set my breakpoints and then ran continue. When I get to the breakpoint, gdb hangs as always. But now when I press ctrl-c in gdb, I get this backtrace
#0 0x00110416 in __kernel_vsyscall ()
#1 0x00b3f39d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x08203b9a in ser_base_wait_for (scb=0x96a2930, timeout=1) at ser-base.c:206
#3 0x08203c89 in do_ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:256
#4 0x08204046 in generic_readchar (scb=0x96a2930, timeout=-1, do_readchar=0x8203c60 <do_ser_base_readchar>) at ser-base.c:326
#5 0x082040b0 in ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:391
#6 0x081ecac2 in serial_readchar (scb=0x96a2930, timeout=-1) at serial.c:376
#7 0x080c4357 in readchar (timeout=<value optimized out>) at remote.c:5922
#8 0x080c5e35 in getpkt_or_notif_sane_1 (buf=0x839f140, sizeof_buf=0x839f144, forever=1, expecting_notif=0) at remote.c:6440
#9 0x080d1e0a in getpkt_sane (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:6534
#10 remote_wait_as (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4736
#11 remote_wait (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4843
#12 0x08184d4b in target_wait (ptid=..., status=0xbffff388, options=0) at target.c:2098
#13 0x0815daf2 in wait_for_inferior (treat_exec_as_sigtrap=0) at infrun.c:2028
#14 0x0815ddd4 in proceed (addr=4294967295, siggnal=TARGET_SIGNAL_DEFAULT, step=0) at infrun.c:1652
#15 0x08153729 in continue_1 (all_threads=0) at infcmd.c:668
#16 0x08153ea2 in continue_command (args=0x0, from_tty=0) at infcmd.c:760
#17 0x0808e9e8 in execute_command (p=0x83b89a1 "", from_tty=0) at top.c:453
#18 0x0816b028 in command_handler (command=0x83b89a0 "c") at event-top.c:511
#19 0x0816bd5a in command_line_handler (rl=0x8ce83e8 "\340&\266\b\340\230\321\b") at event-top.c:736
#20 0x0822d5a5 in rl_callback_read_char () at callback.c:205
#21 0x0816b17b in rl_callback_read_char_wrapper (client_data=0x0) at event-top.c:178
#22 0x0816ac54 in handle_file_event (data=...) at event-loop.c:812
#23 0x08169e6b in process_event () at event-loop.c:394
#24 0x0816aba4 in gdb_do_one_event (data=0x0) at event-loop.c:459
#25 0x0816500b in catch_errors (func=0x816a950 <gdb_do_one_event>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#26 0x080f072a in tui_command_loop (data=0x0) at ./tui/tui-interp.c:153
#27 0x08165684 in current_interp_command_loop () at interps.c:291
#28 0x0808653b in captured_command_loop (data=0x0) at ./main.c:226
#29 0x0816500b in catch_errors (func=0x8086530 <captured_command_loop>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#30 0x08085ecc in captured_main (data=0xbffff7a4) at ./main.c:902
#31 0x0816500b in catch_errors (func=0x80853d0 <captured_main>, func_args=0xbffff7a4, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#32 0x080851d1 in gdb_main (args=0xbffff7a4) at ./main.c:911
#33 0x08085195 in main (argc=128, argv=0x0) at gdb.c:33
So it seems gdb is hanging inside __kernel_vsyscall(). Doing a diff on libc.so.6 on in the /lib directory on the host and the libc.so.6 on the target reveal differences. I've tried using LD_PRELOAD and LD_LIBRARY_PATH but that backtrace always shows /lib/libc.so.6 instead of pointing to the copy that the target has. Maybe I'm not setting them correctly, but I've tried setting them in gdb with set env and also setting them on the command line and exporting them, but to no effect. I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
So how do I get gdb to load different libraries?
Update #6:
So I made a bootable usb key using the target system's disk image as the base. I made minimal changes to it to get it to run on a standard PC, and added gdb and gdb's requisite libraries to it. So now, ibc is the same on both host and target machines and it still hangs on me.
Final. While I know gdb 6.8 worked in the past, I can't figure out the configuration. After upgrading both gdb and gdbserver to 7.12 it worked.
Upgrading versions, while a very large headache, could be possible but probably should not be necessary...
This is the right option. All of the other issues you are encountering are because of this.
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.
You should build on the same version, architecture, etc. as the system you are attempting to deploy your code to.
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error.
Per this answer,
Can be caused by 32/64 bit mixups. Check, for example, that you didn't attach to a 32-bit binary with a 64-bit process' ID, or vice versa.
I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.
Don't do that. As you've found out, it won't work.
So how do I get gdb to load different libraries?
Per this question, you can use LD_LIBRARY_PATH.
Here are some interesting suggestions. Have you tried to attach gdbserver to strace to see what kind of activity is going on during the hang? As other says - it could be a good way to go one step further into figuring out the problem.
You can do that with following command on target machine:
strace -p `pidof gdbserver`
Also sending a CONT signal to gdbserver may help when it hangs:
kill -CONT `pidof gdbserver`
When I run Intel pin with my custom pin-tool, for some reason it crashes on a segfault, before even starting the application under test. It happens for one application, even though, the same setup works for another application.
Here is an example of a successful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable1 <args to the executable>
Application stopped until continued from debugger.
Start GDB, then issue this command at the (gdb) prompt:
target remote :42312
(unset HOME is there for the application's sake) Here is an example of an unsuccessful run:
$ unset HOME && TEST_FILE=test000001.test pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
C: Tool (or Pin) caused signal 11 at PC 0x000000000
Segmentation fault
Note, that it does not even open a socket for gdb to attach to.
When running it directly under gdb, it seems to fail differently (on SIGUSR1):
$ unset HOME && TEST_FILE=test000001.test gdb --args pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from pin...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 185838 is executing new program: /home/necto/pin/intel64/bin/pinbin
Program received signal SIGUSR1, User defined signal 1.
0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
(gdb) bt
#0 0x00007ffff7edba1b in OS_BARESYSCALL_DoCallAsmIntel64Linux () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#1 0x00007fffffffd3d0 in ?? ()
#2 0x00007ffff7edbb53 in OS_SyscallDo () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#3 0x00007ffff7eda4a3 in OS_SendSignalToThread () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#4 0x00007ffff7ed8f8a in OS_RaiseException () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#5 0x00007ffff7e87dad in raise () from /home/necto/pin/intel64/runtime/pincrt/libc-dynamic.so
#6 0x00005555558e747e in ?? ()
#7 0x00005555558e757e in LEVEL_INJECTOR::DoSystemChecks() ()
#8 0x00005555558db0ae in LEVEL_INJECTOR::UNIX_INJECTOR::Run() ()
#9 0x00005555558e0695 in LEVEL_INJECTOR::PIN_UNIX_ENVIRONMENT::LaunchPin() ()
#10 0x00005555558c8be5 in LEVEL_INJECTOR::PIN_ENVIRONMENT::Main() ()
#11 0x0000555555657cf9 in main ()
(gdb)
The backtrace looks like nothing familiar. How can I find out the cause of this segfault?
Edit
As per suggestion of #Employed Russian, I let gdb pass the SIGUSR1 to pin, which helped it advance, but not by far:
(gdb) handle SIGUSR1 nostop noprint pass
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) r
Starting program: /home/necto/pin/pin -appdebug -t /home/necto/pin-trace.so -- ./executable2 <args to the executable>
process 186041 is executing new program: /home/necto/pin/intel64/bin/pinbin
E: Attach to pid 186041 failed.
E: The Operating System configuration prevents Pin from using the default (parent) injection mode.
E: To resolve this, either execute the following (as root):
E: $ echo 0 > /proc/sys/kernel/yama/ptrace_scope
E: Or use the "-injection child" option.
E: For more information, regarding child injection, see Injection section in the Pin User Manual.
E:
Edit2
The problem is in my pin tool. My pin tool pin-trace.so calls a function from the user code (from the application). This function fails on an assertion in the executable2, which becomes an exception in pin and converts into a segfault, being unhandled.
When running it directly under gdb, it seems to fail differently (on SIGUSR1)
It looks like pin is trying to use SGIUSR1 internally. If you ask GDB to ignore this signal with handle SIGUSR1 nostop noprint pass, your GDB session will likely proceed further, hopefully to the crash on NULL pointer dereference.
In case it helps, this:
C: Tool (or Pin) caused signal 11 at PC 0x000000000
means that your pintool (or pin itself) called NULL function pointer.
Komodo Edit crashed on my system , and i tried to debug it , added '-g' option inside komodo script,
And i got:
[New Thread 0xa80c2b70 (LWP 5102)]
[New Thread 0xa78c1b70 (LWP 5107)]
Program received signal SIGSEGV, Segmentation fault.
0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
(gdb) bt
#0 0xa97e1f10 in ?? () from /usr/lib/librsvg-2.so.2
#1 0x00000000 in ?? ()
(gdb) c
Continuing.
Operation not permitted
Is there any way to find out the real problem here ?
I wanted to know where that last string 'Operation not permitted' come from , but how ?
Many thanks !
added '-g' option inside komodo script,
When you say this, do you mean that you passed -g as a command-line argument?
If so, that won't work. -g (or -ggdb) needs to be passed to gcc, during compilation of Komodo Edit, so that debugging symbols are included in the output.
Im building a shared library on linux. the library ".so" was sucessfully created, but when I tried to link it to a test application (with an empty main) and run the executable I got a segmentation error : "Segmentation error (cure dumped)"
when I tried to debug it with gdb and check the backtrace I got this output:
Program received signal SIGSEGV, Segmentation fault.
0x0073d5df in std::_Rb_tree_decrement(std::_Rb_tree_node_base*) () from /usr/lib/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12.1-4.i686 libgcc-4.4.5-2.fc13.i686 libstdc++-4.4.5-2.fc13.i686 zlib-1.2.3-23.fc12.i686
(gdb) backtrace
#0 0x0073d5df in std::_Rb_tree_decrement(std::_Rb_tree_node_base*) () from /usr/lib/libstdc++.so.6
#1 0x0012d70c in ?? () from /opt/cuda/lib/libcudart.so.3
#2 0x0012df0c in ?? () from /opt/cuda/lib/libcudart.so.3
#3 0x0012c88a in ?? () from /opt/cuda/lib/libcudart.so.3
#4 0x00121435 in __cudaRegisterFatBinary () from /opt/cuda/lib/libcudart.so.3
#5 0x005d7bfd in __sti____cudaRegisterAll_55_tmpxft_00000fe6_00000000_26_MonteCarloPaeo_SM10_cpp1_ii_3a8af011()
() from libsharedCUFP.so
#6 0x005db40d in __do_global_ctors_aux () from libsharedCUFP.so
#7 0x005a8748 in _init () from libsharedCUFP.so
#8 0x008abd00 in _dl_init_internal () from /lib/ld-linux.so.2
#9 0x0089d88f in _dl_start_user () from /lib/ld-linux.so.2
Im not familiar with gdb debugging, and it's the first time Im trying to build a shared library on Linux, but it seems to me that it has something to do with the library dynamic linking.
If anyone had any idea about this error and could help me, I would be grateful.
It doesn't have anything to do with dynamic linking or shared libraries - one of the constructors in libsharedCUFP.so (I assume this is your shared library) is most probably passing an illegal address to a function in libcudart.so which crashes.
You simply need to debug your code.
This is on a Redhat EL5 machine w/ a 2.6.18-164.2.1.el5 x86_64 kernel using gcc 4.1.2 and gdb 7.0.
When I run my application with gdb and break in while it's running, several of my threads show the following call stack when I do a backtrace:
#0 0x000000000051d7da in pthread_cond_wait ()
#1 0x0000000100000000 in ?? ()
#2 0x0000000000c1c3b0 in ?? ()
#3 0x0000000000c1c448 in ?? ()
#4 0x00000000000007dd in ?? ()
#5 0x000000000051d630 in ?? ()
#6 0x00007fffffffdc90 in ?? ()
#7 0x000000003b1ae84b in ?? ()
#8 0x00007fffffffdd50 in ?? ()
#9 0x0000000000000000 in ?? ()
Is this a symptom of a common problem?
Is there a known issue with viewing the call stack while waiting on a condition?
The problem is that pthread_cond_wait is written in hand-coded assembly, and apparently doesn't have proper unwind descriptor (required on x86_64 to unwind the stack) in your build of glibc. This problem may have recently been fixed here.
You can try to build and install the latest glibc (note: if you screw up installation, your machine will likely become unbootable; approach with extreme caution!), or just live with "bogus" stack traces from pthread_cond_wait.
Generally, synchronization is required when multiple threads share a single resource.
In such a case, when you interrupt the program, you'll see only 1 thread is running (i.e., accessing the resource) and other threads are waiting within pthread_cond_wait().
So I don't think pthread_cond_wait() itself is problematic.
If your program hangs with deadlock or performance doesn't scale, it might be caused by pthread_cond_wait().
That looks like a corrupt stack trace to me
for example:
#9 0x0000000000000000 in ?? ()
There shouldn't be code at NULL