How can I force a process to return from a call to select() while debugging it - linux

I'm running a server under gdb, and it's currently blocked in a call to select.
I want to make it return from select, after which I can manually modify the fd sets and see how execution continues.
I tried to put a breakpoint on the next line after the call to select(), and issued the command 'signal SIGINT', but that did nothing other than printing 'Continuing with signal SIGINT'.
edit: I'm actually debugging using vgdb, maybe that's the issue?

You could try the jump command. This takes a location similar to how break takes a location, so you can specify a line, or an address as *ADDR. Try help jump at the gdb prompt for more information.
I've generally had most success with this command when the distance that I jump is small, otherwise, too much program state is incorrect for the program to do anything sane, but jumping out of a system call, especially if the plan is to patch up the return state anyway might work well.

Valgrind gdbserver+vgdb only partially supports the GDB command 'signal sig'
since version 3.11 (which was just released the 23 September 2015.
Version 3.10 and before are completely ignoring the GDB instruction
to continue with a signal or to change the signal.
In release 3.11, 'signal SIG' is partially supported: if the process
reported a signal to GDB, the signal can be ignored (using signal 0)
or can be changed (using signal othersignr).
Valgrind gdbserver does currently not support to raise a signal from GDB.
Also, when a thread is blocked in a system call, the Valgrind gdbserver
will not accept the GDB instruction to 'jump' out or 'return' from the
syscall.

Related

weird behavior setting RIP with ptrace

Basically I am using ptrace to inject a shell code to a remote process for execution. But I found some weird behavior regarding RIP register.
What I do is I copy my shell code to the start address of where the program is mapped. Then I set the RIP using ptrace to the address where the start address is. And then I resume the target process for executing the code. Once the shell code finishes (by running int3) I will get signal and recover the code that I just modified.
It works fine except when the remote process is blocked inside of a system call like sleep. If the remote process is blocked inside of a system call at the moment I attach the process, after I set the RIP to where I want to execute my shell code and then resume the target process, I will observe that the RIP is actually 2 less than what the address that I put in the ptrace call. For example if I set the RIP to be 0x4000, once I resume it the RIP becomes 0x3ffe. Typically it crashes for my case due to the segment fault, obviously. But if I grab the register right after I set it without resuming the process, the RIP is the value that I just set. Currently I work around it by insert 2 nop instructions ahead of my shell code and always add 2 when I set the RIP. I just want to know is there anything that I miss for setting the RIP or my whole method for injecting code is totally unstable?
My dev box is Ubuntu14.04, kernel is 3.13.0-45-generic.
If I recall correctly, if you interrupt the process while it's blocked in a syscall, the program counter value, upon continuing, will be subtracted by sizeof(syscall instruction) by the kernel. So once you do a PTRACE_DETACH, the process will re-do the syscall it was interrupted from.
I overcome the problem the same way you did (always adding a tiny nop-sled and incrementing RIP).

How to detect restart before terminating program on Linux

I want to detect that system is restarting before it terminates my program on Linux.
I tried using /var/run/utmp file to detect runlevel, put inotify on its changes but seems like system is closing this program before I get signal. I catch shutdown with it if I set runlevel with telinit command, but dont catch if I just restart with button on top-right corner in Ubuntu.
Any idea how can it be done?
Catch the SIGTERM signal and be quick with saving/doing whatever and then exit. You've got approximately 10 seconds before you'll get SIGKILL which you can't catch, and you'll be force terminated.
If the system isn't sending you a SIGTERM to allow proper shutdown, change your system to something proper, this is the standard way of doing it.
See man 7 signal and man 3 sigaction for signal handling.
(Note that I don't know of a standard way to check if a system is rebooting or not, I don't think such thing exists. But as mentioned above, a proper system will send you SIGTERM and let you do your cleanup/exit. Hard reboot excluded, because thats almost equivalent of pulling the power cord.)

Getting a backtrace of other thread

In Linux, to get a backtrace you can use backtrace() library call, but it only returns backtrace of current thread. Is there any way to get a backtrace of some other thread, assuming I know it's TID (or pthread_t) and I can guarantee it sleeps?
It seems that libunwind (http://www.nongnu.org/libunwind/) project can help. The problem is that it is not supported by CentOS, so I prefer not to use it.
Any other ideas?
Thanks.
I implemented that myself here.
Initially, I wanted to implement something similar as suggested here, i.e. getting somehow the top frame pointer of the thread and unwinding it manually (the linked source is derived from Apples backtrace implementation, thus might be Apple-specific, but the idea is generic).
However, to have that safe (and the source above is not and may even be broken anyway), you must suspend the thread while you access its stack. I searched around for different ways to suspend a thread and found this, this and this. Basically, there is no really good way. The common hack, also used by the Hotspot JAVA VM, is to use signals and sending a custom signal to your thread via pthread_kill.
So, as I would need such signal-hack anyway, I can have it a bit simpler and just use backtrace inside the called signal handler which is executed in the target thread (as also suggested here by sandeep). This is basically what my implementation is doing.
If you are also interested in printing the backtrace, i.e. get some useful debugging information (function name, source code filename, source code line number, ...), read here about an extended backtrace_symbols based on libbfd. Or just see the source here.
Signal Handling with the help of backtrace can solve your purpose.
I mean if you have a PID of the Thread, you can raise a signal for that thread. and in the handler you can use the backtrace. since the handler would be executing in that partucular thread, the backtrace there would be the output what you are needed.
gdb provides these facilities for debugging multi-thread programs:
automatic notification of new threads
‘thread thread-id’, a command to switch among threads
‘info threads’, a command to inquire about existing threads
‘thread apply [thread-id-list] [all] args’, a command to apply a command to a list of threads
thread-specific breakpoints
‘set print thread-events’, which controls printing of messages on thread start and exit.
‘set libthread-db-search-path path’, which lets the user specify which libthread_db to use if the default choice isn't compatible with the program.
So just goto required thread in GDB by cmd: 'thread thread-id'.
Then do 'bt' in that thread context to print the thread backtrace.

Can I instruct gdb to run commands in response to SIGTRAP?

I'm debugging a reference leak in a GObject-based application. GObject has a simple built-in mechanism to help with such matters: you can set the g_trap_object_ref variable in gobject.c to the object that you care about, and then every ref or unref of that object will hit a breakpoint instruction (via G_BREAKPOINT()).
So sure enough, the program does get stopped, with gdb reporting:
Program received signal SIGTRAP, Trace/breakpoint trap.
g_object_ref (_object=0x65f090) at gobject.c:2606
2606 old_val = g_atomic_int_exchange_and_add ((int *)&object->ref_count, 1);
(gdb) _
which is a great start. Now, normally I'd script some commands to be run at a breakpoint I manually set using commands 3 (for breakpoint 3, say). But the equivalent for SIGTRAP, namely handle SIGTRAP, doesn't give me the option of doing anything particularly interesting. Is there a good way to do this?
(I'm aware that there are other ways to debug reference leaks, such as setting watchpoints on the object's ref_count field, refdbg, scripting regular breakpoints on g_object_ref() and g_object_unref(). I'm about to go try of those now. I'm looking specifically for a way to script a response to SIGTRAP. It might come in useful in other situations, too, and I'd be surprised if gdb doesn't support this.)
Do you want to show some values and continue execution of the program? In that case, just define a macro that displays the values you're interested in, continues execution and calls itself recursively:
define c
echo do stuff\n
continue
c
end
GDB doesn't support it.
In general, attaching a command script to signal makes little sense -- your program could be receiving SIGTRAP in any number of places, and the command will not know whether a particular SIGTRAP came in in expected context or not.

Perl: How to add an interrupt handler so one can control a code executed by mpirun via system()?

We use a cluster with Perceus (warewulf) software to do some computing. This software package has wwmpirun program (a Perl script) to prepare a hostfile and execute mpirun:
# ...
system("$mpirun -hostfile $tmp_hostfile -np $mpirun_np #ARGV");
# ...
We use this script to run a math program (CODE) on several nodes, and CODE is normally supposed to be stopped by Ctrl+C giving a short menu with options: status, stop, and halt. However, running with MPI, pressing Ctrl+C badly kills CODE with loss of data.
Developers of CODE suggest a workaround - the program can be stopped by creating a file with name stop%s, where %s is name of task-file being executed by CODE. This allows to stop, but we cannot get status of calculation. Sometimes it takes really long time and getting this function back would be very appreciated.
What do you think - the problem is in CODE or mpirun?
Can one find a way to communicate with CODE executed by mpirun?
UPDATE1
In single run, one gets status of calculation by pressing Ctrl+C and choosing option status in the provided menu by entering s. CODE prints status information in STDOUT and continues to do the calculation.
"we cannot get status of calculation" - what does that mean? do you expect to get the status somehow but are not? or is the software not designed to give you status?
Your system call doesn't re-direct standard error/out anyplace, is that where the status is supposed to be (in which case, catch it by opening a pipe or re-directing to a log and having the wrapper read the log).
Also, you're not processing the return code by evaluating the return value of system call - that may be another way the program communicates.
Your Ctrl+C problem might be because Ctrl+C is caught by the Perl wrapper which dies instead of by the CODE which has some nice Ctrl+C interrupt handler. The solution might be to add interrupt handler to mpirun - see Perl Cookbook Recipe 16.18 for $SIG{INT} or http://www.wellho.net/resources/ex.php4?item=p216/sigint ; you may want to have the Perl wrapper catch Ctrl+C and send the INT signal to CODE it launched.

Resources