I am debugging a core file that was generated in a multi-threaded environment. The process crashed after it received a SIGABRT. The crash seems to be bit tricky and I want to know the execution state of all the threads when the crash happened. I guess the simple backtrace command just gives the execution state of the thread that was running during the crash. I want to know what command I need to use to get the backtrace for all threads
May be you are looking for thread apply all bt?
Related
If I run, for example,
int x = *(0x00000);
The program crashes. But why does the whole program crash instead of that single thread? I created multiple threads that just sleep continuously to test this out. Is there any way to make only the current thread exit, not the whole program (on windows using winapi)?
Thanks.
But why does the whole program crash instead of that single thread?
This is by design. If nobody (not the debugger if attached and not the process itself) handles the user mode exception, the system terminates the process, and this is logical.
All resources are shared per process, not per thread. After an unhandled exception happens the process is probably in an unstable/corrupted state. New exceptions will occur or the other threads might hang.
The thread could for example own some critical section or another resource at the time of the exception. If the thread terminates at this point the resource will be always in use by the crashed thread. When another thread tries to enter this "critical section" (in a broad sense) it hangs forever. (For example a heap critical section).
So better just terminate the process instead of getting new exceptions and undefined behavior in the process.
By the same line of reasoning, if an unhandled exception was in kernel mode the system terminates itself and tries to create a BSOD. Because after an unhandled exception in the kernel, all systems are in an unstable state and simply terminating the buggy thread is not a solution.
Is there any way to make only the current thread exit, not the whole
program (on windows using winapi)?
Formally yes, it's easy, you can set UnhandledExceptionFilter with the SetUnhandledExceptionFilter function and inside UnhandledExceptionFilter simply call TerminateThread for the current thread ( GetCurrentThread() ) because
The exception handler specified by lpTopLevelExceptionFilter is
executed in the context of the thread that caused the fault.
Also note that this callback is called only if the process is not being debugged.
However, terminating the thread is not a proper solution. The solution is that there must not be exceptions in your process or you need to handle it. If you can not the process must end.
I have a node server running. Once in a while, the main thread hangs and goes to 100% CPU usage. The thread is totally hung and not processing any further events whatsoever.
Unfortunately, because of this, even attaching the node debugger is not useful since the thread is hung somewhere (I ran node's debugger and attached to the stalled process but [for example] 'pause' or 'bt' does not return).
How can I figure out where it is hanging? Is it possible to have node keep track of the current closure stack so that I can get access to it retrospectively when the bug occurs again?
One low-level method of checking is to use a utility like strace. You can use it like: strace -p <node pid>. This will only show syscalls however, so if your program is in some kind of infinite loop that is not making any syscalls (like performing I/O) you won't see any output.
You might also try using llnode to attach to the live process to get a more node-friendly interface to the node process (compared to using gdb).
As far as seeing what handles/requests are active in the node process, there are a couple of "private" (underscore-prefixed) methods available if you are feeling adventurous: process._getActiveHandles() and process._getActiveRequests(). You might use those functions in conjunction with a module like blocked which helps detect when the event loop is executing slower than whatever threshold you want.
I have a Fortran program that runs a series of identical calculations on a number of different input data. After doing these calculations the code then always writes a GNUplot script that does some diagnostic plotting (nothing too difficult) and runs it using execute_command_line in Linux.
This usually works well, but after some time I think there must be a memory leak of some kind that works cumulative, because the GNUplotting becomes slower and slower. At some point it virtually stalls.
My question is therefore: Is it possible to interrupt the call to execute_command_line using the keyboard without killing the main Fortran program? Needless to say, CTRL-C kills everything, which is not what I want: I want the main program to continue.
I have been playing with the optional flag wait=.true. but this does not help.
Also, I know that the memory leak has to be fixed (or whatever the cause is), but for now I would like to first see the diagnostic output.
The only solution I have been able to come up with is kind of a workaround:
Modify the shell script so that it
runs the Fortran program in the background: ./mpirun prog_name options &
gets the PID of this proces: proc_PID=$!
waits for the process: wait $proc_PID
traps an interrupt signal: trap handler SIGINT
lets the handler send a SIGURS1 signal: function handler() { kill -SIGUSR1 $proc_PID }
modify the Fortran code so that it catches the SIGUSR1 signal and does what you want with it. For example by having a look here.
By running the mpi process in the background you avoid killing mpirun with SIGINT, which cannot be trapped but you send instead a SIGURS1, which is properly propagated to the mpi processes where it can be handled with directly.
As a side note, however, I realized that this will not solve my problem as my problem was related to an external call to gnuplot using execute_command_line. Since I had a cumulative memory leak, at some point this call started taking for ever because memory resources became scarcer. So the only thing I could have done is manually killing the gnuplot process.
Better, of course, was fixing the memory leak, which I did.
I have an application that I am debugging and I'm trying to understand how gdb works and why I am not able to step through the application sometimes. The problem that I am experiencing is that gdb will hang and the process it is attached to will enter a defunct state when I am stepping through the program. After gdb hangs and I have to kill it to free the terminal (ctrl-C does not work, I have to do this from a different terminal window by getting the process id for that gdb session and using kill -9).
I'm guessing that gdb is hanging because it's waiting for the application to stop at the next instruction and somehow the application finished execution without gdb identifying this. But that's just speculation on my part from the behavior I've observed thus far. So my question is if anyone has seen this type of behavior before and/or could suggest what the cause might be. I think that might help me improve my debugging strategy.
In case it matters I'm using g++ 4.4.3, gdb 7.1, running on Ubuntu 10.04 x86_64.
I had a similar problem and solved it by sending a CONT signal to the process being debugged.
I'd say the debugged process wouldn't sit idle if it was the cause of the hang. Every time GDB has completed a step, it has to update any expressions you required to print. It may include following pointers and so, and in some case, it may fail there (although I don't remind of a real "hang"). It also typically try to update your stack trace. If the stack trace has been corrupted and is no longer coherent, it could be trapped into an endless loop. Attaching gdb to strace to see what kind of activity is going on during the hang could be a good way to go one step further into figuring out the problem.
(e.g. accessing sources through a no-longer-working NFS/SSHFS mount is one of the most frequent reason for gdb to hang, here :P)
I have a server program, which doesn't have a very clean/graceful shutdown (not supposed to terminate in general). When tracing memory leaks, I run it under valgrind, but finally have to kill the process by a signal (^C). Generally I try to terminate the process when the ambiance is quiet but still then some threads might have been busy processing jobs and memory held by them cause false alarms. To assist such analysis, is there any way (tool) in valgrind, so that it can print the backtrace of threads when the program exits (by a signal?).
I know it's inconvenient, but could you get your program to dump core when it gets this signal, then diagnose the core dump with gdb?
Don't sure I quite understand your question, but you can print backtrace of all pthreads by gdb:
thread apply all bt