QThread dumps core - multithreading

I'm looking at a program that crashes, leading to a useless (or so it seems) core dump. I didn't write the program but I'm trying to find what may be the cause.
First strange thing is that the core dump is named after QThread instead of my executable itself.
Then inside the backtrace, there's no hint at line numbers of the program itself:
$ gdb acqui ../../appli/core.QThread.31667.1448795278
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./acqui'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fcf4a1cf4e8 in __GI_abort () at abort.c:89
#2 0x00007fcf4aab9b3d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007fcf4aab7bb6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007fcf4aab7c01 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fcf4aab7e69 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fcf4b8707db in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#7 0x00007fcf4b764e99 in QThread::exec() () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#8 0x00007fcf4b76770f in ?? () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#9 0x00007fcf4ad6c0a4 in start_thread (arg=0x7fcf0b7fe700) at pthread_create.c:309
#10 0x00007fcf4a27f04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) info threads
Id Target Id Frame
16 Thread 0x7fcf297fa700 (LWP 31676) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
15 Thread 0x7fcf28ff9700 (LWP 60474) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
14 Thread 0x7fcf08ff9700 (LWP 60516) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
13 Thread 0x7fcf0bfff700 (LWP 60513) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
12 Thread 0x7fcf3932c700 (LWP 60494) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
11 Thread 0x7fcf29ffb700 (LWP 60444) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
10 Thread 0x7fcf39b2d700 (LWP 31668) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
9 Thread 0x7fcf2affd700 (LWP 31673) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
8 Thread 0x7fcf2bfff700 (LWP 31671) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
7 Thread 0x7fcf38b2b700 (LWP 60432) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
6 Thread 0x7fcf2a7fc700 (LWP 31674) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x7fcf4d4f9780 (LWP 31667) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
4 Thread 0x7fcf097fa700 (LWP 60430) pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
3 Thread 0x7fcf09ffb700 (LWP 31682) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x7fcf0affd700 (LWP 31680) 0x00007fcf4a27650d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7fcf0b7fe700 (LWP 31679) 0x00007fcf4a1ce107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
I'm at a loss as to where to start. Is it a problem in using QThread ? Something else ? How can I enable more (or better) debugging info ? The program itself is compiled with -g -ggdb.

This part:
#4 0x00007fcf4aab7c01 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fcf4aab7e69 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
... means that the code in question is re-throwing an exception, but there is no exception handler for it. So, the runtime calls std::terminate.
This is a programming error, though exactly what to do depends on your libraries and program -- maybe not re-throw, maybe install an outermost exception handler and log a message, etc.

Related

When does thread get into uncancellable sleep state

I have this piece of code where it works perfects in normal case. however , sometimes thread get into uncancelable sleep state.
It means from the state of the process, I see this thread getting into this https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/nanosleep_nocancel.c.html#__nanosleep_nocancel
struct timespec convertticktotimespec(unsigned long numticks)
{
struct timespec tm;
/* separate the integer and decimal portions */
long nanoseconds =
((numticks / (float)sysconf(_SC_CLK_TCK)) - floor(numticks / (float)sysconf(_SC_CLK_TCK))) *
NANOSEC_MULTIPLIER;
tm.tv_sec = numticks / sysconf(_SC_CLK_TCK);
tm.tv_nsec = nanoseconds;
return tm;
}
void *thread(void *args)
{
struct_S *s = (struct_S *)args;
while(1)
{
s->var = 1;
struct timespec tm = convertticktotimespec(sysClkRateGet() * 13);
if ( 0 !=nanosleep(&tm, NULL) ) {
perror(nanosleep);
}
}
}
stack trace looks like this
Thread 19 (Thread 0x7f225a043700 (LWP 16023)):
#0 0x00007f225b8913ed in __accept_nocancel () at ../sysdeps/unix/syscall-
template.S:84
#1 0x0000000000000000 in ?? ()
Thread 18 (Thread 0x7f225a076700 (LWP 15952)):
#0 0x00007f225b89126d in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 14 (Thread 0x7f225a021700 (LWP 16035)):
#0 0x00007f225b8917dd in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 13 (Thread 0x7f225a032700 (LWP 16034)):
#0 0x00007f225b8917dd in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f225bbb3700 (LWP 15950)):
#0 0x00007f225ab1e3f3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f225a010700 (LWP 16036)):
#0 0x00007f225b8911ad in __write_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000000000000000 in ?? ()
some how this thread getting into uncancelable sleep state on random , although I dont find the clear definition of this anywhere on internet, so I assume thread getting sleep state forever which can not be interrupted .hence this thread goes to inactive state forever.
I have no clue why is this happening having executing or responsible for fewer lines of code or instructions.
fromcode.woboq , I found that this gets called from mutex lock.
https://code.woboq.org/userspace/glibc nptl/pthread_mutex_timedlock.c.html#416, but the thread is not using any mutex.
the only thing that i suspect here is , structure struct_s is allocated in the shared memory. this variable is also accessed and assigned by other thread from an another process. does the thread get into this state , internally depending on priority of the threads ?

New Thread spawned by cudaMalloc | Behaviour?

cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb.
It also took a while to return.
The same thread exited, although as a different LWP, at the end of the program.
Can someone explain this behaviour ?
The thread is not specifically spawned by cudaMalloc. The user side CUDA driver API library seems to spawn threads at some stage during lazy context setup which have the lifetime of the CUDA context. The exact processes are not publicly documented.
You see this associated with cudaMallocbecause I would guess this is the first API to trigger whatever setup/callbacks need to be done to make the userspace driver support work. You should notice that only the first call spawns a thread. Subsequent calls do not. And the threads stay alive for the lifetime of the CUDA context, after which they are terminated. You can trigger explicit thread destruction by calling cudaDeviceReset at any point in program execution.
Here is a trivial example which demonstrates cudaMemcpyToSymbol triggering the thread spawning from the driver API library, rather than cudaMalloc:
__device__ float someconstant;
int main()
{
cudaSetDevice(0);
const float x = 3.14159f;
cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
for(int i=0; i<10; i++) {
int *x;
cudaMalloc((void **)&x, size_t(1024));
cudaMemset(x, 0, 1024);
cudaFree(x);
}
return int(cudaDeviceReset());
}
In gdb I see this:
(gdb) tbreak main
Temporary breakpoint 1 at 0x40254f: file gdb_threads.cu, line 5.
(gdb) run
Starting program: /home/talonmies/SO/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at gdb_threads.cu:5
5 cudaSetDevice(0);
(gdb) next
6 const float x = 3.14159f;
(gdb) next
7 cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
(gdb) next
[New Thread 0x7ffff5eb5700 (LWP 14282)]
[New Thread 0x7fffed3ff700 (LWP 14283)]
8 for(int i=0; i<10; i++) {
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffed3ff700 (LWP 14283) "a.out" pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
2 Thread 0x7ffff5eb5700 (LWP 14282) "a.out" 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fd1740 (LWP 14259) "a.out" main () at gdb_threads.cu:8
(gdb) thread apply all bt
Thread 3 (Thread 0x7fffed3ff700 (LWP 14283)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff65cad97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff659582d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7fffed3ff700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7ffff5eb5700 (LWP 14282)):
#0 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff65c9953 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff66571ae in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7ffff5eb5700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fd1740 (LWP 14259)):
#0 main () at gdb_threads.cu:8

libexpect sigalrm nanosleep crash

I'm working with libexpect, but if the read times out (expected return code EXP_TIMEOUT) I instead get a crash as follows.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1366278fc8 in __GI_abort () at abort.c:89
#2 0x00007f13662b2e14 in __libc_message (do_abort=do_abort#entry=2, fmt=fmt#entry=0x7f13663bf06b "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f136634a7dc in __GI___fortify_fail (msg=<optimized out>) at fortify_fail.c:37
#4 0x00007f136634a6ed in ____longjmp_chk () at ../sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S:100
#5 0x00007f136634a649 in __longjmp_chk (env=0x1, val=1) at ../setjmp/longjmp.c:38
#6 0x00007f1366ed2a95 in ?? () from /usr/lib/libexpect.so.5.45
#7 <signal handler called>
#8 0x00007f1367334b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#9 0x000000000044cc13 in main (argc=3, argv=0x7fffca4013b8) at main_thread.c:6750
(gdb)
As you can see, I'm using nanosleep, which is supposed to not interact with signals like usleep and sleep (http://linux.die.net/man/2/nanosleep). As I understand it, libexpect uses SIGALRM to time out, but it's unclear to me how the two threads are interacting. If I had to guess, the expect call is raising a sigalrm, and it's interrupting the nanosleep call, but beyond that I don't know what's going on.
Thread 1:
while (stuff)
{
//dothings
struct timespec time;
time.tv_sec = 0.25;
time.tv_nsec = 250000000;
nanosleep(&time, NULL);
}
Thread 2:
switch(exp_expectl(fd, exp_glob, (char*)user_prompt, OK, exp_end))
{
case OK:
DG_LOG_DEBUG("Recieved user prompt");
break;
case EXP_TIMEOUT:
DG_LOG_DEBUG("Expect timed out");
goto error;
default:
DG_LOG_DEBUG("Expect failed for unknown reasons");
goto error;
}
I have done some reading about signals and sleep, but I've used sleep in multiple threads on many occasions and had no difficulties until now. What am I missing?
edit: misc version info
ubuntu 14.04 3.13.0-44-generic
/usr/lib/libexpect.so.5.45
code is in C
compiler is gcc (-lexpect -ltcl)
include <tcl8.6/expect.h>

Setup and Debugging of applications run under mod-mono-server4

I have a c# application (servicebus) which runs on a private web server. Its basic job is to accept some web requests and create other processes to handle processing the data packages described in the requests. The processing is often ongoing and can take weeks.
The servicebus will, occasionally, start consuming great amounts of CPU. That is, it is normally idle, getting 1 or 2 seconds of CPU time per day. When it gets into this strange mode, its consuming 100+% CPU all the time. At this point, a new instance of the servicebus gets spawned by apache if a new request comes in. At this point I will have two copies of the servicebus running (and possibly both handling processing requests -- i don't know).
This is the normal process (via ps -aef ) :
UID PID PPID C STIME TTY TIME CMD
apache 8978 1 0 11:51 ? 00:00:01 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop
As you can see, the application is a C# program (compiled with VS 2010 for .NET 4) running via mod-mono-server4 under mono. This is a redhat linux enterprise 6.5 system.
After running for a while that process 'went crazy' and started consuming lots of CPU and mod-mono-server created a new instance. As you can see, I didn't find it until Monday morning after it had used over 2 days of CPU time. Here is the new ps -aef output :
UID PID PPID C STIME TTY TIME CMD
apache 8978 1 83 Sep19 ? 2-08:26:25 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop
apache 32538 1 0 Sep21 ? 00:00:00 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop
In case you need to see how the application is configured, I have the snippet from the conf.d file for the application :
# The user and group need to be set before mod_mono.conf is loaded.
User apache
Group apache
# Service Bus setup
Include /etc/httpd/conf/mod_mono.conf
Listen 8081
<VirtualHost *:8081>
DocumentRoot /opt/ov/vespa/servicebus
MonoServerPath default /opt/mono/bin/mod-mono-server4
MonoApplications "/:/opt/ov/vespa/servicebus"
<Location "/">
SetHandler mono
Allow from all
</Location>
</VirtualHost>
The basic question is... how do I go about debugging this and finding what is wrong with my application? That, however is a bit vague. Normally, I would want to put mono into debug mod and then when it gets into this strange mode I would use kill -ABRT to get a core dump out of it. I assume I could then find a for loop/while loop/etc which is stuck and fix my bug. So, the real question is how do do that? Is that process PID=8978 actually my application being interpreted by mono or is it mono running mod-mono-server4.exe? Or is it mono interpreting mod-mono-server4.exe which in turn is interpreting servicebus? Where in the apache configuration files do I put in the arguments to mono so I can get the --debug I desire.
Normally to debug I would need a process like :
/opt/mono/bin/mono --debug /opt/test/testapp.exe
So, I need to get a --debug into the command line and sort out which PID to actually kill. Then I can use techniques from http://www.mono-project.com/docs/debug+profile/debug/ to debug the core file.
NOTE: I have tried putting in MonoMaxCPUTime and MonoAutoRestartTime directives into the apache conf files to cure this. The problem is, when everything is nominal, they work fine. Once it gets into this bad state(consuming a ton of CPU), the restart fails. Or rather it succeeds in creating a new process but fails to delete the old one (basically the state I am already in).
Debugging so far : I see my log files for PID=8979 stops on 9/21 at 03:27. Given that it often generates a 200% or 300% CPU or more that could easily be the time of the 'crash'. Looking in the apache logs I found an unusual event at that time. A dump of the log is below :
...
[Sun Sep 21 03:28:01 2014] [notice] SIGHUP received. Attempting to restart
mod-mono-server received a shutdown message
httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain for ServerName
Stacktrace:
Native stacktrace:
/opt/mono/bin/mono() [0x48cc26]
/lib64/libpthread.so.0() [0x32fca0f710]
/lib64/libpthread.so.0(pthread_cond_wait+0xcc) [0x32fca0b5bc]
/opt/mono/bin/mono() [0x5a6a9c]
/opt/mono/bin/mono() [0x5ad4e9]
/opt/mono/bin/mono() [0x5116d8]
/opt/mono/bin/mono(mono_thread_manage+0x1ad) [0x5161cd]
/opt/mono/bin/mono(mono_main+0x1401) [0x46a671]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x32fc21ed1d]
/opt/mono/bin/mono() [0x4123a9]
Debug info from gdb:
warning: File "/opt/mono/bin/mono-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
To enable execution of this file add
add-auto-load-safe-path /opt/mono/bin/mono-gdb.py
line to your configuration file "$HOME/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "$HOME/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
[New LWP 9148]
[New LWP 9135]
[New LWP 9000]
[New LWP 8991]
[New LWP 8990]
[New LWP 8988]
[New LWP 8987]
[New LWP 8986]
[New LWP 8985]
[New LWP 8984]
[Thread debugging using libthread_db enabled]
0x00000032fca0e75d in read () from /lib64/libpthread.so.0
11 Thread 0x7f0d8bcaf700 (LWP 8984) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7f0d8b2ae700 (LWP 8985) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f0d8a8ad700 (LWP 8986) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7f0d89eac700 (LWP 8987) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f0d894ab700 (LWP 8988) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7f0d88aaa700 (LWP 8990) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f0d880a9700 (LWP 8991) 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f0d8713c700 (LWP 9000) 0x00000032fca0d930 in sem_wait () from /lib64/libpthread.so.0
3 Thread 0x7f0d86157700 (LWP 9135) 0x00000032fc27a983 in malloc () from /lib64/libc.so.6
2 Thread 0x7f0d8568b700 (LWP 9148) 0x00000032fc2792f0 in _int_malloc () from /lib64/libc.so.6
* 1 Thread 0x7f0d8bcb0740 (LWP 8978) 0x00000032fca0e75d in read () from /lib64/libpthread.so.0
Thread 11 (Thread 0x7f0d8bcaf700 (LWP 8984)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7f0d8b2ae700 (LWP 8985)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f0d8a8ad700 (LWP 8986)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f0d89eac700 (LWP 8987)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f0d894ab700 (LWP 8988)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f0d88aaa700 (LWP 8990)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f0d880a9700 (LWP 8991)):
#0 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005d59f7 in GC_wait_marker ()
#2 0x00000000005dbabd in GC_help_marker ()
#3 0x00000000005d4778 in GC_mark_thread ()
#4 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f0d8713c700 (LWP 9000)):
#0 0x00000032fca0d930 in sem_wait () from /lib64/libpthread.so.0
#1 0x00000000005bea28 in mono_sem_wait ()
#2 0x000000000053b2bb in finalizer_thread ()
#3 0x000000000051375b in start_wrapper ()
#4 0x00000000005a8214 in thread_start_routine ()
#5 0x00000000005d565a in GC_start_routine ()
#6 0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x00000032fc2e8b5d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f0d86157700 (LWP 9135)):
#0 0x00000032fc27a983 in malloc () from /lib64/libc.so.6
#1 0x00000000005cd0e6 in monoeg_malloc ()
#2 0x00000000005cbef1 in monoeg_g_hash_table_insert_replace ()
#3 0x00000000005acff5 in WaitForMultipleObjectsEx ()
#4 0x0000000000512694 in ves_icall_System_Threading_WaitHandle_WaitAny_internal ()
#5 0x00000000417b0270 in ?? ()
#6 0x00007f0d68000c21 in ?? ()
#7 0x00007f0d847c4b40 in ?? ()
#8 0x00007f0d68003e00 in ?? ()
#9 0x000000004023e890 in ?? ()
#10 0x00007f0d68003e00 in ?? ()
#11 0x00007f0d86156940 in ?? ()
#12 0x00007f0d861568a0 in ?? ()
#13 0x00007f0d8767d000 in ?? ()
#14 0xffffffffffffffff in ?? ()
#15 0x00007f0d86156cc0 in ?? ()
#16 0x00007f0d847c4b40 in ?? ()
#17 0x000000004023e268 in ?? ()
#18 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f0d8568b700 (LWP 9148)):
#0 0x00000032fc2792f0 in _int_malloc () from /lib64/libc.so.6
#1 0x00000032fc27a636 in calloc () from /lib64/libc.so.6
#2 0x00000000005cd148 in monoeg_malloc0 ()
#3 0x00000000005cbb94 in monoeg_g_hash_table_new ()
#4 0x00000000005acf94 in WaitForMultipleObjectsEx ()
#5 0x0000000000512694 in ves_icall_System_Threading_WaitHandle_WaitAny_internal ()
#6 0x00000000417b0270 in ?? ()
#7 0x00007f0d60000c21 in ?? ()
#8 0x00007f0d8767d000 in ?? ()
#9 0xffffffffffffffff in ?? ()
#10 0x000000004023e890 in ?? ()
#11 0x00007f0d68003e00 in ?? ()
#12 0x00007f0d8568a940 in ?? ()
#13 0x00007f0d8568a8a0 in ?? ()
#14 0x00007f0d8767d000 in ?? ()
#15 0xffffffffffffffff in ?? ()
#16 0x00007f0d8568acc0 in ?? ()
#17 0x00007f0d864e2990 in ?? ()
#18 0x000000004023e268 in ?? ()
#19 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f0d8bcb0740 (LWP 8978)):
#0 0x00000032fca0e75d in read () from /lib64/libpthread.so.0
#1 0x000000000048cdb6 in mono_handle_native_sigsegv ()
#2 <signal handler called>
#3 0x00000032fca0b5bc in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4 0x00000000005a6a9c in _wapi_handle_timedwait_signal_handle ()
#5 0x00000000005ad4e9 in WaitForMultipleObjectsEx ()
#6 0x00000000005116d8 in wait_for_tids ()
#7 0x00000000005161cd in mono_thread_manage ()
#8 0x000000000046a671 in mono_main ()
#9 0x00000032fc21ed1d in __libc_start_main () from /lib64/libc.so.6
#10 0x00000000004123a9 in _start ()
=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
Which I think means the process had a seg fault and was trying to dump core or something and stuck trying to do that? Or did it get a sig ABRT while processing a sig SEGV? In either case, that's a dump of mono, right? I did a find of the full file system and no core was generated so I'm not sure how apache/gdb managed this.
In case it matters I have RedHat 6.5, mono 2.10.8, gcc 4.4.7, mod-mono-server4.exe 2.10.0.0
Basically this boils down to these questions.
How do I get --debug into the mono commands that apache issues?
How do I get apache to save the core files it encounters instead of automatically running gdb on them (as I need to issue more complex commands to get at the underlying c# code)?
What does the command line for my servicebus mean? That is why/how come the mod-mono-server4 isn't a completely separate process from my servicebus? How does the MMS fit into the mono interpreting servicebus processing chain
Or am I totally wrong and will the answers to those questions not help me?
First of all: Mono 2.10 is very old, you may be running into a bug that is already fixed in the latest 3.8.
As for getting --debug into your app, you can set the environment variable MONO_OPTIONS=--debug, that has the same effect as specifying it on the command line.

Identify crash in multi threaded programs - How to?

I am quiet experienced in debugging with GDB, so my question is not related to debug symbols not being available :). I am dealing with a crash in mutti threaded program. Just before the crash I see following logs in GDB, and backtrace does not give me much info. Info thread again shows I am not running under application space. Any suggestions as to how I can approach this problem.
[New Thread 1755]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1755]
warning: GDB can't find the start of the function at 0x2ac17638.
GDB is unable to find the start of the function at 0x2ac17638
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x2ac17638 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
0x2ac17638 in ?? ()
(gdb) bt
#0 0x2ac17638 in ?? ()
(gdb) info thread
[New Thread 1737]
[New Thread 1738]
[New Thread 1739]
[New Thread 1740]
[New Thread 1741]
[New Thread 1742]
[New Thread 1744]
[New Thread 1745]
[New Thread 1746]
[New Thread 1747]
[New Thread 1748]
[New Thread 1749]
[New Thread 1750]
[New Thread 1751]
[New Thread 1752]
[New Thread 1753]
[New Thread 1754]
[New Thread 1756]
20 Thread 1756 0x2aac1068 in ?? ()
19 Thread 1754 0x2abd62b4 in ?? ()
18 Thread 1753 0x2abd62b4 in ?? ()
17 Thread 1752 0x2aabda58 in ?? ()
16 Thread 1751 0x2abd62b4 in ?? ()
15 Thread 1750 0x2aabda58 in ?? ()
14 Thread 1749 0x2aabda58 in ?? ()
13 Thread 1748 0x2aabda58 in ?? ()
12 Thread 1747 0x2aabfb44 in ?? ()
11 Thread 1746 0x2aabfb44 in ?? ()
10 Thread 1745 0x2aabfb44 in ?? ()
9 Thread 1744 0x2aabfb44 in ?? ()
8 Thread 1742 0x2aabfb44 in ?? ()
7 Thread 1741 0x2aac15dc in ?? ()
6 Thread 1740 0x2abd62b4 in ?? ()
5 Thread 1739 0x2abd62b4 in ?? ()
4 Thread 1738 0x2abd62b4 in ?? ()
3 Thread 1737 0x2aabfb44 in ?? ()
* 2 Thread 1755 0x2ac17638 in ?? ()
1 Thread 1736 0x2abd56bc in ?? ()
warning: GDB can't find the start of the function at 0x2ac17638.
P.S. Its a MIPS based embedded linux, the problem is rarely reproducible, but on boot up.

Resources