I have a background thread doing some processing, and saving to Core Data. In my app delegate's applicationShouldTerminate, I wait on a semaphore which is released when the background thread completes its work. This is to avoid killing the thread in the middle of its work and leaving things in an inconsistent state.
Unfortunately, this causes a deadlock. Here is how the background job operates:
_context = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[_context setParentContext:_parentContext];
[_context performBlock:
^{
// ... long-running task here ...
NSError * error;
[_context save:&error]; // deadlock here if main thread is waiting on semaphore
// ... release semaphore here ...
}];
If the user quits the app while the background thread is still working, it deadlocks. The problem seems to be that [_context save:&error] is calling dispatch_sync (or an equivalent) into the main thread - but the main thread is already waiting for this thread to release the semaphore, and so isn't able to run the block.
Since a child context save appears to block on the main thread, how can this pattern (main thread waiting for child to complete and save its context) be achieved?
Main thread:
#0 0x00007fff882e96c2 in semaphore_wait_trap ()
#1 0x00007fff876264c2 in _dispatch_semaphore_wait_slow ()
#2 0x00000001001157fb in +[IndxCore waitForBackgroundJobs] at /Users/mspong/dev/Indx/IndxCore/IndxCore/IndxCore.m:48
#3 0x00000001000040c6 in -[RHAppDelegate applicationShouldTerminate:] at /Users/mspong/dev/Indx/Indx/Indx/RHAppDelegate.m:324
#4 0x00007fff9071a48f in -[NSApplication _docController:shouldTerminate:] ()
#5 0x00007fff9071a39e in __91-[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:]_block_invoke_0 ()
#6 0x00007fff9071a23a in -[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:] ()
(snip)
#17 0x00007fff9048e656 in NSApplicationMain ()
#18 0x0000000100001e72 in main at /Users/mspong/dev/Indx/Indx/Indx/main.m:13
#19 0x00007fff8c4577e1 in start ()
Background thread:
#0 0x00007fff882e96c2 in semaphore_wait_trap ()
#1 0x00007fff87627c6e in _dispatch_thread_semaphore_wait ()
#2 0x00007fff87627ace in _dispatch_barrier_sync_f_slow ()
#3 0x00007fff8704a78c in _perform ()
#4 0x00007fff8704a5d2 in -[NSManagedObjectContext(_NestedContextSupport) executeRequest:withContext:error:] ()
#5 0x00007fff8702c278 in -[NSManagedObjectContext save:] ()
#6 0x000000010011640d in __22-[Indexer updateIndex]_block_invoke_0 at /Users/mspong/dev/Indx/IndxCore/IndxCore/Indexer/Indexer.m:70
#7 0x00007fff87079b4f in developerSubmittedBlockToNSManagedObjectContextPerform_privateasync ()
#8 0x00007fff876230fa in _dispatch_client_callout ()
#9 0x00007fff876244c3 in _dispatch_queue_drain ()
#10 0x00007fff87624335 in _dispatch_queue_invoke ()
#11 0x00007fff87624207 in _dispatch_worker_thread2 ()
#12 0x00007fff88730ceb in _pthread_wqthread ()
#13 0x00007fff8871b1b1 in start_wqthread ()
In applicationShouldTerminate: you can return NSTerminateLater to delay the termination until the background context has finished the save operation.
When the saving is done, you call
[[NSApplication sharedApplication] replyToApplicationShouldTerminate:YES];
to quit the application.
Related
cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb.
It also took a while to return.
The same thread exited, although as a different LWP, at the end of the program.
Can someone explain this behaviour ?
The thread is not specifically spawned by cudaMalloc. The user side CUDA driver API library seems to spawn threads at some stage during lazy context setup which have the lifetime of the CUDA context. The exact processes are not publicly documented.
You see this associated with cudaMallocbecause I would guess this is the first API to trigger whatever setup/callbacks need to be done to make the userspace driver support work. You should notice that only the first call spawns a thread. Subsequent calls do not. And the threads stay alive for the lifetime of the CUDA context, after which they are terminated. You can trigger explicit thread destruction by calling cudaDeviceReset at any point in program execution.
Here is a trivial example which demonstrates cudaMemcpyToSymbol triggering the thread spawning from the driver API library, rather than cudaMalloc:
__device__ float someconstant;
int main()
{
cudaSetDevice(0);
const float x = 3.14159f;
cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
for(int i=0; i<10; i++) {
int *x;
cudaMalloc((void **)&x, size_t(1024));
cudaMemset(x, 0, 1024);
cudaFree(x);
}
return int(cudaDeviceReset());
}
In gdb I see this:
(gdb) tbreak main
Temporary breakpoint 1 at 0x40254f: file gdb_threads.cu, line 5.
(gdb) run
Starting program: /home/talonmies/SO/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at gdb_threads.cu:5
5 cudaSetDevice(0);
(gdb) next
6 const float x = 3.14159f;
(gdb) next
7 cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
(gdb) next
[New Thread 0x7ffff5eb5700 (LWP 14282)]
[New Thread 0x7fffed3ff700 (LWP 14283)]
8 for(int i=0; i<10; i++) {
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffed3ff700 (LWP 14283) "a.out" pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
2 Thread 0x7ffff5eb5700 (LWP 14282) "a.out" 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fd1740 (LWP 14259) "a.out" main () at gdb_threads.cu:8
(gdb) thread apply all bt
Thread 3 (Thread 0x7fffed3ff700 (LWP 14283)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff65cad97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff659582d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7fffed3ff700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7ffff5eb5700 (LWP 14282)):
#0 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff65c9953 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff66571ae in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7ffff5eb5700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fd1740 (LWP 14259)):
#0 main () at gdb_threads.cu:8
I'm working with libexpect, but if the read times out (expected return code EXP_TIMEOUT) I instead get a crash as follows.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1366278fc8 in __GI_abort () at abort.c:89
#2 0x00007f13662b2e14 in __libc_message (do_abort=do_abort#entry=2, fmt=fmt#entry=0x7f13663bf06b "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f136634a7dc in __GI___fortify_fail (msg=<optimized out>) at fortify_fail.c:37
#4 0x00007f136634a6ed in ____longjmp_chk () at ../sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S:100
#5 0x00007f136634a649 in __longjmp_chk (env=0x1, val=1) at ../setjmp/longjmp.c:38
#6 0x00007f1366ed2a95 in ?? () from /usr/lib/libexpect.so.5.45
#7 <signal handler called>
#8 0x00007f1367334b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#9 0x000000000044cc13 in main (argc=3, argv=0x7fffca4013b8) at main_thread.c:6750
(gdb)
As you can see, I'm using nanosleep, which is supposed to not interact with signals like usleep and sleep (http://linux.die.net/man/2/nanosleep). As I understand it, libexpect uses SIGALRM to time out, but it's unclear to me how the two threads are interacting. If I had to guess, the expect call is raising a sigalrm, and it's interrupting the nanosleep call, but beyond that I don't know what's going on.
Thread 1:
while (stuff)
{
//dothings
struct timespec time;
time.tv_sec = 0.25;
time.tv_nsec = 250000000;
nanosleep(&time, NULL);
}
Thread 2:
switch(exp_expectl(fd, exp_glob, (char*)user_prompt, OK, exp_end))
{
case OK:
DG_LOG_DEBUG("Recieved user prompt");
break;
case EXP_TIMEOUT:
DG_LOG_DEBUG("Expect timed out");
goto error;
default:
DG_LOG_DEBUG("Expect failed for unknown reasons");
goto error;
}
I have done some reading about signals and sleep, but I've used sleep in multiple threads on many occasions and had no difficulties until now. What am I missing?
edit: misc version info
ubuntu 14.04 3.13.0-44-generic
/usr/lib/libexpect.so.5.45
code is in C
compiler is gcc (-lexpect -ltcl)
include <tcl8.6/expect.h>
I am using the classic multiple MOCs sharing the same NSPersistentStoreCoordinator as I understand it performs better than the parent/child contexts according to this:
http://floriankugler.com/blog/2013/4/29/concurrent-core-data-stack-performance-shootout
I am being very careful about creating the contexts and using them on the thread/queue that creates them.
The deadlock is this:
Main thread is executing a fetch request and hung on a mutex in [NSManagedObjectContext executeFetchRequest:error:]
#0 0x3ad840fc in __psynch_mutexwait ()
#1 0x3accd128 in pthread_mutex_lock ()
#2 0x328eee90 in -[_PFLock lock] ()
#3 0x328ff350 in -[NSPersistentStoreCoordinator executeRequest:withContext:error:] ()
#4 0x328fdf16 in -[NSManagedObjectContext executeFetchRequest:error:] ()
Background thread is hung on a lock I put on the persistent store coordinator prior to calling [coordinator managedObjectIDForURIRepresentation:objector].
#0 0x3ad840fc in __psynch_mutexwait ()
#1 0x3accd128 in pthread_mutex_lock ()
#2 0x328eee90 in -[_PFLock lock] ()
#3 0x00093180 in -[DataManager getObjectByUri:context:]
I understand that this is difficult to debug from a post but any ideas would be greatly appreciated. BTW...this happens rarely.
This question already has answers here:
Ensuring QProcess termination on termination of its parent QThread
(2 answers)
Closed 4 years ago.
how to terminate an ongoing QProcess that is running inside a QThread and gets deleted by another QThread? I even inserted a QMutex extCmdProcessLock, which should avoid the destruction of the DbManager before the extCmdProcess could finish or timeout.
I get a segmentation fault on "waitForStarted" if another thread calls delete on DbManager.
I cannot use signals (I think) because I use the external command inside a sequential data process.
Thank you very much for any help!
DbManager::extCmd(){
...
QMutexLocker locker(&extCmdProcessLock);
extCmdProcess = new QProcess(this);
QString argStr += " --p1=1"
+ " --p2=3";
extCmdProcess->start(cmd,argStr.split(QString(" ")));
bool startedSuccessfully = extCmdProcess->waitForStarted();
if (!startedSuccessfully) {
extCmdProcess->close();
extCmdProcess->kill();
extCmdProcess->waitForFinished();
delete extCmdProcess;
extCmdProcess = NULL;
return;
}
bool successfullyFinished = extCmdProcess->waitForFinished(-1);
if (!successfullyFinished) {
qDebug() << "finishing failed"; // Appendix C
extCmdProcess->close();
extCmdProcess->kill();
extCmdProcess->waitForFinished(-1);
delete extCmdProcess;
extCmdProcess = NULL;
return;
}
extCmdProcess->close();
delete extCmdProcess;
extCmdProcess = NULL;
}
DbManager::~DbManager(){
qDebug() << "DB DbManager destructor called.";
QMutexLocker locker(&extCmdProcessLock);
if (extCmdProcess!= NULL){
this->extCmdProcess->kill(); // added after Appendix A
this->extCmdProcess->waitForFinished();
}
}
Appendix A: I also get the error "QProcess: Destroyed while process is still running." and I read that this could mean that the "delete dbmanager" call from my other thread is executed while the waitForStarted() command has not completed. But I really wonder why the kill() command in my destructor has not fixed this.
Appendix B: According to comment, added waitForFinished(). Sadly, the QProcess termination still does not get shutdown properly, the segmentation fault happens in waitForStarted() or as below in start() itself.
#0 0x00007f25e03a492a in QEventDispatcherUNIX::registerSocketNotifier () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#1 0x00007f25e0392d0b in QSocketNotifier::QSocketNotifier () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#2 0x00007f25e0350bf8 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#3 0x00007f25e03513ef in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#4 0x00007f25e03115da in QProcess::start () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#5 0x0000000000428628 in DbManager::extCmd()
#6 0x000000000042ca06 in DbManager::storePos ()
#7 0x000000000044f51c in DeviceConnection::incomingData ()
#8 0x00000000004600fb in DeviceConnection::qt_metacall ()
#9 0x00007f25e0388782 in QObject::event () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#10 0x00007f25e0376e3f in QCoreApplicationPrivate::notify_helper () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#11 0x00007f25e0376e86 in QCoreApplication::notify () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#12 0x00007f25e0376ba4 in QCoreApplication::notifyInternal () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#13 0x00007f25e0377901 in QCoreApplicationPrivate::sendPostedEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#14 0x00007f25e03a4500 in QEventDispatcherUNIX::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#15 0x00007f25e0375e15 in QEventLoop::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#16 0x00007f25e0376066 in QEventLoop::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#17 0x00007f25e0277715 in QThread::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#18 0x00007f25e027a596 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#19 0x00007f25df9b43f7 in start_thread () from /lib/libpthread.so.0
#20 0x00007f25def89b4d in clone () from /lib/libc.so.6
#21 0x0000000000000000 in ?? ()
Appendix C: The debug output showed me, that the error message: QProcess: Destroyed while process is still running. always appears, when the finishing failed output appears. This means that my locks or/and kill attempts to protect the QProcess are failing.
Questions I wonder about:
a) If a create a QProcess object and start it, is my extCmdProcessLock unlocked? I already tried to use a normal lock() call instead of the QMutexLoader but no luck.
b) The docs say the main thread will be stopped if I use QProcess this way. Do they really mean the main thread or the thread in which QProcess is started? I assumed second.
c) is QProcess not usable in multithreading environment? If two threads create a QProcess object and run it, do they interfere? Maybe the object is somehow static?
Thanks for any help in filling the knowledge leaks. I really hope to get that puzzle solved.
Appendix D: After removing any delete and deleteLater() from any thread, my QProcess still gets smashed.
#0 0x00007fc94e9796b0 in QProcess::setProcessState () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#1 0x00007fc94e97998b in QProcess::waitForStarted () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#2 0x00007fc94e979a12 in QProcess::waitForFinished () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#3 0x0000000000425681 in DbManager::extCmd()
#4 0x0000000000426fb6 in DbManager::storePos ()
#5 0x000000000044d51c in DeviceConnection::incomingData ()
#6 0x000000000045fb7b in DeviceConnection::qt_metacall ()
#7 0x00007fc94e9f4782 in QObject::event () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#8 0x00007fc94e9e2e3f in QCoreApplicationPrivate::notify_helper () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#9 0x00007fc94e9e2e86 in QCoreApplication::notify () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#10 0x00007fc94e9e2ba4 in QCoreApplication::notifyInternal () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#11 0x00007fc94e9e3901 in QCoreApplicationPrivate::sendPostedEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#12 0x00007fc94ea10500 in QEventDispatcherUNIX::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#13 0x00007fc94e9e1e15 in QEventLoop::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#14 0x00007fc94e9e2066 in QEventLoop::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#15 0x00007fc94e8e3715 in QThread::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#16 0x00007fc94e8e6596 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#17 0x00007fc94e0203f7 in start_thread () from /lib/libpthread.so.0
#18 0x00007fc94d5f5b4d in clone () from /lib/libc.so.6
#19 0x0000000000000000 in ?? ()
It is really bad style to use a QThread to manage a running process. I'm seeing it again and again and it's some fundamental misunderstanding about how to write asynchronous applications properly. Processes are separate from your own application. QProcess provides a beautiful set of signals to notify you when it has successfully started, failed to start, and finished. Simply hook those signals to slots in an instance of a QObject-derived class of yours, and you'll be all set.
It's bad design if the number of threads in your application can exceed significantly the number of cores/hyperhtreads available on the platform, or if the number of threads is linked to some unrelated runtime factor like number of running subprocesses.
See my other other answer.
You can create QProcess on the heap, as a child of your monitoring QObject. You could connect QProcess's finished() signal to its own deleteLater() slot, so that it will automatically delete itself when it's done. The monitoring QObject should forcibly terminate any remaining running processes when it gets itself destroyed, say as a result of your application shutting down.
Further to the question was how to execute uncontrollably long running functions, say database queries for which there's no asynchronous API, with minimal impact, when interspersed with things for which there is good asynchronous API, such as QProcess.
A canonical way would be: do things synchronously where you must, asynchronously otherwise. You can stop the controlling object, and any running process, by invoking its deleteLater() slot -- either via a signal/slot connection, or using QMetaObject::invokeMethod() if you want to do it directly while safely crossing the thread boundary. This is the major benefit of using as few blocking calls as possible: you have some control over the processing and can stop it some of the time. With purely blocking implementation, there's no way to stop it short of using some flag variables and sprinkling your code with tests for it.
The deleteLater() will get processed any time the event loop can spin in the thread where a QObject lives. This means that it will get a chance between the database query calls -- any time when the process is running, in fact.
Untested code:
class Query : public QObject
{
Q_OBJECT
public:
Query(QObject * parent = 0) : QObject(parent) {
connect(process, SIGNAL(error(QProcess::ProcessError)), SLOT(error()));
connect(process, SIGNAL(finished(int,QProcess::ExitStatus)), SLOT(finished(int,QProcess::ExitStatus)));
}
~Query() { process.kill(); }
void start() {
QTimer::singleShot(0, this, SLOT(slot1()));
}
protected slots:
void slot1() {
// do a database query
process.start(....);
next = &Query::slot2;
}
protected:
// slot2 and slot3 don't have to be slots
void slot2() {
if (result == Error) {...}
else {...}
// another database query
process.start(...); // yet another process gets fired
next = &Query::slot3;
}
void slot3() {
if (result == Error) {...}
deleteLater();
}
protected slots:
void error() {
result = Error;
(this->*next)();
}
void finished(int code, QProcess::ExitStatus status) {
result = Finished;
exitCode = code;
exitStatus = status;
(this->*next)();
}
private:
QProcess process;
enum { Error, Finished } result;
int exitCode;
QProcess::ExitStatus exitStatus;
void (Query::* next)();
};
Personally, I'd check if the database that you're using has an asynchronous API. If it doesn't, but if the client library has available sources, then I'd do a minimal port to use Qt's networking stack to make it asynchronous. It would lower the overheads because you'd no more have one thread per database connection, and as you'd get closer to saturating the CPU, the overheads wouldn't rise: ordinarily, to saturate the CPU you'd need many, many threads, since they mostly idle. With asynchronous interface, the number of context switches would go down, since a thread would process one packet of data from the database, and could immediately process another packet from a different connection, without having to do a context switch: the execution stays within the event loop of that thread.
QProcess::waitForStarted just signals that your process has started. The mutex in extCmd() method gets unlocked then because you are not waiting for QProcess::waitForFinished in this method. You will exit this method while the child process is still running.
If you want to use a fire&forget type of execution I just you uses QProcess::startDetached
I am developeing an application on linux where i wanted to have backtrace of all running threads at a particular frequency. so my user defined signal handler SIGUSR1 (for all threads) calls backtrace().
i am getting crash(SIGSEGV) in my signal handler which is originated from backtrace() call. i have passed correct arguments to the function as specified on most of the sites.
http://linux.die.net/man/3/backtrace.
what could make backtrace() crash in this case?
To add more details:
What makes me to conclude that crash is inside backtrace is frame 14 below. onMySignal is the signal handler SIGUSR1 and it calls backtrace.
Sample code of onMySignal is (copied from linux documentation of backtrace)
pthread_mutex_lock( &sig_mutex );
int j, nptrs;
#define SIZE 100
void *buffer[100] = {NULL};//or void *buffer[100];
char **strings;
nptrs = backtrace(buffer, SIZE);
pthread_mutex_unlock( &sig_mutex );
(gdb) where
#0 0x00000037bac0e9dd in raise () from
#1 0x00002aaabda936b2 in skgesigOSCrash () from
#2 0x00002aaabdd31705 in kpeDbgSignalHandler ()
#3 0x00002aaabda938c2 in skgesig_sigactionHandler ()
#4 <signal handler called>
#5 0x00000037ba030265 in raise () from
#6 0x00000037ba031d10 in abort () from
#7 0x00002b6cef82efd7 in os::abort(bool) () from
#8 0x00002b6cef98205d in VMError::report_and_die() ()
#9 0x00002b6cef835655 in JVM_handle_linux_signal ()
#10 0x00002b6cef831bae in signalHandler(int, siginfo*, void*) ()
#11 <signal handler called>
#12 0x00000037be407638 in ?? ()
#13 0x00000037be4088bb in _Unwind_Backtrace ()
#14 0x00000037ba0e5fa8 in backtrace ()
#15 0x00002aaaaae3875f in onMySignal (signum=10,info=0x4088ec80, context=0x4088eb50)
#16 <signal handler called>
#17 0x00002aaab4aa8acb in mxSession::setPartition(int)
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb)
hope this will make more clear of issue..
#janneb
I have Written the Signal handler Implementation in Mutex lock for better synchronozation.
#janneb
i did not find in the Document specifying API backtrace_symbols/backtrace is async_signal_safe or not. and whether they should be used in Signal handler or not.
Still i removed backtrace_symbols from my Signal handler and dont use it anywhere.. but my actual problem of crash in backtrace() persit. and no clue why it is crashing..
Edit 23/06/11: more details:
(gdb) where
#0 0x00000037bac0e9dd in raise () from
#1 0x00002aaab98a36b2 in skgesigOSCrash () from
#2 0x00002aaab9b41705 in kpeDbgSignalHandler () from
#3 0x00002aaab98a38c2 in skgesig_sigactionHandler () from
#4 <signal handler called>
#5 0x00000037ba030265 in raise () from
#6 0x00000037ba031d10 in abort () from
#7 0x00002ac003803fd7 in os::abort(bool) () from
#8 0x00002ac00395705d in VMError::report_and_die() () from
#9 0x00002ac00380a655 in JVM_handle_linux_signal () from
#10 0x00002ac003806bae in signalHandler(int, siginfo*, void*) () from
#11 <signal handler called>
#12 0x00000037be407638 in ?? () from libgcc_s.so.1
#13 0x00000037be4088bb in _Unwind_Backtrace () from libgcc_s.so.1
#14 0x00000037ba0e5fa8 in backtrace () from libc.so.6
#15 0x00002aaaaae3875f in onMyBacktrace (signum=10, info=0x415d0eb0, context=0x415d0d80)
#16 <signal handler called>
#17 0x00000037ba071fa8 in _int_free () from libc.so.6
#18 0x00000000000007e0 in ?? ()
#19 0x000000005aab01a0 in ?? ()
#20 0x000000000000006f in ?? ()
#21 0x00000037ba075292 in realloc () from libc.so.6
#22 0x00002aaab6248c4e in Memory::reallocMemory(void*, unsigned long, char const*, int) ()
crashed occured when realloc was executing and one of the address was like 0x00000000000007e0 (looks invalid)..
The documentation for signal handling
defines the list of safe functions to call from a signal handler, you must not use any other functions, including backtrace. (search for async-signal-safe in that document)
What you can do is write to a pipe you have previously setup, and have a thread waiting for that pipe, which then does the backtrace.
EDIT:
Ok, so that backtrace function returns the current thread's stack, so can't be used from another thread, so my idea of using a separate thread to do the backtrace won't work.
Therefore: you could try backtrace_symbols_fd from your signal handler.
As an alternative you could use gdb to get the backtrace, without having to have code in your program - and gdb can handle multiple threads easily.
Shell script to run gdb and get back traces:
#!/bin/bash
PID="$1"
[ -d "/proc/$PID" ] || PID=$(pgrep $1)
[ -d "/proc/$PID" ] || { echo "Can't find process: $PID" >&2 ; exit 1 ; }
[ -d "$TMPDIR" ] || TMPDIR=/tmp
BATCH=$(mktemp $TMPDIR/pstack.gdb.XXXXXXXXXXXXX)
echo "thread apply all bt" >"$BATCH"
echo "quit" >>"$BATCH"
gdb "/proc/$PID/exe" "$PID" -batch -x "$BATCH" </dev/null
rm "$BATCH"
As stated by Douglas Leeder, backtrace isn't on the list of signal safe calls, though in this case I suspect the problem is the malloc done by backtrace_symbols, try using backtrace_symbols_fd, which does not call malloc, only write. (and drop the mutex calls, signal handlers should not sleep)
EDIT
From what I can tell from the source for backtrace, it should be signal safe itself, though it is possible that you are overrunning your stack.
You may want to look at glibc's implementation for libsegfault to see how it handles this case