deadlock at pthread_cond_destroy() - linux

my problem gdb output:
Program received signal SIGINT, Interrupt. 0x00007ffff7bcb86b in
__lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt
#0 0x00007ffff7bcb86b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff7bc8bf7 in _L_lock_21 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff7bc8a6e in pthread_cond_destroy##GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x0000000000400ab5 in control_destroy (mycontrol=0x6020c0) at control.c:20
#4 0x0000000000400f36 in cleanup_structs () at workcrew.c:160
#5 0x0000000000401027 in main () at workcrew.c:201
Note: the program run in cygwin success, but run in ubuntu linux, it's deadlock.
The all sub threads join finished before deadlock.
The source code is from web: http://www.ibm.com/developerworks/cn/linux/thread/posix_thread3/thread-3.tar.gz

The bug is in control.c:
int control_destroy(data_control *mycontrol) {
int mystatus;
if (pthread_cond_destroy(&(mycontrol->cond)))
return 1;
if (pthread_cond_destroy(&(mycontrol->cond)))
return 1;
mycontrol->active=0;
return 0;
}
This was, presumably, supposed to destroy the mutex and the condition variable. But instead it destroys the condition variable twice.

Related

New Thread spawned by cudaMalloc | Behaviour?

cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb.
It also took a while to return.
The same thread exited, although as a different LWP, at the end of the program.
Can someone explain this behaviour ?
The thread is not specifically spawned by cudaMalloc. The user side CUDA driver API library seems to spawn threads at some stage during lazy context setup which have the lifetime of the CUDA context. The exact processes are not publicly documented.
You see this associated with cudaMallocbecause I would guess this is the first API to trigger whatever setup/callbacks need to be done to make the userspace driver support work. You should notice that only the first call spawns a thread. Subsequent calls do not. And the threads stay alive for the lifetime of the CUDA context, after which they are terminated. You can trigger explicit thread destruction by calling cudaDeviceReset at any point in program execution.
Here is a trivial example which demonstrates cudaMemcpyToSymbol triggering the thread spawning from the driver API library, rather than cudaMalloc:
__device__ float someconstant;
int main()
{
cudaSetDevice(0);
const float x = 3.14159f;
cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
for(int i=0; i<10; i++) {
int *x;
cudaMalloc((void **)&x, size_t(1024));
cudaMemset(x, 0, 1024);
cudaFree(x);
}
return int(cudaDeviceReset());
}
In gdb I see this:
(gdb) tbreak main
Temporary breakpoint 1 at 0x40254f: file gdb_threads.cu, line 5.
(gdb) run
Starting program: /home/talonmies/SO/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at gdb_threads.cu:5
5 cudaSetDevice(0);
(gdb) next
6 const float x = 3.14159f;
(gdb) next
7 cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
(gdb) next
[New Thread 0x7ffff5eb5700 (LWP 14282)]
[New Thread 0x7fffed3ff700 (LWP 14283)]
8 for(int i=0; i<10; i++) {
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffed3ff700 (LWP 14283) "a.out" pthread_cond_timedwait##GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
2 Thread 0x7ffff5eb5700 (LWP 14282) "a.out" 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fd1740 (LWP 14259) "a.out" main () at gdb_threads.cu:8
(gdb) thread apply all bt
Thread 3 (Thread 0x7fffed3ff700 (LWP 14283)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff65cad97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff659582d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7fffed3ff700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7ffff5eb5700 (LWP 14282)):
#0 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff65c9953 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff66571ae in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bc182 in start_thread (arg=0x7ffff5eb5700) at pthread_create.c:312
#5 0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fd1740 (LWP 14259)):
#0 main () at gdb_threads.cu:8

How to understand "/proc/[pid]/stack"?

According to proc manual:
/proc/[pid]/stack (since Linux 2.6.29)
This file provides a symbolic trace of the function calls in
this process's kernel stack. This file is provided only if
the kernel was built with the CONFIG_STACKTRACE configuration
option.
So I write a program to test:
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
pid_t pid = fork();
if (pid > 0) {
wait(NULL);
return 0;
} else if (pid == 0) {
sleep(1000);
return 0;
}
return NULL;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, thread_func, "Thread 1");
pthread_create(&t2, NULL, thread_func, "Thread 2");
sleep(1000);
return 0;
}
After running, use pstack to check the threads of progress:
linux-uibj:~ # pstack 24976
Thread 3 (Thread 0x7fd6e4ed5700 (LWP 24977)):
#0 0x00007fd6e528d3f4 in wait () from /lib64/libpthread.so.0
#1 0x0000000000400744 in thread_func ()
#2 0x00007fd6e52860a4 in start_thread () from /lib64/libpthread.so.0
#3 0x00007fd6e4fbb7fd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fd6e46d4700 (LWP 24978)):
#0 0x00007fd6e528d3f4 in wait () from /lib64/libpthread.so.0
#1 0x0000000000400744 in thread_func ()
#2 0x00007fd6e52860a4 in start_thread () from /lib64/libpthread.so.0
#3 0x00007fd6e4fbb7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fd6e569f700 (LWP 24976)):
#0 0x00007fd6e4f8d6cd in nanosleep () from /lib64/libc.so.6
#1 0x00007fd6e4f8d564 in sleep () from /lib64/libc.so.6
#2 0x00000000004007b1 in main ()
At the same time, check /proc/24976/stack:
linux-uibj:~ # cat /proc/24976/stack
[<ffffffff804ba1a7>] system_call_fastpath+0x16/0x1b
[<00007fd6e4f8d6cd>] 0x7fd6e4f8d6cd
[<ffffffffffffffff>] 0xffffffffffffffff
The 24976 process has 3 threads, and they all block on system call(nanosleep and wait), so all 3 threads now work in kernel space, and turn into kernel threads now, right? If this is true, there should be 3 stacks in /proc/[pid]/stack file. But it seems there is only 1 stack in /proc/[pid]/stack file.
How should I understand /proc/[pid]/stack?
How should I understand /proc/[pid]/stack ?
Taken from the man pages for proc:
There are additional helpful pseudo-paths:
[stack]
The initial process's (also known as the main thread's) stack.
Just below this, you can find:
[stack:[tid]] (since Linux 3.4)
A thread's stack (where the [tid] is a thread ID).
It corresponds to the /proc/[pid]/task/[tid]/path.
Which seems to be what you are looking for.
Nan Xiao is right.
Thread kernel mode stack is under /proc/[PID]/task/[TID]/stack.
you are checking /proc/[PID]/stack, that's the main thread stack so you have only 1. Others are under task folder.
That is for sleep locks. You might also look at perf -g to see spin locks including high system time.

libexpect sigalrm nanosleep crash

I'm working with libexpect, but if the read times out (expected return code EXP_TIMEOUT) I instead get a crash as follows.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f1366275bb9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1366278fc8 in __GI_abort () at abort.c:89
#2 0x00007f13662b2e14 in __libc_message (do_abort=do_abort#entry=2, fmt=fmt#entry=0x7f13663bf06b "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f136634a7dc in __GI___fortify_fail (msg=<optimized out>) at fortify_fail.c:37
#4 0x00007f136634a6ed in ____longjmp_chk () at ../sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S:100
#5 0x00007f136634a649 in __longjmp_chk (env=0x1, val=1) at ../setjmp/longjmp.c:38
#6 0x00007f1366ed2a95 in ?? () from /usr/lib/libexpect.so.5.45
#7 <signal handler called>
#8 0x00007f1367334b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#9 0x000000000044cc13 in main (argc=3, argv=0x7fffca4013b8) at main_thread.c:6750
(gdb)
As you can see, I'm using nanosleep, which is supposed to not interact with signals like usleep and sleep (http://linux.die.net/man/2/nanosleep). As I understand it, libexpect uses SIGALRM to time out, but it's unclear to me how the two threads are interacting. If I had to guess, the expect call is raising a sigalrm, and it's interrupting the nanosleep call, but beyond that I don't know what's going on.
Thread 1:
while (stuff)
{
//dothings
struct timespec time;
time.tv_sec = 0.25;
time.tv_nsec = 250000000;
nanosleep(&time, NULL);
}
Thread 2:
switch(exp_expectl(fd, exp_glob, (char*)user_prompt, OK, exp_end))
{
case OK:
DG_LOG_DEBUG("Recieved user prompt");
break;
case EXP_TIMEOUT:
DG_LOG_DEBUG("Expect timed out");
goto error;
default:
DG_LOG_DEBUG("Expect failed for unknown reasons");
goto error;
}
I have done some reading about signals and sleep, but I've used sleep in multiple threads on many occasions and had no difficulties until now. What am I missing?
edit: misc version info
ubuntu 14.04 3.13.0-44-generic
/usr/lib/libexpect.so.5.45
code is in C
compiler is gcc (-lexpect -ltcl)
include <tcl8.6/expect.h>

Terminate an ongoing QProcess that is running inside a QThread? [duplicate]

This question already has answers here:
Ensuring QProcess termination on termination of its parent QThread
(2 answers)
Closed 4 years ago.
how to terminate an ongoing QProcess that is running inside a QThread and gets deleted by another QThread? I even inserted a QMutex extCmdProcessLock, which should avoid the destruction of the DbManager before the extCmdProcess could finish or timeout.
I get a segmentation fault on "waitForStarted" if another thread calls delete on DbManager.
I cannot use signals (I think) because I use the external command inside a sequential data process.
Thank you very much for any help!
DbManager::extCmd(){
...
QMutexLocker locker(&extCmdProcessLock);
extCmdProcess = new QProcess(this);
QString argStr += " --p1=1"
+ " --p2=3";
extCmdProcess->start(cmd,argStr.split(QString(" ")));
bool startedSuccessfully = extCmdProcess->waitForStarted();
if (!startedSuccessfully) {
extCmdProcess->close();
extCmdProcess->kill();
extCmdProcess->waitForFinished();
delete extCmdProcess;
extCmdProcess = NULL;
return;
}
bool successfullyFinished = extCmdProcess->waitForFinished(-1);
if (!successfullyFinished) {
qDebug() << "finishing failed"; // Appendix C
extCmdProcess->close();
extCmdProcess->kill();
extCmdProcess->waitForFinished(-1);
delete extCmdProcess;
extCmdProcess = NULL;
return;
}
extCmdProcess->close();
delete extCmdProcess;
extCmdProcess = NULL;
}
DbManager::~DbManager(){
qDebug() << "DB DbManager destructor called.";
QMutexLocker locker(&extCmdProcessLock);
if (extCmdProcess!= NULL){
this->extCmdProcess->kill(); // added after Appendix A
this->extCmdProcess->waitForFinished();
}
}
Appendix A: I also get the error "QProcess: Destroyed while process is still running." and I read that this could mean that the "delete dbmanager" call from my other thread is executed while the waitForStarted() command has not completed. But I really wonder why the kill() command in my destructor has not fixed this.
Appendix B: According to comment, added waitForFinished(). Sadly, the QProcess termination still does not get shutdown properly, the segmentation fault happens in waitForStarted() or as below in start() itself.
#0 0x00007f25e03a492a in QEventDispatcherUNIX::registerSocketNotifier () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#1 0x00007f25e0392d0b in QSocketNotifier::QSocketNotifier () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#2 0x00007f25e0350bf8 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#3 0x00007f25e03513ef in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#4 0x00007f25e03115da in QProcess::start () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#5 0x0000000000428628 in DbManager::extCmd()
#6 0x000000000042ca06 in DbManager::storePos ()
#7 0x000000000044f51c in DeviceConnection::incomingData ()
#8 0x00000000004600fb in DeviceConnection::qt_metacall ()
#9 0x00007f25e0388782 in QObject::event () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#10 0x00007f25e0376e3f in QCoreApplicationPrivate::notify_helper () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#11 0x00007f25e0376e86 in QCoreApplication::notify () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#12 0x00007f25e0376ba4 in QCoreApplication::notifyInternal () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#13 0x00007f25e0377901 in QCoreApplicationPrivate::sendPostedEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#14 0x00007f25e03a4500 in QEventDispatcherUNIX::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#15 0x00007f25e0375e15 in QEventLoop::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#16 0x00007f25e0376066 in QEventLoop::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#17 0x00007f25e0277715 in QThread::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#18 0x00007f25e027a596 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#19 0x00007f25df9b43f7 in start_thread () from /lib/libpthread.so.0
#20 0x00007f25def89b4d in clone () from /lib/libc.so.6
#21 0x0000000000000000 in ?? ()
Appendix C: The debug output showed me, that the error message: QProcess: Destroyed while process is still running. always appears, when the finishing failed output appears. This means that my locks or/and kill attempts to protect the QProcess are failing.
Questions I wonder about:
a) If a create a QProcess object and start it, is my extCmdProcessLock unlocked? I already tried to use a normal lock() call instead of the QMutexLoader but no luck.
b) The docs say the main thread will be stopped if I use QProcess this way. Do they really mean the main thread or the thread in which QProcess is started? I assumed second.
c) is QProcess not usable in multithreading environment? If two threads create a QProcess object and run it, do they interfere? Maybe the object is somehow static?
Thanks for any help in filling the knowledge leaks. I really hope to get that puzzle solved.
Appendix D: After removing any delete and deleteLater() from any thread, my QProcess still gets smashed.
#0 0x00007fc94e9796b0 in QProcess::setProcessState () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#1 0x00007fc94e97998b in QProcess::waitForStarted () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#2 0x00007fc94e979a12 in QProcess::waitForFinished () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#3 0x0000000000425681 in DbManager::extCmd()
#4 0x0000000000426fb6 in DbManager::storePos ()
#5 0x000000000044d51c in DeviceConnection::incomingData ()
#6 0x000000000045fb7b in DeviceConnection::qt_metacall ()
#7 0x00007fc94e9f4782 in QObject::event () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#8 0x00007fc94e9e2e3f in QCoreApplicationPrivate::notify_helper () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#9 0x00007fc94e9e2e86 in QCoreApplication::notify () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#10 0x00007fc94e9e2ba4 in QCoreApplication::notifyInternal () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#11 0x00007fc94e9e3901 in QCoreApplicationPrivate::sendPostedEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#12 0x00007fc94ea10500 in QEventDispatcherUNIX::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#13 0x00007fc94e9e1e15 in QEventLoop::processEvents () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#14 0x00007fc94e9e2066 in QEventLoop::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#15 0x00007fc94e8e3715 in QThread::exec () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#16 0x00007fc94e8e6596 in ?? () from /usr/local/Trolltech/Qt-4.7.4/lib/libQtCore.so.4
#17 0x00007fc94e0203f7 in start_thread () from /lib/libpthread.so.0
#18 0x00007fc94d5f5b4d in clone () from /lib/libc.so.6
#19 0x0000000000000000 in ?? ()
It is really bad style to use a QThread to manage a running process. I'm seeing it again and again and it's some fundamental misunderstanding about how to write asynchronous applications properly. Processes are separate from your own application. QProcess provides a beautiful set of signals to notify you when it has successfully started, failed to start, and finished. Simply hook those signals to slots in an instance of a QObject-derived class of yours, and you'll be all set.
It's bad design if the number of threads in your application can exceed significantly the number of cores/hyperhtreads available on the platform, or if the number of threads is linked to some unrelated runtime factor like number of running subprocesses.
See my other other answer.
You can create QProcess on the heap, as a child of your monitoring QObject. You could connect QProcess's finished() signal to its own deleteLater() slot, so that it will automatically delete itself when it's done. The monitoring QObject should forcibly terminate any remaining running processes when it gets itself destroyed, say as a result of your application shutting down.
Further to the question was how to execute uncontrollably long running functions, say database queries for which there's no asynchronous API, with minimal impact, when interspersed with things for which there is good asynchronous API, such as QProcess.
A canonical way would be: do things synchronously where you must, asynchronously otherwise. You can stop the controlling object, and any running process, by invoking its deleteLater() slot -- either via a signal/slot connection, or using QMetaObject::invokeMethod() if you want to do it directly while safely crossing the thread boundary. This is the major benefit of using as few blocking calls as possible: you have some control over the processing and can stop it some of the time. With purely blocking implementation, there's no way to stop it short of using some flag variables and sprinkling your code with tests for it.
The deleteLater() will get processed any time the event loop can spin in the thread where a QObject lives. This means that it will get a chance between the database query calls -- any time when the process is running, in fact.
Untested code:
class Query : public QObject
{
Q_OBJECT
public:
Query(QObject * parent = 0) : QObject(parent) {
connect(process, SIGNAL(error(QProcess::ProcessError)), SLOT(error()));
connect(process, SIGNAL(finished(int,QProcess::ExitStatus)), SLOT(finished(int,QProcess::ExitStatus)));
}
~Query() { process.kill(); }
void start() {
QTimer::singleShot(0, this, SLOT(slot1()));
}
protected slots:
void slot1() {
// do a database query
process.start(....);
next = &Query::slot2;
}
protected:
// slot2 and slot3 don't have to be slots
void slot2() {
if (result == Error) {...}
else {...}
// another database query
process.start(...); // yet another process gets fired
next = &Query::slot3;
}
void slot3() {
if (result == Error) {...}
deleteLater();
}
protected slots:
void error() {
result = Error;
(this->*next)();
}
void finished(int code, QProcess::ExitStatus status) {
result = Finished;
exitCode = code;
exitStatus = status;
(this->*next)();
}
private:
QProcess process;
enum { Error, Finished } result;
int exitCode;
QProcess::ExitStatus exitStatus;
void (Query::* next)();
};
Personally, I'd check if the database that you're using has an asynchronous API. If it doesn't, but if the client library has available sources, then I'd do a minimal port to use Qt's networking stack to make it asynchronous. It would lower the overheads because you'd no more have one thread per database connection, and as you'd get closer to saturating the CPU, the overheads wouldn't rise: ordinarily, to saturate the CPU you'd need many, many threads, since they mostly idle. With asynchronous interface, the number of context switches would go down, since a thread would process one packet of data from the database, and could immediately process another packet from a different connection, without having to do a context switch: the execution stays within the event loop of that thread.
QProcess::waitForStarted just signals that your process has started. The mutex in extCmd() method gets unlocked then because you are not waiting for QProcess::waitForFinished in this method. You will exit this method while the child process is still running.
If you want to use a fire&forget type of execution I just you uses QProcess::startDetached

what makes backtrace() crash(SIGSEGV ) on Linux 64 bit

I am developeing an application on linux where i wanted to have backtrace of all running threads at a particular frequency. so my user defined signal handler SIGUSR1 (for all threads) calls backtrace().
i am getting crash(SIGSEGV) in my signal handler which is originated from backtrace() call. i have passed correct arguments to the function as specified on most of the sites.
http://linux.die.net/man/3/backtrace.
what could make backtrace() crash in this case?
To add more details:
What makes me to conclude that crash is inside backtrace is frame 14 below. onMySignal is the signal handler SIGUSR1 and it calls backtrace.
Sample code of onMySignal is (copied from linux documentation of backtrace)
pthread_mutex_lock( &sig_mutex );
int j, nptrs;
#define SIZE 100
void *buffer[100] = {NULL};//or void *buffer[100];
char **strings;
nptrs = backtrace(buffer, SIZE);
pthread_mutex_unlock( &sig_mutex );
(gdb) where
#0 0x00000037bac0e9dd in raise () from
#1 0x00002aaabda936b2 in skgesigOSCrash () from
#2 0x00002aaabdd31705 in kpeDbgSignalHandler ()
#3 0x00002aaabda938c2 in skgesig_sigactionHandler ()
#4 <signal handler called>
#5 0x00000037ba030265 in raise () from
#6 0x00000037ba031d10 in abort () from
#7 0x00002b6cef82efd7 in os::abort(bool) () from
#8 0x00002b6cef98205d in VMError::report_and_die() ()
#9 0x00002b6cef835655 in JVM_handle_linux_signal ()
#10 0x00002b6cef831bae in signalHandler(int, siginfo*, void*) ()
#11 <signal handler called>
#12 0x00000037be407638 in ?? ()
#13 0x00000037be4088bb in _Unwind_Backtrace ()
#14 0x00000037ba0e5fa8 in backtrace ()
#15 0x00002aaaaae3875f in onMySignal (signum=10,info=0x4088ec80, context=0x4088eb50)
#16 <signal handler called>
#17 0x00002aaab4aa8acb in mxSession::setPartition(int)
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb)
hope this will make more clear of issue..
#janneb
I have Written the Signal handler Implementation in Mutex lock for better synchronozation.
#janneb
i did not find in the Document specifying API backtrace_symbols/backtrace is async_signal_safe or not. and whether they should be used in Signal handler or not.
Still i removed backtrace_symbols from my Signal handler and dont use it anywhere.. but my actual problem of crash in backtrace() persit. and no clue why it is crashing..
Edit 23/06/11: more details:
(gdb) where
#0 0x00000037bac0e9dd in raise () from
#1 0x00002aaab98a36b2 in skgesigOSCrash () from
#2 0x00002aaab9b41705 in kpeDbgSignalHandler () from
#3 0x00002aaab98a38c2 in skgesig_sigactionHandler () from
#4 <signal handler called>
#5 0x00000037ba030265 in raise () from
#6 0x00000037ba031d10 in abort () from
#7 0x00002ac003803fd7 in os::abort(bool) () from
#8 0x00002ac00395705d in VMError::report_and_die() () from
#9 0x00002ac00380a655 in JVM_handle_linux_signal () from
#10 0x00002ac003806bae in signalHandler(int, siginfo*, void*) () from
#11 <signal handler called>
#12 0x00000037be407638 in ?? () from libgcc_s.so.1
#13 0x00000037be4088bb in _Unwind_Backtrace () from libgcc_s.so.1
#14 0x00000037ba0e5fa8 in backtrace () from libc.so.6
#15 0x00002aaaaae3875f in onMyBacktrace (signum=10, info=0x415d0eb0, context=0x415d0d80)
#16 <signal handler called>
#17 0x00000037ba071fa8 in _int_free () from libc.so.6
#18 0x00000000000007e0 in ?? ()
#19 0x000000005aab01a0 in ?? ()
#20 0x000000000000006f in ?? ()
#21 0x00000037ba075292 in realloc () from libc.so.6
#22 0x00002aaab6248c4e in Memory::reallocMemory(void*, unsigned long, char const*, int) ()
crashed occured when realloc was executing and one of the address was like 0x00000000000007e0 (looks invalid)..
The documentation for signal handling
defines the list of safe functions to call from a signal handler, you must not use any other functions, including backtrace. (search for async-signal-safe in that document)
What you can do is write to a pipe you have previously setup, and have a thread waiting for that pipe, which then does the backtrace.
EDIT:
Ok, so that backtrace function returns the current thread's stack, so can't be used from another thread, so my idea of using a separate thread to do the backtrace won't work.
Therefore: you could try backtrace_symbols_fd from your signal handler.
As an alternative you could use gdb to get the backtrace, without having to have code in your program - and gdb can handle multiple threads easily.
Shell script to run gdb and get back traces:
#!/bin/bash
PID="$1"
[ -d "/proc/$PID" ] || PID=$(pgrep $1)
[ -d "/proc/$PID" ] || { echo "Can't find process: $PID" >&2 ; exit 1 ; }
[ -d "$TMPDIR" ] || TMPDIR=/tmp
BATCH=$(mktemp $TMPDIR/pstack.gdb.XXXXXXXXXXXXX)
echo "thread apply all bt" >"$BATCH"
echo "quit" >>"$BATCH"
gdb "/proc/$PID/exe" "$PID" -batch -x "$BATCH" </dev/null
rm "$BATCH"
As stated by Douglas Leeder, backtrace isn't on the list of signal safe calls, though in this case I suspect the problem is the malloc done by backtrace_symbols, try using backtrace_symbols_fd, which does not call malloc, only write. (and drop the mutex calls, signal handlers should not sleep)
EDIT
From what I can tell from the source for backtrace, it should be signal safe itself, though it is possible that you are overrunning your stack.
You may want to look at glibc's implementation for libsegfault to see how it handles this case

Resources