Do QThreads run on parallel? - multithreading

I have two threads running and they simply print a message. Here is an minimalistic example of it.
Here is my Header.h:
#pragma once
#include <QtCore/QThread>
#include <QtCore/QDebug>
class WorkerOne : public QObject {
Q_OBJECT
public Q_SLOTS:
void printFirstMessage() {
while (1) {
qDebug() << "<<< Message from the FIRST worker" << QThread::currentThreadId();
}
}
};
class WorkerTwo : public QObject {
Q_OBJECT
public Q_SLOTS:
void printSecondMessage() {
while (1) {
qDebug() << ">>> Message from the SECOND worker" << QThread::currentThreadId();
}
}
};
And, of course, my main:
#include <QtCore/QCoreApplication>
#include "Header.h"
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
WorkerOne kek1;
QThread t1;
kek1.moveToThread(&t1);
t1.setObjectName("FIRST THREAD");
QThread t2;
WorkerTwo kek2;
kek2.moveToThread(&t2);
t2.setObjectName("SECOND THREAD");
QObject::connect(&t1, &QThread::started, &kek1, &WorkerOne::printFirstMessage);
QObject::connect(&t2, &QThread::started, &kek2, &WorkerTwo::printSecondMessage);
t1.start();
t2.start();
return a.exec();
}
When I start application I see an expected output of it:
As you may see, thread id is different. It's was added to be sure they are running on different threads.
I set the only one breakpoint in printFirstMessage and run the application in debug mode attached to the debugger. Once the debugger stops at my breakpoint - I wait for a while and press Continue, so my debugger stops at the same breakpoint again.
What do I expect to see? I expect to see only one <<< Message from the FIRST worker and a lot of messages from the second worker. But what do I see? I see only two messages: the first one from the first worker and the second one from the second worker.
I pressed Continue a lot of times and the result is more or less the same. That's weird to me, because I expected the second thread to be running while the first one is stopped by debugger.
I decided to test it using std::thread and wrote the following code:
#include <thread>
#include <iostream>
void foo1() {
while (true) {
std::cout << "Function ONE\n";
}
}
void foo2() {
while (true) {
std::cout << "The second function\n";
}
}
int main() {
std::thread t1(&foo1);
std::thread t2(&foo2);
t1.join();
t2.join();
}
Set a breakpoint in the first one, starts the app, after stopping at the breakpoint I hit Continue and see that console contains a lot of messages from the second function and only one from the first function (exactly this I expected using QThread as well):
Could someone explain how does it works with QThread? By the way, I tested it using QtConcurrent::run instead of QThread and the result was as expected: the second function is running while the first one is stopped because of a breakpoint.

Yes, multiple QThread instances are allowed to run in parallel. Whether they effectively run in parallel is up to your OS and depends on multiple factors:
The number of physical (and logical) CPU cores. This is typically not more than 4 or 8 on a consumer computer. This is the maximum number of threads (including the threads of other programs and your OS itself) that can be effectively run in parallel. The number of cores is much lower than the number of threads typically running on a computer. If your computer consists of only 1 core, you will still be able to use multiple QThread's but the OS scheduler will alternate between executing those threads. QThread::idealThreadCount can be used to query the number of (logical) CPU cores.
Each thread has a QThread::Priority. The OS thread scheduler may use this value to prioritize (or de-prioritize) one thread over another. A thread with a lower priority may get less CPU time than a thread with a higher priority when the CPU cores are busy.
The (workload on the) other threads that are currently running.
Debugging your program definitely alters the normal execution of a multi thread program:
Interrupting and continuing a thread has a certain overhead. In the meantime, the other threads may still/already perform some operations.
As pointed out by G.M., most of the time all threads are interrupted when a breakpoint is hit. How fast the others threads are interrupted is not well defined.
Often a debugger has a configuration option to allow interrupting a single thread, while the others continue running, see f.ex. this question.
The number of loops that are executed while the other thread is interrupted/started again, depends on the number of CPU instructions that are needed to perform a single loop. Calling qDebug() and QThread::currentThreadId() is definitely slower than a single std::cout.
Conclusion: You don't have any hard garanty about the scheduling of a thread. However, in normal operation, both threads will get almost the same amount of CPU time on average as the OS scheduler has no reason the favor one over the other. Using a debugger completely alters this normal behavior.

Related

Is the following code thread unsafe? Is so, how can I make a possible result more likely to come out?

Is the screen output of the following program deterministic? My understanding is that it is not, as it could be either 1 or 2 depending on whether the latest thread to pick up the value of i picks it up before or after the other thread has written 1 into it.
On the other, hand I keep seeing the same output as if each thread waits the previous to finish, as in I get 2 on screen in this case, or 100 if I create similar threads from t1 to t100 and join them all.
If the answer is no, the result is not deterministic, is there a way with a simple toy program to increase the odds that the one of the possible results comes out?
#include <iostream>
#include <thread>
int main() {
int i = 0;
std::thread t1([&i](){ ++i; });
std::thread t2([&i](){ ++i; });
t1.join();
t2.join();
std::cout << i << '\n';
}
(I'm compiling and running it like this: g++ -std=c++11 -lpthread prova.cpp -o exe && ./exe.)
Your are always seeing the same result because the first thread starts and runs its operations before the second one. This narrows the window for a race condition to occur.
But ultimately, there is still a chance that it occurs because the ++ operation is not atomic (read value, then increment, then write).
If the two threads start at the same time (eg: thread 1 slowed down due to the CPU being busy), then they will read the same value and the final result will be 1.

How to unblock a QThread running a pcsc call?

I have a Qt application that connects to a card reader using various pcsc implementations under GNU/Linux, MacOS, and Windows. All communication with the card runs in a worker thread.
In one scenario, the user starts an operation requiring communication with the card via a card reader. The card reader has a keyboard and during the authentication procedure the user must enter their PIN on the reader's keyboard.
This operation is implemented by a call to SCardControl() (see e.g. the Microsoft documentation). As long as the user is working with the reader, the call to SCardControl() does not terminate and the worker thread is blocked by it.
At this point, the user might decide to close the application while the operation is still pending. Closing the application at this point causes the application to crash (on Linux with signal SIGABRT) because:
The worker thread is blocked waiting for SCardControl() to return.
The main thread cannot stop the blocked thread: neither quit() nor terminate() cause the thread to finish.
When the application is exited, the QThread object for the worker thread is destroyed and, since the thread is still running state, it throws a signal to indicate an error.
I have tried several solutions.
Subclass QThread and create a worker thread which calls setTerminationEnabled(true); to allow termination through QThread::terminate(). This does not work on MacOS: when QThread is destroyed, the thread is still in a running state and the signal SIGABRT is emitted.
Handle signal SIGABRT on shutdown and ignore it. This did not seem to be a good idea but I wanted to try it out before discarding it. After ignoring signal SIGABRT, a signal SIGSEGV is received and the application crashes. I had adapted the approach described here.
Try to unblock the thread by sending a command to the card reader from the main thread. I tried SCardCancel(), SCardDisconnect() and SCardReleaseContext() but none of these commands has any effect on the blocked thread.
I find it quite strange that it is not possible to cleanly shutdown an application when a thread is blocked on some function call, but all the solutions I have tried have not worked and I have run out of ideas. Did I overlook something? Does anybody have any useful hint?
EDIT
I looked into the Qt source code for QThread and found out that on Unix-like platforms QThread::terminate() uses pthread_cancel() internally. But apparently pthread_cancel() does not work / does nothing on Darwin, see e.g. here and here.
So, maybe I will really have to go with the option of showing a dialog to the user asking to remove the card from the reader.
Cleanly shutting down a thread is not possible from outside if it is blocked in a call. You can, however, prevent user from quitting the application before the operation has completed.
void MainWindow::closeEvent(QCloseEvent *closeEvent) {
if (workerBlocked) closeEvent->ignore();
}
In addition, you can show a dialog telling the user the operation has to be completed first.
Also, if possible, you can let the window close but keep the application alive until the operation is complete by setting qApp->setQuitOnLastWindowClosed(false);
The problem boils down to the fact that a QThread object isn't destructible while the associated thread is running. Usually, it would a print statement like this to the debug output:
QThread: Destroyed while thread is still running
Don't agonize over trying to get SCardControl to return so that the worker thread can be quit safely (since it doesn't return as long as the user is interacting with the reader). Instead, You can follow this answer to destruct the QThread object in a safe manner with a minimum amount of changes to your current implementation.
Here is an example that shows what I mean:
#include <QtWidgets>
//a thread that can be destroyed at any time
//see http://stackoverflow.com/a/25230470
class SafeThread : public QThread{
using QThread::run;
public:
explicit SafeThread(QObject* parent= nullptr):QThread(parent){}
~SafeThread(){ quit(); wait(); }
};
//worker QObject class
class Worker : public QObject {
Q_OBJECT
public:
explicit Worker(QObject* parent = nullptr):QObject(parent){}
~Worker(){}
Q_SLOT void doBlockingWork() {
emit started();
//the sleep call blocks the worker thread for 10 seconds!
//consider it a mock call to the SCardControl function
QThread::sleep(10);
emit finished();
}
Q_SIGNAL void started();
Q_SIGNAL void finished();
};
int main(int argc, char* argv[]) {
QApplication a(argc, argv);
//setup worker thread and QObject
Worker worker;
SafeThread thread;
worker.moveToThread(&thread);
thread.start();
//setup GUI components
QWidget w;
QVBoxLayout layout(&w);
QPushButton button("start working");
QLabel status("idle");
layout.addWidget(&button);
layout.addWidget(&status);
//connect signals/slots
QObject::connect(&worker, &Worker::started, &status,
[&status]{ status.setText("working. . .");} );
QObject::connect(&worker, &Worker::finished, &status,
[&status]{ status.setText("idle");} );
QObject::connect(&button, &QPushButton::clicked, &worker, &Worker::doBlockingWork);
w.show();
return a.exec();
}
#include "main.moc"
Notice that the SafeThread's destructor makes sure to wait() until the associated thread has finished execution. And only afterwards, the main thread can proceed to call QThread's destructor.

High availability computing: How to deal with a non-returning system call, without risking false positives?

I have a process that's running on a Linux computer as part of a high-availability system. The process has a main thread that receives requests from the other computers on the network and responds to them. There is also a heartbeat thread that sends out multicast heartbeat packets periodically, to let the other processes on the network know that this process is still alive and available -- if they don't heart any heartbeat packets from it for a while, one of them will assume this process has died and will take over its duties, so that the system as a whole can continue to work.
This all works pretty well, but the other day the entire system failed, and when I investigated why I found the following:
Due to (what is apparently) a bug in the box's Linux kernel, there was a kernel "oops" induced by a system call that this process's main thread made.
Because of the kernel "oops", the system call never returned, leaving the process's main thread permanently hung.
The heartbeat thread, OTOH, continue to operate correctly, which meant that the other nodes on the network never realized that this node had failed, and none of them stepped in to take over its duties... and so the requested tasks were not performed and the system's operation effectively halted.
My question is, is there an elegant solution that can handle this sort of failure? (Obviously one thing to do is fix the Linux kernel so it doesn't "oops", but given the complexity of the Linux kernel, it would be nice if my software could handle future other kernel bugs more gracefully as well).
One solution I don't like would be to put the heartbeat generator into the main thread, rather than running it as a separate thread, or in some other way tie it to the main thread so that if the main thread gets hung up indefinitely, heartbeats won't get sent. The reason I don't like this solution is because the main thread is not a real-time thread, and so doing this would introduce the possibility of occasional false-positives where a slow-to-complete operation was mistaken for a node failure. I'd like to avoid false positives if I can.
Ideally there would be some way to ensure that a failed syscall either returns an error code, or if that's not possible, crashes my process; either of those would halt the generation of heartbeat packets and allow a failover to proceed. Is there any way to do that, or does an unreliable kernel doom my user process to unreliability as well?
My second suggestion is to use ptrace to find the current instruction pointer. You can have a parent thread that ptraces your process and interrupts it every second to check the current RIP value. This is somewhat complex, so I've written a demonstration program: (x86_64 only, but that should be fixable by changing the register names.)
#define _GNU_SOURCE
#include <unistd.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <linux/ptrace.h>
#include <sys/user.h>
#include <time.h>
// this number is arbitrary - find a better one.
#define STACK_SIZE (1024 * 1024)
int main_thread(void *ptr) {
// "main" thread is now running under the monitor
printf("Hello from main!");
while (1) {
int c = getchar();
if (c == EOF) { break; }
nanosleep(&(struct timespec) {0, 200 * 1000 * 1000}, NULL);
putchar(c);
}
return 0;
}
int main(int argc, char *argv[]) {
void *vstack = malloc(STACK_SIZE);
pid_t v;
if (clone(main_thread, vstack + STACK_SIZE, CLONE_PARENT_SETTID | CLONE_FILES | CLONE_FS | CLONE_IO, NULL, &v) == -1) { // you'll want to check these flags
perror("failed to spawn child task");
return 3;
}
printf("Target: %d; %d\n", v, getpid());
long ptv = ptrace(PTRACE_SEIZE, v, NULL, NULL);
if (ptv == -1) {
perror("failed monitor sieze");
exit(1);
}
struct user_regs_struct regs;
fprintf(stderr, "beginning monitor...\n");
while (1) {
sleep(1);
long ptv = ptrace(PTRACE_INTERRUPT, v, NULL, NULL);
if (ptv == -1) {
perror("failed to interrupt main thread");
break;
}
int status;
if (waitpid(v, &status, __WCLONE) == -1) {
perror("target wait failed");
break;
}
if (!WIFSTOPPED(status)) { // this section is messy. do it better.
fputs("target wait went wrong", stderr);
break;
}
if ((status >> 8) != (SIGTRAP | PTRACE_EVENT_STOP << 8)) {
fputs("target wait went wrong (2)", stderr);
break;
}
ptv = ptrace(PTRACE_GETREGS, v, NULL, &regs);
if (ptv == -1) {
perror("failed to peek at registers of thread");
break;
}
fprintf(stderr, "%d -> RIP %x RSP %x\n", time(NULL), regs.rip, regs.rsp);
ptv = ptrace(PTRACE_CONT, v, NULL, NULL);
if (ptv == -1) {
perror("failed to resume main thread");
break;
}
}
return 2;
}
Note that this is not production-quality code. You'll need to do a bunch of fixing things up.
Based on this, you should be able to figure out whether or not the program counter is advancing, and could combine this with other pieces of information (such as /proc/PID/status) to find if it's busy in a system call. You might also be able to extend the usage of ptrace to check what system calls are being used, so that you can check if it's a reasonable one to be waiting on.
This is a hacky solution, but I don't think that you'll find a non-hacky solution for this problem. Despite the hackiness, I don't think (this is untested) that it would be particularly slow; my implementation pauses the monitored thread once per second for a very short amount of time - which I would guess would be in the 100s of microseconds range. That's around 0.01% efficiency loss, theoretically.
I think you need a shared activity marker.
Have the main thread (or in a more general application, all worker threads) update the shared activity marker with the current time (or clock tick, e.g. by computing the "current" nanosecond from clock_gettime(CLOCK_MONOTONIC, ...)), and have the heartbeat thread periodically check when this activity marker was last updated, cancelling itself (and thus stopping the heartbeat broadcast) if there has not been any activity update within a reasonable time.
This scheme can easily be extended with a state flag if the workload is very sporadic. The main work thread sets the flag and updates the activity marker when it begins a unit of work, and clears the flag when the work has completed. If there is no work being done then the heartbeat is sent without checking the activity marker. If work is being done then the heartbeat is stopped if the time since the activity marker was updated exceeds the maximum processing time allowed for a unit of work. (Multiple worker threads each need their own activity marker and flag in this case, and the heartbeat thread can be designed to stop when any one worker thread gets stuck, or only when all worker threads get stuck, depending on their purposes and importance to the overall system).
(The activity marker value (and the work flag) will of course have to be protected by a mutex that must be acquired before reading or writing the value.)
Perhaps the heartbeat thread can also cause the whole process to commit suicide (e.g. kill(getpid(), SIGQUIT)) so that it can be restarted by having it be called in a loop in a wrapper script, especially if a process restart clears the condition in the kernel which would cause the problem in the first place.
One possible method would be to have another set of heartbeat messages from the main thread to the heartbeat thread. If it stops receiving messages for a certain amount of time, it stops sending them out as well. (And could try other recovery such as restarting the process.)
To solve the issue of the main thread actually just being in a long sleep, have a (properly-synchronized) flag that the heartbeat thread sets when it has decided that the main thread must have failed - and the main thread should check this flag at appropriate times (e.g. after the potential wait) to make sure that it hasn't been reported as dead. If it has, it stops running, because its job would have already been taken up by a different node.
The main thread can also send I-am-alive events to the heartbeat thread at other times than once around the loop - for example, if it's going into a long-running operation. Without this, there's no way to tell the difference between a failed main thread and a sleeping main thread.

Long-running / blocking operations in boost asio handlers

Current Situation
I implemented a TCP server using boost.asio which currently uses a single io_service object on which I call the run method from a single thread.
So far the server was able to answer the requests of the clients immediately, since it had all necessary information in the memory (no long-running operations in the receive handler were necessary).
Problem
Now requirements have changed and I need to get some information out of a database (with ODBC) - which is basically a long-running blocking operation - in order to create the response for the clients.
I see several approaches, but I don't know which one is best (and there are probably even more approaches):
First Approach
I could keep the long running operations in the handlers, and simply call io_service.run() from multiple threads. I guess I would use as many threads as I have CPU cores available?
While this approach would be easy to implement, I don't think I would get the best performance with this approach because of the limited number of threads (which are idling most of the time since database access is more an I/O-bound operation than a compute-bound operation).
Second Approach
In section 6 of this document it says:
Use threads for long running tasks
A variant of the single-threaded design, this design still uses a single io_service::run() thread for implementing protocol logic. Long running or blocking tasks are passed to a background thread and, once completed, the result is posted back to the io_service::run() thread.
This sounds promising, but I don't know how to implement that. Can anyone provide some code snippet / example for this approach?
Third Approach
Boris Schäling explains in section 7.5 of his boost introduction how to extend boost.asio with custom services.
This looks like a lot of work. Does this approach have any benefits compared to the other approaches?
The approaches are not explicitly mutually exclusive. I often see a combination of the first and second:
One or more thread are processing network I/O in one io_service.
Long running or blocking tasks are posted into a different io_service. This io_service functions as a thread pool that will not interfere with threads handling network I/O. Alternatively, one could spawn a detached thread every time a long running or blocking task is needed; however, the overhead of thread creation/destruction may a noticeable impact.
This answer that provides a thread pool implementation. Additionally, here is a basic example that tries to emphasize the interaction between two io_services.
#include <iostream>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/chrono.hpp>
#include <boost/optional.hpp>
#include <boost/thread.hpp>
/// #brief Background service will function as a thread-pool where
/// long-standing blocking operations may occur without affecting
/// the network event loop.
boost::asio::io_service background_service;
/// #brief The main io_service will handle network operations.
boost::asio::io_service io_service;
boost::optional<boost::asio::io_service::work> work;
/// #brief ODBC blocking operation.
///
/// #brief data Data to use for query.
/// #brief handler Handler to invoke upon completion of operation.
template <typename Handler>
void query_odbc(unsigned int data,
Handler handler)
{
std::cout << "in background service, start querying odbc\n";
std::cout.flush();
// Mimic busy work.
boost::this_thread::sleep_for(boost::chrono::seconds(5));
std::cout << "in background service, posting odbc result to main service\n";
std::cout.flush();
io_service.post(boost::bind(handler, data * 2));
}
/// #brief Functions as a continuation for handle_read, that will be
/// invoked with results from ODBC.
void handle_read_odbc(unsigned int result)
{
std::stringstream stream;
stream << "in main service, got " << result << " from odbc.\n";
std::cout << stream.str();
std::cout.flush();
// Allow io_service to stop in this example.
work = boost::none;
}
/// #brief Mocked up read handler that will post work into a background
/// service.
void handle_read(const boost::system::error_code& error,
std::size_t bytes_transferred)
{
std::cout << "in main service, need to query odbc" << std::endl;
typedef void (*handler_type)(unsigned int);
background_service.post(boost::bind(&query_odbc<handler_type>,
21, // data
&handle_read_odbc) // handler
);
// Keep io_service event loop running in this example.
work = boost::in_place(boost::ref(io_service));
}
/// #brief Loop to show concurrency.
void print_loop(unsigned int iteration)
{
if (!iteration) return;
std::cout << " in main service, doing work.\n";
std::cout.flush();
boost::this_thread::sleep_for(boost::chrono::seconds(1));
io_service.post(boost::bind(&print_loop, --iteration));
}
int main()
{
boost::optional<boost::asio::io_service::work> background_work(
boost::in_place(boost::ref(background_service)));
// Dedicate 3 threads to performing long-standing blocking operations.
boost::thread_group background_threads;
for (std::size_t i = 0; i < 3; ++i)
background_threads.create_thread(
boost::bind(&boost::asio::io_service::run, &background_service));
// Post a mocked up 'handle read' handler into the main io_service.
io_service.post(boost::bind(&handle_read,
make_error_code(boost::system::errc::success), 0));
// Post a mockup loop into the io_service to show concurrency.
io_service.post(boost::bind(&print_loop, 5));
// Run the main io_service.
io_service.run();
// Cleanup background.
background_work = boost::none;
background_threads.join_all();
}
And the output:
in main service, need to query odbc
in main service, doing work.
in background service, start querying odbc
in main service, doing work.
in main service, doing work.
in main service, doing work.
in main service, doing work.
in background service, posting odbc result to main service
in main service, got 42 from odbc.
Note that the single thread processing the main io_service posts work into the background_service, and then continues to process its event loop while the background_service blocks. Once the background_service gets a result, it posts a handler into the main io_service.
We have same long-running tasks in our server (a legacy protocol with storages). So our server is running 200 threads to avoid blocking service (yes, 200 threads is running io_service::run). Its not too great thing, but works well for now.
The only problem we had is asio::strand which uses so-called "implementations" which gets locked when hadler is currently called. Solved this via increase this strands butckets and "deattaching" task via io_service::post without strand wrap.
Some tasks may run seconds or even minutes and this does work without issues at the moment.

Thread, ansi c signal and Qt

I'm writing a multithread plugin based application. I will not be the plugins author. So I would wish to avoid that the main application crashes cause of a segmentation fault in a plugin. Is it possible? Or the crash in the plugin definitely compromise also the main application status?
I wrote a sketch program using qt cause my "real" application is strongly based on qt library. Like you can see I forced the thread to crash calling the trimmed function on a not-allocated QString. The signal handler is correctly called but after the thread is forced to quit also the main application crashes. Did I do something wrong? or like I said before what I'm trying to do is not achievable?
Please note that in this simplified version of the program I avoided to use plugins but only thread. Introducing plugins will add a new critical level, I suppose. I want to go on step by step. And, overall, I want to understand if my target is feasible. Thanks a lot for any kind of help or suggestions everyone will try to give me.
#include <QString>
#include <QThread>
#include<csignal>
#include <QtGlobal>
#include <QtCore/QCoreApplication>
class MyThread : public QThread
{
public:
static void sigHand(int sig)
{
qDebug("Thread crashed");
QThread* th = QThread::currentThread();
th->exit(1);
}
MyThread(QObject * parent = 0)
:QThread(parent)
{
signal(SIGSEGV,sigHand);
}
~MyThread()
{
signal(SIGSEGV,SIG_DFL);
qDebug("Deleted thread, restored default signal handler");
}
void run()
{
QString* s;
s->trimmed();
qDebug("Should not reach this point");
}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
MyThread th(&a);
th.run();
while (th.isRunning());
qDebug("Thread died but main application still on");
return a.exec();
}
I'm currently working on the same issue and found this question via google.
There are several reasons your source is not working:
There is no new thread. The thread is only created, if you call QThread::start. Instead you call MyThread::run, which executes the run method in the main thread.
You call QThread::exit to stop the thread, which is not supposed to directly stop a thread, but sends a (qt) signal to the thread event loop, requesting it to stop. Since there is neither a thread nor an event loop, the function has no effect. Even if you had called QThread::start, it would not work, since writing a run method does not create a qt event loop. To be able to use exit with any QThread, you would need to call QThread::exec first.
However, QThread::exit is the wrong method anyways. To prevent the SIGSEGV, the thread must be called immediately, not after receiving the (qt) signal in its event loop. So although generally frowned upon, in this case QThread::terminate has to be called
But it is generally said to be unsafe to call complex functions like QThread::currentThread, QThread::exit or QThread::terminate from signal handlers, so you should never call them there
Since the thread is still running after the signal handler (and I'm not sure even QThread::terminate would kill it fast enough), the signal handler exits to where it was called from, so it reexecutes the instruction causing the SIGSEGV, and the next SIGSEGV occurs.
Therefore I have used a different approach, the signal handler changes the register containing the instruction address to another function, which will then be run, after the signal handler exits, instead the crashing instruction. Like:
void signalHandler(int type, siginfo_t * si, void* ccontext){
(static_cast<ucontext_t*>(ccontext))->Eip = &recoverFromCrash;
}
struct sigaction sa;
memset(&sa, 0, sizeof(sa)); sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &signalHandler;
sigaction(SIGSEGV, &sa, 0);
The recoverFromCrash function is then normally called in the thread causing the SIGSEGV. Since the signal handler is called for all SIGSEGV, from all threads, the function has to check which thread it is running in.
However, I did not consider it safe to simply kill the thread, since there might be other stuff, depending on a running thread. So instead of killing it, I let it run in an endless loop (calling sleep to avoid wasting CPU time). Then, when the program is closed, it sets a global variabel, and the thread is terminated. (notice that the recover function must never return, since otherwise the execution will return to the function which caused the SIGSEGV)
Called from the mainthread on the other hand, it starts a new event loop, to let the program running.
if (QThread::currentThread() != QCoreApplication::instance()->thread()) {
//sub thread
QThread* t = QThread::currentThread();
while (programIsRunning) ThreadBreaker::sleep(1);
ThreadBreaker::forceTerminate();
} else {
//main thread
while (programIsRunning) {
QApplication::processEvents(QEventLoop::AllEvents);
ThreadBreaker::msleep(1);
}
exit(0);
}
ThreadBreaker is a trivial wrapper class around QThread, since msleep, sleep and setTerminationEnabled (which has to be called before terminate) of QThread are protected and could not be called from the recover function.
But this is only the basic picture. There are a lot of other things to worry about: Catching SIGFPE, Catching stack overflows (check the address of the SIGSEGV, run the signal handler in an alternate stack), have a bunch of defines for platform independence (64 bit, arm, mac), show debug messages (try to get a stack trace, wonder why calling gdb for it crashes the X server, wonder why calling glibc backtrace for it crashes the program)...

Resources