Thread, ansi c signal and Qt - multithreading

I'm writing a multithread plugin based application. I will not be the plugins author. So I would wish to avoid that the main application crashes cause of a segmentation fault in a plugin. Is it possible? Or the crash in the plugin definitely compromise also the main application status?
I wrote a sketch program using qt cause my "real" application is strongly based on qt library. Like you can see I forced the thread to crash calling the trimmed function on a not-allocated QString. The signal handler is correctly called but after the thread is forced to quit also the main application crashes. Did I do something wrong? or like I said before what I'm trying to do is not achievable?
Please note that in this simplified version of the program I avoided to use plugins but only thread. Introducing plugins will add a new critical level, I suppose. I want to go on step by step. And, overall, I want to understand if my target is feasible. Thanks a lot for any kind of help or suggestions everyone will try to give me.
#include <QString>
#include <QThread>
#include<csignal>
#include <QtGlobal>
#include <QtCore/QCoreApplication>
class MyThread : public QThread
{
public:
static void sigHand(int sig)
{
qDebug("Thread crashed");
QThread* th = QThread::currentThread();
th->exit(1);
}
MyThread(QObject * parent = 0)
:QThread(parent)
{
signal(SIGSEGV,sigHand);
}
~MyThread()
{
signal(SIGSEGV,SIG_DFL);
qDebug("Deleted thread, restored default signal handler");
}
void run()
{
QString* s;
s->trimmed();
qDebug("Should not reach this point");
}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
MyThread th(&a);
th.run();
while (th.isRunning());
qDebug("Thread died but main application still on");
return a.exec();
}

I'm currently working on the same issue and found this question via google.
There are several reasons your source is not working:
There is no new thread. The thread is only created, if you call QThread::start. Instead you call MyThread::run, which executes the run method in the main thread.
You call QThread::exit to stop the thread, which is not supposed to directly stop a thread, but sends a (qt) signal to the thread event loop, requesting it to stop. Since there is neither a thread nor an event loop, the function has no effect. Even if you had called QThread::start, it would not work, since writing a run method does not create a qt event loop. To be able to use exit with any QThread, you would need to call QThread::exec first.
However, QThread::exit is the wrong method anyways. To prevent the SIGSEGV, the thread must be called immediately, not after receiving the (qt) signal in its event loop. So although generally frowned upon, in this case QThread::terminate has to be called
But it is generally said to be unsafe to call complex functions like QThread::currentThread, QThread::exit or QThread::terminate from signal handlers, so you should never call them there
Since the thread is still running after the signal handler (and I'm not sure even QThread::terminate would kill it fast enough), the signal handler exits to where it was called from, so it reexecutes the instruction causing the SIGSEGV, and the next SIGSEGV occurs.
Therefore I have used a different approach, the signal handler changes the register containing the instruction address to another function, which will then be run, after the signal handler exits, instead the crashing instruction. Like:
void signalHandler(int type, siginfo_t * si, void* ccontext){
(static_cast<ucontext_t*>(ccontext))->Eip = &recoverFromCrash;
}
struct sigaction sa;
memset(&sa, 0, sizeof(sa)); sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &signalHandler;
sigaction(SIGSEGV, &sa, 0);
The recoverFromCrash function is then normally called in the thread causing the SIGSEGV. Since the signal handler is called for all SIGSEGV, from all threads, the function has to check which thread it is running in.
However, I did not consider it safe to simply kill the thread, since there might be other stuff, depending on a running thread. So instead of killing it, I let it run in an endless loop (calling sleep to avoid wasting CPU time). Then, when the program is closed, it sets a global variabel, and the thread is terminated. (notice that the recover function must never return, since otherwise the execution will return to the function which caused the SIGSEGV)
Called from the mainthread on the other hand, it starts a new event loop, to let the program running.
if (QThread::currentThread() != QCoreApplication::instance()->thread()) {
//sub thread
QThread* t = QThread::currentThread();
while (programIsRunning) ThreadBreaker::sleep(1);
ThreadBreaker::forceTerminate();
} else {
//main thread
while (programIsRunning) {
QApplication::processEvents(QEventLoop::AllEvents);
ThreadBreaker::msleep(1);
}
exit(0);
}
ThreadBreaker is a trivial wrapper class around QThread, since msleep, sleep and setTerminationEnabled (which has to be called before terminate) of QThread are protected and could not be called from the recover function.
But this is only the basic picture. There are a lot of other things to worry about: Catching SIGFPE, Catching stack overflows (check the address of the SIGSEGV, run the signal handler in an alternate stack), have a bunch of defines for platform independence (64 bit, arm, mac), show debug messages (try to get a stack trace, wonder why calling gdb for it crashes the X server, wonder why calling glibc backtrace for it crashes the program)...

Related

C++ - User-Level Threads - sigaction by SIGVTALRM

I've found some evil bug in my user-level threads library.
My scheduler is actually a singleton class that initializes a signal timer this way:
sigAlarm_ is a member field of the scheduler, and its of type struct sigaction.
This is the related part of the scheduler initialization:
sigAlarm_.sa_handler = timerHandlerGlobal; // Assign the first field of sigAlarm (sa_handler) as needed, others zeroed
if (sigaction(SIGVTALRM, &sigAlarm_, nullptr) != 0) { uthreadSystemError("sigaction"); }
Now, this timerHandlerGlobal is a static function, and not a member function of the scheduler, as C++ doesn't permit passing function members this way.
Now, when I terminate the main thread of the library (which actually runs the scheduler), I'm invoking std::exit(1) which cleans the resources up.
When I'm running my tests with ASan (Address Sanitizer), in some executions, it gets into the timerHandlerGlobal while the scheduler is already nullptr!
Now, I've been already two days on that, inspecting what's the cause.
Now I see that if I'm adding this ugly condition, no problem appears with ASAN:
void timerHandlerGlobal(int signo)
{
if (scheduler_manager)
{
scheduler_manager->timerHandler(signo);
}
}
But, why is after std::exit(1) invoked by the scheduler, the sigaction.sa_handler (which is timerHandlerGlobal), is still running?
Please tell me you know why it is, I just want to omit this awful condition.

Why does calling kill(getpid(), SIGUSR1) inside handler for SIGUSR1 loop?

I'm trying to understand what is happening behind the scenes with this code. This was asked at a final exam of an Intro to OS course I'm taking. As I understand it, when returning from kernel mode to user mode, the system checks if there are any pending signals (by examining the signal vector) and tends to them. So as the program returns from the kill syscall the OS sees that SIGUSR1 is pending and invokes the handler. By that logic, the handler should print "stack" in an infinite loop, but when running this code it actually prints "stackoverflow" in an infinite loop. Why is this happening?
Thanks in advance.
void handler(int signo) {
printf("stack");
kill(getpid(), SIGUSR1);
printf("overflow\n");
}
int main() {
struct sigaction act;
act.sa_handler = &handler;
sigaction(SIGUSR1, &act, NULL);
kill(getpid(), SIGUSR1);
return 0;
}
You actually have undefined behavior here, as you're calling sigaction with an incompletely initialized struct sigaction object. So depending on what values happen to be in the sa_flags and sa_mask fields, a variety of different things might happen.
Some of these would not block SIGUSR1 while the signal handler is running, which would mean that a new signal handler would run immediately when the first calls kill (so before the first handler returns and pops its stack frame). So you end up with many recursive handler stack frames on the stack (and outputs of 'stack') until it overflows.
Other combos would block the signal so it would not immediately trigger a second signal handler. Instead the signal would be "pending" until the first signal handler returns.

Properties of pthread_exit function : which one is right?

In the CSAPP book Section 12.3, They said..
The thread terminates explicitly by calling the pthread_exit function. If the main thread calls pthread_exit, it waits for all other peer threads to terminate and then terminates main thread and the entire process with a return value of thread_return.
However in the man page of pthread_exit : https://man7.org/linux/man-pages/man3/pthread_exit.3.html
Performing a return from the start function of any thread other than the main thread results in an implicit call to pthread_exit(), using the function's return value as the thread's exit status.
To allow other threads to continue execution, the main thread should terminate by calling pthread_exit() rather than exit(3).
Two descriptions about pthread_exit are different. First one said main thread will wait for peer but not on second.
Therefore I write a code to ensure correct property.
(I borrow some code lines from When the main thread exits, do other threads also exit?)
(Thanks to https://stackoverflow.com/users/959183/laifjei)
Since pthread_cancel is called before pthread_exit, main thread cancel t1 thread successfully and the result is like,,
However, when I modify a code as '42 line -> add //' and '44 line -> delete //', main thread cannot cancel t1 since it was already terminated. Therefore the following result is looks like,,
Finally, I conclude that man page's property is correct. Am I right?
Why does CSAPP book said that "it waits for all other peer threads to terminate"?
Two descriptions about pthread_exit are different. First one said main thread will wait for peer but not on second.
Not very different, and not in a way that you can easily distinguish by most means.
In particular, regardless of whether the main thread terminates immediately or waits for other threads to terminate before doing so, the pthread_exit() function is like the exit() function in that it does not return. Observing that statements inserted into your test program between the pthread_exit() call and the end of main are not executed does yield any information that helps you determine the relative sequence of thread terminations.
For that reason, the question is also largely moot. Although there indeed are ways in which the difference can be observed, it is rarely significant.
Nevertheless, here's a better example:
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_t main_thread;
void *wait_for_main(void *unused) {
void *main_rval;
// Wait for the main thread to terminate
if ((errno = pthread_join(main_thread, &main_rval)) != 0) {
perror("pthread_join");
} else {
fputs("The main thread was successfully joined\n", stderr);
}
fflush(stderr);
return NULL;
}
int main(void) {
pthread_t child_thread;
main_thread = pthread_self();
if ((errno = pthread_create(&child_thread, NULL, wait_for_main, NULL)) != 0) {
perror("pthread_create");
} else {
fputs("The child thread was successfully started\n", stderr);
}
pthread_exit(NULL);
}
That program runs successfully, printing ...
The child thread was successfully started
The main thread was successfully joined
This shows that the main thread indeed terminated (because it was successfully joined), and that the other thread continued to run afterward (because it wrote its message to stderr).
You go on to ask ...
Why does CSAPP book said that "it waits for all other peer threads to terminate"?
... but no one other than Bryant, O'Hallaron, or one of their editors could definitively answer the question (and maybe not all -- or any -- of those). Here are some possibilities:
The book is just wrong. It happens.
The book is unclear or imprecise, in that it means the "it" that waits to be the overall program, the operating system, or some other variation on "something other than the main thread".
Or my actual best guess:
The book is is describing behavior from an operating system perspective, whereas the Pthreads documentation is describing it from a C-language perspective. It may well be that the OS thread that is the process's main one indeed is the thing that waits for others to terminate, but its C-language semantics within the running program terminate with the pthread_exit(). That is the book is talking about pthread implementation details, not documented, observable pthread semantics.

How to unblock a QThread running a pcsc call?

I have a Qt application that connects to a card reader using various pcsc implementations under GNU/Linux, MacOS, and Windows. All communication with the card runs in a worker thread.
In one scenario, the user starts an operation requiring communication with the card via a card reader. The card reader has a keyboard and during the authentication procedure the user must enter their PIN on the reader's keyboard.
This operation is implemented by a call to SCardControl() (see e.g. the Microsoft documentation). As long as the user is working with the reader, the call to SCardControl() does not terminate and the worker thread is blocked by it.
At this point, the user might decide to close the application while the operation is still pending. Closing the application at this point causes the application to crash (on Linux with signal SIGABRT) because:
The worker thread is blocked waiting for SCardControl() to return.
The main thread cannot stop the blocked thread: neither quit() nor terminate() cause the thread to finish.
When the application is exited, the QThread object for the worker thread is destroyed and, since the thread is still running state, it throws a signal to indicate an error.
I have tried several solutions.
Subclass QThread and create a worker thread which calls setTerminationEnabled(true); to allow termination through QThread::terminate(). This does not work on MacOS: when QThread is destroyed, the thread is still in a running state and the signal SIGABRT is emitted.
Handle signal SIGABRT on shutdown and ignore it. This did not seem to be a good idea but I wanted to try it out before discarding it. After ignoring signal SIGABRT, a signal SIGSEGV is received and the application crashes. I had adapted the approach described here.
Try to unblock the thread by sending a command to the card reader from the main thread. I tried SCardCancel(), SCardDisconnect() and SCardReleaseContext() but none of these commands has any effect on the blocked thread.
I find it quite strange that it is not possible to cleanly shutdown an application when a thread is blocked on some function call, but all the solutions I have tried have not worked and I have run out of ideas. Did I overlook something? Does anybody have any useful hint?
EDIT
I looked into the Qt source code for QThread and found out that on Unix-like platforms QThread::terminate() uses pthread_cancel() internally. But apparently pthread_cancel() does not work / does nothing on Darwin, see e.g. here and here.
So, maybe I will really have to go with the option of showing a dialog to the user asking to remove the card from the reader.
Cleanly shutting down a thread is not possible from outside if it is blocked in a call. You can, however, prevent user from quitting the application before the operation has completed.
void MainWindow::closeEvent(QCloseEvent *closeEvent) {
if (workerBlocked) closeEvent->ignore();
}
In addition, you can show a dialog telling the user the operation has to be completed first.
Also, if possible, you can let the window close but keep the application alive until the operation is complete by setting qApp->setQuitOnLastWindowClosed(false);
The problem boils down to the fact that a QThread object isn't destructible while the associated thread is running. Usually, it would a print statement like this to the debug output:
QThread: Destroyed while thread is still running
Don't agonize over trying to get SCardControl to return so that the worker thread can be quit safely (since it doesn't return as long as the user is interacting with the reader). Instead, You can follow this answer to destruct the QThread object in a safe manner with a minimum amount of changes to your current implementation.
Here is an example that shows what I mean:
#include <QtWidgets>
//a thread that can be destroyed at any time
//see http://stackoverflow.com/a/25230470
class SafeThread : public QThread{
using QThread::run;
public:
explicit SafeThread(QObject* parent= nullptr):QThread(parent){}
~SafeThread(){ quit(); wait(); }
};
//worker QObject class
class Worker : public QObject {
Q_OBJECT
public:
explicit Worker(QObject* parent = nullptr):QObject(parent){}
~Worker(){}
Q_SLOT void doBlockingWork() {
emit started();
//the sleep call blocks the worker thread for 10 seconds!
//consider it a mock call to the SCardControl function
QThread::sleep(10);
emit finished();
}
Q_SIGNAL void started();
Q_SIGNAL void finished();
};
int main(int argc, char* argv[]) {
QApplication a(argc, argv);
//setup worker thread and QObject
Worker worker;
SafeThread thread;
worker.moveToThread(&thread);
thread.start();
//setup GUI components
QWidget w;
QVBoxLayout layout(&w);
QPushButton button("start working");
QLabel status("idle");
layout.addWidget(&button);
layout.addWidget(&status);
//connect signals/slots
QObject::connect(&worker, &Worker::started, &status,
[&status]{ status.setText("working. . .");} );
QObject::connect(&worker, &Worker::finished, &status,
[&status]{ status.setText("idle");} );
QObject::connect(&button, &QPushButton::clicked, &worker, &Worker::doBlockingWork);
w.show();
return a.exec();
}
#include "main.moc"
Notice that the SafeThread's destructor makes sure to wait() until the associated thread has finished execution. And only afterwards, the main thread can proceed to call QThread's destructor.

Interrupting open() with SIGALRM

We have a legacy embedded system which uses SDL to read images and fonts from an NFS share.
If there's a network problem, TTF_OpenFont() and IMG_Load() hang essentially forever. A test application reveals that open() behaves in the same way.
It occurred to us that a quick fix would be to call alarm() before the calls which open files on the NFS share. The man pages weren't entirely clear whether open() would fail with EINTR when interrupted by SIGALRM, so we put together a test app to verify this approach. We set up a signal handler with sigaction::sa_flags set to zero to ensure that SA_RESTART was not set.
The signal handler was called, but open() was not interrupted. (We observed the same behaviour with SIGINT and SIGTERM.)
I suppose the system treats open() as a "fast" operation even on "slow" infrastructure such as NFS.
Is there any way to change this behaviour and allow open() to be interrupted by a signal?
The man pages weren't entirely clear whether open() would fail with
EINTR when interrupted by SIGALRM, so we put together a test app to
verify this approach.
open(2) is a slow syscall (slow syscalls are those that can sleep forever, and can be awaken when, and if, a signal is caught in the meantime) only for some file types. In general, opens that block the caller until some condition occurs are usually interruptible. Known examples include opening a FIFO (named pipe), or (back in the old days) opening a physical terminal device (it sleeps until the modem is dialed).
NFS-mounted filesystems probably don't cause open(2) to sleep in an interruptible state. After all, you are most likely opening a regular file, and in that case open(2) will not be interruptable.
Is there any way to change this behaviour and allow open() to be
interrupted by a signal?
I don't think so, not without doing some (non-trivial) changes to the kernel.
I would explore the possibility of using setjmp(3) / longjmp(3) (see the manpage if you're not familiar; it's basically non-local gotos). You can initialize the environment buffer before calling open(2), and issue a longjmp(3) in the signal handler. Here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <unistd.h>
#include <signal.h>
static jmp_buf jmp_env;
void sighandler(int signo) {
longjmp(jmp_env, 1);
}
int main(void) {
struct sigaction sigact;
sigact.sa_handler = sighandler;
sigact.sa_flags = 0;
sigemptyset(&sigact.sa_mask);
if (sigaction(SIGALRM, &sigact, NULL) < 0) {
perror("sigaction(2) error");
exit(EXIT_FAILURE);
}
if (setjmp(jmp_env) == 0) {
/* First time through
* This is where we would open the file
*/
alarm(5);
/* Simulate a blocked open() */
while (1)
; /* Intentionally left blank */
/* If open(2) is successful here, don't forget to unset
* the alarm
*/
alarm(0);
} else {
/* SIGALRM caught, open(2) canceled */
printf("open(2) timed out\n");
}
return 0;
}
It works by saving the context environment with the help of setjmp(3) before calling open(2). setjmp(3) returns 0 the first time through, and returns whatever value was passed to longjmp(3) otherwise.
Please be aware that this solution is not perfect. Here are some points to keep in mind:
There is a window of time between the call to alarm(2) and the call to open(2) (simulated here with while (1) { ... }) where the process may be preempted for a long time, so there is a chance the alarm expires before we actually attempt to open the file. Sure, with a large timeout such as 2 or 3 seconds this will most likely not happen, but it's still a race condition.
Similarly, there is a window of time between successfully opening the file and canceling the alarm where, again, the process may be preempted for a long time and the alarm may expire before we get the chance to cancel it. This is slightly worse because we have already opened the file so we will "leak" the file descriptor. Again, in practice, with a large timeout this will likely never happen, but it's a race condition nevertheless.
If the code catches other signals, there may be another signal handler in the midst of execution when SIGALRM is caught. Using longjmp(3) inside the signal handler will destroy the execution context of these other signal handlers, and depending on what they were doing, very nasty things may happen (inconsistent state if the signal handlers were manipulating other data structures in the program, etc.). It's as if it started executing, and suddenly crashed somewhere in the middle. You can fix it by: a) carefully setting up all signal handlers such that SIGALRM is blocked before they are invoked (this ensures that the SIGALRM handler does not begin execution until other handlers are done) and b) blocking these other signals before catching SIGALRM. Both actions can be accomplished by setting the sa_mask field of struct sigaction with the necessary mask (the operating system atomically sets the process's signal mask to that value before beginning execution of the handler and unsets it before returning from the handler). OTOH, if the rest of the code doesn't catch signals, then this is not a problem.
sleep(3) may be implemented with alarm(2), and alarm(2) and setitimer(2) share the same timer; if other portions in the code make use of any of these functions, they will interfere and the result will be a huge mess.
Just make sure you weigh in these disadvantages before blindly using this approach. The use of setjmp(3) / longjmp(3) is usually discouraged and makes programs considerably harder to read, understand and maintain. It's not elegant, but right now I don't think you have a choice, unless you're willing to do some core refactoring in the project.
If you do end up using setjmp(3), then at the very least document these limitations.
Maybe there is a strategy of using a separate thread to do the open so the main thread is not held up longer than desired.

Resources