In the CSAPP book Section 12.3, They said..
The thread terminates explicitly by calling the pthread_exit function. If the main thread calls pthread_exit, it waits for all other peer threads to terminate and then terminates main thread and the entire process with a return value of thread_return.
However in the man page of pthread_exit : https://man7.org/linux/man-pages/man3/pthread_exit.3.html
Performing a return from the start function of any thread other than the main thread results in an implicit call to pthread_exit(), using the function's return value as the thread's exit status.
To allow other threads to continue execution, the main thread should terminate by calling pthread_exit() rather than exit(3).
Two descriptions about pthread_exit are different. First one said main thread will wait for peer but not on second.
Therefore I write a code to ensure correct property.
(I borrow some code lines from When the main thread exits, do other threads also exit?)
(Thanks to https://stackoverflow.com/users/959183/laifjei)
Since pthread_cancel is called before pthread_exit, main thread cancel t1 thread successfully and the result is like,,
However, when I modify a code as '42 line -> add //' and '44 line -> delete //', main thread cannot cancel t1 since it was already terminated. Therefore the following result is looks like,,
Finally, I conclude that man page's property is correct. Am I right?
Why does CSAPP book said that "it waits for all other peer threads to terminate"?
Two descriptions about pthread_exit are different. First one said main thread will wait for peer but not on second.
Not very different, and not in a way that you can easily distinguish by most means.
In particular, regardless of whether the main thread terminates immediately or waits for other threads to terminate before doing so, the pthread_exit() function is like the exit() function in that it does not return. Observing that statements inserted into your test program between the pthread_exit() call and the end of main are not executed does yield any information that helps you determine the relative sequence of thread terminations.
For that reason, the question is also largely moot. Although there indeed are ways in which the difference can be observed, it is rarely significant.
Nevertheless, here's a better example:
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_t main_thread;
void *wait_for_main(void *unused) {
void *main_rval;
// Wait for the main thread to terminate
if ((errno = pthread_join(main_thread, &main_rval)) != 0) {
perror("pthread_join");
} else {
fputs("The main thread was successfully joined\n", stderr);
}
fflush(stderr);
return NULL;
}
int main(void) {
pthread_t child_thread;
main_thread = pthread_self();
if ((errno = pthread_create(&child_thread, NULL, wait_for_main, NULL)) != 0) {
perror("pthread_create");
} else {
fputs("The child thread was successfully started\n", stderr);
}
pthread_exit(NULL);
}
That program runs successfully, printing ...
The child thread was successfully started
The main thread was successfully joined
This shows that the main thread indeed terminated (because it was successfully joined), and that the other thread continued to run afterward (because it wrote its message to stderr).
You go on to ask ...
Why does CSAPP book said that "it waits for all other peer threads to terminate"?
... but no one other than Bryant, O'Hallaron, or one of their editors could definitively answer the question (and maybe not all -- or any -- of those). Here are some possibilities:
The book is just wrong. It happens.
The book is unclear or imprecise, in that it means the "it" that waits to be the overall program, the operating system, or some other variation on "something other than the main thread".
Or my actual best guess:
The book is is describing behavior from an operating system perspective, whereas the Pthreads documentation is describing it from a C-language perspective. It may well be that the OS thread that is the process's main one indeed is the thing that waits for others to terminate, but its C-language semantics within the running program terminate with the pthread_exit(). That is the book is talking about pthread implementation details, not documented, observable pthread semantics.
Related
I have a question about condition_variable.
Having this code
#include <iostream>
#include <thread>
#include <chrono>
#include <mutex>
#include <condition_variable>
std::condition_variable cv;
std::mutex mut;
int value;
void sleep() {
std::unique_lock<std::mutex> lock(mut);
// sleep forever
cv.notify_one();
}
int main ()
{
std::thread th (sleep);
std::unique_lock<std::mutex> lck(mut);
if(cv.wait_for(lck,std::chrono::seconds(1))==std::cv_status::timeout) {
std::cout << "failed" << std::endl;
}
th.join();
return 0;
}
How to resolve this deadlock
Why the wait_for blocks even after the 1 sec.
Is the mut necessary for the thread th ?
Thanks.
Why the wait_for blocks even after the 1 sec?
How do you know that the main thread ever makes it to the cv.wait_for(...) call?
It's not 100% clear what you are asking, but if the program never prints "failed," and you are asking why not, then probably what happened is, the child thread locked the mutex first and then it "slept forever" while keeping the mutex locked. If that happened, then the main thread would never be able to get past the std::unique_lock<std::mutex> lck(mut); line.
Is the mut necessary for the thread th ?
That depends. You certainly don't need to lock a mutex in a thread that does nothing but "// sleep forever," but maybe the thread that you are asking about is not exactly the same as what you showed. Maybe you are asking how wait() and notify() are supposed to be used.
I can't give a C++-specific answer, but in most programming languages and libraries, wait() and notify() are low-level primitives that are meant to be used in a very specific way:
A "consumer" thread waits by doing something like this:
mutex.lock();
while ( ! SomeImportantCondition() ) {
cond_var.wait(mutex);
}
DoSomethingThatRequiresTheConditionToBeTrue();
mutex.unlock()
The purpose of the mutex is to protect the shared data that SomeImportantCondition() tests. No other thread should be allowed to change the value that SomeImportantCondition() returns while the mutex is locked.
Also, you may already know this, but some readers might not; The reason why mutex is given in cond_var.wait(mutex) is because the wait function temporarily unlocks the mutex while it is waiting, and then it re-locks the mutex before it returns. The unlock is necessary so that a producer thread will be allowed to make the condition become true. Re-locking is needed to guarantee that the condition still will be true when the consumer accesses the shared data.
The third thing to note is that the consumer does not wait() if the condition already is true. A common newbie mistake is to unconditionally call cv.wait() and expect that a cv.notify() call in some other thread will wake it up. But a notify() will not wake the consumer thread if it happens before the consumer starts waiting.
Writing the "producer" is easier. There's no technical reason why a "producer" can't just call cond_var.notify() without doing anything else at all. But that's not very useful. Most producers do something like this:
mutex.lock();
... Do something that makes SomeImportantCondition() return true;
cond_var.notify();
mutex.unlock();
The only really important thing is that the producer locks the mutex before it touches the shared data that are tested by SomeImportantCondition(). Some programming languages will let you move the notify() call after the unlock() call. Others won't. It doesn't really matter either way.
I was writing a multithreading code using pthread_cond in conjuction with mutexes, which made me wonder:
is the signal one time, so if the signal is sent before the other thread is waiting for it, the other thread will keep waiting indefinitely?
Since cond_wait() unlocks the mutex, is it a thumb rule to write this statement JUST before mutex_unlock(), (I realise this makes the latter redundant, but I do that just for clarity) or are there many scenarios where you would want to write the function outside the mutex lock?
Make this your mantra:
Only ever wait for something ...
Waiting should almost always look like this:
if (pthread_mutex_lock(...) != 0) {
/* something terrible happened, panic */
}
while (test-condition) {
pthread_cond_wait(...)
}
pthread_mutex_unlock(...)
If the exclusive check of test-condition fails, and so a context enters pthread_cond_wait the associated mutex will be atomically unlocked.
This means another context can enter code that looks like:
if (pthread_mutex_lock(...) != 0) {
/* panic */
}
test-condition = false;
pthread_cond_signal(...);
pthread_mutex_unlock(...);
Changing the predicate and atomically waking the first context that is in the call to pthread_cond_wait, which in turn checks the predicate test-condition and can jump past the loop.
If we just look at the waiting code again:
if (pthread_mutex_lock(...) != 0) {
/* something terrible happened, panic */
}
while (test-condition) {
pthread_cond_wait(...)
}
pthread_mutex_unlock(...)
Between the call to wait and unlock, there is always exclusivity; Either because the mutex was acquired exclusively (the predicated wait loop was not entered), or because before returning from a call topthread_cond_wait the mutex was re-acquired atomically.
Synchronization is hard to get right, and is costly for a multi-threaded application; One should attempt to keep critical sections simple to squeeze the margins for error to their minimum size.
Another important thing to do is check the return values of all these pthread_* calls; The return value is important information about state that you always need to know, and nearly always need to act upon.
Some useful man pages (for return values):
pthread_mutex_lock
pthread_cond_wait
pthread_cond_signal
I found a piece of strange code in a open source software
for (i=0; i<store->scan_threads; i++) {
pthread_join(thread_ids[i], NULL);
pthread_detach(thread_ids[i]);
}
Is there any meaning to call pthread_detach ?
That stanza is silly and unsafe.
Design-wise, the detach is unnecessary — the join completion already means that the thread is completely finished. There's nothing to detach. (The code in question simply spawns threads with default joinability.)
Implementation-wise, the detach is unsafe. A thread ID may be recycled as soon as the thread is finished — oops, didn't mean to detach that other thread! Worse, the ID is not guaranteed to be meaningful at all after the call to join returns — SEGV?
In this code (considering that this code is from main thread.... )
pthread_join(thread_ids[i], NULL);
this will wait the main thread to return thread with thread id "thread_ids[i]", and if main thread is doing some more work then
pthread_detach(thread_ids[i]);
will release the resource used by the thread (with thread id "thread_ids[i]).
I'm writing a multithread plugin based application. I will not be the plugins author. So I would wish to avoid that the main application crashes cause of a segmentation fault in a plugin. Is it possible? Or the crash in the plugin definitely compromise also the main application status?
I wrote a sketch program using qt cause my "real" application is strongly based on qt library. Like you can see I forced the thread to crash calling the trimmed function on a not-allocated QString. The signal handler is correctly called but after the thread is forced to quit also the main application crashes. Did I do something wrong? or like I said before what I'm trying to do is not achievable?
Please note that in this simplified version of the program I avoided to use plugins but only thread. Introducing plugins will add a new critical level, I suppose. I want to go on step by step. And, overall, I want to understand if my target is feasible. Thanks a lot for any kind of help or suggestions everyone will try to give me.
#include <QString>
#include <QThread>
#include<csignal>
#include <QtGlobal>
#include <QtCore/QCoreApplication>
class MyThread : public QThread
{
public:
static void sigHand(int sig)
{
qDebug("Thread crashed");
QThread* th = QThread::currentThread();
th->exit(1);
}
MyThread(QObject * parent = 0)
:QThread(parent)
{
signal(SIGSEGV,sigHand);
}
~MyThread()
{
signal(SIGSEGV,SIG_DFL);
qDebug("Deleted thread, restored default signal handler");
}
void run()
{
QString* s;
s->trimmed();
qDebug("Should not reach this point");
}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
MyThread th(&a);
th.run();
while (th.isRunning());
qDebug("Thread died but main application still on");
return a.exec();
}
I'm currently working on the same issue and found this question via google.
There are several reasons your source is not working:
There is no new thread. The thread is only created, if you call QThread::start. Instead you call MyThread::run, which executes the run method in the main thread.
You call QThread::exit to stop the thread, which is not supposed to directly stop a thread, but sends a (qt) signal to the thread event loop, requesting it to stop. Since there is neither a thread nor an event loop, the function has no effect. Even if you had called QThread::start, it would not work, since writing a run method does not create a qt event loop. To be able to use exit with any QThread, you would need to call QThread::exec first.
However, QThread::exit is the wrong method anyways. To prevent the SIGSEGV, the thread must be called immediately, not after receiving the (qt) signal in its event loop. So although generally frowned upon, in this case QThread::terminate has to be called
But it is generally said to be unsafe to call complex functions like QThread::currentThread, QThread::exit or QThread::terminate from signal handlers, so you should never call them there
Since the thread is still running after the signal handler (and I'm not sure even QThread::terminate would kill it fast enough), the signal handler exits to where it was called from, so it reexecutes the instruction causing the SIGSEGV, and the next SIGSEGV occurs.
Therefore I have used a different approach, the signal handler changes the register containing the instruction address to another function, which will then be run, after the signal handler exits, instead the crashing instruction. Like:
void signalHandler(int type, siginfo_t * si, void* ccontext){
(static_cast<ucontext_t*>(ccontext))->Eip = &recoverFromCrash;
}
struct sigaction sa;
memset(&sa, 0, sizeof(sa)); sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &signalHandler;
sigaction(SIGSEGV, &sa, 0);
The recoverFromCrash function is then normally called in the thread causing the SIGSEGV. Since the signal handler is called for all SIGSEGV, from all threads, the function has to check which thread it is running in.
However, I did not consider it safe to simply kill the thread, since there might be other stuff, depending on a running thread. So instead of killing it, I let it run in an endless loop (calling sleep to avoid wasting CPU time). Then, when the program is closed, it sets a global variabel, and the thread is terminated. (notice that the recover function must never return, since otherwise the execution will return to the function which caused the SIGSEGV)
Called from the mainthread on the other hand, it starts a new event loop, to let the program running.
if (QThread::currentThread() != QCoreApplication::instance()->thread()) {
//sub thread
QThread* t = QThread::currentThread();
while (programIsRunning) ThreadBreaker::sleep(1);
ThreadBreaker::forceTerminate();
} else {
//main thread
while (programIsRunning) {
QApplication::processEvents(QEventLoop::AllEvents);
ThreadBreaker::msleep(1);
}
exit(0);
}
ThreadBreaker is a trivial wrapper class around QThread, since msleep, sleep and setTerminationEnabled (which has to be called before terminate) of QThread are protected and could not be called from the recover function.
But this is only the basic picture. There are a lot of other things to worry about: Catching SIGFPE, Catching stack overflows (check the address of the SIGSEGV, run the signal handler in an alternate stack), have a bunch of defines for platform independence (64 bit, arm, mac), show debug messages (try to get a stack trace, wonder why calling gdb for it crashes the X server, wonder why calling glibc backtrace for it crashes the program)...
Wait(semaphore sem) {
DISABLE_INTS
sem.val--
if (sem.val < 0){
add thread to sem.L
block(thread)
}
ENABLE_INTS
Signal(semaphore sem){
DISABLE_INTS
sem.val++
if (sem.val <= 0) {
th = remove next
thread from sem.L
wakeup(th)
}
ENABLE_INTS
If block(thread) stops a thread from executing, how, where, and when does it return?
Which thread enables interrupts following the Wait()?
the thread that called block() shouldn’t return until another thread has called wakeup(thread)!
but how does that other thread get to run?
where exactly does the thread switch occur?
block(thread) works that way:
Enables interrupts
Uses some kind of waiting mechanism (provided by the operating system or the busy waiting in the simplest case) to wait until the wakeup(thread) on this thread is called. This means that in this point thread yields its time to the scheduler.
Disables interrupts and returns.
Yes, UP and DOWN are mostly useful when called from different threads, but it is not impossible that you call these with one thread - if you start semaphore with a value > 0, then the same thread can entry the critical section and execute both DOWN (before) and UP (after). Value which initializes the semaphore tells how many threads can enter the critical section at once, which might be 1 (mutex) or any other positive number.
How are the threads created? That is not shown on the lecture slide, because that is only a principle how semaphore works using a pseudocode. But it is a completely different story how you use those semaphores in your application.