Race condition on pthread_kill()? - linux

Linux manual for pthread_kill() has the following paragraph:
POSIX.1-2008 recommends that if an implementation detects the use
of a thread ID after the end of its lifetime, pthread_kill()
should return the error ESRCH. The glibc implementation returns
this error in the cases where an invalid thread ID can be detected.
But note also that POSIX says that an attempt to use a thread ID whose
lifetime has ended produces undefined behavior, and an attempt to
use an invalid thread ID in a call to pthread_kill() can, for example,
cause a segmentation fault.
The problem is, between checking the thread ID is valid and issuing the pthread_kill(), the thread might have terminated. Is it inherently unsafe to use pthread_kill(), as there is always a race condition that can turn into a undefined behavior?
How to ensure thread ID will be valid?

Race condition on pthread_kill()?
When the thread is detached, always. But if thread ID is valid, no.
Is it inherently unsafe to use pthread_kill(), as there is always a race condition that can turn into a undefined behavior?
No, not always.
How to ensure thread ID will be valid?
From POSIX thread ID:
The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.
Otherwise it's valid. So when the thread is not detached nor joined, thread ID is just valid and you always at any time can call pthread_kill() with it.
Generally, you should stop using a thread ID after pthread_detach or pthread_join. It's like free() in malloc() - you can't use memory allocated by malloc() after free(). The same way you can't use thread ID after detaching or joining, the thread ID just becomes invalid. Just with pthread_detach it becomes invalid "later", but you don't know when, so you can't use it anyway (well, unless you write your own synchronization). It might become invalid right after the call to pthread_detach. If you intent to do anything with a thread ID, do not detach and do not join it.
The call to pthread_kill with "inactive thread" (non-detached non-joined thread that terminated) is valid - the thread ID is still valid. We can read from pthread_kill posix:
Existing implementations vary on the result of a pthread_kill() with a thread ID indicating an inactive thread (a terminated thread that has not been detached or joined). Some indicate success on such a call, while others give an error of [ESRCH]. Since the definition of thread lifetime in this volume of POSIX.1-2017 covers inactive threads, the [ESRCH] error as described is inappropriate in this case. In particular, this means that an application cannot have one thread check for termination of another with pthread_kill().
FUTURE DIRECTIONS
A future version of this standard may require that pthread_kill() not fail with [ESRCH] in the case of sending signals to an inactive thread (a terminated thread not yet detached or joined), even though no signal will be delivered because the thread is no longer running.
The FUTURE DIRECTIONS looks like it's prefering that pthread_kill() with an inactive thread should just succeed and return 0. I personally like the ESRCH error in such case.

How to ensure thread ID will be valid?
You must redesign so that your code knows this a priori.* Anything short of that is a TOCTOU race (CWE-367).
Fortunately, there's a lot of prior art from interprocess killing. Interprocess signaling doesn't run the terrifying risk of undefined behavior like pthread_kill does, but careful coders consider the risk of signaling a recycled PID just as unacceptable. (And thread IDs can be recycled, too.)
* OK, you could do it by checking some contrived state. For example, set a mutex-protected i_am_still_running flag to false at the very, very end of your thread routine. Then only pthread_kill that thread while holding the mutex and confirming that it is still running. Yuck.

Related

linux unnamed posix semaphore sem_destroy, sem_wait method question

when there are waiting semaphores of sem_wait method, I call the sem_destroy method on other thread. But waiting semaphore was not wake up.
In case of mutex, pthread_mutex_destroy was return the value EBUSY when there are some waiting threads.
however sem_destroy return 0 and errno was also set 0.
I want to destroy semaphore after calling sem_destroy to block access as destroyed semaphore and to wake up the waiting thread.
Semaphore handle of Window OS is possible.
please advise me. thank you.
POSIX says this about sem_destroy:
The effect of destroying a semaphore upon which other threads are currently blocked is undefined.
It specifically doesn't say that other threads are woken up. In fact, if sem_t contains a pointer to memory, what it almost certainly does do is free the memory, meaning you then have a use-after-free security problem. (Whether that is the case depends on your libc.)
The general approach of allocation for mutexes and semaphores is that they should be either allocated and freed with their relevant data structure, or they should be allocated before the relevant code needs them and then freed after the entire code is done with them. In C, you cannot safely deallocate data structures (e.g., with sem_destroy) that are in use.
If you want to wake up all users of the semaphore, you must increment it until all users have awoken. You can call sem_getvalue to determine if anyone is waiting on the semaphore and then call sem_post to increment it. Only then can you safely destroy it. Note that this can have a race condition, depending on your code.
However, note that you must be careful that the other code does not continue to use the semaphore after it's destroyed, such as by trying to re-acquire it in a loop. If you are careful to structure your code properly, then you can have confidence that this won't happen.

Can it be assumed that `pthread_cond_signal` will wake the signaled thread atomically with regards to the mutex bond to the condition variable?

Quoting POSIX:
The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().
"If predictable scheduling behavior is required". This could/would hint that locking the mutex bound to the condition variable right before calling pthread_cond_signal() should guarantee that the signaled thread will be woken up before any other thread manages to lock this mutex. Is this correct?
We will se if any PThreads guru has a more comprehensive answer, but as far as I can see, at least in the Linux manpage, you do not get fully predictable behavior. What you do get is a guarantee that if two threads wait on the same condition variable, the higher-prio thread gets to go first (at least, that should be true on Linux if one thread is SCHED_OTHER and the other is real-time SCHED_FIFO). That holds if you lock mutex before signalling (with reservation for errors after a quick read of the manpage).
See
https://linux.die.net/man/3/pthread_cond_signal
No, there is no guarantee the signalled thread will be waken up. Worse, if in the signalling thread you have sequence:
while(run_again) {
pthread_mutex_lock(&mutex);
/* prepare data */
pthread_mutex_unlock(&mutex);
pthread_cond_broadcast(&cond);
}
there is reasonable chance control would never be passed to other threads waiting on mutex because of logic in the scheduler. Some examples to play with you can find in this answer.
No.
The best reference I have found regarding the predictability is this one:
https://austin-group-l.opengroup.narkive.com/lKcmfoRI/predictable-scheduling-behavior-in-pthread-cond-broadcast
Basically, people want to guard against the possibility that threads do not get a fair chance to run. Apparently, it is not a problem for most producer-consumer scenarios, and it does not apply to pthread_cond_broadcast as well. I would say, it is useful only in limited cases.
Cppreference.com actually considers unlocking after notifying may be a pessimization:
https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all

Is pthread_join() a critical function?

According to POSIX, a Thread ID can be reused if the original bearer thread finished. Therefore, would one need to use a mutex or semaphore when calling pthread_join()? Because, it could happen that the target thread, which one wants to join, already terminated and another thread with the same thread ID was created, before calling pthread_join() in the original thread. This would make the original thread believe that the target thread has not finished, although this is not the case.
I think you'll find this works much the same way as processes in UNIX. A joinable thread is not considered truly finished until something has actually joined it.
This is similar to the UNIX processes in that, even though they've technically exited, enough status information (including the PID, which cannot be re-used yet) hangs around until another process does a wait on it. Only after that point does the PID become available for re-use. This kind of process is called a zombie, since it's dead but not dead.
This is supported by the pthread_join documentation which states:
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
and pthread_create, which states:
Only when a terminated joinable thread has been joined are the last of its resources released back to the system.

Can a thread be logically interruptible while waiting for a mutex?

I was reading R&R's Unix system programming, I encounter a question about mutex. For the following paragraph stated in that book. When he said a thread that waits for a mutex is not logically interruptible, does it mean when a thread wait for a mutex, it won't be able to do a context switch? Can someone elaborate it?
A thread that waits for a mutex is not logically interruptible except
by termination of the process, termination of a thread with
pthread_exit (from a signal handler), or asynchronous cancellation
(which is normally not used).
No, it doesn't mean that it can't context switch. On the contrary, a thread waiting for a mutex that is already acquired almost always will context switch away, perhaps after a short delay.
All it means is that the pthread_mutex_lock() call won't return EINTR or similar - it will either successfully acquire the mutex, or return persistent failure.

How do I determine if a detached pthread is alive?

How do I determine if a detached pthread is still alive ?
I have a communication channel with the thread (a uni-directional queue pointing outwards from the thread) but what happens if the thread dies without a gasp?
Should I resign myself to using process signals or can I probe for thread liveliness somehow?
For a joinable (i.e NOT detached) pthread you could use pthread_kill like this:
int ret = pthread_kill(YOUR_PTHREAD_ID, 0);
If you get a ESRCH value, it might be the case that your thread is dead.
However this doesn't apply to a detached pthreads because after it has ended its thread ID can be reused for another thread.
From the comments:
The answer is wrong because if the thread is detached and is not
alive, the pthread_t is invalid. You can't pass it to pthread_kill. It
could, for example, be a pointer to a structure that was freed,
causing your program to crash. POSIX says, "A conforming
implementation is free to reuse a thread ID after its lifetime has
ended. If an application attempts to use a thread ID whose lifetime
has ended, the behavior is undefined." – Thanks #DavidSchwartz
This question assumes a design with an unavoidable race condition.
Presumably, you plan to do something like this:
Check to see if thread is alive
Wait for message from thread
The problem is that this sequence is not atomic and cannot be fixed. Specifically, what if the thread you are checking dies between step (1) and step (2)?
Race conditions are evil; rare race conditions doubly so. Papering over something 90% reliable with something 99.999% reliable is one of the worst decisions you can make.
The right answer to your question is "don't do that". Instead, fix your application so that threads do not die randomly.
If that is impossible, and some thread is prone to crashing, and you need to recover from that... Then your design is fundamentally flawed and you should not be using a thread. Put that unreliable thing in a different process and use a pipe to communicate with it instead. Process death closes file descriptors, and reading a pipe whose other end has been closed has well-defined, easily detected, race-free behavior.
It is probably undefined behaviour when you send a signal to an already dead thread. Your application might crash.
see http://sourceware.org/bugzilla/show_bug.cgi?id=4509 and http://udrepper.livejournal.com/16844.html

Resources