How do I determine if a detached pthread is alive? - linux

How do I determine if a detached pthread is still alive ?
I have a communication channel with the thread (a uni-directional queue pointing outwards from the thread) but what happens if the thread dies without a gasp?
Should I resign myself to using process signals or can I probe for thread liveliness somehow?

For a joinable (i.e NOT detached) pthread you could use pthread_kill like this:
int ret = pthread_kill(YOUR_PTHREAD_ID, 0);
If you get a ESRCH value, it might be the case that your thread is dead.
However this doesn't apply to a detached pthreads because after it has ended its thread ID can be reused for another thread.
From the comments:
The answer is wrong because if the thread is detached and is not
alive, the pthread_t is invalid. You can't pass it to pthread_kill. It
could, for example, be a pointer to a structure that was freed,
causing your program to crash. POSIX says, "A conforming
implementation is free to reuse a thread ID after its lifetime has
ended. If an application attempts to use a thread ID whose lifetime
has ended, the behavior is undefined." – Thanks #DavidSchwartz

This question assumes a design with an unavoidable race condition.
Presumably, you plan to do something like this:
Check to see if thread is alive
Wait for message from thread
The problem is that this sequence is not atomic and cannot be fixed. Specifically, what if the thread you are checking dies between step (1) and step (2)?
Race conditions are evil; rare race conditions doubly so. Papering over something 90% reliable with something 99.999% reliable is one of the worst decisions you can make.
The right answer to your question is "don't do that". Instead, fix your application so that threads do not die randomly.
If that is impossible, and some thread is prone to crashing, and you need to recover from that... Then your design is fundamentally flawed and you should not be using a thread. Put that unreliable thing in a different process and use a pipe to communicate with it instead. Process death closes file descriptors, and reading a pipe whose other end has been closed has well-defined, easily detected, race-free behavior.

It is probably undefined behaviour when you send a signal to an already dead thread. Your application might crash.
see http://sourceware.org/bugzilla/show_bug.cgi?id=4509 and http://udrepper.livejournal.com/16844.html

Related

Race condition on pthread_kill()?

Linux manual for pthread_kill() has the following paragraph:
POSIX.1-2008 recommends that if an implementation detects the use
of a thread ID after the end of its lifetime, pthread_kill()
should return the error ESRCH. The glibc implementation returns
this error in the cases where an invalid thread ID can be detected.
But note also that POSIX says that an attempt to use a thread ID whose
lifetime has ended produces undefined behavior, and an attempt to
use an invalid thread ID in a call to pthread_kill() can, for example,
cause a segmentation fault.
The problem is, between checking the thread ID is valid and issuing the pthread_kill(), the thread might have terminated. Is it inherently unsafe to use pthread_kill(), as there is always a race condition that can turn into a undefined behavior?
How to ensure thread ID will be valid?
Race condition on pthread_kill()?
When the thread is detached, always. But if thread ID is valid, no.
Is it inherently unsafe to use pthread_kill(), as there is always a race condition that can turn into a undefined behavior?
No, not always.
How to ensure thread ID will be valid?
From POSIX thread ID:
The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.
Otherwise it's valid. So when the thread is not detached nor joined, thread ID is just valid and you always at any time can call pthread_kill() with it.
Generally, you should stop using a thread ID after pthread_detach or pthread_join. It's like free() in malloc() - you can't use memory allocated by malloc() after free(). The same way you can't use thread ID after detaching or joining, the thread ID just becomes invalid. Just with pthread_detach it becomes invalid "later", but you don't know when, so you can't use it anyway (well, unless you write your own synchronization). It might become invalid right after the call to pthread_detach. If you intent to do anything with a thread ID, do not detach and do not join it.
The call to pthread_kill with "inactive thread" (non-detached non-joined thread that terminated) is valid - the thread ID is still valid. We can read from pthread_kill posix:
Existing implementations vary on the result of a pthread_kill() with a thread ID indicating an inactive thread (a terminated thread that has not been detached or joined). Some indicate success on such a call, while others give an error of [ESRCH]. Since the definition of thread lifetime in this volume of POSIX.1-2017 covers inactive threads, the [ESRCH] error as described is inappropriate in this case. In particular, this means that an application cannot have one thread check for termination of another with pthread_kill().
FUTURE DIRECTIONS
A future version of this standard may require that pthread_kill() not fail with [ESRCH] in the case of sending signals to an inactive thread (a terminated thread not yet detached or joined), even though no signal will be delivered because the thread is no longer running.
The FUTURE DIRECTIONS looks like it's prefering that pthread_kill() with an inactive thread should just succeed and return 0. I personally like the ESRCH error in such case.
How to ensure thread ID will be valid?
You must redesign so that your code knows this a priori.* Anything short of that is a TOCTOU race (CWE-367).
Fortunately, there's a lot of prior art from interprocess killing. Interprocess signaling doesn't run the terrifying risk of undefined behavior like pthread_kill does, but careful coders consider the risk of signaling a recycled PID just as unacceptable. (And thread IDs can be recycled, too.)
* OK, you could do it by checking some contrived state. For example, set a mutex-protected i_am_still_running flag to false at the very, very end of your thread routine. Then only pthread_kill that thread while holding the mutex and confirming that it is still running. Yuck.

Make thread wait for condition but allow thread to remain usable while waiting or listening for a signal

Given a situation where thread A had to dispatch work to thread B, is there any synchronisation mechanism that allows thread A to not return, but remain usable for other tasks, until thread B is done, of which then thread A can return?
This is not language specific, but simple c language would be a great choice in responding to this.
This could be absolutely counterintuitive; it actually sounds as such, but I have to ask before presuming...
Please Note This is a made up hypothetical situation that I'm interested in. I am not looking for a solution to an existing problem, so alternative concurrency solutions are completely pointless. I have no code for it, and if I were in it I can think of a few alternative code engineering solutions to avoid this setup. I just wish to know if a thread can be usable, in some way, while waiting for a signal from another thread, and what synchronisation mechanism to use for that.
UPDATE
As I mentioned above, I know how to synchronise threads etc. Im only interested in the situation that I have presented here. Mutexes, semaphores and locks all kinds of mechanisms will all synchronise access to resources, synchronise order of events, synchronise all kinds of concurrently issues, yes. But Im not interested in how to do it properly. I just have this made up situation that I wish to know if it can be addressed with a mechanism as described prior.
UPDATE 2
It seems I have opened up a portal for people that think they are experts in concurrency to teleport and lecture at chance how they think the rest of world does not know how threading works. I simply asked if there is a mechanism for this situation, not a work around solution, not 'the proper way to synchronise', not a better way to do it. I already know what I would do and never be in this made up situation. It's simply hypothetical.
After much research, thought, and overview, I have come to the conclusion that its like asking:
If a calculator has the ability for me simply enter a series of 5 digits and automatically get their sum on the screen.
No, it does not have such a mode ready. But I can still get the sum with a few extra clicks using the plus and eventually the equal button.
If i really wanted a thread that can continue while listening for a condition of some sort, I could easily implement a personal class or object around the OS/kernel/SDK thread or whatever and make use of that.
• So at a low level, my answer is no, there is no such mechanism •
If a thread is waiting, then it's waiting. If it can continue executing then it is not really 'waiting', in the concurrency meaning of waiting. Otherwise there would be some other term for this state (Alert Waiting, anyone?). This is not to say it is not possible, just not with one simple low level predefined mechanism similar to a mutex or semaphore etc. One could wrap the required functionality in some class or object etc.
Having said that, there are Interrupts and Interrupt handlers, which come close to addressing this situation. However, an interrupt has to be defined, with its handler. The interrupts may actually be running on another thread (not to say a thread per interrupt). So a number of objects are involved here.
You have a misunderstanding about how mutexes are typically used.
If you want to do some work, you acquire the mutex to figure out what work you need to do. You do this because "what work you need to do" is shared between the thread that decide what work needed to be done and the thread that's going to do the work. But then you release the mutex that protects "what work you need to do" while you do the work.
Then, when you finish the work, you acquire the mutex that protects your report that the work is done. This is needed because the status of the work is shared with other threads. You set that status to "done" and then you release the mutex.
Notice that no thread holds the mutex for very long, just for the microscopic fraction of a second it needs to check on or modify shared state. So to see if work is done, you can acquire the mutex that protects the reporting of the status of that work, check the status, and then release the mutex. The thread doing the work will not hold that mutex for longer than the tiny fraction of a second it needs to change that status.
If you're holding mutexes so long that you worry at all about waiting for them to be released, you're either doing something wrong or using mutexes in a very atypical way.
So use a mutex to protect the status of the work. If you need to wait for work to be done, also use a condition variable. Only hold that mutex while changing, or checking, the status of the work.
But, If a thread attempts to acquire an already acquired mutex, that thread will be forced to wait until the thread that originally acquired the mutex releases it. So, while that thread is waiting, can it actually be usable. This is where my question is.
If you consider any case where one thread might slow another thread down to be "waiting", then you can never avoid waiting. All that has to happen is one thread accesses memory and that might slow another thread down. So what do you do, never access memory?
When we talk about one thread "waiting" for another, what we mean is waiting for the thread to do actual work. We don't worry about the microscopic overhead of inter-thread synchronization both because there's nothing we can do about it and because it's negligible.
If you literally want to find some way that one thread can never, ever slow another thread down, you'll have to re-design pretty much everything we use threads for.
Update:
For example, consider some code that has a mutex and a boolean. The boolean indicates whether or not the work is done. The "assign work" flow looks like this:
Create a work object with a mutex and a boolean. Set the boolean to false.
Dispatch a thread to work on that object.
The "do work" flow looks like this:
Do work. (The mutex is not held here.)
Acquire mutex.
Set boolean to true.
Release mutex.
The "is work done" flow looks like this:
Acquire mutex.
Copy boolean.
Release mutex.
Look at copied value.
This allows one thread to do work and another thread to check if the work is done any time it wants to while doing other things. The only case where one thread waits for the other is the one-in-a-million case where a thread that needs to check if the work is done happens to check right at the instant that the work has just finished. Even in that case, it will typically block for less than a microsecond as the thread that holds the mutex only needs to set one boolean and release the mutex. And if even that bothers you, most mutexes have a non-blocking "try to lock" function (which you would use in the "check if work is done" flow so that the checking thread never blocks).
And this is the normal way mutexes are used. Actual contention is the exception, not the rule.

Ending a thread that might be joined or dereferenced

I'm having a problem deciding on what to do in this situation, I want to have a detached thread, but still be able to join it in case I want to abort it early, presumably before starting a new instance of it, to make sure I don't have the thread still accessing things when it shouldn't.
This means I shouldn't detach the thread right after calling it, so then I have a few options:
Self-detach the thread when it's reaching the end of its execution, but then wouldn't this cause problems if I try to join it from the main thread? This would be my prefered solution if the problem of trying to join it after it's self-detached could be solved. I could dereference the thread handle that the main thread has access to from the self-detaching thread before self-detaching it, however in case the main thread tries to join right before the handle is dereferenced and the thread self-detached this could cause problems, so I'd have to protect the dereferencing in the thread and however (I don't know how, I might need to create a variable to indicate this) I would check if I should join in the main thread with a mutex, which complicates things. Somehow I have a feeling that this isn't the right way to do it.
Leave the thread hanging until eventually I join it, which could take a long time to happen, depending on how I organise things it could be not before I get rid of what it made (e.g. joining the thread right before freeing an image that was loaded/processed by the thread when I don't need it anymore)
Have the main thread poll periodically to know when the thread has done its job, then join it (or detach it actually) and indicate not to try joining it again?
Or should I just call pthread_exit() from the thread, but then what if I try to join it?
If I sound a bit confused it's because I am. I'm writing in C99 using TinyCThread, a simple wrapper to pthread and Win32 API threading. I'm not even sure how to dereference my thread handles, on Windows the thread handle is HANDLE, and setting a handle to NULL seems to do it, I'm not sure that's the right way to do it with the pthread_t type.
Epilogue: Based on John Bollinger's answer I chose to go with detaching the thread, putting most of that thread's code in a mutex, this way if any other thread wants to block until the thread is practically done it can use that mutex.
The price of using an abstraction layer such as TinyCThreads is that you can rely only on the defined characteristics of the abstraction. Both Windows and POSIX provide features and details that are not necessarily reflected by TinyCThreads. On the other hand, this may force you to rely on a firmer foundation than you might otherwise hack together with the help of implementation-specific features.
Anyway, you say,
I want to have a detached thread, but still be able to join it in case I want to abort it early,
but that's inconsistent. Once you detach a thread, you cannot join it. I suspect you meant something more like, "I want a thread that I can join as long as it is running, but that I don't have to join when it terminates." That's at least consistent, but it focuses on mechanism.
What I think you actually want would be described better as a thread that you can cancel synchronously as long as it is running, but that you otherwise don't need to join when it terminates. I note, however, that the whole idea presupposes a way to make the thread terminate early, and it does not appear that TinyCThread provides any built-in facility for that. It will also require a mechanism to determine whether a given thread is still alive, and TinyCThread does not provide that, either.
First, then, you need some additional per-thread shared state that tracks thread status (running / abort requested / terminated). Because the state is shared, you'll need a mutex to protect it, and that will probably need to be per-thread, too. Furthermore, in order to enable one thread (e.g. the main one) to wait for that state to change when it cancels a thread, it will need a per-thread condition variable.
With that in place, the new thread can self-detach, but it must periodically check whether an abort has been requested. When the thread ends its work, whether because it discovers an abort has been requested or because it reaches the normal end of its work, it performs any needed cleanup, sets the status to "terminated", broadcasts to the CV, and exits.
Any thread that wants to cancel another locks the associated mutex, and checks whether the thread is already terminated. If not, it sets the thread status to "abort requested", and waits on the condition variable until the status becomes "terminated". If desired, you could use a timed wait to allow the cancellation request to time out. After successfully canceling the thread, it may be possible to clean up the mutex, cv, and shared variable.
I note that all of that hinges on my interpretation of your request, and in particular, on the prospect that what you're after is aborting / canceling threads. None of the alternatives you floated seem to address that; for the most part they abandon the unwanted thread, which does not serve your expressed interest in preventing it from making unwanted changes to shared state.
It's not clear to me what you want, but you can use a condition variable to implement basically arbitrary joining semantics for threads. The POSIX Rationale contains an example of this, showing how to implement pthread_join with a timeout (search for timed_thread).

What are the possible threats that a waiting pthread_mutex might encounter?

If a pthread is locking a shared resource.
Is there any threat that a waiting pthread_mutex might encounter?
Something like limitation of parallel pthreads, time limit, event, ...
As you can see in the specification, for example here, pthread_mutex_lock() has a int return value. Apart from the trivial/obvious error causes, such as "invalid argument" etc, there is one which can actually be considered a "threat". Especially a threat to people who do not check return values.
This threat is the return value EAGAIN which if not caught properly can cause your program to become faulty, accessing the resource the mutex is supposed to protect while it did not acquire the mutex. EAGAIN can happen for example, if the process received a System V "signal" and if this thread with the code is affected by it.
In general, using Unix System V constructs (such as signals) in parallel with Posix threads is at least dangerous. In Unix System V, threads did not exist and it was clear that the single main thread of a process was "interrupted" and used to handle the signal (using a stack-switch to the signal stack). Any kernel side blocking of the main thread got interrupted, the blocking function returns with EAGAIN and has to re-issue its call after handling the signal.
Hence, unfortunately the only fool-proof way of coding on Posix/Unix systems involves an abundance of while loops around anything which might block.
while( EAGAIN == pthread_mutex_lock(...) );
Not doing that would mean that your code can only be used in applications which clearly exert full control over signal behavior. Such as disabling all signals or using other means to ensure that the thread executing this code will not be affected.
Apart from this, Mutexes are system resources (kernel objects) and the amount available is not infinite, yet not usually something to worry about. I hope for other answers to elaborate on such limits.
EDIT It seems the documentation has changed in the past few years. Now they state, that EAGAIN would be related to the limit of recursive locks and that EINTR shall not happen. In the past, at least there were systems/documentations which conformed with my explanation above.
Also new (at least to me):
If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.
Well, maybe they learned something since I last was forced to work with such systems.

Is waiting for an event that will never trigger a deadlock?

A deadlock normally means that thread (or process) A is waiting for thread B, and at the same time thread B is waiting for thread A.
Currently I encountered a similar situation in our application. Thread A is waiting for an event to be set by thread B. However, thread B is not waiting for thread A, it just won't set the event (no matter for what reason). I am wondering whether this situation can also be called a "deadlock", or is there an other term for this?
I'd call it a bug or bad design. But it is not deadlock if one thread is still running.
Strictly speaking, no that's not deadlock, which is what you initially said (except that in general there could be a whole cycle of threads each waiting for the next one's lock: A->B->...->Z->A).
I think you could call it resource starvation, but that's quite a general term that also covers deadlock.
I would call it a starvation (ressource being CPU), not a deadlock.
Yes - I would call this a deadlock, too.
However, only one thread (Thread A) is affected from it, not the entire application.
Here is my point of view :
A deadlock is a situation where the global state of the program does not progress anymore.
If A is blocked but the program can still terminate because B may find a solution, it is not a deadlock.

Resources