Linux, pthreads -- send stop condition - linux

I'm writing an application using phtreads and C under Linux. The main function starts a bunch of threads (up to 20). Then, under some criterion I need to stop all threads at once. Each thread is running under some conditional loop, like:
while (state) {....}
So I need to change state to false for each thread (I think at the moment it would be enough to have one state for each thread, but maybe in the future I'll have to stop each thread separately)
So, what's the best way to do it? I could use some state as a global variable and use mutex for it. Each time I will have to lock, read, unlock it to read it's value. And I think it's kinda of time consuming procedure. Do you have any other ideas how to implement it?

# man pthread_cancel
allows you to send cancellation request to the thread.
# man pthread_cleanup_push
allows you to set cancellation routine.

Related

Ending a thread that might be joined or dereferenced

I'm having a problem deciding on what to do in this situation, I want to have a detached thread, but still be able to join it in case I want to abort it early, presumably before starting a new instance of it, to make sure I don't have the thread still accessing things when it shouldn't.
This means I shouldn't detach the thread right after calling it, so then I have a few options:
Self-detach the thread when it's reaching the end of its execution, but then wouldn't this cause problems if I try to join it from the main thread? This would be my prefered solution if the problem of trying to join it after it's self-detached could be solved. I could dereference the thread handle that the main thread has access to from the self-detaching thread before self-detaching it, however in case the main thread tries to join right before the handle is dereferenced and the thread self-detached this could cause problems, so I'd have to protect the dereferencing in the thread and however (I don't know how, I might need to create a variable to indicate this) I would check if I should join in the main thread with a mutex, which complicates things. Somehow I have a feeling that this isn't the right way to do it.
Leave the thread hanging until eventually I join it, which could take a long time to happen, depending on how I organise things it could be not before I get rid of what it made (e.g. joining the thread right before freeing an image that was loaded/processed by the thread when I don't need it anymore)
Have the main thread poll periodically to know when the thread has done its job, then join it (or detach it actually) and indicate not to try joining it again?
Or should I just call pthread_exit() from the thread, but then what if I try to join it?
If I sound a bit confused it's because I am. I'm writing in C99 using TinyCThread, a simple wrapper to pthread and Win32 API threading. I'm not even sure how to dereference my thread handles, on Windows the thread handle is HANDLE, and setting a handle to NULL seems to do it, I'm not sure that's the right way to do it with the pthread_t type.
Epilogue: Based on John Bollinger's answer I chose to go with detaching the thread, putting most of that thread's code in a mutex, this way if any other thread wants to block until the thread is practically done it can use that mutex.
The price of using an abstraction layer such as TinyCThreads is that you can rely only on the defined characteristics of the abstraction. Both Windows and POSIX provide features and details that are not necessarily reflected by TinyCThreads. On the other hand, this may force you to rely on a firmer foundation than you might otherwise hack together with the help of implementation-specific features.
Anyway, you say,
I want to have a detached thread, but still be able to join it in case I want to abort it early,
but that's inconsistent. Once you detach a thread, you cannot join it. I suspect you meant something more like, "I want a thread that I can join as long as it is running, but that I don't have to join when it terminates." That's at least consistent, but it focuses on mechanism.
What I think you actually want would be described better as a thread that you can cancel synchronously as long as it is running, but that you otherwise don't need to join when it terminates. I note, however, that the whole idea presupposes a way to make the thread terminate early, and it does not appear that TinyCThread provides any built-in facility for that. It will also require a mechanism to determine whether a given thread is still alive, and TinyCThread does not provide that, either.
First, then, you need some additional per-thread shared state that tracks thread status (running / abort requested / terminated). Because the state is shared, you'll need a mutex to protect it, and that will probably need to be per-thread, too. Furthermore, in order to enable one thread (e.g. the main one) to wait for that state to change when it cancels a thread, it will need a per-thread condition variable.
With that in place, the new thread can self-detach, but it must periodically check whether an abort has been requested. When the thread ends its work, whether because it discovers an abort has been requested or because it reaches the normal end of its work, it performs any needed cleanup, sets the status to "terminated", broadcasts to the CV, and exits.
Any thread that wants to cancel another locks the associated mutex, and checks whether the thread is already terminated. If not, it sets the thread status to "abort requested", and waits on the condition variable until the status becomes "terminated". If desired, you could use a timed wait to allow the cancellation request to time out. After successfully canceling the thread, it may be possible to clean up the mutex, cv, and shared variable.
I note that all of that hinges on my interpretation of your request, and in particular, on the prospect that what you're after is aborting / canceling threads. None of the alternatives you floated seem to address that; for the most part they abandon the unwanted thread, which does not serve your expressed interest in preventing it from making unwanted changes to shared state.
It's not clear to me what you want, but you can use a condition variable to implement basically arbitrary joining semantics for threads. The POSIX Rationale contains an example of this, showing how to implement pthread_join with a timeout (search for timed_thread).

Confused about threads

I'm studying threads in C and I have this theoretical question in mind that is driving me crazy. Assume the following code:
1) void main() {
2) createThread(...); // create a new thread that does "something"
3) }
After line 2 is executed, two paths of execution are created. However I believe that immediately after line 2 is executed then it doesn't even matter what the new thread does, which was created at line 2, because the original thread that executed line 2 will end the entire program at its next instruction. Am I wrong? is there any chance the original thread gets suspended somehow and the new thread get its chance to do something (assume the code as is, no sync between threads or join operations are performed)
It can work out either way. If you have more than one core, the new thread might get its own core. Even if you don't, the scheduler might give the new thread priority over the existing one. The original thread might exhaust its timeslice right after it creates a new thread.
So that code creates a race condition -- one thread is trying to do work, another thread is trying to terminate the process. Which one wins will depend on the threading implementation, the hardware, and perhaps even some random chance.
If main() finishes before the spawned threads, all those threads will be terminated as there is no main() to support them.
Calling pthread_exit() at the end of main() will block it and keep it alive to support the threads it created until they complete execution.
You can learn more about this here: https://computing.llnl.gov/tutorials/pthreads/
Assuming you are using POSIX pthreads (not clear from your example) then you are right. If you don't want that then indeed pthread_exit from main will mean the program will continue to run until all the threads finish. The "main thread" is special in this regard, as its exit normally causes all threads to terminate.
More typically, you'll do something useful in the main thread after a new thread has been forked. Otherwise, what's the point? So you'll do your own processing, wait on some events, etc. If you want main (or any other thread) to wait for a thread to complete before proceeding, you can call pthread_join() with the handle of the thread of interest.
All of this may be off the point, however since you are not explicitly using POSIX threads in your example, so I don't know if that's pseudo-code for the purpose of example or literal code. In Windows, CreateThread has different semantics from POSIX pthreads. However, you didn't use that capitalization for the call in your example so I don't know if that's what you intended either. Personally I use the pthreads_win32 library even on Windows.

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

explicit joining of python threads?

I need to start some threads in a python program. The threads perform a background task which might take a long time, so I don't want to block the main thread waiting on the task to happen.
Python provides the ability to 'reap' threads using Thread.join() and Thread.isAlive(). But I don't actually care about finding out when the thread has finished. I'm content to start up the thread, let it do it's thing and never worry about it again.
The question is, do I need to keep references around to the Thread objects that I start so that I can later join() them? Or can I just let the reference to the Thread object go out of scope and not worry about it? Is there a 'right' thing to do in this case?
You don't have to explicitly join threads -- just make sure they're not "daemonized" (leave their daemon attribute to the default, False) so they'll keep the process alive until they're all done (if you make your threads daemons, then you must make sure the main thread does not terminate until all relevant threads are done, or else the threads will be killed by the OS).
I think the right thing is the simplest one: forget about your "background threads", just make them non-daemons (which is after all their default state).

pthreads - how to parallelize a job

I need to parallelize a simple password cracker, for using it on a n-processor system. My idea is to create n threads and to feed them more and more job as they finish.
What is the best way to know when a thread has finished? A mutex? Isn't expensive checking this mutex constantly while other threads are running?
You can have a simple queue structure - use any data structure you like - and then just use a mutex when you add/remove items from it.
Provided your threads grab the work they need to do in big enough "chunks", then there will be very little contention on the mutex, so very little overhead.
For example, if each thread was to grab approximately 1 second of work at a time and work independently for 1 second, then there would be very few operations on the mutex.
The threads could exit when they had no more work; the main thread could then wait using pthread_join.
Use message queues between the threads :-
Master -> Process (saying go with this).
Process -> Master (saying I'm done - give me more, or, I've found the result!)
Using this, the thread only closes down when the system does - otherwise it's either processing data or waiting on a message queue.
This way, the MCP (I've always wanted to say that!) simply processes messages and hands jobs out to threads that are waiting for more work.
This may be more efficient that creating and destroying threads all the time.
Normally you use a "condition variable" for this kind of thing where you want to wait for an asynchronous job to finish.
Condition variables are basically mutex-protected, simple signals. Pthread has condition variables (see e.g. the pthread_cond_create(...) function).

Resources