explicit joining of python threads? - multithreading

I need to start some threads in a python program. The threads perform a background task which might take a long time, so I don't want to block the main thread waiting on the task to happen.
Python provides the ability to 'reap' threads using Thread.join() and Thread.isAlive(). But I don't actually care about finding out when the thread has finished. I'm content to start up the thread, let it do it's thing and never worry about it again.
The question is, do I need to keep references around to the Thread objects that I start so that I can later join() them? Or can I just let the reference to the Thread object go out of scope and not worry about it? Is there a 'right' thing to do in this case?

You don't have to explicitly join threads -- just make sure they're not "daemonized" (leave their daemon attribute to the default, False) so they'll keep the process alive until they're all done (if you make your threads daemons, then you must make sure the main thread does not terminate until all relevant threads are done, or else the threads will be killed by the OS).
I think the right thing is the simplest one: forget about your "background threads", just make them non-daemons (which is after all their default state).

Related

Ending a thread that might be joined or dereferenced

I'm having a problem deciding on what to do in this situation, I want to have a detached thread, but still be able to join it in case I want to abort it early, presumably before starting a new instance of it, to make sure I don't have the thread still accessing things when it shouldn't.
This means I shouldn't detach the thread right after calling it, so then I have a few options:
Self-detach the thread when it's reaching the end of its execution, but then wouldn't this cause problems if I try to join it from the main thread? This would be my prefered solution if the problem of trying to join it after it's self-detached could be solved. I could dereference the thread handle that the main thread has access to from the self-detaching thread before self-detaching it, however in case the main thread tries to join right before the handle is dereferenced and the thread self-detached this could cause problems, so I'd have to protect the dereferencing in the thread and however (I don't know how, I might need to create a variable to indicate this) I would check if I should join in the main thread with a mutex, which complicates things. Somehow I have a feeling that this isn't the right way to do it.
Leave the thread hanging until eventually I join it, which could take a long time to happen, depending on how I organise things it could be not before I get rid of what it made (e.g. joining the thread right before freeing an image that was loaded/processed by the thread when I don't need it anymore)
Have the main thread poll periodically to know when the thread has done its job, then join it (or detach it actually) and indicate not to try joining it again?
Or should I just call pthread_exit() from the thread, but then what if I try to join it?
If I sound a bit confused it's because I am. I'm writing in C99 using TinyCThread, a simple wrapper to pthread and Win32 API threading. I'm not even sure how to dereference my thread handles, on Windows the thread handle is HANDLE, and setting a handle to NULL seems to do it, I'm not sure that's the right way to do it with the pthread_t type.
Epilogue: Based on John Bollinger's answer I chose to go with detaching the thread, putting most of that thread's code in a mutex, this way if any other thread wants to block until the thread is practically done it can use that mutex.
The price of using an abstraction layer such as TinyCThreads is that you can rely only on the defined characteristics of the abstraction. Both Windows and POSIX provide features and details that are not necessarily reflected by TinyCThreads. On the other hand, this may force you to rely on a firmer foundation than you might otherwise hack together with the help of implementation-specific features.
Anyway, you say,
I want to have a detached thread, but still be able to join it in case I want to abort it early,
but that's inconsistent. Once you detach a thread, you cannot join it. I suspect you meant something more like, "I want a thread that I can join as long as it is running, but that I don't have to join when it terminates." That's at least consistent, but it focuses on mechanism.
What I think you actually want would be described better as a thread that you can cancel synchronously as long as it is running, but that you otherwise don't need to join when it terminates. I note, however, that the whole idea presupposes a way to make the thread terminate early, and it does not appear that TinyCThread provides any built-in facility for that. It will also require a mechanism to determine whether a given thread is still alive, and TinyCThread does not provide that, either.
First, then, you need some additional per-thread shared state that tracks thread status (running / abort requested / terminated). Because the state is shared, you'll need a mutex to protect it, and that will probably need to be per-thread, too. Furthermore, in order to enable one thread (e.g. the main one) to wait for that state to change when it cancels a thread, it will need a per-thread condition variable.
With that in place, the new thread can self-detach, but it must periodically check whether an abort has been requested. When the thread ends its work, whether because it discovers an abort has been requested or because it reaches the normal end of its work, it performs any needed cleanup, sets the status to "terminated", broadcasts to the CV, and exits.
Any thread that wants to cancel another locks the associated mutex, and checks whether the thread is already terminated. If not, it sets the thread status to "abort requested", and waits on the condition variable until the status becomes "terminated". If desired, you could use a timed wait to allow the cancellation request to time out. After successfully canceling the thread, it may be possible to clean up the mutex, cv, and shared variable.
I note that all of that hinges on my interpretation of your request, and in particular, on the prospect that what you're after is aborting / canceling threads. None of the alternatives you floated seem to address that; for the most part they abandon the unwanted thread, which does not serve your expressed interest in preventing it from making unwanted changes to shared state.
It's not clear to me what you want, but you can use a condition variable to implement basically arbitrary joining semantics for threads. The POSIX Rationale contains an example of this, showing how to implement pthread_join with a timeout (search for timed_thread).

Confused about threads

I'm studying threads in C and I have this theoretical question in mind that is driving me crazy. Assume the following code:
1) void main() {
2) createThread(...); // create a new thread that does "something"
3) }
After line 2 is executed, two paths of execution are created. However I believe that immediately after line 2 is executed then it doesn't even matter what the new thread does, which was created at line 2, because the original thread that executed line 2 will end the entire program at its next instruction. Am I wrong? is there any chance the original thread gets suspended somehow and the new thread get its chance to do something (assume the code as is, no sync between threads or join operations are performed)
It can work out either way. If you have more than one core, the new thread might get its own core. Even if you don't, the scheduler might give the new thread priority over the existing one. The original thread might exhaust its timeslice right after it creates a new thread.
So that code creates a race condition -- one thread is trying to do work, another thread is trying to terminate the process. Which one wins will depend on the threading implementation, the hardware, and perhaps even some random chance.
If main() finishes before the spawned threads, all those threads will be terminated as there is no main() to support them.
Calling pthread_exit() at the end of main() will block it and keep it alive to support the threads it created until they complete execution.
You can learn more about this here: https://computing.llnl.gov/tutorials/pthreads/
Assuming you are using POSIX pthreads (not clear from your example) then you are right. If you don't want that then indeed pthread_exit from main will mean the program will continue to run until all the threads finish. The "main thread" is special in this regard, as its exit normally causes all threads to terminate.
More typically, you'll do something useful in the main thread after a new thread has been forked. Otherwise, what's the point? So you'll do your own processing, wait on some events, etc. If you want main (or any other thread) to wait for a thread to complete before proceeding, you can call pthread_join() with the handle of the thread of interest.
All of this may be off the point, however since you are not explicitly using POSIX threads in your example, so I don't know if that's pseudo-code for the purpose of example or literal code. In Windows, CreateThread has different semantics from POSIX pthreads. However, you didn't use that capitalization for the call in your example so I don't know if that's what you intended either. Personally I use the pthreads_win32 library even on Windows.

Intervening threads that waited for too long

Is there anyway in F# that I can detect if a current waiting thread is waiting for too long without being contacted?
I have a case where threads must be actively contacting other waiting threads to pass their work to once they're finished. My solution is having a bug somewhere that sometimes one or more threads just wait for too long and eventually the program got deadlocked because other threads don't contact them.
I think by detecting if a waiting thread is simply waiting for too long, it will just actively go looking for available work, rather than keeping waiting for other threads to pass to it.
It's probably better to try and understand why your threads are getting stuck than just terminating them. If you can reproduce this with the Visual Studio debugger attached, you can click the Pause button and use the Threads window to see what code all threads are in.
That said; if you still have the need to do this, the solution will depend on how you're managing your threads. To monitor them from the outside, you'll need some process that has a list of threads and the ability to tell whether they're dead.
The Thread class doesn't appear have any built-in mechanism for sharing state between the thread and its control except for Name. You could possibly abuse name, but I would probably have a thread-safe collection (eg. a ConcurrentDictionary<Thread, DateTime>) to store all of the threads and the timestamp of their last communication, and pass an Action into each thread when it's started that allows it to "Ping" by calling the action periodically. The action would simply update the DateTime stored against that thread.
The controlling process then simply scans through the dictionary periodically for anything with a timestamp that is too old, declares that thread dead and Aborts() it.
It's hard to give a code sample without knowing exactly how you're spawning your threads and describe what a thread "being contacted" means in more detail.

Linux, pthreads -- send stop condition

I'm writing an application using phtreads and C under Linux. The main function starts a bunch of threads (up to 20). Then, under some criterion I need to stop all threads at once. Each thread is running under some conditional loop, like:
while (state) {....}
So I need to change state to false for each thread (I think at the moment it would be enough to have one state for each thread, but maybe in the future I'll have to stop each thread separately)
So, what's the best way to do it? I could use some state as a global variable and use mutex for it. Each time I will have to lock, read, unlock it to read it's value. And I think it's kinda of time consuming procedure. Do you have any other ideas how to implement it?
# man pthread_cancel
allows you to send cancellation request to the thread.
# man pthread_cleanup_push
allows you to set cancellation routine.

pthreads - how to parallelize a job

I need to parallelize a simple password cracker, for using it on a n-processor system. My idea is to create n threads and to feed them more and more job as they finish.
What is the best way to know when a thread has finished? A mutex? Isn't expensive checking this mutex constantly while other threads are running?
You can have a simple queue structure - use any data structure you like - and then just use a mutex when you add/remove items from it.
Provided your threads grab the work they need to do in big enough "chunks", then there will be very little contention on the mutex, so very little overhead.
For example, if each thread was to grab approximately 1 second of work at a time and work independently for 1 second, then there would be very few operations on the mutex.
The threads could exit when they had no more work; the main thread could then wait using pthread_join.
Use message queues between the threads :-
Master -> Process (saying go with this).
Process -> Master (saying I'm done - give me more, or, I've found the result!)
Using this, the thread only closes down when the system does - otherwise it's either processing data or waiting on a message queue.
This way, the MCP (I've always wanted to say that!) simply processes messages and hands jobs out to threads that are waiting for more work.
This may be more efficient that creating and destroying threads all the time.
Normally you use a "condition variable" for this kind of thing where you want to wait for an asynchronous job to finish.
Condition variables are basically mutex-protected, simple signals. Pthread has condition variables (see e.g. the pthread_cond_create(...) function).

Resources