Why was the method java.lang.Thread.join() named like that? - multithreading

Does anybody know why the method join() member of a java.lang.Thread was named like that? Its javadoc is:
Waits for this thread to die.
When join is called on some thread calling thread is waiting for the other to die and continue execution. Supposedly calling thread will die as well, but still it's not clear why the author used this name.

It's a common name in threading - it's not like Java was the first to use it. (For example, that's what pthreads uses too.)
I guess you could imagine it like two people taking a walk - you join the other one and walk with them until you've finished, before going back to what you were doing. That sort of analogy may have been the original reason, although I agree it's not exactly intuitive.

It's named this way because you're basically stating that the calling thread of execution is going to wait to join the given state of execution. It's also named join in posix and many other threading packages.
After that call to join returns (unless it was interrupted), the two threads of execution are basically running together from that point (with that thread getting the return value of the now-terminated thread).

This stems from concurrent software modeling when the flow of control splits into to concurrent threads. Later, the two threads of execution will join again.
Also waitToDie() was probably a) too long and b) too morbid.

well... this isnt really correct but I thought of an "waiting room" (it actually isnt a queue with a certain scheduling as FIFO, HRRN or such).
when a thread cannot go on and needs to wait on some other thread to finish it just joins the guys (aka threads) in the waiting room to get active next...

Because you are waiting for another thread of execution (i.e. the one you're calling join on) to join (i.e. die) to the current (i.e. the calling) thread.
The calling thread does not die: it simply waits for the other thread to do so.

This is a terminology that is widely used(outside Java as well). I take it as sort of Associating a Thread with another one in some way. I think Thread.Associate() could have been a better option but Join() isn't bad either.

Related

Ending a thread that might be joined or dereferenced

I'm having a problem deciding on what to do in this situation, I want to have a detached thread, but still be able to join it in case I want to abort it early, presumably before starting a new instance of it, to make sure I don't have the thread still accessing things when it shouldn't.
This means I shouldn't detach the thread right after calling it, so then I have a few options:
Self-detach the thread when it's reaching the end of its execution, but then wouldn't this cause problems if I try to join it from the main thread? This would be my prefered solution if the problem of trying to join it after it's self-detached could be solved. I could dereference the thread handle that the main thread has access to from the self-detaching thread before self-detaching it, however in case the main thread tries to join right before the handle is dereferenced and the thread self-detached this could cause problems, so I'd have to protect the dereferencing in the thread and however (I don't know how, I might need to create a variable to indicate this) I would check if I should join in the main thread with a mutex, which complicates things. Somehow I have a feeling that this isn't the right way to do it.
Leave the thread hanging until eventually I join it, which could take a long time to happen, depending on how I organise things it could be not before I get rid of what it made (e.g. joining the thread right before freeing an image that was loaded/processed by the thread when I don't need it anymore)
Have the main thread poll periodically to know when the thread has done its job, then join it (or detach it actually) and indicate not to try joining it again?
Or should I just call pthread_exit() from the thread, but then what if I try to join it?
If I sound a bit confused it's because I am. I'm writing in C99 using TinyCThread, a simple wrapper to pthread and Win32 API threading. I'm not even sure how to dereference my thread handles, on Windows the thread handle is HANDLE, and setting a handle to NULL seems to do it, I'm not sure that's the right way to do it with the pthread_t type.
Epilogue: Based on John Bollinger's answer I chose to go with detaching the thread, putting most of that thread's code in a mutex, this way if any other thread wants to block until the thread is practically done it can use that mutex.
The price of using an abstraction layer such as TinyCThreads is that you can rely only on the defined characteristics of the abstraction. Both Windows and POSIX provide features and details that are not necessarily reflected by TinyCThreads. On the other hand, this may force you to rely on a firmer foundation than you might otherwise hack together with the help of implementation-specific features.
Anyway, you say,
I want to have a detached thread, but still be able to join it in case I want to abort it early,
but that's inconsistent. Once you detach a thread, you cannot join it. I suspect you meant something more like, "I want a thread that I can join as long as it is running, but that I don't have to join when it terminates." That's at least consistent, but it focuses on mechanism.
What I think you actually want would be described better as a thread that you can cancel synchronously as long as it is running, but that you otherwise don't need to join when it terminates. I note, however, that the whole idea presupposes a way to make the thread terminate early, and it does not appear that TinyCThread provides any built-in facility for that. It will also require a mechanism to determine whether a given thread is still alive, and TinyCThread does not provide that, either.
First, then, you need some additional per-thread shared state that tracks thread status (running / abort requested / terminated). Because the state is shared, you'll need a mutex to protect it, and that will probably need to be per-thread, too. Furthermore, in order to enable one thread (e.g. the main one) to wait for that state to change when it cancels a thread, it will need a per-thread condition variable.
With that in place, the new thread can self-detach, but it must periodically check whether an abort has been requested. When the thread ends its work, whether because it discovers an abort has been requested or because it reaches the normal end of its work, it performs any needed cleanup, sets the status to "terminated", broadcasts to the CV, and exits.
Any thread that wants to cancel another locks the associated mutex, and checks whether the thread is already terminated. If not, it sets the thread status to "abort requested", and waits on the condition variable until the status becomes "terminated". If desired, you could use a timed wait to allow the cancellation request to time out. After successfully canceling the thread, it may be possible to clean up the mutex, cv, and shared variable.
I note that all of that hinges on my interpretation of your request, and in particular, on the prospect that what you're after is aborting / canceling threads. None of the alternatives you floated seem to address that; for the most part they abandon the unwanted thread, which does not serve your expressed interest in preventing it from making unwanted changes to shared state.
It's not clear to me what you want, but you can use a condition variable to implement basically arbitrary joining semantics for threads. The POSIX Rationale contains an example of this, showing how to implement pthread_join with a timeout (search for timed_thread).

What are the main purposes for joining pthreads in Linux/UNIX?

I'm a student and I'm going over threads right now, and despite reading TLPI very carefully, I still don't have a good understanding as to why one might join two pthreads.
From what I've gleaned, it can be used either as a way for one thread to pass a return value to another OR it can be used as a waiting mechanism between threads. That said, it's entirely possible that I've misunderstood the entire point. Would someone mind explaining it a bit for me?
Threads are mainly used for parallel processing. Joining/Exiting threads means the work/purpose of the thread is fulfilled. When the purpose is fulfilled then the resources should be freed and made available to other threads/processes. Resources could be any of following:
Stack (as Basile Starynkevitch said)
Processor time
Opened files/Shared Memory/Any other resource locked/booked by the thread.
Joining threads can be done for just shifting the control also Or it might be done for transferring values as return values (as Michael Burr said).

How is ThreadPool implemented in .NET 4.0?

I recently tried to work out how the solution to a ThreadPool class works in .NET 4.0. I tried to read through a reflected code but it seems a bit too extensive for me.
Could someone explain in simple terms how this class works i.e.
How it stores each methods that are coming in
Is it thread safe, supposedly multiple threads try to enqueue their methods in the thread pool?
When it reaches the limit of available threads, how does it return to execute the remaining batch waiting in the queue when one of the threads becomes free? Is there some callback mechanism for it?
Of course, in the absence of the actual implementation (or in the absence of Eric Lippert :) ) what I'm saying is only common sense:
The thread pool holds an internal (circular?) queue where the tasks are kept (hence QueueUserWorkItem).
Putting tasks in the queue is thread-safe (this is for sure, as I've used myself in this scenario several times).
I think that each thread loops indefinitely and keeps taking tasks from the queue (in a thread-safe manner of course) automatically when it's done with the current task. If the queue is empty it will just block.
In a queue of delegates
TBH, I don't know for sure but, if it's not, it's dangerous, nearly useless and probably the worst code ever emitted by M$, (even including Windows ME). Just assume it's thread safe.
The work threads are while loops, waiting on the work request queue for a delegate, invoking one when it becomes available, then looping back round again when the the delegate returns to wait on the queue again for another delegate. There is no need for any callback.
I don't know exectly but to my mind it stores it in a collection of
Task
MSDN says yes
GetMaxThreads() returns the amount of onetime-executed threads if
you reach this border all others are queued. As I understand you
need mechanism for knowing when thread is executed. There is
RegisterWaitForSingleObject(WaitHandle, WaitOrTimerCallback, Object, Int32, Boolean)

Twisted: use of multiple threads and processes together

The Twisted documentation led me to believe that it was OK to combine techniques such as reactor.spawnProcess() and threads.deferToThread() in the same application, that the reactor would handle this elegantly under the covers. Upon actually trying it, I found that my application deadlocks. Using multiple threads by themselves, or child processes by themselves, everything is fine.
Looking into the reactor source, I find that the SelectReactor.spawnProcess() method simply calls os.fork() without any consideration for multiple threads that might be running. This explains the deadlocks, because starting with the call to os.fork() you will have two processes with multiple concurrent threads running and doing who knows what with the same file descriptors.
My question for SO is, what is the best strategy for solving this problem?
What I have in mind is to subclass SelectReactor, so that it is a singleton and calls os.fork() only once, immediately when instantiated. The child process will run in the background and act as a server for the parent (using object serialization over pipes to communicate back and forth). The parent continues to run the application and may use threads as desired. Calls to spawnProcess() in the parent will be delegated to the child process, which will be guaranteed to have only one thread running and can therefore call os.fork() safely.
Has anyone done this before? Is there a faster way?
What is the best strategy for solving this problem?
File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.
The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that spawnProcess synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process before spawnProcess returns. A similar challenge will be supporting the childFDs argument, since it will no longer be possible to merely inherit the file descriptors in the child process.
An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call sys.setcheckinterval with a very large number before calling os.fork, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until the os.execvpe takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these with deferToThread isn't very common so maybe that doesn't affect your case.
The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).
First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.
Second, fork() does not create multiple threads in the child process. According to the standard describing fork(),
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread ...
Now, that's not to say that there are no potential multithreading issues with spawnProcess; the standard also says:
... to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called ...
and I don't think there's anything to ensure that only async-signal-safe operations are used.
So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.
Returning to this issue after some time, I found that if I do this:
reactor.callFromThread(reactor.spawnProcess, *spawnargs)
instead of this:
reactor.spawnProcess(*spawnargs)
then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."
I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.
fork() on Linux definitely leaves the child process with only one thread.
I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.

Origin of thread join

Nearly all programming languages that support threading, have a method called join. I understand what a join does, but would like to know what the origin behind the naming of it is? Wouldn't a name such as finish be more appropriate?
I think it comes from the analogy of execution paths. The program's execution path split into two separate paths when the thread was spawned, and now you want the two paths to join back together into a single path again.
Thread A and Thread B did different things and now they are going to kind of reunite because their results have to get exchanged - they will join each other, go on and eventually split up again.
As I understand/interpret it (although correct me if I'm wrong), threads of execution should all contribute towards a single overall task (if there is no interaction between threads, then they might as well be separate processes, after all one of the main points of threading is to overcome the communication barrier between processes). Therefore it seems logical that subtasks branch off from the overall task and then re-join at a later point, rather than running into a dead end. Also, seeing as when a thread is created it is allocated some of its parents resources, even if the thread does not return a value it should still return what it was given in the first place, thus merging or "joining" with the original thread.

Resources