Why is waiting for a thread to terminate called "joining"?

Why is waiting for a thread to terminate called "joining"? - multithreading

Be it Java, C#, C++, posix threads, and so on. All threading libraries name the method that waits for a thread to terminate join. Why is this the case?
The wording sounds counterintuitive to me, because "joining" someone in doing some task means doing the task together/helping the person with the task. It does not mean "sleep until the other person did the task, then wake up and go on".
So where does the concept of "joining" originate from?

Related

processes only terminate, when threads are terminated?

Processes should only terminate themselves, when all their threads are
terminated!
It's a question in our mock exam, and we aren't sure whether the statement is true or false.
Thanks a lot

First, I need to point out that this exam question contains an incorrect presumption. A running process always has at least one thread. The initial thread, the thread that first calls main or equivalent, isn't special; it's just like every other thread created by pthread_create or equivalent. Once all of the threads within a process have exited, the process can't do anything anymore — there's no way for it to execute even a single additional CPU instruction. In practice, the operating system will terminate the process at that point.
Second, as was pointed out in the comments on the question, the use of "should" makes your exam question ambiguous. It could be read as either "Processes only terminate when all of their threads are terminated" — as a description of how the system works. Or it could be read as "You, the programmer, should write code that ensures that your processes only terminate when all of their threads are terminated" — as a prescription for writing correct code.
If you are specifically talking about POSIX threads ("pthreads"), the answer to the descriptive question is that it depends on how each thread terminates. If all threads terminate by calling pthread_exit or by being cancelled, the process will survive until the last thread terminates, no matter which order they exit in. On the other hand, if any thread calls exit or _exit, or receives a fatal signal, that will immediately terminate the entire process, no matter how many threads are still active. (I am not 100% sure about this, but I think it doesn't matter whether any threads have been detached.)
There's an additional complication, which is that returning from a function passed to pthread_create is equivalent to calling pthread_exit for that thread, but returning from main is equivalent to calling exit. That makes the initial thread a little bit special: unless you specifically end main by calling pthread_exit, the entire process will be terminated when the initial thread exits. But technically this is not a property of the thread itself, but of the code running in that thread.
I do not know the answer to the descriptive question for threads libraries other than POSIX; in particular I don't know the answer for either Windows native threads, or for the threads library added to ISO C in its 2011 revision.
The answer to the prescriptive question is yes with exceptions. You, a programmer, should write programs that, under normal conditions, take care to end their process only when all of their threads have finished their work. (With POSIX threads, this translates to making sure that main does not return until all the other threads have been joined.) However, sometimes you have a few threads that run an infinite loop, without holding any locks or anything, and there's no good way to tell them to exit when everything else is done; as long as exiting the process out from under them won't damage any persistent state, go ahead and exit the process out from under them. (This is the intended use case for detached threads.) Also, it's OK, and often the best choice, to terminate the entire process abruptly if you encounter some kind of unrecoverable error. Those are the only exceptions I can think of off the top of my head.

QProcess, QEventLoop - of any use for parallel-processing

I wonder whether I could use QEventLoop (QProcess?) to parallelize multiple calls to same function with Qt. What is precisely the difference with QtConcurrent or QThread? What is a process and an event loop more precisely? I read that QCoreApplication must exec() as early as possible in main() method, so that I wonder why it is different from main Thread.
could you point as some efficient reference to processes and thread with Qt? I came through the official doc and those things remain unclear.
Thanks and regards.

Process and thread are not Qt-specific concepts. You can search for "process vs. thread" anywhere for that distinction to be explained. For instance: What resources are shared between threads?
Though related concepts, spawning a new process is a more "heavyweight" form of parallelism than spawning a new thread within your existing process. Processes are protected from each other by default, while threads of execution within a process can read and write each other's memory directly. The protection you get from spawning processes comes at a greater run-time cost...and since independent processes can't read each other's memory, you have to share data between them using methods of inter-process communication.
Odds are that you want threads, because they're simpler to use in a case where one is writing all the code in a program. Given all the complexities in multithreaded programming, I'd suggest looking at a good book or reading some websites to start with. See: What are some good resources for learning threaded programming?
But if you want to dive in and just get a feel for how threading in Qt looks, you can spend time looking at the examples:
http://qt-project.org/doc/qt-4.8/examples-threadandconcurrent.html
QtConcurrent is an abstraction library that makes it easier to implement some kinds of parallel programming patterns. It's built on top of the QThread abstractions, and there's nothing it can do that you couldn't code yourself by writing to QThread directly. But it might make your code easier to write and less prone to errors.
As for an event loop...that is merely a generic term for how any given thread of execution in your program waits for work items to process, processes them, and can decide when it is no longer needed. If a thread's job were merely to start up, do some math, and exit...then it wouldn't need an event loop. But starting and stopping a thread takes time and churns resources. So typically threads live for longer periods of time, and have an event loop that knows how to wait for events it needs to respond to.
If you build on top of QtConcurrent, you won't have to worry about an event loop in your worker threads because they are managed automatically in a thread pool. The word count example is pretty simple to see:
http://qt-project.org/doc/qt-4.8/qtconcurrent-wordcount-main-cpp.html

scala actors vs threads and blocking IO

As I understand it, actors are basically lightweight threads implemented on top of threads, running many actors on a small pool of shared threads.
Given that's the case, using blocking operations in an actor blocks the underlying thread. This is not a correctness problem because the actor library will spawn more threads as necessary (is that right?) but then you end up with lots and lots of threads, negating the benefit of using actors in the first place.
Given that, how do actors work when you need to do such IO operations? Are there operations which "actor-block", suspending the actor while letting the thread go on to other operations (much as blocking operations suspend the thread while letting the CPU go on to other operations), or is everything written in CPS, with chained actors? Or are actors simply not a good fit for this sort of long-running operation?
Background: I have experience writing multithreaded stuff the classic way, and understand prettywell how CPS/event loops work, but have absolutely no experience working with actors, and just want to understand, on a high level, how they fit in, before I dive into the code.

This is not a correctness problem because the actor library will spawn more threads as necessary (is that right?)
So far as I understand, that is not right. The actor is blocked, and sending another message to it causes that message to sit in the actors mailbox until that actor can receive it or react to the message.
In Programming in Scala (1), it explicitly states that actors should not block. If an actor needs to do something long running it should pass the work to a second actor, so that the main actor can free itself up and go read more messages from its mailbox. Once the worker has completed the work, it can signal that fact back to the main actor, which can finish doing whatever it has to do.
Since workers too have mailboxes, you will end up with several workers busily working their way through the work. If you don't have enough CPU to handle that, their queues will just get bigger and bigger. Eventually you can scale out by using remote actors. Akka might be more useful in such cases.
(1) Chapter 32.5 of Programming in Scala (Odersky, Second edition, 2010)
EDIT: I found this:
The scheduler method of the Actor trait can be overridden to return a ResizableThreadPoolScheduler, which resizes its thread pool to avoid starvation caused by actors that invoke arbitrary blocking methods.
Found it at: http://www.scala-lang.org/api/current/scala/actors/Actor.html
So, that means depending on the scheduler impl you set, perhaps the pool used to run the actors will be increased. I was wrong when I said you were wrong :-) The rest of the answer still holds true.

How thing work in erlang is that all blocking operation should be don by send message because when your actor is blocked waiting for a message it's yielding the thread to other actor.
So if you want to do some blocking operation like reading from a file you should do a FileReader actor that use Non-bloking api to read and write from file. And have your other actor use this actor (send and receive message to it) as an api to read and write to file.

Multi Threading

How I can determine which thread is waiting for more time?
My requirement is, in a synchronized methods, when one thread finishes its work, I want to allow the thread which is waiting for the longest time. I hope my question make sense.

All depends on which language and/or environment you are using. So far as I know there's no intrinsic support for this in Java, if multiple threads are waiting to enter a synchronized method then the system will pick an arbitrary one to run when entry is possible.
If instead you use Java's wait() / notify() then you control which threads are notified and so can build your own priority mechanism, for example you could have a simple queue to which each thread adds itself before its wait() then you just take the top item from the queue and notify that thread.

You should not and almost certainly do not need to do this.
The threading environment will schedule threads for you.
If the software design is such that this appears to be a problem, then the design is incorrect for a pre-emptive threading environment.
What you may want to be doing is something more like managing and prioritizing units of work, where you for example service work in the order that it arrives.
In other words, the order of work processing should not in your design depend on which thread runs, but rather, on your design of how work is handed out to threads.

#djna Java doesn't let you choose which thread to notify. If 10 threads are in the queue any one of them can be notified.
This can be done by using the lock/condition interfaces in concurrent package.
Here you can associate each of these threads with a condition and then take out an item from that queue and signal the condition that is mapped with that thread/task.

Why was the method java.lang.Thread.join() named like that?

Does anybody know why the method join() member of a java.lang.Thread was named like that? Its javadoc is:
Waits for this thread to die.
When join is called on some thread calling thread is waiting for the other to die and continue execution. Supposedly calling thread will die as well, but still it's not clear why the author used this name.

It's a common name in threading - it's not like Java was the first to use it. (For example, that's what pthreads uses too.)
I guess you could imagine it like two people taking a walk - you join the other one and walk with them until you've finished, before going back to what you were doing. That sort of analogy may have been the original reason, although I agree it's not exactly intuitive.

It's named this way because you're basically stating that the calling thread of execution is going to wait to join the given state of execution. It's also named join in posix and many other threading packages.
After that call to join returns (unless it was interrupted), the two threads of execution are basically running together from that point (with that thread getting the return value of the now-terminated thread).

This stems from concurrent software modeling when the flow of control splits into to concurrent threads. Later, the two threads of execution will join again.
Also waitToDie() was probably a) too long and b) too morbid.

well... this isnt really correct but I thought of an "waiting room" (it actually isnt a queue with a certain scheduling as FIFO, HRRN or such).
when a thread cannot go on and needs to wait on some other thread to finish it just joins the guys (aka threads) in the waiting room to get active next...

Because you are waiting for another thread of execution (i.e. the one you're calling join on) to join (i.e. die) to the current (i.e. the calling) thread.
The calling thread does not die: it simply waits for the other thread to do so.

This is a terminology that is widely used(outside Java as well). I take it as sort of Associating a Thread with another one in some way. I think Thread.Associate() could have been a better option but Join() isn't bad either.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string