Why does main thread wait for background threads to finish - python-3.x

Given the following code:
from threading import Thread
def dosomething():
for _ in range(100):
print("background thread running")
t = Thread(target=dosomething)
t.start()
The program runs as long as the background thread needs to completely go through the loop. My question is, why does the main thread wait for the background thread to finish and doesn't exit immediately after starting the background thread.

Because that's the way it is. From the threading docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property or the daemon constructor argument.
I assume that you ran something like python my_script.py on the command line and wondered why it doesn't return you to a prompt until the worker thread is done?
Well if you change your code to:
from threading import Thread
def dosomething():
for _ in range(100):
print("background thread running")
t = Thread(target=dosomething, daemon=True)
t.start()
You will find that you are returned to the terminal. However your daemon thread will also die (again from the docs):
Note Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
This is because killing the program means killing the process and since that worker thread is handled by that process it dies - try this article for a fuller explanation on that.

Because that's the way it ought to be!
Back in the days before multi-threading, a program would end when its main routine ended which also, coincidentally, was when its last (and only) thread ended. When multi-threading became a thing, some languages (e.g., Python, Java) generalized the classic behavior to "program ends when last thread ends," while others (e.g., C/C++) generaized it to "program ends when its main thread ends."
Having worked in both worlds, It's my strong opinion that Python and Java got it right. The main thread is special to the extent that it's the first: It's the one that calls the program's designated entry point. But after things get going, there is no good reason why the main thread should be treated any differently from any other thread. That only complicates things---makes the behavior that much more difficult to explain.

Related

Will python wait for all threads to finish before exiting?

I have the below python script:
script.py
from threading import Thread
def func():
while True:
pass
a = Thread(target=func)
a.start()
print('started the thread')
When running it from the terminal (python script.py), it prints started the thread and hangs in there. Coming from the C background I know when the main thread completes the remaining threads are terminated as well. Is that not the case in Python? Will the python wait even without a call to a.join()? Is there a concept of main thread in python or all threads are same?
There is a "main thread" in Python, which is generally the thread that started the Python interpreter. Threads created by the threading module have some layers of behavior on top of native OS threads. One of those behaviors is that when the interpreter exits, part of normal shutdown processing is that the main thread joins each non-daemon threading.Thread - so, yes, Python normally waits for all other threads to end.
Whether daemon threading.Threads are forcefully shut down when Python exits is up to the OS, although I believe all major operating systems now do kill them.
If you want to exit the interpreter regardless of whether non-daemon threading.Threads are running, you need to kill the job, or call os._exit(). Note the leading underscore! It's not recommended, but is there if you need it.

processes only terminate, when threads are terminated?

Processes should only terminate themselves, when all their threads are
terminated!
It's a question in our mock exam, and we aren't sure whether the statement is true or false.
Thanks a lot
First, I need to point out that this exam question contains an incorrect presumption. A running process always has at least one thread. The initial thread, the thread that first calls main or equivalent, isn't special; it's just like every other thread created by pthread_create or equivalent. Once all of the threads within a process have exited, the process can't do anything anymore — there's no way for it to execute even a single additional CPU instruction. In practice, the operating system will terminate the process at that point.
Second, as was pointed out in the comments on the question, the use of "should" makes your exam question ambiguous. It could be read as either "Processes only terminate when all of their threads are terminated" — as a description of how the system works. Or it could be read as "You, the programmer, should write code that ensures that your processes only terminate when all of their threads are terminated" — as a prescription for writing correct code.
If you are specifically talking about POSIX threads ("pthreads"), the answer to the descriptive question is that it depends on how each thread terminates. If all threads terminate by calling pthread_exit or by being cancelled, the process will survive until the last thread terminates, no matter which order they exit in. On the other hand, if any thread calls exit or _exit, or receives a fatal signal, that will immediately terminate the entire process, no matter how many threads are still active. (I am not 100% sure about this, but I think it doesn't matter whether any threads have been detached.)
There's an additional complication, which is that returning from a function passed to pthread_create is equivalent to calling pthread_exit for that thread, but returning from main is equivalent to calling exit. That makes the initial thread a little bit special: unless you specifically end main by calling pthread_exit, the entire process will be terminated when the initial thread exits. But technically this is not a property of the thread itself, but of the code running in that thread.
I do not know the answer to the descriptive question for threads libraries other than POSIX; in particular I don't know the answer for either Windows native threads, or for the threads library added to ISO C in its 2011 revision.
The answer to the prescriptive question is yes with exceptions. You, a programmer, should write programs that, under normal conditions, take care to end their process only when all of their threads have finished their work. (With POSIX threads, this translates to making sure that main does not return until all the other threads have been joined.) However, sometimes you have a few threads that run an infinite loop, without holding any locks or anything, and there's no good way to tell them to exit when everything else is done; as long as exiting the process out from under them won't damage any persistent state, go ahead and exit the process out from under them. (This is the intended use case for detached threads.) Also, it's OK, and often the best choice, to terminate the entire process abruptly if you encounter some kind of unrecoverable error. Those are the only exceptions I can think of off the top of my head.

Confused about threads

I'm studying threads in C and I have this theoretical question in mind that is driving me crazy. Assume the following code:
1) void main() {
2) createThread(...); // create a new thread that does "something"
3) }
After line 2 is executed, two paths of execution are created. However I believe that immediately after line 2 is executed then it doesn't even matter what the new thread does, which was created at line 2, because the original thread that executed line 2 will end the entire program at its next instruction. Am I wrong? is there any chance the original thread gets suspended somehow and the new thread get its chance to do something (assume the code as is, no sync between threads or join operations are performed)
It can work out either way. If you have more than one core, the new thread might get its own core. Even if you don't, the scheduler might give the new thread priority over the existing one. The original thread might exhaust its timeslice right after it creates a new thread.
So that code creates a race condition -- one thread is trying to do work, another thread is trying to terminate the process. Which one wins will depend on the threading implementation, the hardware, and perhaps even some random chance.
If main() finishes before the spawned threads, all those threads will be terminated as there is no main() to support them.
Calling pthread_exit() at the end of main() will block it and keep it alive to support the threads it created until they complete execution.
You can learn more about this here: https://computing.llnl.gov/tutorials/pthreads/
Assuming you are using POSIX pthreads (not clear from your example) then you are right. If you don't want that then indeed pthread_exit from main will mean the program will continue to run until all the threads finish. The "main thread" is special in this regard, as its exit normally causes all threads to terminate.
More typically, you'll do something useful in the main thread after a new thread has been forked. Otherwise, what's the point? So you'll do your own processing, wait on some events, etc. If you want main (or any other thread) to wait for a thread to complete before proceeding, you can call pthread_join() with the handle of the thread of interest.
All of this may be off the point, however since you are not explicitly using POSIX threads in your example, so I don't know if that's pseudo-code for the purpose of example or literal code. In Windows, CreateThread has different semantics from POSIX pthreads. However, you didn't use that capitalization for the call in your example so I don't know if that's what you intended either. Personally I use the pthreads_win32 library even on Windows.

Quit main loop maybe the thread is still running

Hi all~ I have a problem boring me so much.
Sometimes when I exit my program, there are some thread still running, in Linux system, it will cause crash after I quit the main loop. Is there any method that can kill all threads when I quit main loop?
It would help a lot if you specified your programming language and threading library of choice.
The usual way to control this type of situation (that is for a parent thread to wait until children complete before terminating) is to call a function supplied by the library, usually named join or wait.
pthread supplies you with pthread_join, for example.
If you're spawning processes via fork, you should use wait or waitpid in the parent to halt until the child completes - try man waitpid or take a look at this.
This way you can inform your children that you are about to exit via the usual means, wait until they wrap up and terminate, then cleanly exit the main loop.
Does this help? This is the least brutal way of synchronizing termination, if you want to actively kill the children threads there are alternatives, of course (like pthread_kill for pthreads, for example).
If you are using java try using the jconsole (Java Monitoring & Management Console) shipped with jdk6u23 in my case. You can get the thread name that is not killed. You can use join for that thread to complete.
But there can be program issue like, in my case i had a timer thread hanging [Timer-0] java.util.Timer to make an a timer.cancel() which closed that timer.

explicit joining of python threads?

I need to start some threads in a python program. The threads perform a background task which might take a long time, so I don't want to block the main thread waiting on the task to happen.
Python provides the ability to 'reap' threads using Thread.join() and Thread.isAlive(). But I don't actually care about finding out when the thread has finished. I'm content to start up the thread, let it do it's thing and never worry about it again.
The question is, do I need to keep references around to the Thread objects that I start so that I can later join() them? Or can I just let the reference to the Thread object go out of scope and not worry about it? Is there a 'right' thing to do in this case?
You don't have to explicitly join threads -- just make sure they're not "daemonized" (leave their daemon attribute to the default, False) so they'll keep the process alive until they're all done (if you make your threads daemons, then you must make sure the main thread does not terminate until all relevant threads are done, or else the threads will be killed by the OS).
I think the right thing is the simplest one: forget about your "background threads", just make them non-daemons (which is after all their default state).

Resources