What happens to threads when I dont explicitly call the join method?

What happens to threads when I dont explicitly call the join method? - multithreading

I need to make some network calls for data in my program. I intend to call them in parallel but not all of them need to complete.
What i have right now is
thread1 = makeNetworkCallThread()
thread1.start()
thread2 = makeLongerNetworkCallThread()
thread2.start()
thread1.join()
foo = thread1.getData()
thread2.join()
if conditionOn(foo):
foo = thread2.getData()
# continue with code
the problem with this is that even if the shorter network call succeeded, I need to wait for the time it takes for the longer network call to complete
What will happen if I move the thread2.join() inside the if statement? The join method might never get called. Will that cause some problems with stale threads etc?

thread2 will still continue to run (subject to the caveats of the GIL, but since it is a network call that's probably not a concern) whether join is called or not. The difference is whether the main context waits for the thread to end before going on to do other things - if you're able to continue processing without that longer network call completing, then there should be no issues.
Do keep in mind that the program will not actually end (the interpreter will not exit) until all threads have been completed. Depending on the latency of this long network call to the run time of the rest of your program (in the event you don't wait), it might appear that the program reaches its end but doesn't actually exit until the network call wraps up. Consider this silly example:
# Python 2.7
import threading
import time
import logging
def wasteTime(sec):
logging.info('Thread to waste %d seconds started' % sec)
time.sleep(sec)
logging.info('Thread to waste %d seconds ended' % sec)
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
t1 = threading.Thread(target=wasteTime, args=(2,))
t2 = threading.Thread(target=wasteTime, args=(10,))
t1.start()
t2.start()
t1.join()
logging.info('Main context done')
This is the logging output:
$ time python test.py
2015-01-15 09:32:12,239 Thread to waste 2 seconds started
2015-01-15 09:32:12,239 Thread to waste 10 seconds started
2015-01-15 09:32:14,240 Thread to waste 2 seconds ended
2015-01-15 09:32:14,241 Main context done
2015-01-15 09:32:22,240 Thread to waste 10 seconds ended
real 0m10.026s
user 0m0.015s
sys 0m0.010s
Note that although the main context reached its end after 2 seconds (the amount of time it took for thread1 to complete), the program doesn't completely exit until thread2 is completed (ten seconds after start of execution). In situations like this (particularly if the output is being logged as such), it's my opinion that it is better to explicitly call join at some point and explicitly identify in your logs that this is what the program is doing so that it doesn't look to the user/operator like it has hung. For my silly example, that might look like adding lines like this to the end of the main context:
logging.info('Waiting for thread 2 to complete')
t2.join()
Which will generate somewhat less mysterious log output:
$ time python test.py
2015-01-15 09:39:18,979 Thread to waste 2 seconds started
2015-01-15 09:39:18,979 Thread to waste 10 seconds started
2015-01-15 09:39:20,980 Thread to waste 2 seconds ended
2015-01-15 09:39:20,980 Main context done
2015-01-15 09:39:20,980 Waiting for thread 2 to complete
2015-01-15 09:39:28,980 Thread to waste 10 seconds ended
real 0m10.027s
user 0m0.015s
sys 0m0.010s

Related

Why does the threads run even when the python script has finished its execution

I am curious why the threads started in a python script are running even when the last statement of the script is executed (which means, the script has completed (I believe)).
I have shared below the code I am talking about. Any insights on this would be helpful:
======================================================================================
import time
import threading
start=time.perf_counter()
def do_something():
print("Waiting for a sec...")
time.sleep(60)
print("Waiting is over!!!")
mid1=time.perf_counter()
t1=threading.Thread(target=do_something)
t2=threading.Thread(target=do_something)
mid2=time.perf_counter()
t1.start()
mid3=time.perf_counter()
t2.start()
finish=time.perf_counter()
print(start,mid1,mid2,mid3,finish)

What output do you see? This is what I see:
Waiting for a sec...
Waiting for a sec...
95783.4201273 95783.4201278 95783.4201527 95783.4217046 95783.4219945
Then it's quiet for a minute, and displays:
Waiting is over!!!
Waiting is over!!!
and then the script ends.
That's all as expected. As part of shutting down, the interpreter waits for all running threads to complete (unless they were created with daemon=True, which you should probably avoid until you know exactly what you're doing). You told your threads to sleep for 60 seconds before finishing, and that's what they did.

subprocess.Popen hangs for ~70 seconds using Python3?

In my program I have this utility function for executing commands in shell, here's a simplified version of it:
def run_command(cmd):
s = time.time()
print('starting subprocess')
proc = subprocess.Popen(cmd.split(),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True)
print('subprocess started after ({0}) seconds'.format(time.time() - s))
My program uses this function from different threads to execute commands.
Occasionally the "Popen" line takes around 70 seconds to complete. I mean out of thousands of invocations in a day on different program runs this happens about 4-5 times. As far as I know Popen is non-blocking. What is weird to me is when it does happen it takes the same ~70 seconds to start. It's important to note that while this happen I have 3-4 other threads that are waiting in a loop:
while some_counter > 0:
time.sleep(0.5)
They do so for at most 60 seconds. After they give up and finish their flow I see another ~14 seconds until the "Popen" call finishes. Is there a problem running "Popen" from some threads in parallel to having other threads in a "wait loop"?
Update 1:
I now I see that this problem started after I switched from Fedora27+Python3.6 to Fedora31+python3.7.

Can I do work in another thread while waiting for subprocess Popen

I have a Python 3.7 project
It is using a library which uses subprocess Popen to call out to a shell script.
I am wondering: if were to put the library calls in a separate thread, would I be able to do work in the main thread while waiting for the result from Popen in the other thread?
There is an answer here https://stackoverflow.com/a/33352871/202168 which says:
The way Python threads work with the GIL is with a simple counter.
With every 100 byte codes executed the GIL is supposed to be released
by the thread currently executing in order to give other threads a
chance to execute code. This behavior is essentially broken in Python
2.7 because of the thread release/acquire mechanism. It has been fixed in Python 3.
Either way does not sound particularly hopeful for what I want to do. It sounds like if the "library calls" thread has not hit the 100 bytecode trigger point when the call to Popen.wait is made then probably it will not switch to my other thread and the whole app will wait for the subprocess?
Maybe this info is wrong however.
Here is another answer https://stackoverflow.com/a/16262657/202168 which says:
...the interpreter can always release the GIL; it will give it to some
other thread after it has interpreted enough instructions, or
automatically if it does some I/O. Note that since recent Python 3.x,
the criteria is no longer based on the number of executed
instructions, but on whether enough time has elapsed.
This sounds more hopeful, since presumably communicating with the subprocess would involve I/O and might therefore allow a context switch for my main thread to be able to proceed in the meantime. (or perhaps just elapsed time waiting on the wait would cause a context switch)
I am aware of https://docs.python.org/3/library/asyncio-subprocess.html which explicitly solves this problem, but I am calling a 3rd-party library which just uses plain subprocess.Popen.
Can anyone confirm if the "subprocess calls in a separate thread" idea is likely to be useful to me, in Python 3.7 specifically?

I had time to make an experiment, so I will answer my own question...
I set up two files:
mainthread.py
#!/usr/bin/env python
import subprocess
import threading
import time
def run_busyproc():
print(f'{time.time()} Starting busyprocess...')
subprocess.run(["python", "busyprocess.py"])
print(f'{time.time()} busyprocess done.')
if __name__ == "__main__":
thread = threading.Thread(target=run_busyproc)
print("Starting thread...")
thread.start()
while thread.is_alive():
print(f"{time.time()} Main thread doing its thing...")
time.sleep(0.5)
print("Thread is done (?)")
print("Exit main.")
and busyprocess.py:
#!/usr/bin/env python
from time import sleep
if __name__ == "__main__":
for _ in range(100):
print("Busy...")
sleep(0.5)
print("Done")
Running mainthread.py from the command-line I can see that there is the context-switch that you would hope to see - main thread is able to do work while waiting on the result of the subprocess:
Starting thread...
1555970578.20475 Main thread doing its thing...
1555970578.204679 Starting busyprocess...
Busy...
1555970578.710308 Main thread doing its thing...
Busy...
1555970579.2153869 Main thread doing its thing...
Busy...
1555970579.718168 Main thread doing its thing...
Busy...
1555970580.2231748 Main thread doing its thing...
Busy...
1555970580.726122 Main thread doing its thing...
Busy...
1555970628.009814 Main thread doing its thing...
Done
1555970628.512945 Main thread doing its thing...
1555970628.518155 busyprocess done.
Thread is done (?)
Exit main.
Good news everybody, python threading works :)

Python Counter too slow

I just wanted to code a little timer on my work pc. Funny thing is, the counter is too slow, meaning it runs longer than it should. I am really confused. The delay grows the smaller the intervals of updating become. Is my pc too slow? The CPU is around 30% while running this... idk.
python3.6.3
import time
def timer(sec):
start = sec
print(sec)
while sec > 0:
sec = sec-0.1 #the smaller this value, the slower
time.sleep(0.1)
print(round(sec,2))
print("Done! {} Seconds passed.".format(start))
start = time.time() #For Testing
timer(10)
print(time.time()-start)

Sleeping you process require a system call (a call to the kernel, which triggers an hardware interruption to give hand to that kernel), and a hardware clock interruption to wake up the process once it's done. Sleeping may not be a lot of CPU computations, but waiting for the hardware interruption and the kernel to task the processes can take multiple CPU cycles.
Rather than waiting for a constant unit of time, I suggest you to wait for the time required to hit the next milestone (by getting the current time, rounding it to the next step and getting the difference)

Try this way, you can use normal operators on time.time()
import time
start = time.time()
seconds = 5
while True:
if start - time.time() > seconds:
print(seconds + " elapsed.")

How do I do multithreading in python?

I got the source code from http://www.saltycrane.com/blog/2008/09/simplistic-python-thread-example/ however when I tried to modify the code to my needs the results are not what I wanted.
import time
from threading import Thread
def myfunc():
time.sleep(2)
print("thread working on something")
while 1:
thread = Thread(target=myfunc())
thread.start()
print("looping")
and got the results of
thread working on something
looping
// wait 2 secondd
thread working on something
looping
// wait 2 seconds
thread working on something
looping
// wait 2 seconds and so on
thread working on something
looping
// wait 2 seconds
but then I have to wait 2 seconds before I do anything.
I want to be able to do anything while the thread does something else like checking things in an array and compare them.

In the main loop, you are initialising and starting a new thread an endless number of times. In reality you will have millions of threads running. This of course is not practical and would soon crash the program.
The reason your program does not crash is that the function that is running in the thread is executed and ends in the one pass i.e. you do not have a loop in the thread function to keep the thread alive and working.
Suggestion.
Add a loop to your threading function (myfunc) that will continue to run indefinitely in the background.
Initialise and call the thread function outside of the loop in your main section. In this way you will create only 1 thread that will run its own loop in the background. You could of course run a number of these same threads in the background if you called it more than once.
Now create a loop in your main body, and continue with your array checking or any other task that you want to run whilst the threading function continues to run in the background.
Something like this may help
import time
from threading import Thread
def myfunc():
counter = 0
while 1>0:
print "The thread counter is at ", counter
counter += 1
time.sleep (2)
thread = Thread(target=myfunc)
thread.start()
# The thread has now initialised and is running in the background
mCounter = 0
while 1:
print "Main loop counter = ", mCounter
mCounter += 1
time.sleep (5)
In this example, the thread will print a line every 2 seconds, and the main loop will print a line every 5 seconds.
Be careful to close your thread down. In some cases, a keyboard interrupt will stop the main loop, but the thread will keep on running.
I hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string