What i'm trying to do this run a function on a thread and start a timer and then forcefully join or exit the thread once a certain duration has been reached, now i have been informed that this can be done in linux but i don't know the equivalent to windows so i did this:
#idea is to want to run the function for 7 secs but it will stop at 5 secs
import time
import threading
def func():
for i in range(0,7):#print not yet done 7 times
print("not yet done")
time.sleep(1)
t1=threading.Thread(target=func())
timer=time.perf_counter
while t1.join()==False:#loop while function is not done
if timer>5:#stop function at 5 seconds
print(timer)
print("done")
t1.join()
Related
I am curious why the threads started in a python script are running even when the last statement of the script is executed (which means, the script has completed (I believe)).
I have shared below the code I am talking about. Any insights on this would be helpful:
======================================================================================
import time
import threading
start=time.perf_counter()
def do_something():
print("Waiting for a sec...")
time.sleep(60)
print("Waiting is over!!!")
mid1=time.perf_counter()
t1=threading.Thread(target=do_something)
t2=threading.Thread(target=do_something)
mid2=time.perf_counter()
t1.start()
mid3=time.perf_counter()
t2.start()
finish=time.perf_counter()
print(start,mid1,mid2,mid3,finish)
What output do you see? This is what I see:
Waiting for a sec...
Waiting for a sec...
95783.4201273 95783.4201278 95783.4201527 95783.4217046 95783.4219945
Then it's quiet for a minute, and displays:
Waiting is over!!!
Waiting is over!!!
and then the script ends.
That's all as expected. As part of shutting down, the interpreter waits for all running threads to complete (unless they were created with daemon=True, which you should probably avoid until you know exactly what you're doing). You told your threads to sleep for 60 seconds before finishing, and that's what they did.
In my program I have this utility function for executing commands in shell, here's a simplified version of it:
def run_command(cmd):
s = time.time()
print('starting subprocess')
proc = subprocess.Popen(cmd.split(),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True)
print('subprocess started after ({0}) seconds'.format(time.time() - s))
My program uses this function from different threads to execute commands.
Occasionally the "Popen" line takes around 70 seconds to complete. I mean out of thousands of invocations in a day on different program runs this happens about 4-5 times. As far as I know Popen is non-blocking. What is weird to me is when it does happen it takes the same ~70 seconds to start. It's important to note that while this happen I have 3-4 other threads that are waiting in a loop:
while some_counter > 0:
time.sleep(0.5)
They do so for at most 60 seconds. After they give up and finish their flow I see another ~14 seconds until the "Popen" call finishes. Is there a problem running "Popen" from some threads in parallel to having other threads in a "wait loop"?
Update 1:
I now I see that this problem started after I switched from Fedora27+Python3.6 to Fedora31+python3.7.
I have an array of data to handle and handler that executing long (1-2 minutes) and takes a lot of memory for its calculations.
raw = ['a', 'b', 'c']
def handler():
# do something long
Since handler requires a lot of memory, I want to execute it in separate subprocess and kill it after execution to release memory. Something like the following snippet:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
process.start()
The problem is that such approach leads to immediate running len(raw) processes. And it's not good.
Also, it's not needed to interchange any kind of data between subprocesses. Just run them consequently.
Therefore it would be great to run a few processes at the same time and add a new one once existing finishes.
How could it be implemented (if it's even possible)?
to run your processes sequentially, just join each process within the loop:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
process.start()
process.join()
that way you're sure that only one process is running at the same time (no concurrency)
That's the simplest way. To run more than one process but limit the number of processes running at the same time, you can use a multiprocessing.Pool object and apply_async
I've built a simple example which computes the square of the argument, and simulates an heavy processing:
from multiprocessing import Pool
import time
def target(r):
time.sleep(5)
return(r*r)
raw = [1,2,3,4,5]
if __name__ == '__main__':
with Pool(3) as p: # 3 processes at a time
reslist = [p.apply_async(target, (r,)) for r in raw]
for result in reslist:
print(result.get())
Running this I get:
<5 seconds wait, time to compute the results>
1
4
9
<5 seconds wait, 3 processes max can run at the same time>
16
25
I got the source code from http://www.saltycrane.com/blog/2008/09/simplistic-python-thread-example/ however when I tried to modify the code to my needs the results are not what I wanted.
import time
from threading import Thread
def myfunc():
time.sleep(2)
print("thread working on something")
while 1:
thread = Thread(target=myfunc())
thread.start()
print("looping")
and got the results of
thread working on something
looping
// wait 2 secondd
thread working on something
looping
// wait 2 seconds
thread working on something
looping
// wait 2 seconds and so on
thread working on something
looping
// wait 2 seconds
but then I have to wait 2 seconds before I do anything.
I want to be able to do anything while the thread does something else like checking things in an array and compare them.
In the main loop, you are initialising and starting a new thread an endless number of times. In reality you will have millions of threads running. This of course is not practical and would soon crash the program.
The reason your program does not crash is that the function that is running in the thread is executed and ends in the one pass i.e. you do not have a loop in the thread function to keep the thread alive and working.
Suggestion.
Add a loop to your threading function (myfunc) that will continue to run indefinitely in the background.
Initialise and call the thread function outside of the loop in your main section. In this way you will create only 1 thread that will run its own loop in the background. You could of course run a number of these same threads in the background if you called it more than once.
Now create a loop in your main body, and continue with your array checking or any other task that you want to run whilst the threading function continues to run in the background.
Something like this may help
import time
from threading import Thread
def myfunc():
counter = 0
while 1>0:
print "The thread counter is at ", counter
counter += 1
time.sleep (2)
thread = Thread(target=myfunc)
thread.start()
# The thread has now initialised and is running in the background
mCounter = 0
while 1:
print "Main loop counter = ", mCounter
mCounter += 1
time.sleep (5)
In this example, the thread will print a line every 2 seconds, and the main loop will print a line every 5 seconds.
Be careful to close your thread down. In some cases, a keyboard interrupt will stop the main loop, but the thread will keep on running.
I hope this helps.
I need to make some network calls for data in my program. I intend to call them in parallel but not all of them need to complete.
What i have right now is
thread1 = makeNetworkCallThread()
thread1.start()
thread2 = makeLongerNetworkCallThread()
thread2.start()
thread1.join()
foo = thread1.getData()
thread2.join()
if conditionOn(foo):
foo = thread2.getData()
# continue with code
the problem with this is that even if the shorter network call succeeded, I need to wait for the time it takes for the longer network call to complete
What will happen if I move the thread2.join() inside the if statement? The join method might never get called. Will that cause some problems with stale threads etc?
thread2 will still continue to run (subject to the caveats of the GIL, but since it is a network call that's probably not a concern) whether join is called or not. The difference is whether the main context waits for the thread to end before going on to do other things - if you're able to continue processing without that longer network call completing, then there should be no issues.
Do keep in mind that the program will not actually end (the interpreter will not exit) until all threads have been completed. Depending on the latency of this long network call to the run time of the rest of your program (in the event you don't wait), it might appear that the program reaches its end but doesn't actually exit until the network call wraps up. Consider this silly example:
# Python 2.7
import threading
import time
import logging
def wasteTime(sec):
logging.info('Thread to waste %d seconds started' % sec)
time.sleep(sec)
logging.info('Thread to waste %d seconds ended' % sec)
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
t1 = threading.Thread(target=wasteTime, args=(2,))
t2 = threading.Thread(target=wasteTime, args=(10,))
t1.start()
t2.start()
t1.join()
logging.info('Main context done')
This is the logging output:
$ time python test.py
2015-01-15 09:32:12,239 Thread to waste 2 seconds started
2015-01-15 09:32:12,239 Thread to waste 10 seconds started
2015-01-15 09:32:14,240 Thread to waste 2 seconds ended
2015-01-15 09:32:14,241 Main context done
2015-01-15 09:32:22,240 Thread to waste 10 seconds ended
real 0m10.026s
user 0m0.015s
sys 0m0.010s
Note that although the main context reached its end after 2 seconds (the amount of time it took for thread1 to complete), the program doesn't completely exit until thread2 is completed (ten seconds after start of execution). In situations like this (particularly if the output is being logged as such), it's my opinion that it is better to explicitly call join at some point and explicitly identify in your logs that this is what the program is doing so that it doesn't look to the user/operator like it has hung. For my silly example, that might look like adding lines like this to the end of the main context:
logging.info('Waiting for thread 2 to complete')
t2.join()
Which will generate somewhat less mysterious log output:
$ time python test.py
2015-01-15 09:39:18,979 Thread to waste 2 seconds started
2015-01-15 09:39:18,979 Thread to waste 10 seconds started
2015-01-15 09:39:20,980 Thread to waste 2 seconds ended
2015-01-15 09:39:20,980 Main context done
2015-01-15 09:39:20,980 Waiting for thread 2 to complete
2015-01-15 09:39:28,980 Thread to waste 10 seconds ended
real 0m10.027s
user 0m0.015s
sys 0m0.010s