How to control thread execution in Python3 - python-3.x

Below python function executes after every 30 seconds, I would like to stop the thread execution after 1 hour (60 minutes) of total time. How can I get this ? Or polling is better than this ?
import time, threading
import datetime
def hold():
print(datetime.datetime.now())
threading.Timer(30, hold).start()
if __name__ == '__main__':
hold()

You could simply make use of the time module to do that in the following way
import time, threading
import datetime
def hold():
start = time.time()
while 1:
print(datetime.datetime.now())
# Sleep for 30 secs
time.sleep(30)
# Check if 1 hr over since the start of thread execution
end = time.time()
# Check if 1 hr passed
if(end - start >= 3600):
print(datetime.datetime.now())
break
if __name__ == '__main__':
# Initiating the thread
thread1 = threading.Thread(target=hold)
thread1.start()
thread1.join()
print("Thread execution complete")

Related

How to exit ThreadPoolExecutor with statement immediately when a future is running

Coming from a .Net background I am trying to understand python multithreading using concurrent.futures.ThreadPoolExecutor and submit. I was trying to add a timeout to some code for a test but have realised I don't exactly understand some elements of what I'm trying to do. I have put some simplified code below. I would expect the method to return after around 5 seconds, when the call to concurrent.futures.wait(futures, return_when=FIRST_COMPLETED) completes. In fact it takes the full 10 seconds. I suspect it has to do with my understanding of the with statement as changing the code to thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=2) results in the behvaiour I would expect. Adding a call to the shutdown method doesn't do anything as all the futures are already running. Is there a way to exit out of the with statement immediately following the call to wait? I have tried using break and return but they have no effect. I am using python 3.10.8
from concurrent.futures import FIRST_COMPLETED
import threading
import concurrent
import time
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures.append(thread_pool.submit(time.sleep, 5))
futures.append(thread_pool.submit(time.sleep, 10))
concurrent.futures.wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print("Current Time =", datetime.now()) # Prints time N + 10
AFAIK, there is no native way to terminate threads from ThreadPoolExecutor and it's supposedly not even a good idea, as described in existing answers (exhibit A, exhibit B).
It is possible to do this with processes in ProcessPoolExecutor, but even then the main process would apparently wait for all the processes that already started:
If wait is False then this method will return immediately and the
resources associated with the executor will be freed when all pending
futures are done executing. Regardless of the value of wait, the
entire Python program will not exit until all pending futures are done
executing.
This means that even though the "End #" would be printed after cca 5 seconds, the script would terminate after cca 20 seconds.
from concurrent.futures import FIRST_COMPLETED, ProcessPoolExecutor, wait
from datetime import datetime
from time import sleep
def multiple_processes():
print("Start #", datetime.now())
set_timeout_on_method()
print("End #", datetime.now())
def set_timeout_on_method():
futures = []
with ProcessPoolExecutor() as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
executor.shutdown(wait=False, cancel_futures=True)
if __name__ == "__main__":
multiple_processes()
With max_workers set to 1, the entire script would take cca 35 seconds because (to my surprise) the last future doesn't get cancelled, despite cancel_futures=True.
You could kill the workers, though. This would make the main process finish without delay:
...
with ProcessPoolExecutor(max_workers=1) as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
subprocesses = [p.pid for p in executor._processes.values()]
executor.shutdown(wait=False, cancel_futures=True)
for pid in subprocesses:
os.kill(pid, signal.SIGTERM)
...
Disclaimer: Please don't take this answer as an advice to whatever you are trying achieve. It's just a brainstorming based on your code.
The problem is that you can not cancel Future if it was already started:
Attempt to cancel the call. If the call is currently being executed or finished running and cannot be cancelled then the method will return False, otherwise the call will be cancelled and the method will return True.
To prove it I made the following changes:
from concurrent.futures import (
FIRST_COMPLETED,
ThreadPoolExecutor,
wait as futures_wait,
)
from time import sleep
from datetime import datetime
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
with ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures = [thread_pool.submit(sleep, t) for t in (2, 10, 2, 100, 100, 100, 100, 100)]
futures_wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print([i.cancel() if not i.done() else "DONE" for i in futures])
print("Current Time =", datetime.now()) # Prints time N + 10
if __name__ == '__main__':
test_multiple_threads()
As you can see only three of tasks are done. ThreadPoolExecutor actually based on threading module and Thread in Python can't be stopped in some conventional way. Check this answer

python3 - Main thread kills child thread after some timeout?

I'm not sure it is doable with thread in python. Basically, I have a function which invokes GDAL library to open an image file. But this can be stuck, so, after 10 seconds, if the file cannot be opened, then it should raise an exception.
import threading
import osgeo.gdal as gdal
def test(filepath):
# After 10 seconds, if the filepath cannot be opened, this function must end and throw exception.
# If the filepath can be opened before 10 seconds, then it return dataset
dataset = gdal.Open(filepath)
return dataset
filepath="http://.../test.tif"
t = threading.Thread(target = test, args = [filepath])
t.start()
# is there something called t.timeout(10)
# and if this thread cannot be finished in 10 seconds, it raises a RuntimeException?
t.join()
I ended up using multiprocessing and Queue from multiprocessing to achieve what I wanted:
import multiprocessing
import time
from multiprocessing import Queue
q = Queue()
TIME_OUT = 5
def worker(x, queue):
time.sleep(15)
a = (1, 5, 6, 7)
queue.put(a)
queue = Queue()
process = multiprocessing.Process(target=worker, args=(5, queue,))
process.start()
# main thread waits child process after TIME_OUT
process.join(TIME_OUT)
if process.is_alive():
print("Process hangs!")
process.terminate()
print("Process finished")
print(queue.qsize())
if queue.qsize() > 0:
a, b, _, d = queue.get()
print(a, b, d)

Increase in python scripts execution time when using ProcessPoolExecutor

I am observing increase in execution time of python script when I trigger parallel instances of it using process pool executor on a 56 core machine. The script abc.py imports a heavy python library which takes around 1 seconds.
time python ~/abc.py
real 0m0.846s
user 0m0.620s
sys 0m0.078s
Test Method
import shlex
from subprocess import Popen, PIPE
def test():
command = "python /u/deeparora/abc.py"
p = Popen(shlex.split(command), stdout=PIPE, stderr=PIPE)
p.wait(timeout=None)
Below code also takes 1 second which is expected
Serial Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=1)
futures = []
for index in range(0, 1):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
However the below code takes 5 seconds to execute on 56 core machine
Parallel Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=50)
futures = []
for index in range(0, 50):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
I checked the execution time in process logs and could notice that now the script (abc.py) execution time has also increased from 1 to 4 seconds. Can somebody help me understand this behavior?
Check Graph Here
I tried to run this. and found interesting results.
When the function given is too simple. Then Function execution time < Multi Pool creation Time. So adding more workers will increases the total time.
To validate this, Check the experiment with sleep(0.001) below.
From Graph, First total time reduces when I increases workers but then after a point, total time begins to increase because cost of creating and closing workers is higher than the calculation time itself.
from concurrent.futures import ProcessPoolExecutor
from time import sleep, time
values = [3,4,5,6] * 200
def cube(x):
sleep(0.001)
return x*x*x
times = []
total_threds = [i for i in range(1, 20)]
for num_tread in range(1, 20):
print(f'Processing thread: {num_tread}')
st_time = time()
with ProcessPoolExecutor(max_workers=num_tread) as exe:
exe.submit(cube,2)
# Maps the method 'cube' with a iterable
result = exe.map(cube,values)
end_time = time()[enter image description here][1]
times.append(end_time - st_time)
plt.plot(total_threds, times)
plt.title('Number of threads vs Time taken to Run')
plt.xlabel('Number of Threads')
plt.ylabel('Time taken in ms')
Check Graph Here

Muti-core parallel computing over a for loop in python-3.x

I have a simple for loop which is to print a number from 1 to 9999 with 5 seconds sleep in between. The code is as below:
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
run()
I want to apply multiprocessing to run the for loop concurrently with multi-cores. So I amended the code above to take 5 cores:
import multiprocessing as mp
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
if __name__ == '__main__':
p = mp.Pool(5)
p.map(run())
p.close()
There is no issue in running the job but it seems like it is not running in parallel with 5 cores. How could I get the code worked as expected?
First, you are running the same 1..9999 loop 5 times, and second, you are executing the run function instead of passing it to the .map() method.
You must prepare your queue before passing it to the Pool instance so that all 5 workers process the same queue:
import multiprocessing as mp
import time
def run(i):
print(i)
time.sleep(5)
if __name__ == '__main__':
length = 10000
queue = range(1, length)
p = mp.Pool(5)
p.map(run, queue)
p.close()
Note that it will process the numbers out of order as explained in the documentation. For example, worker #1 will process 1..500, worker #2 will process 501..1000 etc:
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
If you want to process the numbers more similarly to the single threaded version, set chunksize to 1:
p.map(run, queue, 1)

How to give different names to ThreadPoolExecutor threads in Python

I have the below code for creating threads and running them.
from concurrent.futures import ThreadPoolExecutor
import threading
def task(n):
result = 0
i = 0
for i in range(n):
result = result + i
print("I: {}".format(result))
print(f'Thread : {threading.current_thread()} executed with variable {n}')
def main():
executor = ThreadPoolExecutor(max_workers=3)
task1 = executor.submit(task, (10))
task2 = executor.submit(task, (100))
if __name__ == '__main__':
main()
When i run the code in my windows 10 machine this is the output which gets generated:
I: 45
Thread : <Thread(ThreadPoolExecutor-0_0, started daemon 11956)> executed with variable 10
I: 4950
Thread : <Thread(ThreadPoolExecutor-0_0, started daemon 11956)> executed with variable 100
Process finished with exit code 0
As we see both the threads have the same name. How do i differentiate between them by giving them different names ? Is this somehow a feature of the concurrent.futures class ?
Many thanks for any answers.
from the docs:
New in version 3.6: The thread_name_prefix argument was added to allow users to control the threading.Thread names for worker threads created by the pool for easier debugging.
using the thread_name_prefix argument:
concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')
You say: "both the threads have the same name".
That's not correct! The name is the same because the same thread is used for both tasks: in fact task() exits immediately.
In order to have both threads involved, you have to add some sleep in your task() function.
Just to recap:
(1) without sleep:
from concurrent.futures import ThreadPoolExecutor
import threading
import time
def task(n):
result = 0
i = 0
for i in range(n): result = result + i
print(f'{threading.current_thread().name} with variable {n}: {result}')
executor = ThreadPoolExecutor(max_workers=3)
executor.submit(task, (10))
executor.submit(task, (100))
In this case the output will be:
ThreadPoolExecutor-0_0 with variable 10: 45
ThreadPoolExecutor-0_0 with variable 100: 4950
(2) with a sleep inside task(), to make the function longer in time:
from concurrent.futures import ThreadPoolExecutor
import threading
import time
def task(n):
result = 0
i = 0
for i in range(n): result = result + i
time.sleep(.5)
print(f'{threading.current_thread().name} with variable {n}: {result}')
executor = ThreadPoolExecutor(max_workers=3)
executor.submit(task, (10))
executor.submit(task, (100))
In this case the output will be:
ThreadPoolExecutor-0_0 with variable 10: 45
ThreadPoolExecutor-0_1 with variable 100: 4950

Resources