I am observing increase in execution time of python script when I trigger parallel instances of it using process pool executor on a 56 core machine. The script abc.py imports a heavy python library which takes around 1 seconds.
time python ~/abc.py
real 0m0.846s
user 0m0.620s
sys 0m0.078s
Test Method
import shlex
from subprocess import Popen, PIPE
def test():
command = "python /u/deeparora/abc.py"
p = Popen(shlex.split(command), stdout=PIPE, stderr=PIPE)
p.wait(timeout=None)
Below code also takes 1 second which is expected
Serial Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=1)
futures = []
for index in range(0, 1):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
However the below code takes 5 seconds to execute on 56 core machine
Parallel Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=50)
futures = []
for index in range(0, 50):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
I checked the execution time in process logs and could notice that now the script (abc.py) execution time has also increased from 1 to 4 seconds. Can somebody help me understand this behavior?
Check Graph Here
I tried to run this. and found interesting results.
When the function given is too simple. Then Function execution time < Multi Pool creation Time. So adding more workers will increases the total time.
To validate this, Check the experiment with sleep(0.001) below.
From Graph, First total time reduces when I increases workers but then after a point, total time begins to increase because cost of creating and closing workers is higher than the calculation time itself.
from concurrent.futures import ProcessPoolExecutor
from time import sleep, time
values = [3,4,5,6] * 200
def cube(x):
sleep(0.001)
return x*x*x
times = []
total_threds = [i for i in range(1, 20)]
for num_tread in range(1, 20):
print(f'Processing thread: {num_tread}')
st_time = time()
with ProcessPoolExecutor(max_workers=num_tread) as exe:
exe.submit(cube,2)
# Maps the method 'cube' with a iterable
result = exe.map(cube,values)
end_time = time()[enter image description here][1]
times.append(end_time - st_time)
plt.plot(total_threds, times)
plt.title('Number of threads vs Time taken to Run')
plt.xlabel('Number of Threads')
plt.ylabel('Time taken in ms')
Check Graph Here
Related
Coming from a .Net background I am trying to understand python multithreading using concurrent.futures.ThreadPoolExecutor and submit. I was trying to add a timeout to some code for a test but have realised I don't exactly understand some elements of what I'm trying to do. I have put some simplified code below. I would expect the method to return after around 5 seconds, when the call to concurrent.futures.wait(futures, return_when=FIRST_COMPLETED) completes. In fact it takes the full 10 seconds. I suspect it has to do with my understanding of the with statement as changing the code to thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=2) results in the behvaiour I would expect. Adding a call to the shutdown method doesn't do anything as all the futures are already running. Is there a way to exit out of the with statement immediately following the call to wait? I have tried using break and return but they have no effect. I am using python 3.10.8
from concurrent.futures import FIRST_COMPLETED
import threading
import concurrent
import time
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures.append(thread_pool.submit(time.sleep, 5))
futures.append(thread_pool.submit(time.sleep, 10))
concurrent.futures.wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print("Current Time =", datetime.now()) # Prints time N + 10
AFAIK, there is no native way to terminate threads from ThreadPoolExecutor and it's supposedly not even a good idea, as described in existing answers (exhibit A, exhibit B).
It is possible to do this with processes in ProcessPoolExecutor, but even then the main process would apparently wait for all the processes that already started:
If wait is False then this method will return immediately and the
resources associated with the executor will be freed when all pending
futures are done executing. Regardless of the value of wait, the
entire Python program will not exit until all pending futures are done
executing.
This means that even though the "End #" would be printed after cca 5 seconds, the script would terminate after cca 20 seconds.
from concurrent.futures import FIRST_COMPLETED, ProcessPoolExecutor, wait
from datetime import datetime
from time import sleep
def multiple_processes():
print("Start #", datetime.now())
set_timeout_on_method()
print("End #", datetime.now())
def set_timeout_on_method():
futures = []
with ProcessPoolExecutor() as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
executor.shutdown(wait=False, cancel_futures=True)
if __name__ == "__main__":
multiple_processes()
With max_workers set to 1, the entire script would take cca 35 seconds because (to my surprise) the last future doesn't get cancelled, despite cancel_futures=True.
You could kill the workers, though. This would make the main process finish without delay:
...
with ProcessPoolExecutor(max_workers=1) as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
subprocesses = [p.pid for p in executor._processes.values()]
executor.shutdown(wait=False, cancel_futures=True)
for pid in subprocesses:
os.kill(pid, signal.SIGTERM)
...
Disclaimer: Please don't take this answer as an advice to whatever you are trying achieve. It's just a brainstorming based on your code.
The problem is that you can not cancel Future if it was already started:
Attempt to cancel the call. If the call is currently being executed or finished running and cannot be cancelled then the method will return False, otherwise the call will be cancelled and the method will return True.
To prove it I made the following changes:
from concurrent.futures import (
FIRST_COMPLETED,
ThreadPoolExecutor,
wait as futures_wait,
)
from time import sleep
from datetime import datetime
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
with ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures = [thread_pool.submit(sleep, t) for t in (2, 10, 2, 100, 100, 100, 100, 100)]
futures_wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print([i.cancel() if not i.done() else "DONE" for i in futures])
print("Current Time =", datetime.now()) # Prints time N + 10
if __name__ == '__main__':
test_multiple_threads()
As you can see only three of tasks are done. ThreadPoolExecutor actually based on threading module and Thread in Python can't be stopped in some conventional way. Check this answer
When using tqdm with multithreading, tqdm seems to jump down a line and overwrite what was there when one thread finishes. It seems to snap back once all threads have finished, but I have some long running threads and the progress bars look pretty bad as it is.
I created an example program to be able to replicate the issue. I basically just stripped out all of the business logic and replaced it with sleeps.
from concurrent.futures import ThreadPoolExecutor
from tqdm.auto import tqdm
from time import sleep
from random import randrange
def myf(instance: int, name: str):
rand_size = randrange(75, 150)
total_lines = 0
# Simulate getting file size
# Yes there's probably a better way to get the line count, but this
# was quick and dirty and works well enough. The sleep is just there
# to slow it down for the example
for _ in tqdm(
iterable=range(rand_size),
position=instance,
desc=f'GETTING LINE COUNT: {name}',
leave=False
):
sleep(0.1)
total_lines += 1
# Simulate the processing
for record in tqdm(
iterable=range(rand_size),
total=total_lines,
position=instance,
desc=name
):
sleep(0.2)
def main():
myf_args = []
for i in range(10):
myf_args.append({
'instance': i,
'name': f'Thread-{i}'
})
with ThreadPoolExecutor(max_workers=len(myf_args)) as executor:
executor.map(lambda f: myf(**f), myf_args)
if __name__ == "__main__":
main()
I'm looking for a way to keep the progress bars in place and looking neat as it's running so I can get a good idea of the progress of each thread. When googling the issue, all I can find are people having an issue where it prints a new line every iteration, which isn't really applicable here.
Running the code below, I noticed that executor.submit(printer, i) is called for each value of i in range(100) before even the first process finishes. However, since I have set max_workers=3, only three processes can run at a time. Say the program starts and processes for values zero through two are running; at this moment, what happens to the executor.submit(printer, i) called for values three through ninety-nine? And if the answer is "they're stored in memory", is there a way I can calculate how much memory each pending process might take?
import time
from concurrent.futures import ProcessPoolExecutor
def printer(i):
print(i)
end_time = time.time() + 1
while time.time() < end_time:
pass
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=3) as executor:
for i in range(100):
print(i)
executor.submit(printer, i)
Also, would it be the same if I were to use executor.map(printer, range(100)) instead of the loop?
I am having issue with even the most basic task using mutiprocessing.Tool method.
It seems to be working but never finish the simplest task.
Could you please help what am I doing wrong?
I read some articles, tried to understand it, but could figure it out. I added a short example (with list(map(squared, range(2_000_000))), it works, but not the below.)
Thanks in advance,
Roland
"""
from multiprocessing import Pool
import time
process_pool = Pool(processes = 4)
def squared(n):
return n ** 2
start = time.perf_counter()
process_pool.apply(squared, range(2_000_000))
end = time.perf_counter() - start
print(f"Run time: {end}")
"""
It seems to be a case of multithread..... Have you tried something like:
from concurrent.futures import ThreadPoolExecutor, as_completed
num_of_threads = 50 # Number of threads executing at the same time
with ThreadPoolExecutor(max_workers=num_of_threads) as executor:
tasks = []
for i in i_list:
tasks.append(
executor.submit(
<Function_to_execute>, i
)
)
for future in as_completed(tasks):
if future.result():
yield future.result() # Here can be just a return, yield you return a generator
I think you want imap() (and move squared() before you define the Pool):
from multiprocessing import Pool
import time
def squared(n):
return n ** 2
process_pool = Pool(processes = 4)
start = time.perf_counter()
process_pool.imap(squared, range(2))
end = time.perf_counter() - start
print(f"Run time: {end}")
just keep in mind this is not a very representative example, since you dont do anything with the results; something better would be
with Pool(4) as pool:
results = pool.imap(squared, range(2_000_000))
for result in results:
pass # do something here with the result
I have a simple for loop which is to print a number from 1 to 9999 with 5 seconds sleep in between. The code is as below:
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
run()
I want to apply multiprocessing to run the for loop concurrently with multi-cores. So I amended the code above to take 5 cores:
import multiprocessing as mp
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
if __name__ == '__main__':
p = mp.Pool(5)
p.map(run())
p.close()
There is no issue in running the job but it seems like it is not running in parallel with 5 cores. How could I get the code worked as expected?
First, you are running the same 1..9999 loop 5 times, and second, you are executing the run function instead of passing it to the .map() method.
You must prepare your queue before passing it to the Pool instance so that all 5 workers process the same queue:
import multiprocessing as mp
import time
def run(i):
print(i)
time.sleep(5)
if __name__ == '__main__':
length = 10000
queue = range(1, length)
p = mp.Pool(5)
p.map(run, queue)
p.close()
Note that it will process the numbers out of order as explained in the documentation. For example, worker #1 will process 1..500, worker #2 will process 501..1000 etc:
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
If you want to process the numbers more similarly to the single threaded version, set chunksize to 1:
p.map(run, queue, 1)