When using tqdm with multithreading, tqdm seems to jump down a line and overwrite what was there when one thread finishes. It seems to snap back once all threads have finished, but I have some long running threads and the progress bars look pretty bad as it is.
I created an example program to be able to replicate the issue. I basically just stripped out all of the business logic and replaced it with sleeps.
from concurrent.futures import ThreadPoolExecutor
from tqdm.auto import tqdm
from time import sleep
from random import randrange
def myf(instance: int, name: str):
rand_size = randrange(75, 150)
total_lines = 0
# Simulate getting file size
# Yes there's probably a better way to get the line count, but this
# was quick and dirty and works well enough. The sleep is just there
# to slow it down for the example
for _ in tqdm(
iterable=range(rand_size),
position=instance,
desc=f'GETTING LINE COUNT: {name}',
leave=False
):
sleep(0.1)
total_lines += 1
# Simulate the processing
for record in tqdm(
iterable=range(rand_size),
total=total_lines,
position=instance,
desc=name
):
sleep(0.2)
def main():
myf_args = []
for i in range(10):
myf_args.append({
'instance': i,
'name': f'Thread-{i}'
})
with ThreadPoolExecutor(max_workers=len(myf_args)) as executor:
executor.map(lambda f: myf(**f), myf_args)
if __name__ == "__main__":
main()
I'm looking for a way to keep the progress bars in place and looking neat as it's running so I can get a good idea of the progress of each thread. When googling the issue, all I can find are people having an issue where it prints a new line every iteration, which isn't really applicable here.
Related
Running the code below, I noticed that executor.submit(printer, i) is called for each value of i in range(100) before even the first process finishes. However, since I have set max_workers=3, only three processes can run at a time. Say the program starts and processes for values zero through two are running; at this moment, what happens to the executor.submit(printer, i) called for values three through ninety-nine? And if the answer is "they're stored in memory", is there a way I can calculate how much memory each pending process might take?
import time
from concurrent.futures import ProcessPoolExecutor
def printer(i):
print(i)
end_time = time.time() + 1
while time.time() < end_time:
pass
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=3) as executor:
for i in range(100):
print(i)
executor.submit(printer, i)
Also, would it be the same if I were to use executor.map(printer, range(100)) instead of the loop?
I am observing increase in execution time of python script when I trigger parallel instances of it using process pool executor on a 56 core machine. The script abc.py imports a heavy python library which takes around 1 seconds.
time python ~/abc.py
real 0m0.846s
user 0m0.620s
sys 0m0.078s
Test Method
import shlex
from subprocess import Popen, PIPE
def test():
command = "python /u/deeparora/abc.py"
p = Popen(shlex.split(command), stdout=PIPE, stderr=PIPE)
p.wait(timeout=None)
Below code also takes 1 second which is expected
Serial Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=1)
futures = []
for index in range(0, 1):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
However the below code takes 5 seconds to execute on 56 core machine
Parallel Execution
import concurrent.futures
pool = ProcessPoolExecutor(max_workers=50)
futures = []
for index in range(0, 50):
futures.append(pool.submit(test))
for future in concurrent.futures.as_completed(futures):
pass
I checked the execution time in process logs and could notice that now the script (abc.py) execution time has also increased from 1 to 4 seconds. Can somebody help me understand this behavior?
Check Graph Here
I tried to run this. and found interesting results.
When the function given is too simple. Then Function execution time < Multi Pool creation Time. So adding more workers will increases the total time.
To validate this, Check the experiment with sleep(0.001) below.
From Graph, First total time reduces when I increases workers but then after a point, total time begins to increase because cost of creating and closing workers is higher than the calculation time itself.
from concurrent.futures import ProcessPoolExecutor
from time import sleep, time
values = [3,4,5,6] * 200
def cube(x):
sleep(0.001)
return x*x*x
times = []
total_threds = [i for i in range(1, 20)]
for num_tread in range(1, 20):
print(f'Processing thread: {num_tread}')
st_time = time()
with ProcessPoolExecutor(max_workers=num_tread) as exe:
exe.submit(cube,2)
# Maps the method 'cube' with a iterable
result = exe.map(cube,values)
end_time = time()[enter image description here][1]
times.append(end_time - st_time)
plt.plot(total_threds, times)
plt.title('Number of threads vs Time taken to Run')
plt.xlabel('Number of Threads')
plt.ylabel('Time taken in ms')
Check Graph Here
I am trying to use multiprocessing for the below code. The code seems to run a bit faster than the for loop inside the function.
How can I confirm I using the library and not the just the for loop?
from multiprocessing import Pool
from multiprocessing import cpu_count
import requests
import pandas as pd
data= pd.read_csv('~/Downloads/50kNAE000.txt.1' ,sep="\t", header=None)
data = data[0].str.strip("0 ")
lst = []
def request(x):
for i,v in x.items():
print(i)
file = requests.get(v)
lst.append(file.text)
#time.sleep(1)
if __name__ == "__main__":
pool = Pool(cpu_count())
results = pool.map(request(data))
pool.close() # 'TERM'
pool.join() # 'KILL'
Multiprocessing has overhead. It has to start the process and transfer function data via interprocess mechanism. Just running a single function in another process vs. running that same function normally is always going to be slower. The advantage is actually doing parallelism with significant work in the functions that makes the overhead minimal.
You can call multiprocessing.current_process().name to see the process name change.
I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))
I have a function that yields lines from a huge CSV file lazily:
def get_next_line():
with open(sample_csv,'r') as f:
for line in f:
yield line
def do_long_operation(row):
print('Do some operation that takes a long time')
I need to use threads such that each record I get from the above function I can call do_long_operation.
Most places on Internet have examples like this, and I am not very sure if I am on the right path.
import threading
thread_list = []
for i in range(8):
t = threading.Thread(target=do_long_operation, args=(get_next_row from get_next_line))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
My questions are:
How do I start only a finite number of threads, say 8?
How do I make sure that each of the threads will get a row from get_next_line?
You could use a thread pool from multiprocessing and map your tasks to a pool of workers:
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing import Pool
from random import randint
from time import sleep
def process_line(l):
print l, "started"
sleep(randint(0, 3))
print l, "done"
def get_next_line():
with open("sample.csv", 'r') as f:
for line in f:
yield line
f = get_next_line()
t = Pool(processes=8)
for i in f:
t.map(process_line, (i,))
t.close()
t.join()
This will create eight workers and submit your lines to them, one by one. As soon as a process is "free", it will be allocated a new task.
There is a commented out import statement, too. If you comment out the ThreadPool and import Pool from multiprocessing instead, you will get subprocesses instead of threads, which may be more efficient in your case.
Using a Pool/ThreadPool from multiprocessing to map tasks to a pool of workers and a Queue to control how many tasks are held in memory (so we don't read too far ahead into the huge CSV file if worker processes are slow):
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing import Pool
from random import randint
import time, os
from multiprocessing import Queue
def process_line(l):
print("{} started".format(l))
time.sleep(randint(0, 3))
print("{} done".format(l))
def get_next_line():
with open(sample_csv, 'r') as f:
for line in f:
yield line
# use for testing
# def get_next_line():
# for i in range(100):
# print('yielding {}'.format(i))
# yield i
def worker_main(queue):
print("{} working".format(os.getpid()))
while True:
# Get item from queue, block until one is available
item = queue.get(True)
if item == None:
# Shutdown this worker and requeue the item so other workers can shutdown as well
queue.put(None)
break
else:
# Process item
process_line(item)
print("{} done working".format(os.getpid()))
f = get_next_line()
# Use a multiprocessing queue with maxsize
q = Queue(maxsize=5)
# Start workers to process queue items
t = Pool(processes=8, initializer=worker_main, initargs=(q,))
# Enqueue items. This blocks if the queue is full.
for l in f:
q.put(l)
# Enqueue the shutdown message (i.e. None)
q.put(None)
# We need to first close the pool before joining
t.close()
t.join()
Hannu's answer is not the best method.
I ran the code on a 100M rows CSV file. It took me forever to perform the operation.
However, prior to reading his answer, I had written the following code:
def call_processing_rows_pickably(row):
process_row(row)
import csv
from multiprocessing import Pool
import time
import datetime
def process_row(row):
row_to_be_printed = str(row)+str("hola!")
print(row_to_be_printed)
class process_csv():
def __init__(self, file_name):
self.file_name = file_name
def get_row_count(self):
with open(self.file_name) as f:
for i, l in enumerate(f):
pass
self.row_count = i
def select_chunk_size(self):
if(self.row_count>10000000):
self.chunk_size = 100000
return
if(self.row_count>5000000):
self.chunk_size = 50000
return
self.chunk_size = 10000
return
def process_rows(self):
list_de_rows = []
count = 0
with open(self.file_name, 'rb') as file:
reader = csv.reader(file)
for row in reader:
print(count+1)
list_de_rows.append(row)
if(len(list_de_rows) == self.chunk_size):
p.map(call_processing_rows_pickably, list_de_rows)
del list_de_rows[:]
def start_process(self):
self.get_row_count()
self.select_chunk_size()
self.process_rows()
initial = datetime.datetime.now()
p = Pool(4)
ob = process_csv("100M_primes.csv")
ob.start_process()
final = datetime.datetime.now()
print(final-initial)
This took 22 minutes. Obviously, I need to have more improvements. For example, the Fred library in R takes 10 minutes maximum to do this task.
The difference is: I am creating a chunk of 100k rows first, and then I pass it to a function which is mapped by threadpool(here, 4 threads).