How to make my python code faster by using multi threading? - multithreading

This is the only function in my program I am calling, and the argument 'apps' I am passing is directory which contains 5000 json files on which I will be doing some arithmetic and It take me around 25mins to process all the files.
Now I want to do same thing using multi threading to make it faster.
Can someone please help me with this.
def directory(apps):
for files in os.listdir(apps):
files_path = os.path.join(apps, files)
with open(files_path, 'r') as data_file:
#Doing some stuff
try:
time.sleep(1)
start_time = time.time()
directory(application)
print "===== %s seconds =====" % (time.time() - start_time)
except KeyboardInterrupt:
print '\nInterrupted'
print "===== %s seconds =====" % (time.time() - start_time)

Related

How to exit ThreadPoolExecutor with statement immediately when a future is running

Coming from a .Net background I am trying to understand python multithreading using concurrent.futures.ThreadPoolExecutor and submit. I was trying to add a timeout to some code for a test but have realised I don't exactly understand some elements of what I'm trying to do. I have put some simplified code below. I would expect the method to return after around 5 seconds, when the call to concurrent.futures.wait(futures, return_when=FIRST_COMPLETED) completes. In fact it takes the full 10 seconds. I suspect it has to do with my understanding of the with statement as changing the code to thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=2) results in the behvaiour I would expect. Adding a call to the shutdown method doesn't do anything as all the futures are already running. Is there a way to exit out of the with statement immediately following the call to wait? I have tried using break and return but they have no effect. I am using python 3.10.8
from concurrent.futures import FIRST_COMPLETED
import threading
import concurrent
import time
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures.append(thread_pool.submit(time.sleep, 5))
futures.append(thread_pool.submit(time.sleep, 10))
concurrent.futures.wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print("Current Time =", datetime.now()) # Prints time N + 10
AFAIK, there is no native way to terminate threads from ThreadPoolExecutor and it's supposedly not even a good idea, as described in existing answers (exhibit A, exhibit B).
It is possible to do this with processes in ProcessPoolExecutor, but even then the main process would apparently wait for all the processes that already started:
If wait is False then this method will return immediately and the
resources associated with the executor will be freed when all pending
futures are done executing. Regardless of the value of wait, the
entire Python program will not exit until all pending futures are done
executing.
This means that even though the "End #" would be printed after cca 5 seconds, the script would terminate after cca 20 seconds.
from concurrent.futures import FIRST_COMPLETED, ProcessPoolExecutor, wait
from datetime import datetime
from time import sleep
def multiple_processes():
print("Start #", datetime.now())
set_timeout_on_method()
print("End #", datetime.now())
def set_timeout_on_method():
futures = []
with ProcessPoolExecutor() as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
executor.shutdown(wait=False, cancel_futures=True)
if __name__ == "__main__":
multiple_processes()
With max_workers set to 1, the entire script would take cca 35 seconds because (to my surprise) the last future doesn't get cancelled, despite cancel_futures=True.
You could kill the workers, though. This would make the main process finish without delay:
...
with ProcessPoolExecutor(max_workers=1) as executor:
futures.append(executor.submit(sleep, 5))
futures.append(executor.submit(sleep, 10))
futures.append(executor.submit(sleep, 20))
print("Futures created #", datetime.now())
if wait(futures, return_when=FIRST_COMPLETED):
print("Shortest future completed #", datetime.now())
subprocesses = [p.pid for p in executor._processes.values()]
executor.shutdown(wait=False, cancel_futures=True)
for pid in subprocesses:
os.kill(pid, signal.SIGTERM)
...
Disclaimer: Please don't take this answer as an advice to whatever you are trying achieve. It's just a brainstorming based on your code.
The problem is that you can not cancel Future if it was already started:
Attempt to cancel the call. If the call is currently being executed or finished running and cannot be cancelled then the method will return False, otherwise the call will be cancelled and the method will return True.
To prove it I made the following changes:
from concurrent.futures import (
FIRST_COMPLETED,
ThreadPoolExecutor,
wait as futures_wait,
)
from time import sleep
from datetime import datetime
def test_multiple_threads():
set_timeout_on_method()
print("Current Time =", datetime.now()) # Prints time N + 10
def set_timeout_on_method():
with ThreadPoolExecutor(max_workers=2) as thread_pool:
print("Current Time =", datetime.now()) # Prints time N
futures = [thread_pool.submit(sleep, t) for t in (2, 10, 2, 100, 100, 100, 100, 100)]
futures_wait(futures, return_when=FIRST_COMPLETED)
print("Current Time =", datetime.now()) # Prints time N + 5
print([i.cancel() if not i.done() else "DONE" for i in futures])
print("Current Time =", datetime.now()) # Prints time N + 10
if __name__ == '__main__':
test_multiple_threads()
As you can see only three of tasks are done. ThreadPoolExecutor actually based on threading module and Thread in Python can't be stopped in some conventional way. Check this answer

ProgressBar same timing as a function

I want the progress bar to be the same timing with a function, starts when it starts and stop when it is finished
I tried using while loops but it didn't work out.
def clean(path):
for path in paths:
try:
rmtree(path)
except OSError:
pass
while clean is True:
for i in range(1):
sys.stdout.write("%s" % (" " * toolbar_width))
sys.stdout.flush()
sys.stdout.write("\b" * (toolbar_width + 1))
for i in range(toolbar_width):
sleep(0.1) # do real work here
# update the bar
sys.stdout.write("█████")
sys.stdout.flush()
sys.stdout.write("\nDone Cleaning\n")
I expect the progress bar to print it out according to the clean() function
when it is running it will run and when it is done it stops.
You are missing a ":" after your "True" statement. Your code is otherwise correct. Just make sure you check over your work! It should look like:
def clean(path):
for path in paths:
try:
rmtree(path)
except OSError:
pass
while clean is True:
for i in range(1):
sys.stdout.write("%s" % (" " * toolbar_width))
sys.stdout.flush()
sys.stdout.write("\b" * (toolbar_width + 1))
for i in range(toolbar_width):
sleep(0.1) # do real work here
# update the bar
sys.stdout.write("█████")
sys.stdout.flush()
sys.stdout.write("\nDone Cleaning\n")

why doesn't time passes the thread in python

I've created a code so at a certian time it has to display a message. I tried it with a counter and it worked, but when using "strftime" it doesn't work. Can any one explain what I did wrong.
Below you'll fine the code I used.
import datetime, threading, time
now = datetime.datetime.now()
def foo():
counter = 0
next_call = time.time()
while True:
#counter +=1
if(now.strftime("%H:%M:%S")>=("22:00:00")):
print("go to sleep")
time.sleep(30)
else:
print (datetime.datetime.now())
next_call = next_call+1;
time.sleep(next_call - time.time())
timerThread = threading.Thread(target=foo)
timerThread.start()
You never change the value of 'now'. Conseqeuently it is the fixed time at which you start running this program.
You need to update it inside the loop.

Track execution time in python

I want to know how long my code takes to run and have tried using the following code which was found on a different thread.
import time
start_time = time.time()
# my code
main()
print("--- %s seconds ---" % (time.time() - start_time))
but it does not seem to print the time it takes. I've included pictures of my code since the code is fairly long. Is there something I'm doing wrong?
You might have a bug in your code, because this works:
def dosomething():
counter = 10
while counter > 0:
counter -= 1
print ('getting ready to check the elapsed time')
def main():
start_time = time.time()
dosomething()
print("--- %s seconds ---" % (time.time() - start_time))
if __name__ == "__main__":
main()

How emulate termial iteraction like putty using pySerial

I tried to use pySerial to build a simple terminal to interact COM1 in my PC,
I create 2 threads , one is for READ , the other is for Write
however,
def write_comport():
global ser,cmd, log_file
print("enter Q to quit")
while True:
cmd = raw_input(">>:")
# print(repr(var))
if cmd=='Q':
ser.close()
sys.exit()
log_file.close()
else:
ser.write(cmd + '\r\n')
write_to_file("[Input]"+cmd, log_file)
time.sleep(1)
pass
def read_comport():
global ser, cmd, log_file
while True:
element = ser.readline().strip('\n')
if "~ #" in str(element):
continue
if cmd == str(element).strip():
continue
if "img" in str(element):
print("got:"+element)
beep()
print element
write_to_file(cmd, log_file)
pass
def write_to_file(str,f):
f.write(str)
f.flush
def main():
try:
global read_thr,write_thr
beep()
port_num='COM4'
baudrate=115200
init_serial_port(port_num,baudrate)
read_thr =Thread(target=read_comport)
read_thr.start()
write_thr =Thread(target=write_comport)
write_thr.start()
while True:
pass
except Exception as e:
print(e)
exit_prog()
but the behavior of my code is not as smart as putty or anyother.
cause my function can Not detect whenever the reader is done.
is there any better way to achieve this goal?
By the way, I tried to save the log into txt file real-time . But when I open the file during the process is running, it seems nothing write to my text log file?

Resources