python speedup a simple function - python-3.x

I try to find a simple way to "speed up" simple functions for a big script so I googled for it and found 3 ways to do that.
but it seems the time they need is always the same.
so what I am doing wrong testing them?
file1:
from concurrent.futures import ThreadPoolExecutor as PoolExecutor
from threading import Thread
import time
import os
import math
#https://dev.to/rhymes/how-to-make-python-code-concurrent-with-3-lines-of-code-2fpe
def benchmark():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
with PoolExecutor(max_workers=3) as executor:
for _ in executor.map((benchmark())):
pass
file2:
#the basic way
from threading import Thread
import time
import os
import math
def calc():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
calc()
file3:
import asyncio
import uvloop
import time
import math
#https://github.com/magicstack/uvloop
async def main():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
uvloop.install()
asyncio.run(main())
every file needs about 180-200 sec
so i 'can't see' a difference.

I googled for it and found 3 ways to [speed up a function], but it seems the time they need is always the same. so what I am doing wrong testing them?
You seemed to have found strategies to speed up some code by parallelizing it, but you failed to implement them correctly. First, the speedup is supposed to come from running multiple instances of the function in parallel, and the code snippets make no attempt to do that. Then, there are other problems.
In the first example, you pass the result benchmark() to executor.map, which means all of benchmark() is immediately executed to completion, thus effectively disabling parallelization. (Also, executor.map is supposed to receive an iterable, not None, and this code must have printed a traceback not shown in the question.) The correct way would be something like:
# run the benchmark 5 times in parallel - if that takes less
# than 5x of a single benchmark, you've got a speedup
with ThreadPoolExecutor(max_workers=5) as executor:
for _ in range(5):
executor.submit(benchmark)
For this to actually produce a speedup, you should try to use ProcessPoolExecutor, which runs its tasks in separate processes and is therefore unaffected by the GIL.
The second code snippet never actually creates or runs a thread, it just executes the function in the main thread, so it's unclear how that's supposed to speed things up.
The last snippet doesn't await anything, so the async def works just like an ordinary function. Note that asyncio is an async framework based on switching between tasks blocked on IO, and as such can never speed CPU-bound calculations.

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Launching parallel tasks: Subprocess output triggers function asynchronously

The example I will describe here is purely conceptual so I'm not interested in solving this actual problem.
What I need to accomplish is to be able to asynchronously run a function based on a continuous output of a subprocess command, in this case, the windows ping yahoo.com -t command and based on the time value from the replies I want to trigger the startme function. Now inside this function, there will be some more processing done, including some database and/or network-related calls so basically I/O processing.
My best bet would be that I should use Threading but for some reason, I can't get this to work as intended. Here is what I have tried so far:
First of all I tried the old way of using Threads like this:
import subprocess
import re
import asyncio
import time
import threading
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
if myproc.poll() is not None:
break
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
# startme(int(mytime[0]))
p1 = threading.Thread(target=startme(mytime), daemon=True)
# p1 = threading.Thread(target=startme(mytime)) # tried with and without the daemon
p1.start()
# p1.join()
except:
pass
main()
But right after startme() fire for the first time, the pings stop showing and they are waiting for the startme.time.sleep() to finish.
I did manage to get this working using the concurrent.futures's ThreadPoolExecutor but when tried to replace the time.sleep() with the actual database query I found out that my startme() function will never complete so no Mytime xxx finished! message is ever shown nor any database entry is being made.
import sqlite3
import subprocess
import re
import asyncio
import time
# import threading
# import multiprocessing
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute(
'''CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, u1, u2, u3, u4)''')
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
# time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
c.execute("INSERT INTO mytable VALUES (null, ?, ?, ?, ?)",(1,2,3,mytime))
conn.commit()
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
print(f"The time {mytime} is low enought to call startme()" )
executor = ThreadPoolExecutor()
# executor = ProcessPoolExecutor() # I did tried using process even if it's not a CPU-related issue
executor.submit(startme, mytime)
except:
pass
main()
I did try using asyncio but I soon realized this is not the case but I'm wondering if I should try aiosqlite
I also thought about using asyncio.create_subprocess_shell and run both as parallel subprocesses but can't think of a way to wait for a certain string from the ping command that would trigger the second script.
Please note that I don't really need a return from the startme() function and the ping command example is conceptually derived from the mitmproxy's mitmdump output command.
The first code wasn't working as I did a stupid mistake when creating the thread so p1 = threading.Thread(target=startme(mytime)) does not take the function with its arguments but separately like this p1 = threading.Thread(target=startme, args=(mytime,))
The reason why I could not get the SQL insert statement to work in my second code was this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 10688 and this is thread id 17964
that I didn't saw until I wrapped my SQL statement into a try/except and captured the error. So I needed to make the SQL database connection inside my startme() function
The other asyncio stuff was just nonsense and cannot be applied to the current issue here.

Multithreading in Python within a for loop

Let's say I have a program in Python which looks like this:
import time
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
for i in range(10):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
What I am trying to do here is some sort of multithreading, so that the contents of my main for loop continues to execute without having to wait for the delay caused by time.sleep(5) in the delayed function.
Ideally, the piece of code that I am working upon looks something like below. I get a message from some API endpoint which I want to send to a particular telegram channel in real-time(paid subscribers), but I also want to send it to another channel by delaying it exactly 10 minutes or 600 seconds since they're free members. The problem I am facing is, I want to keep sending the message in real-time to my paid subscribers and kind of create a new thread/process for the delayed message which runs independently of the main while loop.
def send_message_realtime(my_realtime_message):
telegram.send(my_realtime_message)
def send_message_delayed(my_realtime_message):
time.sleep(600)
telegram.send(my_realtime_message)
while True:
my_realtime_message = api.get()
send_message_realtime(my_realtime_message)
send_message_delayed(my_realtime_message)
I think something like ThreadPoolExecutor does what you are look for:
import time
from concurrent.futures.thread import ThreadPoolExecutor
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
def work_to_do(i):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
with ThreadPoolExecutor(max_workers=4) as executor:
for i in range(10):
executor.submit(work_to_do, i)
The max_workers would be the number of parallel messages that could have at a given moment in time.
Instead of a multithread solution, you can also use a multiprocessing solution, for instance
from multiprocessing import Pool
...
with Pool(4) as p:
print(p.map(work_to_do, range(10)))

How to set timeout for a block of code which is not a function python3

After spending a lot of hours looking for a solution in stackoverflow, I did not find a good solution to set a timeout for a block of code. There are approximations to set a timeout for a function. Nevertheless, I would like to know how to set a timeout without having a function. Let's take the following code as an example:
print("Doing different things")
for i in range(0,10)
# Doing some heavy stuff
print("Done. Continue with the following code")
So, How would you break the for loop if it has not finished after x seconds? Just continue with the code (maybe saving some bool variables to know that timeout was reached), despite the fact that the for loop did not finish properly.
i think implement this efficiently without using functions not possible , look this code ..
import datetime as dt
print("Doing different things")
# store
time_out_after = dt.timedelta(seconds=60)
start_time = dt.datetime.now()
for i in range(10):
if dt.datetime.now() > time_started + time_out:
break
else:
# Doing some heavy stuff
print("Done. Continue with the following code")
the problem : the timeout will checked in the beginning of every loop cycle, so it may be take more than the specified timeout period to break of the loop, or in worst case it maybe not interrupt the loop ever becouse it can't interrupt the code that never finish un iteration.
update :
as op replayed, that he want more efficient way, this is a proper way to do it, but using functions.
import asyncio
async def test_func():
print('doing thing here , it will take long time')
await asyncio.sleep(3600) # this will emulate heaven task with actual Sleep for one hour
return 'yay!' # this will not executed as the timeout will occur early
async def main():
# Wait for at most 1 second
try:
result = await asyncio.wait_for(test_func(), timeout=1.0) # call your function with specific timeout
# do something with the result
except asyncio.TimeoutError:
# when time out happen program will break from the test function and execute code here
print('timeout!')
print('lets continue to do other things')
asyncio.run(main())
Expected output:
doing thing here , it will take long time
timeout!
lets continue to do other things
note:
now timeout will happen after exactly the time you specify. in this example code, after one second.
you would replace this line:
await asyncio.sleep(3600)
with your actual task code.
try it and let me know what do you think. thank you.
read asyncio docs:
link
update 24/2/2019
as op noted that asyncio.run introduced in python 3.7 and asked for altrnative on python 3.6
asyncio.run alternative for python older than 3.7:
replace
asyncio.run(main())
with this code for older version (i think 3.4 to 3.6)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
You may try the following way:
import time
start = time.time()
for val in range(10):
# some heavy stuff
time.sleep(.5)
if time.time() - start > 3: # 3 is timeout in seconds
print('loop stopped at', val)
break # stop the loop, or sys.exit() to stop the script
else:
print('successfully completed')
I guess it is kinda viable approach. Actual timeout is greater than 3 seconds and depends on the single step execution time.

Threading/Async in Requests-html

I have a large number of links I need to scrape from a website. I have ~70 base links and from them over 700 links that need to be scraped from those starting 70. So in order to speed up this process, takes about 2-3 hours without threading/async, I decided to try and use a thread/async.
My problem is that I need to render some javascript in order to get the links in the first place. I have been using requests-html to do this as its html.render() method is very reliable. However, when I try and run this using threading or async I run into a host of problems. I tried AsyncHTMLSession due to this Github PR but have been unable to get it to work. I was wondering if anyone had any ideas or links they could point me too that might help.
Here is some example code:
from multiprocessing.pool import ThreadPool
from requests_html import AsyncHTMLSession
links = (tuple of links)
n = 5
batch = [links[i:i+n] for i in range(0, len(links), n)]
def link_processor(batch_link):
session = AsyncHTMLSession()
results = []
for l in batch_link:
print(l)
r = session.get(l)
r.html.arender()
tmp_next = r.html.xpath('//a[contains(#href, "/matches/")]')
return tmp_next
pool = ThreadPool(processes=2)
output = pool.map(link_processor, batch)
pool.close()
pool.join()
print(output)
Output:
RuntimeError: There is no current event loop in thread 'Thread-1'.
Was able to fix this with some help from the learnpython subreddit. Turns out requests-html probably uses threads in some way and so threading the threads has an issue so simply using multiprocessing pool works.
FIXED CODE:
from multiprocessing import Pool
from requests_html import HTMLSession
.....
pool = Pool(processes=3)
output = pool.map(link_processor, batch[:2])
pool.close()
pool.join()
print(output)

Resources