Launching parallel tasks: Subprocess output triggers function asynchronously - python-3.x

The example I will describe here is purely conceptual so I'm not interested in solving this actual problem.
What I need to accomplish is to be able to asynchronously run a function based on a continuous output of a subprocess command, in this case, the windows ping yahoo.com -t command and based on the time value from the replies I want to trigger the startme function. Now inside this function, there will be some more processing done, including some database and/or network-related calls so basically I/O processing.
My best bet would be that I should use Threading but for some reason, I can't get this to work as intended. Here is what I have tried so far:
First of all I tried the old way of using Threads like this:
import subprocess
import re
import asyncio
import time
import threading
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
if myproc.poll() is not None:
break
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
# startme(int(mytime[0]))
p1 = threading.Thread(target=startme(mytime), daemon=True)
# p1 = threading.Thread(target=startme(mytime)) # tried with and without the daemon
p1.start()
# p1.join()
except:
pass
main()
But right after startme() fire for the first time, the pings stop showing and they are waiting for the startme.time.sleep() to finish.
I did manage to get this working using the concurrent.futures's ThreadPoolExecutor but when tried to replace the time.sleep() with the actual database query I found out that my startme() function will never complete so no Mytime xxx finished! message is ever shown nor any database entry is being made.
import sqlite3
import subprocess
import re
import asyncio
import time
# import threading
# import multiprocessing
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute(
'''CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, u1, u2, u3, u4)''')
def startme(mytime: int):
print(f"Mytime {mytime} was started!")
# time.sleep(mytime) ## including more long operation functions here such as database calls and even some time.sleep() - if possible
c.execute("INSERT INTO mytable VALUES (null, ?, ?, ?, ?)",(1,2,3,mytime))
conn.commit()
print(f"Mytime {mytime} finished!")
myproc = subprocess.Popen(['ping', 'yahoo.com', '-t'], shell=True, stdout=subprocess.PIPE)
def main():
while True:
output = myproc.stdout.readline()
myoutput = output.strip().decode(encoding="UTF-8")
print(myoutput)
mytime = re.findall("(?<=time\=)(.*)(?=ms\s)", myoutput)
try:
mytime = int(mytime[0])
if mytime < 197:
print(f"The time {mytime} is low enought to call startme()" )
executor = ThreadPoolExecutor()
# executor = ProcessPoolExecutor() # I did tried using process even if it's not a CPU-related issue
executor.submit(startme, mytime)
except:
pass
main()
I did try using asyncio but I soon realized this is not the case but I'm wondering if I should try aiosqlite
I also thought about using asyncio.create_subprocess_shell and run both as parallel subprocesses but can't think of a way to wait for a certain string from the ping command that would trigger the second script.
Please note that I don't really need a return from the startme() function and the ping command example is conceptually derived from the mitmproxy's mitmdump output command.

The first code wasn't working as I did a stupid mistake when creating the thread so p1 = threading.Thread(target=startme(mytime)) does not take the function with its arguments but separately like this p1 = threading.Thread(target=startme, args=(mytime,))
The reason why I could not get the SQL insert statement to work in my second code was this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 10688 and this is thread id 17964
that I didn't saw until I wrapped my SQL statement into a try/except and captured the error. So I needed to make the SQL database connection inside my startme() function
The other asyncio stuff was just nonsense and cannot be applied to the current issue here.

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Multithreading in Python within a for loop

Let's say I have a program in Python which looks like this:
import time
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
for i in range(10):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
What I am trying to do here is some sort of multithreading, so that the contents of my main for loop continues to execute without having to wait for the delay caused by time.sleep(5) in the delayed function.
Ideally, the piece of code that I am working upon looks something like below. I get a message from some API endpoint which I want to send to a particular telegram channel in real-time(paid subscribers), but I also want to send it to another channel by delaying it exactly 10 minutes or 600 seconds since they're free members. The problem I am facing is, I want to keep sending the message in real-time to my paid subscribers and kind of create a new thread/process for the delayed message which runs independently of the main while loop.
def send_message_realtime(my_realtime_message):
telegram.send(my_realtime_message)
def send_message_delayed(my_realtime_message):
time.sleep(600)
telegram.send(my_realtime_message)
while True:
my_realtime_message = api.get()
send_message_realtime(my_realtime_message)
send_message_delayed(my_realtime_message)
I think something like ThreadPoolExecutor does what you are look for:
import time
from concurrent.futures.thread import ThreadPoolExecutor
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
def work_to_do(i):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
with ThreadPoolExecutor(max_workers=4) as executor:
for i in range(10):
executor.submit(work_to_do, i)
The max_workers would be the number of parallel messages that could have at a given moment in time.
Instead of a multithread solution, you can also use a multiprocessing solution, for instance
from multiprocessing import Pool
...
with Pool(4) as p:
print(p.map(work_to_do, range(10)))

python speedup a simple function

I try to find a simple way to "speed up" simple functions for a big script so I googled for it and found 3 ways to do that.
but it seems the time they need is always the same.
so what I am doing wrong testing them?
file1:
from concurrent.futures import ThreadPoolExecutor as PoolExecutor
from threading import Thread
import time
import os
import math
#https://dev.to/rhymes/how-to-make-python-code-concurrent-with-3-lines-of-code-2fpe
def benchmark():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
with PoolExecutor(max_workers=3) as executor:
for _ in executor.map((benchmark())):
pass
file2:
#the basic way
from threading import Thread
import time
import os
import math
def calc():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
calc()
file3:
import asyncio
import uvloop
import time
import math
#https://github.com/magicstack/uvloop
async def main():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
uvloop.install()
asyncio.run(main())
every file needs about 180-200 sec
so i 'can't see' a difference.
I googled for it and found 3 ways to [speed up a function], but it seems the time they need is always the same. so what I am doing wrong testing them?
You seemed to have found strategies to speed up some code by parallelizing it, but you failed to implement them correctly. First, the speedup is supposed to come from running multiple instances of the function in parallel, and the code snippets make no attempt to do that. Then, there are other problems.
In the first example, you pass the result benchmark() to executor.map, which means all of benchmark() is immediately executed to completion, thus effectively disabling parallelization. (Also, executor.map is supposed to receive an iterable, not None, and this code must have printed a traceback not shown in the question.) The correct way would be something like:
# run the benchmark 5 times in parallel - if that takes less
# than 5x of a single benchmark, you've got a speedup
with ThreadPoolExecutor(max_workers=5) as executor:
for _ in range(5):
executor.submit(benchmark)
For this to actually produce a speedup, you should try to use ProcessPoolExecutor, which runs its tasks in separate processes and is therefore unaffected by the GIL.
The second code snippet never actually creates or runs a thread, it just executes the function in the main thread, so it's unclear how that's supposed to speed things up.
The last snippet doesn't await anything, so the async def works just like an ordinary function. Note that asyncio is an async framework based on switching between tasks blocked on IO, and as such can never speed CPU-bound calculations.

Running pytest with thread

I have a question for pytest
I would like to run same pytest script with multiple threads.
But,i am not sure how to create and run thread which is passing more than one param. (And running thread with pytest..)
for example I have
test_web.py
from selenium import webdriver
import pytest
class SAMPLETEST:
self.browser = webdriver.Chrome()
self.browser.get(URL)
self.browser.maximize_window()
def test_title(self):
assert "Project WEB" in self.browser.title
def test_login(self):
print('Testing Login')
ID_BOX = self.broswer.find_element_by_id("ProjectemployeeId")
PW_BOX = self.broswer.find_element_by_id("projectpassword")
ID_BOX.send_keys(self.ID) # this place for ID. This param come from thread_run.py
PW_BOX.send_keys(self.PW) # this place for PW. It is not working. I am not sure how to get this data from threa_run.py
PW_BOX.submit()
IN thread_run.py
import threading
import time
from test_web import SAMPLETEST
ID_List = ["0","1","2","3","4","5","6","7"]
PW_LIST = ["0","1","2","3","4","5","6","7"]
threads = []
print("1: Create thread")
for I in range(8):
print("Append thread" + str(I))
t = threading.Thread(target=SAMPLETEST, args=(ID_List[I], PW_LIST[I]))
threads.append(t)
for I in range(8):
print("Start thread:" + str(I))
threads[I].start()
i was able to run thread to run many SAMPLETEST class without pytest.
However, it is not working with pytest.
My question is.
First, how to initialize self.brower in insde of SAMPLETEST? I am sure below codes will not be working
self.browser = webdriver.Chrome()
self.browser.get(URL)
self.browser.maximize_window()
Second, in thread_run.py, how can i pass the two arguments(ID and Password) when I run thread to call SAMPLTEST on test_web.py?
ID_BOX.send_keys(self.ID) # this place for ID. This param come from thread_run.py
ID_BOX.send_keys(self.ID)
PW_BOX.send_keys(self.PW)
I was trying to build constructor (init) in SAMPLETEST class but it wasn't working...
I am not really sure how to run threads (which passing arguments or parameter ) with pytest.
There are 2 scenarios which i could read from this:
prepare the test data and pass in parameters to your test method which could be achieved by pytest-generate-tests and parameterise concept. You can refer to the documentation here
In case of running pytest in multi threading - Pytest-xdist or pytest-parallel
I had a similar issue and got it resolved by passing an argument in form of a list.
Like:,
I replaced below line
thread_1 = Thread(target=fun1, args=10)
with
thread_1 = Thread(target=fun1, args=[10])

hourly action resetter with threading

this is some code I have written based off basic knowledge of threading I acquired. What I am trying to do is limit the actions hourly so it doesn't spam.
import datetime
import time
from threading import Thread
the_Minute = str(datetime.datetime.now().time())[3:5]
action_Limit = 0
def updater():
global the_Minute
while True:
the_Minute=str(datetime.datetime.now().time())[3:5]
#updates the minute value
def resetter():
global action_Limit
global the_Minute
while True:
if the_Minute=='00':
action_Limit=0
time.sleep(30)
def performer():
global action_Limit
global the_Minute
while the_Minute!='00' and action_Limit<100:
#perform actions here
action_Limit+=1
print(action_Limit)
time.sleep(1)
updater_Thread = Thread(target=updater)
resetter_Thread = Thread(target=resetter)
performer_Thread = Thread(target=performer)
updater_Thread.start
performer_Thread.start
resetter_Thread.start
When I run this, nothing happens, but I also don't receive any errors. Could anyone tell me how I could make it work, or point me to some resources to help? Thank you for your time.

Resources