Subprocess Tornado capture exit status - python-3.x

Scenario: First, I need to update status in db to 'pending' and at the same time, return the status to user. Then subprocess will be running in the background and it will take 30 seconds as I have put time.sleep(30) in dummy.py. After that, I have to update status in db to 'completed'. I am trying to make non-blocking functions using tornado.
My Question: I have captured if Subprocess is finished by using yield. If yield result is 0, I assume that Subprocess has completed. I know something is not right with my logic. How do I capture if Subprocess(Tornado) has finished in correct way?
My current code is:
class MainHandler(tornado.web.RequestHandler):
#coroutine
def get(self, id):
print ("TORNADO ALERT")
self.write("Pending")
#If ID in DB, UPDATE DB
#Update Status to Pending
self.flush()
res =yield self._work()
self.write(res)
#coroutine
def _work(self):
p = Subprocess(['python', 'dummy.py'])
f = Future()
p.set_exit_callback(f.set_result)
h = yield f
print (">>> ",h)
if h == 0:
print("DB Updated")
#Update Status to Completed
raise Return(" Completed ")
My imports are as follows:
from tornado.concurrent import Future
from tornado.process import Subprocess

Use Subprocess.wait_for_exit() (which returns a Future) instead of Subprocess.set_exit_callback(). This can then be used in a coroutine with
async def f():
p = Subprocess(cmd)
await p.wait_for_exit()

Related

Using asyncio to wait for results from subprocess

My Python script contains a loop that uses subprocess to run commands outside the script. Each subprocess is independent. I listen for the returned message in case there's an error; I can't ignore the result of the subprocess. Here's the script without asyncio (I've replaced my computationally expensive call with sleep):
from subprocess import PIPE # https://docs.python.org/3/library/subprocess.html
import subprocess
def go_do_something(index: int) -> None:
"""
This function takes a long time
Nothing is returned
Each instance is independent
"""
process = subprocess.run(["sleep","2"],stdout=PIPE,stderr=PIPE,timeout=20)
stdout = process.stdout.decode("utf-8")
stderr = process.stderr.decode("utf-8")
if "error" in stderr:
print("error for "+str(index))
return
def my_long_func(val: int) -> None:
"""
This function contains a loop
Each iteration of the loop calls a function
Nothing is returned
"""
for index in range(val):
print("index = "+str(index))
go_do_something(index)
# run the script
my_long_func(3) # launch three tasks
I think I could use asyncio to speed up this activity since the Python script is waiting on the external subprocess to complete. I think threading or multiprocessing are not necessary, though they could also result in faster execution. Using a task queue (e.g., Celery) is another option.
I tried implementing the asyncio approach, but am missing something since the following attempt doesn't change the overall execution time:
import asyncio
from subprocess import PIPE # https://docs.python.org/3/library/subprocess.html
import subprocess
async def go_do_something(index: int) -> None:
"""
This function takes a long time
Nothing is returned
Each instance is independent
"""
process = subprocess.run(["sleep","2"],stdout=PIPE,stderr=PIPE,timeout=20)
stdout = process.stdout.decode("utf-8")
stderr = process.stderr.decode("utf-8")
if "error" in stderr:
print("error for "+str(index))
return
def my_long_func(val: int) -> None:
"""
This function contains a loop
Each iteration of the loop calls a function
Nothing is returned
"""
# https://docs.python.org/3/library/asyncio-eventloop.html
loop = asyncio.get_event_loop()
tasks = []
for index in range(val):
task = go_do_something(index)
tasks.append(task)
# https://docs.python.org/3/library/asyncio-task.html
tasks = asyncio.gather(*tasks)
loop.run_until_complete(tasks)
loop.close()
return
my_long_func(3) # launch three tasks
If I want to monitor the output of each subprocess but not wait while each subprocess runs, can I benefit from asyncio? Or does this situation require something like multiprocessing or Celery?
Try executing the commands using asyncio instead of subprocess.
Define a run() function:
import asyncio
async def run(cmd: str):
proc = await asyncio.create_subprocess_shell(
cmd,
stderr=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE
)
stdout, stderr = await proc.communicate()
print(f'[{cmd!r} exited with {proc.returncode}]')
if stdout:
print(f'[stdout]\n{stdout.decode()}')
if stderr:
print(f'[stderr]\n{stderr.decode()}')
Then you may call it as you would call any async function:
asyncio.run(run('sleep 2'))
#=>
['sleep 2' exited with 0]
The example was taken from the official documentation. Also available here.
#ronginat pointed me to https://asyncio.readthedocs.io/en/latest/subprocess.html which I was able to adapt to the situation I am seeking:
import asyncio
async def run_command(*args):
# Create subprocess
process = await asyncio.create_subprocess_exec(
*args,
# stdout must a pipe to be accessible as process.stdout
stdout=asyncio.subprocess.PIPE)
# Wait for the subprocess to finish
stdout, stderr = await process.communicate()
# Return stdout
return stdout.decode().strip()
async def go_do_something(index: int) -> None:
print('index=',index)
res = await run_command('sleep','2')
return res
def my_long_func(val: int) -> None:
task_list = []
for indx in range(val):
task_list.append( go_do_something(indx) )
loop = asyncio.get_event_loop()
commands = asyncio.gather(*task_list)
reslt = loop.run_until_complete(commands)
print(reslt)
loop.close()
my_long_func(3) # launch three tasks
The total time of execution is just over 2 seconds even though there are three sleeps of duration 2 seconds. And I get the stdout from each subprocess.

Sharing asyncio.Queue with another thread or process

I've recently converted my old template matching program to asyncio and I have a situation where one of my coroutines relies on a blocking method (processing_frame).
I want to run that method in a seperate thread or process whenever the coroutine that calls that method (analyze_frame) gets an item from the shared asyncio.Queue()
I'm not sure if that's possible or worth it performance wise since I have very little experience with threading and multiprocessing
import cv2
import datetime
import argparse
import os
import asyncio
# Making CLI
if not os.path.exists("frames"):
os.makedirs("frames")
t0 = datetime.datetime.now()
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video", required=True,
help="path to our file")
args = vars(ap.parse_args())
threshold = .2
death_count = 0
was_found = False
template = cv2.imread('youdied.png')
vidcap = cv2.VideoCapture(args["video"])
loop = asyncio.get_event_loop()
frames_to_analyze = asyncio.Queue()
def main():
length = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
tasks = []
for _ in range(int(length / 50)):
tasks.append(loop.create_task(read_frame(50, frames_to_analyze)))
tasks.append(loop.create_task(analyze_frame(threshold, template, frames_to_analyze)))
final_task = asyncio.gather(*tasks)
loop.run_until_complete(final_task)
dt = datetime.datetime.now() - t0
print("App exiting, total time: {:,.2f} sec.".format(dt.total_seconds()))
print(f"Deaths registered: {death_count}")
async def read_frame(frames, frames_to_analyze):
global vidcap
for _ in range(frames-1):
vidcap.grab()
else:
current_frame = vidcap.read()[1]
print("Read 50 frames")
await frames_to_analyze.put(current_frame)
async def analyze_frame(threshold, template, frames_to_analyze):
global vidcap
global was_found
global death_count
frame = await frames_to_analyze.get()
is_found = processing_frame(frame)
if was_found and not is_found:
death_count += 1
await writing_to_file(death_count, frame)
was_found = is_found
def processing_frame(frame):
res = cv2.matchTemplate(frame, template, cv2.TM_CCOEFF_NORMED)
max_val = cv2.minMaxLoc(res)[1]
is_found = max_val >= threshold
print(is_found)
return is_found
async def writing_to_file(death_count, frame):
cv2.imwrite(f"frames/frame{death_count}.jpg", frame)
if __name__ == '__main__':
main()
I've tried using unsync but without much success
I would get something along the lines of
with self._rlock:
PermissionError: [WinError 5] Access is denied
If processing_frame is a blocking function, you should call it with await loop.run_in_executor(None, processing_frame, frame). That will submit the function to a thread pool and allow the event loop to proceed with doing other things until the call function completes.
The same goes for calls such as cv2.imwrite. As written, writing_to_file is not truly asynchronous, despite being defined with async def. This is because it doesn't await anything, so once its execution starts, it will proceed to the end without ever suspending. In that case one could as well make it a normal function in the first place, to make it obvious what's going on.

Python multiprocessing script partial output

I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))

What is mean yield None (tornado.gen.moment)

I need async subprocess lock in my web application.
I writes next code:
r = redis.Redis('localhost')
pipe = r.pipeline()
is_locked = False
while not is_locked:
try:
pipe.watch(lock_name)
current_locked = int(pipe.get(lock_name))
if current_locked == 0:
pipe.multi()
pipe.incr(lock_name)
pipe.execute()
is_locked = True
else:
yield None
except redis.WatchError:
yield None
return True
In documentation writen that tornado.gen.moment (yield None since version 4.5) is a special object which may be yielded to allow the IOLoop to run for one iteration. How it works? Is it next iteration with other Feature object (from other request) or not? Is it correct yield None usage?
The gen.moment is just resolved Future object added to the ioloop with a callback. This allows to run one iteration of ioloop.
The yield None is converted to the gen.moment using convert_yielded in the coroutine's gen.Runner.
The ioloop (basically while True) with each iteration do things like:
run callbacks scheduled with ioloop's add_callback or add_callback_from_signal
run callbacks scheduled with ioloop's add_timeout
poll for fd events (e.g. wait for file descirptor to be ready to write or read). Of course to not block the ioloop the poll has timeout.
run handler of ready fds
So getting to the point yield gen.moment will allow to do all the things above for one time (one iteration).
As an example let's schedule async task - httpclient fetch that requires running ioloop to be finished. On the other hand there will be also blocking function (time.sleep).
import time
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.httpclient import AsyncHTTPClient
#gen.coroutine
def fetch_task():
client = AsyncHTTPClient()
yield client.fetch('http://google.com')
print('fetch_task finished')
#gen.coroutine
def blocking():
start_time = time.time()
counter = 1
while True:
time.sleep(5)
print('blocking for %f' % (time.time() - start_time))
yield gen.moment
print('gen.moment counter %d' % counter)
counter += 1
#gen.coroutine
def main():
fetch_task()
yield blocking()
IOLoop.instance().run_sync(main)
Observation:
without a yield gen.moment, the fetch_task won't be finished
increase/decrease value of time.sleep does not affect the number of required iteration of a ioloop for the fetch_task to be completed. This means also that a AsyncHTTPClient.fetch is N + 1 (gen.moments + the task schedule) interactions with ioloop (handling callbacks, polling fd, handling events).
gen.moment does not always mean, the other tasks will be finished, rather they get opportunity to be one step closer to completeness.

Python Tweepy streaming with multitasking

in Python 2.7 I am successful in using the following code to listen to a direct message stream on an account:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy import API
from tweepy.streaming import StreamListener
# These values are appropriately filled in the code
consumer_key = '######'
consumer_secret = '######'
access_token = '######'
access_token_secret = '######'
class StdOutListener( StreamListener ):
def __init__( self ):
self.tweetCount = 0
def on_connect( self ):
print("Connection established!!")
def on_disconnect( self, notice ):
print("Connection lost!! : ", notice)
def on_data( self, status ):
print("Entered on_data()")
print(status, flush = True)
return True
# I can add code here to execute when a message is received, such as slicing the message and activating something else
def on_direct_message( self, status ):
print("Entered on_direct_message()")
try:
print(status, flush = True)
return True
except BaseException as e:
print("Failed on_direct_message()", str(e))
def on_error( self, status ):
print(status)
def main():
try:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.secure = True
auth.set_access_token(access_token, access_token_secret)
api = API(auth)
# If the authentication was successful, you should
# see the name of the account print out
print(api.me().name)
stream = Stream(auth, StdOutListener())
stream.userstream()
except BaseException as e:
print("Error in main()", e)
if __name__ == '__main__':
main()
This is great, and I can also execute code when I receive a message, but the jobs I'm adding to a work queue need to be able to stop after a certain amount of time. I'm using a popular start = time.time() and subtracting current time to determine elapsed time, but this streaming code does not loop to check the time. I just waits for a new message, so the clock is never checked so to speak.
My question is this: How can I get streaming to occur and still track time elapsed? Do I need to use multithreading as described in this article? http://www.tutorialspoint.com/python/python_multithreading.htm
I am new to Python and having fun playing around with hardware attached to a Raspberry Pi. I have learned so much from Stackoverflow, thank you all :)
I'm not sure exactly how you want to decide when to stop, but you can pass a timeout argument to the stream to give up after a certain delay.
stream = Stream(auth, StdOutListener(), timeout=30)
That will call your listener's on_timeout() method. If you return true, it will continue streaming. Otherwise, it will stop.
Between the stream's timeout argument and your listener's on_timeout(), you should be able to decide when to stop streaming.
I found I was able to get some multithreading code the way I wanted to. Unlike this tutorial from Tutorialspoint which gives an example of launching multiple instances of the same code with varying timing parameters, I was able to get two different blocks of code to run in their own instances
One block of code constantly adds 10 to a global variable (var).
Another block checks when 5 seconds elapses then prints var's value.
This demonstrates 2 different tasks executing and sharing data using Python multithreading.
See code below
import threading
import time
exitFlag = 0
var = 10
class myThread1 (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
#var counting block begins here
print "addemup starting"
global var
while (var < 100000):
if var > 90000:
var = 0
var = var + 10
class myThread2 (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
#time checking block begins here and prints var every 5 secs
print "checkem starting"
global var
start = time.time()
elapsed = time.time() - start
while (elapsed < 10):
elapsed = time.time() - start
if elapsed > 5:
print "var = ", var
start = time.time()
elapsed = time.time() - start
# Create new threads
thread1 = myThread1(1, "Thread-1", 1)
thread2 = myThread2(2, "Thread-2", 2)
# Start new Threads
thread1.start()
thread2.start()
print "Exiting Main Thread"
My next task will be breaking up my twitter streaming in to its own thread, and passing direct messages received as variables to a task queueing program, while hopefully the first thread continues to listen for more direct messages.

Resources