asyncio.Queue as producer-consumer flow in a webserver like Quart - python-3.x

Is it possible to use asyncio.Queue with a webserver like Quart to communicate between the producer and consumer?
Here is what I am trying to do....
from quart import Quart, request
import asyncio
queue = asyncio.Queue()
producers = []
consumers = []
async def producer(mesg):
print(f'produced {mesg}')
await queue.put(mesg)
await asyncio.sleep(1) # do some work
async def consumer():
while True:
token = await queue.get()
await asyncio.sleep(1) # do some work
queue.task_done()
print(f'consumed {token}')
#app.route('/route', methods=['POST'])
async def index():
mesg = await request.get_data()
try:
p = asyncio.create_task(producer(mesg))
producers.append(p)
c = asyncio.create_task(consumer())
consumers.append(c)
return f"published message {mesg}", 200
except Exception as e:
logger.exception("Failed tp publish message %s!", mesg)
return f"Failed to publish message: {mesg}", 400
if __name__ == '__main__':
PORT = int(os.getenv('PORT')) if os.getenv('PORT') else 8050
app.run(host='0.0.0.0', port=PORT, debug=True)
This works fine.
But I am not sure if this is a good practice because I am confused how (where in my code) to do the below steps.
# Making sure all the producers have completed
await asyncio.gather(*producers)
#wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
EDIT-1:
I have tried using #app.after_serving, with some logger.debug statements.
#app.after_serving
async def shutdown():
logger.debug("Shutting down...")
logger.debug("waiting for producers to finish...")
await asyncio.gather(*producers)
logger.debug("waiting for tasks to complete...")
await queue.join()
logger.debug("cancelling consumers...")
for c in consumers:
c.cancel()
But the debug statements are not printed when hypercorn is gracefully shutting down. So, I am not sure whether the function(shutdown) decorated with #app.after_serving is actually called during a shutdown.
Here is the message from hypercorn during shutdown
appserver_1 | 2020-05-29 15:55:14,200 - base_events.py:1490 - create_server - INFO - <Server sockets=(<asyncio.TransportSocket fd=14, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 8080)>,)> is serving
appserver_1 | Running on 0.0.0.0:8080 over http (CTRL + C to quit)
Gracefully stopping... (press Ctrl+C again to force)
I using a kill -SIGTERM <PID> to signal a graceful shutdown to the process.

I would place the cleanup code in a shutdown after_serving function,
#app.after_serving
async def shutdown():
# Making sure all the producers have completed
await asyncio.gather(*producers)
#wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
As for the globals, I tend to store them on the app directly so that they can be accessed via the current_app proxy. Please note though that this (and your solution) only works for a single process (worker), if you want to use multiple workers (or equivalently hosts) you will need a third party store for this information e.g. using redis.

But I am not sure if this is a good practice
Global variables like the ones you've created in your example are not typically good practice in enterprise solutions. Especially in Python, there are some finicky behavior when it comes to globals.
Passing in variables to a function or class has been a cleaner approach from my experience.
However, I don't know how to do that in quart since I don't use that lib.
# Making sure all the producers have completed
#wait for the remaining tasks to be processed
# cancel the consumers, which are now idle
Typically clean up tasks are done upon exiting the event loop and before exiting the application.
I don't have knowledge of how quart works, but you might be able to put that logic after app.run() so that clean up tasks run after the event loop stops.
This might vary depending on how your application exits.
Check the documentation for some sort of a "on shutdown" event that you can hook into.

Related

Blocking and non-blocking calls on server side, why does it matter for asynchronous client side?

Experimenting with some asynchronous code, in Python 3.8.0, I stumbled on the following situation. I have client.py which can handle connections asynchronously with a server in server.py. This server pretends to do some work, but actually sleeps for some seconds and then returns. My question is, since the server is running in a completely different process, why does it matter whether the sleep method is blocking or not and if processes on the server side may not be blocking, what is the benefit of doing asynchronous calls like these in the first place?
# client.py
import time
import asyncio
import aiohttp
async def request_coro(url, session):
async with session.get(url) as response:
return await response.read()
async def concurrent_requests(number, url='http://localhost:8080'):
tasks = []
async with aiohttp.ClientSession() as session:
for n in range(number):
# Schedule the tasks
task = asyncio.create_task(request_coro(url, session))
tasks.append(task)
# returns when all tasks are completed
return await asyncio.gather(*tasks)
t0 = time.time()
responses = asyncio.run(concurrent_requests(10))
elapsed_concurrent = time.time() - t0
sum_sleeps = sum((int(i) for i in responses))
print(f'{elapsed_concurrent=:.2f} and {sum_sleeps=:.2f}')
# server.py
import time
import random
import logging
import asyncio
from aiohttp import web
random.seed(10)
async def index(requests):
# Introduce some latency at the server side
sleeps = random.randint(1, 3)
# NON-BLOCKING
# await asyncio.sleep(sleeps)
# BLOCKING
time.sleep(sleeps)
return web.Response(text=str(sleeps))
app = web.Application()
app.add_routes([web.get('/', index),
web.get('/index', index)])
logging.basicConfig(level=logging.DEBUG)
web.run_app(app, host='localhost', port=8080)
These are the results from 10 asynchronous calls by the client using either the blocking or the non-blocking sleep methods:
asyncio.sleep (non-blocking)
elapsed_concurrent=3.02 and sum_sleeps=19.00
time.sleep (blocking)
elapsed_concurrent=19.04 and sum_sleeps=19.00
Although the server is running in a completely different process, it can not take multiple active connections at the same time, like a multi threaded server. So the client and the server are working asynchonously both having their own event loop.
The server can only take new connections from the client when the event loop is suspended in a non-blocking sleep. Making it appear that the server is multi threaded but actually rapidly alternates between available connections. A blocking sleep will make the requests sequential because the suspended event loop will sit idle and can not handle new connections in the mean time.

Process finishes but cannot be joined?

To accelerate a certain task, I'm subclassing Process to create a worker that will process data coming in samples. Some managing class will feed it data and read the outputs (using two Queue instances). For asynchronous operation I'm using put_nowait and get_nowait. At the end I'm sending a special exit code to my process, upon which it breaks its internal loop. However... it never happens. Here's a minimal reproducible example:
import multiprocessing as mp
class Worker(mp.Process):
def __init__(self, in_queue, out_queue):
super(Worker, self).__init__()
self.input_queue = in_queue
self.output_queue = out_queue
def run(self):
while True:
received = self.input_queue.get(block=True)
if received is None:
break
self.output_queue.put_nowait(received)
print("\tWORKER DEAD")
class Processor():
def __init__(self):
# prepare
in_queue = mp.Queue()
out_queue = mp.Queue()
worker = Worker(in_queue, out_queue)
# get to work
worker.start()
in_queue.put_nowait(list(range(10**5))) # XXX
# clean up
print("NOTIFYING")
in_queue.put_nowait(None)
#out_queue.get() # XXX
print("JOINING")
worker.join()
Processor()
This code never completes, hanging permanently like this:
NOTIFYING
JOINING
WORKER DEAD
Why?
I've marked two lines with XXX. In the first one, if I send less data (say, 10**4), everything will finish normally (processes join as expected). Similarly in the second, if I get() after notifying the workers to finish. I know I'm missing something but nothing in the documentation seems relevant.
Documentation mentions that
When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences [...] After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.
https://docs.python.org/3.7/library/multiprocessing.html#pipes-and-queues
and additionally that
whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate.
https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing-programming
This means that the behaviour you describe is caused probably by a racing condition between self.output_queue.put_nowait(received) in the worker and joining the worker with worker.join() in the Processers __init__. If joining was faster than feeding it into the queue, everything finishes fine. If it was too slow, there is an item in the queue, and the worker would not join.
Uncommenting the out_queue.get() in the main process would empty the queue, which allows joining. But as it is important for the queue to return if the queue would already be empty, using a time-out might be an option to try to wait out the racing condition, e.g out_qeue.get(timeout=10).
Possibly important might also be to protect the main routine, especially for Windows (python multiprocessing on windows, if __name__ == "__main__")

Python Asyncio - Server able to receive multi-commands in different times and processing it

I am building a client/server communication by using the AsyncIO library in Python. Now I'm trying to make my server able to receive more than one command, process it and reply when done.
I mean:
Server received the command and can be able to receive "n" more commands while processing the previous received commands.
Could someone help me how to find some examples or than how is the best way to make a search?
I mean: Server received the command and can be able to receive "n" more commands while processing the previous received commands.
If I understand you correctly, you want the server to process the client's command in the background, i.e. continue talking to the client while the command is running. This allows the client to queue multiple commands without waiting for the first one; http calls this technique pipelining.
Since asyncio allows creating lightweight tasks that run in the "background", it is actually quite easy to implement such server. Here is an example server that responds with a message after sleeping for an interval, and accepting multiple commands at any point:
import asyncio
async def serve(r, w):
loop = asyncio.get_event_loop()
while True:
cmd = await r.readline()
if not cmd:
break
if not cmd.startswith(b'sleep '):
w.write(b'bad command %s\n' % cmd.strip())
continue
sleep_arg = int(cmd[6:]) # how many seconds to sleep
loop.create_task(cmd_sleep(w, sleep_arg))
async def cmd_sleep(w, interval):
w.write(f'starting sleep {interval}\n'.encode())
await asyncio.sleep(interval)
w.write(f'ended sleep {interval}\n'.encode())
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.start_server(serve, None, 4321))
loop.run_forever()
main()

Python 3.4 Comms Stream Delegate - Non-blocking recv and send - data out from asyncio

I'm putting together a client server app on RPi. It has a main thread which creates a comms thread to talk to an iOS device.
The main thread creates an asyncio event loop and a sendQ and a recvQ and passes them as args to the commsDelegate main method in the comms thread.
The trouble I'm having is when iOS device connects, it needs to receive unsolicited data from this Python app as soon as the data becomes available and it needs to be able to send data up to the Python app. So send and receive need to be non-blocking.
There are great echo server tutorials out there. But little in terms of the server doing something useful with the data.
Can anyone assist me in getting asyncio to read my send queue and forward data as soon as the main thread has queued it? I have receive working great.
Main Thread creates a loop and starts the comms thread:
commsLoop = asyncio.new_event_loop()
commsMainThread = threading.Thread(target=CommsDelegate.commsDelegate, args=(commsInQ,commsOutQ,commsLoop,commsPort,), daemon=True)
commsMainThread.start()
Then asyncio in module CommsDelegate should run the loop as loop.run_forever() server task reading and writing from a socket stream and sending receiving messages using queues back up to the main thread.
Here's my code so far. I found that if I create a factory for the protocol generator, I can pass it the queue names so the receipt of messages is all good now. When they arrive from the client they are queued _nowait and the main thread receives them just fine.
I just need asyncio to handle the queue of outbound messages from the Main thread as they arrive on sendQ, so it can send them on to the connected client.
#!/usr/bin/env python3.6
import asyncio
class ServerProtocol(asyncio.Protocol):
def __init__(self, loop, recvQ, sendQ):
self.loop = loop
self.recvQ = recvQ
self.sendQ = sendQ
def connection_made(self, transport):
peername = transport.get_extra_info('peername')
print('Connection from {}'.format(peername))
self.transport = transport
def data_received(self, data):
message = data.decode()
print('Data received: {!r}'.format(message))
self.recvQ.put_nowait(message.rstrip())
# Needs work... I think the queue.get_nowait should be a co-ro maybe?
def unknownAtTheMo():
dataToSend = sendQ.get_nowait()
print('Send: {!r}'.format(message))
self.transport.write(dataToSend)
# Needs work to close on request from client or server or exc...
def handleCloseSocket(self):
print('Close the client socket')
self.transport.close()
# async co-routine to consume the send message Q from Main Thread
async def consume(sendQ):
print("In consume coro")
while True:
outboundData = await self.sendQ.get()
print("Consumed", outboundData)
self.transport.write(outboundData.encode('ascii'))
def commsDelegate(recvQ, sendQ, loop, port):
asyncio.set_event_loop(loop)
# Connection coroutine - Create a factory to assist the protocol in receipt of the queues as args
factory = lambda: ProveItServerProtocol(loop, recvQ, sendQ)
# Each client connection will create a new protocol instance
connection = loop.run_until_complete(loop.create_server(factory, host='192.168.1.199', port=port))
# Outgoing message queue handler
consumer = asyncio.ensure_future(consume(sendQ))
# Set up connection
loop.run_until_complete(connection)
# Wait until the connection is closed
loop.run_forever()
# Wait until the queue is empty
loop.run_until_complete(queue.join())
# Cancel the consumer
consumer.cancel()
# Let the consumer terminate
loop.run_until_complete(consumer)
# Close the connection
connection.close()
# Close the loop
loop.close()
I send all data messages as json and CommsDelegate performs encode and decode then relays them asis.
Update: asyncio thread seems to be working well for incoming traffic. Server receives json and relays it via a queue - non-blocking.
Once the send is working, I'll have a reusable blackbox server on a thread.
I can see two problems with your approach. First, all your clients are using the same recv and send queues, so there is no way the consume coroutine can know who to reply to.
The second issue has to do with your use of queues as a bridge between the synchronous and the asynchronous worlds. See this part of your code:
await self.sendQ.get()
If sendQ is a regular queue (from the queue module), this line will fail because sendQ is not a coroutine. On the other hand, if sendQ is an asyncio.Queue, the main thread won't be able to use sendQ.put because it is a coroutine. It would be possible to use put_nowait, but thread-safety is not guaranteed in asyncio. Instead, you'd have to use loop.call_soon_threadsafe:
loop.call_soon_threadsafe(sendQ.put_nowait, message)
In general, remember that asyncio is designed to run as the main application. It's supposed to run in the main thread, and communicate with synchronous code through a ThreadPoolExecutor (see loop.run_in_executor).
More information about multithreading in the asyncio documentation. You might also want to have a look at the asyncio stream API that provides a much nicer interface to work with TCP.

Python Notify when all files have been transferred

I am using "watchdog" api to keep checking changes in a folder in my filesystem. Whatever files changes in that folder, I pass them to a particular function which starts threads for each file I pass them.
But watchdog, or any other filesystem watcher api (in my knowledge), notifies users file by file i.e. as the files come by, they notify the user. But I would like it to notify me a whole bunch of files at a time so that I can pass that list to my function and take use of multi-threading. Currently, when I use "watchdog", it notifies me one file at a time and I am only able to pass that one file to my function. I want to pass it many files at a time to be able to have multithreading.
One solution that comes to my mind is: you see when you copy a bunch of files in a folder, OS shows you a progress bar. If it would be possible for me to be notified when that progress bar is done, then it would be a perfect solution for my question. But I don't know if that is possible.
Also I know that watchdog is a polling API, and an ideal API for watching filesystem would be interrupt driven api like pyinotify. But I didn't find any API which was interrupt driven and also cross platform. iWatch is good, but only for linux, and I want something for all OS. So, if you have suggestions on any other API, please do let me know.
Thanks.
Instead of accumulating filesystem events, you could spawn a pool of worker
threads which get tasks from a common queue. The watchdog thread could then put
tasks in the queue as filesystem events occur. Done this way, a worker thread
can start working as soon as an event occurs.
For example,
import logging
import Queue
import threading
import time
import watchdog.observers as observers
import watchdog.events as events
logger = logging.getLogger(__name__)
SENTINEL = None
class MyEventHandler(events.FileSystemEventHandler):
def on_any_event(self, event):
super(MyEventHandler, self).on_any_event(event)
queue.put(event)
def __init__(self, queue):
self.queue = queue
def process(queue):
while True:
event = queue.get()
logger.info(event)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
queue = Queue.Queue()
num_workers = 4
pool = [threading.Thread(target=process, args=(queue,)) for i in range(num_workers)]
for t in pool:
t.daemon = True
t.start()
event_handler = MyEventHandler(queue)
observer = observers.Observer()
observer.schedule(
event_handler,
path='/tmp/testdir',
recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Running
% mkdir /tmp/testdir
% script.py
yields output like
[14:48:31 Thread-1] <FileCreatedEvent: src_path=/tmp/testdir/.#foo>
[14:48:32 Thread-2] <FileModifiedEvent: src_path=/tmp/testdir/foo>
[14:48:32 Thread-3] <FileModifiedEvent: src_path=/tmp/testdir/foo>
[14:48:32 Thread-4] <FileDeletedEvent: src_path=/tmp/testdir/.#foo>
[14:48:42 Thread-1] <FileDeletedEvent: src_path=/tmp/testdir/foo>
[14:48:47 Thread-2] <FileCreatedEvent: src_path=/tmp/testdir/.#bar>
[14:48:49 Thread-4] <FileCreatedEvent: src_path=/tmp/testdir/bar>
[14:48:49 Thread-4] <FileModifiedEvent: src_path=/tmp/testdir/bar>
[14:48:49 Thread-1] <FileDeletedEvent: src_path=/tmp/testdir/.#bar>
[14:48:54 Thread-2] <FileDeletedEvent: src_path=/tmp/testdir/bar>
Doug Hellman has written an excellent set of tutorials (which has now been edited into a book) which should help you get started:
on using Queue
the threading module
how to setup and use a pool of worker processes
how to setup a pool of worker threads
I didn't actually end up using a multiprocessing Pool or ThreadPool as discussed
in the last two links, but you may find them useful anyway.

Resources