Stream producer and consumer with asyncio gather python - python-3.x

I wrote a script for a socket server that simply listens for incoming connections and processes the incoming data. The chosen architecture is the asyncio.start_server for the socket management and the asyncio.Queues for passing the data between the producer and consumer coroutines. The problem is that the consume(q1) function is executed only once (at the first script startup). Then it is not more executed. Is the line run_until_complete(asyncio.gather()) wrong?
import asyncio
import functools
async def handle_readnwrite(reader, writer, q1): #Producer coroutine
data = await reader.read(1024)
message = data.decode()
await writer.drain()
await q1.put(message[3:20])
await q1.put(None)
writer.close() #Close the client socket
async def consume(q1): #Consumer coroutine
while True:
# wait for an item from the producer
item = await q1.get()
if item is None:
logging.debug('None items') # the producer emits None to indicate that it is done
break
do_something(item)
loop = asyncio.get_event_loop()
q1 = asyncio.Queue(loop=loop)
producer_coro = asyncio.start_server(functools.partial(handle_readnwrite, q1=q1), '0.0.0.0', 3000, loop=loop)
consumer_coro = consume(q1)
loop.run_until_complete(asyncio.gather(consumer_coro,producer_coro))
try:
loop.run_forever()
except KeyboardInterrupt:
pass
loop.close()

handle_readnwrite always enqueues the None terminator, which causes consume to break (and therefore finish the coroutine). If consume should continue running and process other messages, the None terminator must not be sent after each message.

Related

Python: concurrently pending on async coroutine and synchronous function

I'd like to establish an SSH SOCKs tunnel (using asyncssh) during the execution of a synchronous function. When the function is done I want to tear down the tunnel and exit.
Apparently some async function has to be awaited to keep the tunnel working so the important thing is that conn.wait_closed() and the synchronous function are executed concurrently. So I am quite sure that I actually need a second thread.
I first tried some saner things using a ThreadPoolExecutor with run_in_executor but then ended up with the abysmal multihreaded variant below.
#! /usr/bin/env python3
import traceback
from threading import Thread
from concurrent.futures import ThreadPoolExecutor
import asyncio, asyncssh, sys
_server="127.0.0.1"
_port=22
_proxy_port=8080
async def run_client():
conn = await asyncio.wait_for(
asyncssh.connect(
_server,
port=_port,
options=asyncssh.SSHClientConnectionOptions(client_host_keysign=True),
),
10,
)
listener = await conn.forward_socks('127.0.0.1', _proxy_port)
return conn
async def do_stuff(func):
try:
conn = await run_client()
print("SSH tunnel active")
def start_loop(loop):
asyncio.set_event_loop(loop)
try:
loop.run_forever()
except Exception as e:
print(f"worker loop: {e}")
async def thread_func():
ret=await func()
print("Func done - tearing done worker thread and SSH connection")
conn.close()
# asyncio.get_event_loop().stop()
return ret
func_loop = asyncio.new_event_loop()
func_thread = Thread(target=start_loop, args=(func_loop,))
func_thread.start()
print("thread started")
fut = asyncio.run_coroutine_threadsafe(thread_func(), func_loop)
print(f"fut scheduled: {fut}")
done = await asyncio.gather(asyncio.wrap_future(fut), conn.wait_closed())
print("wait done")
for ret in done:
print(f"ret={ret}")
# Canceling pending tasks and stopping the loop
# asyncio.gather(*asyncio.Task.all_tasks()).cancel()
print("stopping func_loop")
func_loop.call_soon_threadsafe(func_loop.stop())
print("joining func_thread")
func_thread.join()
print("joined func_thread")
except (OSError, asyncssh.Error) as exc:
sys.exit('SSH connection failed: ' + str(exc))
except (Exception) as exc:
sys.exit('Unhandled exception: ' + str(exc))
traceback.print_exc()
async def just_wait():
print("starting just_wait")
input()
print("ending just_wait")
return 42
asyncio.get_event_loop().run_until_complete(do_stuff(just_wait))
It actually "works" "correctly" till the end where I get an exception while joining the worker thread. I presume because something I do is not threadsafe.
Exception in callback None()
handle: <Handle>
Traceback (most recent call last):
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
TypeError: 'NoneType' object is not callable
To test the code you must have a local SSH server running with key files setup for your user. You may want to change the _port variable.
I am looking for the reason of the exception and/or a version of the program that requires less manual intervention in the threading and possibly uses just a single event loop. I don't know how to achieve the latter when I want to await the two things (as in the asyncio.gather call).
The immediate cause of your error is this line:
# incorrect
func_loop.call_soon_threadsafe(func_loop.stop())
The intention is to call func_loop.stop() in the thread that runs the func_loop event loop. But as written, it invokes func_loop.stop() in the current thread and passes its return value (None) to call_soon_threadsafe as the function to invoke. This causes call_soon_threadsafe to complain that None is not callable. To fix the immediate problem, you should drop the extra parentheses and invoke the method as:
# correct
func_loop.call_soon_threadsafe(func_loop.stop)
However, the code is definitely over-complicated as written:
it doesn't make sense to create a new event loop when you are already inside an event loop
just_wait shouldn't be async def since it doesn't await anything, so it's clearly not async.
sys.exit takes an integer exit status, not a string. Also, it doesn't make much sense to attempt to print a backtrace after the call to sys.exit.
To run a non-async function from asyncio, just use run_in_executor with the function and pass it the non-async function as-is. You don't need an extra thread nor an extra event loop, run_in_executor will take care of the thread and connect it with your current event loop, effectively making the sync function awaitable. For example (untested):
async def do_stuff(func):
conn = await run_client()
print("SSH tunnel active")
loop = asyncio.get_event_loop()
ret = await loop.run_in_executor(None, func)
print(f"ret={ret}")
conn.close()
await conn.wait_closed()
print("wait done")
def just_wait():
# just_wait is a regular function; it can call blocking code,
# but it cannot await
print("starting just_wait")
input()
print("ending just_wait")
return 42
asyncio.get_event_loop().run_until_complete(do_stuff(just_wait))
If you need to await things in just_wait, you can make it async and use run_in_executor for the actual blocking code inside it:
async def do_stuff():
conn = await run_client()
print("SSH tunnel active")
loop = asyncio.get_event_loop()
ret = await just_wait()
print(f"ret={ret}")
conn.close()
await conn.wait_closed()
print("wait done")
async def just_wait():
# just_wait is an async function, it can await, but
# must invoke blocking code through run_in_executor
print("starting just_wait")
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, input)
print("ending just_wait")
return 42
asyncio.run(do_stuff())

Python async: Waiting for stdin input while doing other stuff

I'm trying to create a WebSocket command line client that waits for messages from a WebSocket server but waits for user input at the same time.
Regularly polling multiple online sources every second works fine on the server, (the one running at localhost:6789 in this example), but instead of using Python's normal sleep() method, it uses asyncio.sleep(), which makes sense because sleeping and asynchronously sleeping aren't the same thing, at least not under the hood.
Similarly, waiting for user input and asynchronously waiting for user input aren't the same thing, but I can't figure out how to asynchronously wait for user input in the same way that I can asynchronously wait for an arbitrary amount of seconds, so that the client can deal with incoming messages from the WebSocket server while simultaneously waiting for user input.
The comment below in the else-clause of monitor_cmd() hopefully explains what I'm getting at:
import asyncio
import json
import websockets
async def monitor_ws():
uri = 'ws://localhost:6789'
async with websockets.connect(uri) as websocket:
async for message in websocket:
print(json.dumps(json.loads(message), indent=2, sort_keys=True))
async def monitor_cmd():
while True:
sleep_instead = False
if sleep_instead:
await asyncio.sleep(1)
print('Sleeping works fine.')
else:
# Seems like I need the equivalent of:
# line = await asyncio.input('Is this your line? ')
line = input('Is this your line? ')
print(line)
try:
asyncio.get_event_loop().run_until_complete(asyncio.wait([
monitor_ws(),
monitor_cmd()
]))
except KeyboardInterrupt:
quit()
This code just waits for input indefinitely and does nothing else in the meantime, and I understand why. What I don't understand, is how to fix it. :)
Of course, if I'm thinking about this problem in the wrong way, I'd be very happy to learn how to remedy that as well.
You can use the aioconsole third-party package to interact with stdin in an asyncio-friendly manner:
line = await aioconsole.ainput('Is this your line? ')
Borrowing heavily from aioconsole, if you would rather avoid using an external library you could define your own async input function:
async def ainput(string: str) -> str:
await asyncio.get_event_loop().run_in_executor(
None, lambda s=string: sys.stdout.write(s+' '))
return await asyncio.get_event_loop().run_in_executor(
None, sys.stdin.readline)
Borrowing heavily from aioconsole, there are 2 ways to handle.
start a new daemon thread:
import sys
import asyncio
import threading
from concurrent.futures import Future
async def run_as_daemon(func, *args):
future = Future()
future.set_running_or_notify_cancel()
def daemon():
try:
result = func(*args)
except Exception as e:
future.set_exception(e)
else:
future.set_result(result)
threading.Thread(target=daemon, daemon=True).start()
return await asyncio.wrap_future(future)
async def main():
data = await run_as_daemon(sys.stdin.readline)
print(data)
if __name__ == "__main__":
asyncio.run(main())
use stream reader:
import sys
import asyncio
async def get_steam_reader(pipe) -> asyncio.StreamReader:
loop = asyncio.get_event_loop()
reader = asyncio.StreamReader(loop=loop)
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, pipe)
return reader
async def main():
reader = await get_steam_reader(sys.stdin)
data = await reader.readline()
print(data)
if __name__ == "__main__":
asyncio.run(main())

Handling ensure_future and its missing tasks

I have a streaming application that almost continuously takes the data given as input and sends an HTTP request using that value and does something with the returned value.
Obviously to speed things up I've used asyncio and aiohttp libraries in Python 3.7 to get the best performance, but it becomes hard to debug given how fast the data moves.
This is what my code looks like
'''
Gets the final requests
'''
async def apiRequest(info, url, session, reqType, post_data=''):
if reqType:
async with session.post(url, data = post_data) as response:
info['response'] = await response.text()
else:
async with session.get(url+post_data) as response:
info['response'] = await response.text()
logger.debug(info)
return info
'''
Loops through the batches and sends it for request
'''
async def main(data, listOfData):
tasks = []
async with ClientSession() as session:
for reqData in listOfData:
try:
task = asyncio.ensure_future(apiRequest(**reqData))
tasks.append(task)
except Exception as e:
print(e)
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
responses = await asyncio.gather(*tasks)
return responses #list of APIResponses
'''
Streams data in and prepares batches to send for requests
'''
async def Kconsumer(data, loop, batchsize=100):
consumer = AIOKafkaConsumer(**KafkaConfigs)
await consumer.start()
dataPoints = []
async for msg in consumer:
try:
sys.stdout.flush()
consumedMsg = loads(msg.value.decode('utf-8'))
if consumedMsg['tid']:
dataPoints.append(loads(msg.value.decode('utf-8')))
if len(dataPoints)==batchsize or time.time() - startTime>5:
'''
#1: The task below goes and sends HTTP GET requests in bulk using aiohttp
'''
task = asyncio.ensure_future(getRequests(data, dataPoints))
res = await asyncio.gather(*[task])
if task.done():
outputs = []
'''
#2: Does some ETL on the returned values
'''
ids = await asyncio.gather(*[doSomething(**{'tid':x['tid'],
'cid':x['cid'], 'tn':x['tn'],
'id':x['id'], 'ix':x['ix'],
'ac':x['ac'], 'output':to_dict(xmltodict.parse(x['response'],encoding='utf-8')),
'loop':loop, 'option':1}) for x in res[0]])
simplySaveDataIntoDataBase(id) # This is where I see some missing data in the database
dataPoints = []
except Exception as e:
logger.error(e)
logger.error(traceback.format_exc())
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
logger.error(str(exc_type) +' '+ str(fname) +' '+ str(exc_tb.tb_lineno))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
asyncio.ensure_future(Kconsumer(data, loop, batchsize=100))
loop.run_forever()
Does the ensure_future need to be awaited ?
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
Does the ensure_future need to be awaited ?
Yes, and your code is doing that already. await asyncio.gather(*tasks) awaits the provided tasks and returns their results in the same order.
Note that await asyncio.gather(*[task]) doesn't make sense, because it is equivalent to await asyncio.gather(task), which is again equivalent to await task. In other words, when you need the result of getRequests(data, dataPoints), you can write res = await getRequests(data, dataPoints) without the ceremony of first calling ensure_future() and then calling gather().
In fact, you almost never need to call ensure_future yourself:
if you need to await multiple tasks, you can pass coroutine objects directly to gather, e.g. gather(coroutine1(), coroutine2()).
if you need to spawn a background task, you can call asyncio.create_task(coroutine(...))
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
If you use gather, all requests must finish before any of them return. (That is not aiohttp policy, it's how gather works.) If you need to implement a timeout, you can use asyncio.wait_for or similar.

Asyncio server stops to respond after the first request

I'm trying to write an asyncio-based server. The problem is, that it stops to respond after the first request.
My code is built upon this template for echo-server and this method to pass parameters to coroutines.
class MsgHandler:
def __init__(self, mem):
# here (mem:dict) I store received metrics
self.mem = mem
async def handle(self, reader, writer):
#this coroutine handles requests
data = await reader.read(1024)
print('request:', data.decode('utf-8'))
# read_msg returns an answer based on the request received
# My server closes connection on every second request
# For the first one, everything works as intended,
# so I don't thik the problem is in read_msg()
response = read_msg(data.decode('utf-8'), self.mem)
print('response:', response)
writer.write(response.encode('utf-8'))
await writer.drain()
writer.close()
def run_server(host, port):
mem = {}
msg_handler = MsgHandler(mem)
loop = asyncio.get_event_loop()
coro = asyncio.start_server(msg_handler.handle, host, port, loop=loop)
server = loop.run_until_complete(coro)
try:
loop.run_forever()
except KeyboardInterrupt:
pass
server.close()
loop.run_until_complete(server.wait_closed())
loop.close()
On the client-side I either get an empty response or ConnectionResetError (104, 'Connection reset by peer').
You are closing the writer with writer.close() in the handler, which closes the socket.
From the 3.9 docs on StreamWriter:
Also, if you don't close the stream writer, then you would still have store it somewhere in order to keep receiving messages over that same connection.

Wait for db future to complete?

I have written a code for sanic application, rethinkdb is being used as a backend database. I want to wait for rethinkdb connection function to intialise before other functions as they have dependency on rethinkdb connection.
My rethinkdb connection initialization function is:
async def open_connections(app):
logger.warning('opening database connection')
r.set_loop_type('asyncio')
connection= await r.connect(
port=app.config.DATABASE["port"],
host=app.config.DATABASE["ip"],
db=app.config.DATABASE["dbname"],
user=app.config.DATABASE["user"],
password=app.config.DATABASE["password"])
print (f"connection established {connection}")
return connection
The call back function which will be executed after future gets resolved is
def db_callback(future):
exc = future.exception()
if exc:
# Handle wonderful empty TimeoutError exception
logger.error(f"From mnemonic api isnt working with error {exc}")
sys.exit(1)
result = future.result()
return result
sanic app:
def main():
app = Sanic(__name__)
load_config(app)
zmq = ZMQEventLoop()
asyncio.set_event_loop(zmq)
server = app.create_server(
host=app.config.HOST, port=app.config.PORT, debug=app.config.DEBUG, access_log=True)
loop = asyncio.get_event_loop()
##not wait for the server to strat, this will return a future object
asyncio.ensure_future(server)
##not wait for the rethinkdb connection to initialize, this will return
##a future object
future = asyncio.ensure_future(open_connections(app))
result = future.add_done_callback(db_callback)
logger.debug(result)
future = asyncio.ensure_future(insert_mstr_account(app))
future.add_done_callback(insert_mstr_acc_callback)
future = asyncio.ensure_future(check_master_accounts(app))
future.add_done_callback(callbk_check_master_accounts)
signal(SIGINT, lambda s, f: loop.close())
try:
loop.run_forever()
except KeyboardInterrupt:
close_connections(app)
loop.stop()
When i start this app, the print statement in open_connections functions executes in the last.
future = asyncio.ensure_future(open_connections(app))
result = future.add_done_callback(db_callback)
ensure_future schedules coroutines concurrently
add_done_callback does not wait for the completion of the future, instead it simply schedules a function call after the future is completed. You can see it here
So you should explicitly await the open_connections future before performing other functions:
future = asyncio.ensure_future(open_connections(app))
future.add_done_callback(db_callback)
result = await future
EDITED: the answer above applies only to coroutine
In this case we want to wait for the completion of future in the function body. To do it we should use loop.run_until_complete
def main():
...
future = asyncio.ensure_future(open_connections(app))
future.add_done_callback(db_callback)
result = loop.run_until_complete(future)

Resources