Wait for db future to complete? - python-3.x

I have written a code for sanic application, rethinkdb is being used as a backend database. I want to wait for rethinkdb connection function to intialise before other functions as they have dependency on rethinkdb connection.
My rethinkdb connection initialization function is:
async def open_connections(app):
logger.warning('opening database connection')
r.set_loop_type('asyncio')
connection= await r.connect(
port=app.config.DATABASE["port"],
host=app.config.DATABASE["ip"],
db=app.config.DATABASE["dbname"],
user=app.config.DATABASE["user"],
password=app.config.DATABASE["password"])
print (f"connection established {connection}")
return connection
The call back function which will be executed after future gets resolved is
def db_callback(future):
exc = future.exception()
if exc:
# Handle wonderful empty TimeoutError exception
logger.error(f"From mnemonic api isnt working with error {exc}")
sys.exit(1)
result = future.result()
return result
sanic app:
def main():
app = Sanic(__name__)
load_config(app)
zmq = ZMQEventLoop()
asyncio.set_event_loop(zmq)
server = app.create_server(
host=app.config.HOST, port=app.config.PORT, debug=app.config.DEBUG, access_log=True)
loop = asyncio.get_event_loop()
##not wait for the server to strat, this will return a future object
asyncio.ensure_future(server)
##not wait for the rethinkdb connection to initialize, this will return
##a future object
future = asyncio.ensure_future(open_connections(app))
result = future.add_done_callback(db_callback)
logger.debug(result)
future = asyncio.ensure_future(insert_mstr_account(app))
future.add_done_callback(insert_mstr_acc_callback)
future = asyncio.ensure_future(check_master_accounts(app))
future.add_done_callback(callbk_check_master_accounts)
signal(SIGINT, lambda s, f: loop.close())
try:
loop.run_forever()
except KeyboardInterrupt:
close_connections(app)
loop.stop()
When i start this app, the print statement in open_connections functions executes in the last.

future = asyncio.ensure_future(open_connections(app))
result = future.add_done_callback(db_callback)
ensure_future schedules coroutines concurrently
add_done_callback does not wait for the completion of the future, instead it simply schedules a function call after the future is completed. You can see it here
So you should explicitly await the open_connections future before performing other functions:
future = asyncio.ensure_future(open_connections(app))
future.add_done_callback(db_callback)
result = await future
EDITED: the answer above applies only to coroutine
In this case we want to wait for the completion of future in the function body. To do it we should use loop.run_until_complete
def main():
...
future = asyncio.ensure_future(open_connections(app))
future.add_done_callback(db_callback)
result = loop.run_until_complete(future)

Related

asyncio.wait not returning on first exception

I have an AMQP publisher class with the following methods. on_response is the callback that is called when a consumer sends back a message to the RPC queue I setup. I.e. the self.callback_queue.name you see in the reply_to of the Message. publish publishes out to a direct exchange with a routing key that has multiple consumers (very similar to a fanout), and multiple responses come back. I create a number of futures equal to the number of responses I expect, and asyncio.wait for those futures to complete. As I get responses back on the queue and consume them, I set the result to the futures.
async def on_response(self, message: IncomingMessage):
if message.correlation_id is None:
logger.error(f"Bad message {message!r}")
await message.ack()
return
body = message.body.decode('UTF-8')
future = self.futures[message.correlation_id].pop()
if hasattr(body, 'error'):
future.set_execption(body)
else:
future.set_result(body)
await message.ack()
async def publish(self, routing_key, expected_response_count, msg, timeout=None, return_partial=False):
if not self.connected:
logger.info("Publisher not connected. Waiting to connect first.")
await self.connect()
correlation_id = str(uuid.uuid4())
futures = [self.loop.create_future() for _ in range(expected_response_count)]
self.futures[correlation_id] = futures
await self.exchange.publish(
Message(
str(msg).encode(),
content_type="text/plain",
correlation_id=correlation_id,
reply_to=self.callback_queue.name,
),
routing_key=routing_key,
)
done, pending = await asyncio.wait(futures, timeout=timeout, return_when=asyncio.FIRST_EXCEPTION)
if not return_partial and pending:
raise asyncio.TimeoutError(f'Failed to return all results for publish to {routing_key}')
for f in pending:
f.cancel()
del self.futures[correlation_id]
results = []
for future in done:
try:
results.append(json.loads(future.result()))
except json.decoder.JSONDecodeError as e:
logger.error(f'Client did not return JSON!! {e!r}')
logger.info(future.result())
return results
My goal is to either wait until all futures are finished, or a timeout occurs. This is all working nicely at the moment. What doesn't work, is when I added return_when=asyncio.FIRST_EXCEPTION, the asyncio.wait.. does not finish after the first call of future.set_exception(...) as I thought it would.
What do I need to do with the future so that when I get a response back and see that an error occurred on the consumer side (before the timeout, or even other responses) the await asyncio.wait will no longer be blocking. I was looking at the documentation and it says:
The function will return when any future finishes by raising an exception
when return_when=asyncio.FIRST_EXCEPTION. My first thought is that I'm not raising an exception in my future correctly, only, I'm having trouble finding out exactly how I should do that then. From the API documentation for the Future class, it looks like I'm doing the right thing.
When I created a minimum viable example, I realized I was actually doing things MOSTLY right after all, and I glanced over other errors causing this not to work. Here is my minimum example:
The most important change I had to do was actually pass in an Exception object.. (subclass of BaseException) do the set_exception method.
import asyncio
async def set_after(future, t, body, raise_exception):
await asyncio.sleep(t)
if raise_exception:
future.set_exception(Exception("problem"))
else:
future.set_result(body)
print(body)
async def main():
loop = asyncio.get_event_loop()
futures = [loop.create_future() for _ in range(2)]
asyncio.create_task(set_after(futures[0], 3, 'hello', raise_exception=True))
asyncio.create_task(set_after(futures[1], 7, 'world', raise_exception=False))
print(futures)
done, pending = await asyncio.wait(futures, timeout=10, return_when=asyncio.FIRST_EXCEPTION)
print(done)
print(pending)
asyncio.run(main())
In this line of code if hasattr(body, 'error'):, body was a string. I thought it was JSON at that point already. Should have been using "error" in body as my condition in any case. whoops!

Python: concurrently pending on async coroutine and synchronous function

I'd like to establish an SSH SOCKs tunnel (using asyncssh) during the execution of a synchronous function. When the function is done I want to tear down the tunnel and exit.
Apparently some async function has to be awaited to keep the tunnel working so the important thing is that conn.wait_closed() and the synchronous function are executed concurrently. So I am quite sure that I actually need a second thread.
I first tried some saner things using a ThreadPoolExecutor with run_in_executor but then ended up with the abysmal multihreaded variant below.
#! /usr/bin/env python3
import traceback
from threading import Thread
from concurrent.futures import ThreadPoolExecutor
import asyncio, asyncssh, sys
_server="127.0.0.1"
_port=22
_proxy_port=8080
async def run_client():
conn = await asyncio.wait_for(
asyncssh.connect(
_server,
port=_port,
options=asyncssh.SSHClientConnectionOptions(client_host_keysign=True),
),
10,
)
listener = await conn.forward_socks('127.0.0.1', _proxy_port)
return conn
async def do_stuff(func):
try:
conn = await run_client()
print("SSH tunnel active")
def start_loop(loop):
asyncio.set_event_loop(loop)
try:
loop.run_forever()
except Exception as e:
print(f"worker loop: {e}")
async def thread_func():
ret=await func()
print("Func done - tearing done worker thread and SSH connection")
conn.close()
# asyncio.get_event_loop().stop()
return ret
func_loop = asyncio.new_event_loop()
func_thread = Thread(target=start_loop, args=(func_loop,))
func_thread.start()
print("thread started")
fut = asyncio.run_coroutine_threadsafe(thread_func(), func_loop)
print(f"fut scheduled: {fut}")
done = await asyncio.gather(asyncio.wrap_future(fut), conn.wait_closed())
print("wait done")
for ret in done:
print(f"ret={ret}")
# Canceling pending tasks and stopping the loop
# asyncio.gather(*asyncio.Task.all_tasks()).cancel()
print("stopping func_loop")
func_loop.call_soon_threadsafe(func_loop.stop())
print("joining func_thread")
func_thread.join()
print("joined func_thread")
except (OSError, asyncssh.Error) as exc:
sys.exit('SSH connection failed: ' + str(exc))
except (Exception) as exc:
sys.exit('Unhandled exception: ' + str(exc))
traceback.print_exc()
async def just_wait():
print("starting just_wait")
input()
print("ending just_wait")
return 42
asyncio.get_event_loop().run_until_complete(do_stuff(just_wait))
It actually "works" "correctly" till the end where I get an exception while joining the worker thread. I presume because something I do is not threadsafe.
Exception in callback None()
handle: <Handle>
Traceback (most recent call last):
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
TypeError: 'NoneType' object is not callable
To test the code you must have a local SSH server running with key files setup for your user. You may want to change the _port variable.
I am looking for the reason of the exception and/or a version of the program that requires less manual intervention in the threading and possibly uses just a single event loop. I don't know how to achieve the latter when I want to await the two things (as in the asyncio.gather call).
The immediate cause of your error is this line:
# incorrect
func_loop.call_soon_threadsafe(func_loop.stop())
The intention is to call func_loop.stop() in the thread that runs the func_loop event loop. But as written, it invokes func_loop.stop() in the current thread and passes its return value (None) to call_soon_threadsafe as the function to invoke. This causes call_soon_threadsafe to complain that None is not callable. To fix the immediate problem, you should drop the extra parentheses and invoke the method as:
# correct
func_loop.call_soon_threadsafe(func_loop.stop)
However, the code is definitely over-complicated as written:
it doesn't make sense to create a new event loop when you are already inside an event loop
just_wait shouldn't be async def since it doesn't await anything, so it's clearly not async.
sys.exit takes an integer exit status, not a string. Also, it doesn't make much sense to attempt to print a backtrace after the call to sys.exit.
To run a non-async function from asyncio, just use run_in_executor with the function and pass it the non-async function as-is. You don't need an extra thread nor an extra event loop, run_in_executor will take care of the thread and connect it with your current event loop, effectively making the sync function awaitable. For example (untested):
async def do_stuff(func):
conn = await run_client()
print("SSH tunnel active")
loop = asyncio.get_event_loop()
ret = await loop.run_in_executor(None, func)
print(f"ret={ret}")
conn.close()
await conn.wait_closed()
print("wait done")
def just_wait():
# just_wait is a regular function; it can call blocking code,
# but it cannot await
print("starting just_wait")
input()
print("ending just_wait")
return 42
asyncio.get_event_loop().run_until_complete(do_stuff(just_wait))
If you need to await things in just_wait, you can make it async and use run_in_executor for the actual blocking code inside it:
async def do_stuff():
conn = await run_client()
print("SSH tunnel active")
loop = asyncio.get_event_loop()
ret = await just_wait()
print(f"ret={ret}")
conn.close()
await conn.wait_closed()
print("wait done")
async def just_wait():
# just_wait is an async function, it can await, but
# must invoke blocking code through run_in_executor
print("starting just_wait")
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, input)
print("ending just_wait")
return 42
asyncio.run(do_stuff())

Python async: Waiting for stdin input while doing other stuff

I'm trying to create a WebSocket command line client that waits for messages from a WebSocket server but waits for user input at the same time.
Regularly polling multiple online sources every second works fine on the server, (the one running at localhost:6789 in this example), but instead of using Python's normal sleep() method, it uses asyncio.sleep(), which makes sense because sleeping and asynchronously sleeping aren't the same thing, at least not under the hood.
Similarly, waiting for user input and asynchronously waiting for user input aren't the same thing, but I can't figure out how to asynchronously wait for user input in the same way that I can asynchronously wait for an arbitrary amount of seconds, so that the client can deal with incoming messages from the WebSocket server while simultaneously waiting for user input.
The comment below in the else-clause of monitor_cmd() hopefully explains what I'm getting at:
import asyncio
import json
import websockets
async def monitor_ws():
uri = 'ws://localhost:6789'
async with websockets.connect(uri) as websocket:
async for message in websocket:
print(json.dumps(json.loads(message), indent=2, sort_keys=True))
async def monitor_cmd():
while True:
sleep_instead = False
if sleep_instead:
await asyncio.sleep(1)
print('Sleeping works fine.')
else:
# Seems like I need the equivalent of:
# line = await asyncio.input('Is this your line? ')
line = input('Is this your line? ')
print(line)
try:
asyncio.get_event_loop().run_until_complete(asyncio.wait([
monitor_ws(),
monitor_cmd()
]))
except KeyboardInterrupt:
quit()
This code just waits for input indefinitely and does nothing else in the meantime, and I understand why. What I don't understand, is how to fix it. :)
Of course, if I'm thinking about this problem in the wrong way, I'd be very happy to learn how to remedy that as well.
You can use the aioconsole third-party package to interact with stdin in an asyncio-friendly manner:
line = await aioconsole.ainput('Is this your line? ')
Borrowing heavily from aioconsole, if you would rather avoid using an external library you could define your own async input function:
async def ainput(string: str) -> str:
await asyncio.get_event_loop().run_in_executor(
None, lambda s=string: sys.stdout.write(s+' '))
return await asyncio.get_event_loop().run_in_executor(
None, sys.stdin.readline)
Borrowing heavily from aioconsole, there are 2 ways to handle.
start a new daemon thread:
import sys
import asyncio
import threading
from concurrent.futures import Future
async def run_as_daemon(func, *args):
future = Future()
future.set_running_or_notify_cancel()
def daemon():
try:
result = func(*args)
except Exception as e:
future.set_exception(e)
else:
future.set_result(result)
threading.Thread(target=daemon, daemon=True).start()
return await asyncio.wrap_future(future)
async def main():
data = await run_as_daemon(sys.stdin.readline)
print(data)
if __name__ == "__main__":
asyncio.run(main())
use stream reader:
import sys
import asyncio
async def get_steam_reader(pipe) -> asyncio.StreamReader:
loop = asyncio.get_event_loop()
reader = asyncio.StreamReader(loop=loop)
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, pipe)
return reader
async def main():
reader = await get_steam_reader(sys.stdin)
data = await reader.readline()
print(data)
if __name__ == "__main__":
asyncio.run(main())

Handling ensure_future and its missing tasks

I have a streaming application that almost continuously takes the data given as input and sends an HTTP request using that value and does something with the returned value.
Obviously to speed things up I've used asyncio and aiohttp libraries in Python 3.7 to get the best performance, but it becomes hard to debug given how fast the data moves.
This is what my code looks like
'''
Gets the final requests
'''
async def apiRequest(info, url, session, reqType, post_data=''):
if reqType:
async with session.post(url, data = post_data) as response:
info['response'] = await response.text()
else:
async with session.get(url+post_data) as response:
info['response'] = await response.text()
logger.debug(info)
return info
'''
Loops through the batches and sends it for request
'''
async def main(data, listOfData):
tasks = []
async with ClientSession() as session:
for reqData in listOfData:
try:
task = asyncio.ensure_future(apiRequest(**reqData))
tasks.append(task)
except Exception as e:
print(e)
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
responses = await asyncio.gather(*tasks)
return responses #list of APIResponses
'''
Streams data in and prepares batches to send for requests
'''
async def Kconsumer(data, loop, batchsize=100):
consumer = AIOKafkaConsumer(**KafkaConfigs)
await consumer.start()
dataPoints = []
async for msg in consumer:
try:
sys.stdout.flush()
consumedMsg = loads(msg.value.decode('utf-8'))
if consumedMsg['tid']:
dataPoints.append(loads(msg.value.decode('utf-8')))
if len(dataPoints)==batchsize or time.time() - startTime>5:
'''
#1: The task below goes and sends HTTP GET requests in bulk using aiohttp
'''
task = asyncio.ensure_future(getRequests(data, dataPoints))
res = await asyncio.gather(*[task])
if task.done():
outputs = []
'''
#2: Does some ETL on the returned values
'''
ids = await asyncio.gather(*[doSomething(**{'tid':x['tid'],
'cid':x['cid'], 'tn':x['tn'],
'id':x['id'], 'ix':x['ix'],
'ac':x['ac'], 'output':to_dict(xmltodict.parse(x['response'],encoding='utf-8')),
'loop':loop, 'option':1}) for x in res[0]])
simplySaveDataIntoDataBase(id) # This is where I see some missing data in the database
dataPoints = []
except Exception as e:
logger.error(e)
logger.error(traceback.format_exc())
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
logger.error(str(exc_type) +' '+ str(fname) +' '+ str(exc_tb.tb_lineno))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
asyncio.ensure_future(Kconsumer(data, loop, batchsize=100))
loop.run_forever()
Does the ensure_future need to be awaited ?
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
Does the ensure_future need to be awaited ?
Yes, and your code is doing that already. await asyncio.gather(*tasks) awaits the provided tasks and returns their results in the same order.
Note that await asyncio.gather(*[task]) doesn't make sense, because it is equivalent to await asyncio.gather(task), which is again equivalent to await task. In other words, when you need the result of getRequests(data, dataPoints), you can write res = await getRequests(data, dataPoints) without the ceremony of first calling ensure_future() and then calling gather().
In fact, you almost never need to call ensure_future yourself:
if you need to await multiple tasks, you can pass coroutine objects directly to gather, e.g. gather(coroutine1(), coroutine2()).
if you need to spawn a background task, you can call asyncio.create_task(coroutine(...))
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
If you use gather, all requests must finish before any of them return. (That is not aiohttp policy, it's how gather works.) If you need to implement a timeout, you can use asyncio.wait_for or similar.

Store future state on application object

Is it ok to store the state of future directly on application object? Example below
import asyncio
async def background():
await asyncio.sleep(1)
print('Doing something useful in the background')
await asyncio.sleep(1)
#aiohttp_jinja2.template('loading.html')
async def loading(request):
app = request.app
task = getattr(app, 'task_obj', None)
if task is None:
task = asyncio.ensure_future(background())
callback = partial(done_refresh, app)
task.add_done_callback(callback)
app.task_obj = task
def done_refresh(app, future):
if hasattr(app, 'task_obj'):
# Nice! Task is done
del app.refreshing
exc = future.exception()
if exc is not None:
# Task has some exception
print('Failed to update: %s', exc)
Usually, I store some marker like in_progress in Redis and then check for that value from whatever function I want, but that way I lose Task object itself and will not be able to access useful data like exception info.
What is the common approach to handle such cases?
Your approach makes perfect sense, except that the task should be stored in the aiohttp app context, instead of being set as an attribute (app['task_obj'] = ... instead of app.task_obj = ...)
see also https://docs.aiohttp.org/en/stable/web_advanced.html#data-sharing-aka-no-singletons-please

Resources