asyncio.wait not returning on first exception - python-3.x

I have an AMQP publisher class with the following methods. on_response is the callback that is called when a consumer sends back a message to the RPC queue I setup. I.e. the self.callback_queue.name you see in the reply_to of the Message. publish publishes out to a direct exchange with a routing key that has multiple consumers (very similar to a fanout), and multiple responses come back. I create a number of futures equal to the number of responses I expect, and asyncio.wait for those futures to complete. As I get responses back on the queue and consume them, I set the result to the futures.
async def on_response(self, message: IncomingMessage):
if message.correlation_id is None:
logger.error(f"Bad message {message!r}")
await message.ack()
return
body = message.body.decode('UTF-8')
future = self.futures[message.correlation_id].pop()
if hasattr(body, 'error'):
future.set_execption(body)
else:
future.set_result(body)
await message.ack()
async def publish(self, routing_key, expected_response_count, msg, timeout=None, return_partial=False):
if not self.connected:
logger.info("Publisher not connected. Waiting to connect first.")
await self.connect()
correlation_id = str(uuid.uuid4())
futures = [self.loop.create_future() for _ in range(expected_response_count)]
self.futures[correlation_id] = futures
await self.exchange.publish(
Message(
str(msg).encode(),
content_type="text/plain",
correlation_id=correlation_id,
reply_to=self.callback_queue.name,
),
routing_key=routing_key,
)
done, pending = await asyncio.wait(futures, timeout=timeout, return_when=asyncio.FIRST_EXCEPTION)
if not return_partial and pending:
raise asyncio.TimeoutError(f'Failed to return all results for publish to {routing_key}')
for f in pending:
f.cancel()
del self.futures[correlation_id]
results = []
for future in done:
try:
results.append(json.loads(future.result()))
except json.decoder.JSONDecodeError as e:
logger.error(f'Client did not return JSON!! {e!r}')
logger.info(future.result())
return results
My goal is to either wait until all futures are finished, or a timeout occurs. This is all working nicely at the moment. What doesn't work, is when I added return_when=asyncio.FIRST_EXCEPTION, the asyncio.wait.. does not finish after the first call of future.set_exception(...) as I thought it would.
What do I need to do with the future so that when I get a response back and see that an error occurred on the consumer side (before the timeout, or even other responses) the await asyncio.wait will no longer be blocking. I was looking at the documentation and it says:
The function will return when any future finishes by raising an exception
when return_when=asyncio.FIRST_EXCEPTION. My first thought is that I'm not raising an exception in my future correctly, only, I'm having trouble finding out exactly how I should do that then. From the API documentation for the Future class, it looks like I'm doing the right thing.

When I created a minimum viable example, I realized I was actually doing things MOSTLY right after all, and I glanced over other errors causing this not to work. Here is my minimum example:
The most important change I had to do was actually pass in an Exception object.. (subclass of BaseException) do the set_exception method.
import asyncio
async def set_after(future, t, body, raise_exception):
await asyncio.sleep(t)
if raise_exception:
future.set_exception(Exception("problem"))
else:
future.set_result(body)
print(body)
async def main():
loop = asyncio.get_event_loop()
futures = [loop.create_future() for _ in range(2)]
asyncio.create_task(set_after(futures[0], 3, 'hello', raise_exception=True))
asyncio.create_task(set_after(futures[1], 7, 'world', raise_exception=False))
print(futures)
done, pending = await asyncio.wait(futures, timeout=10, return_when=asyncio.FIRST_EXCEPTION)
print(done)
print(pending)
asyncio.run(main())
In this line of code if hasattr(body, 'error'):, body was a string. I thought it was JSON at that point already. Should have been using "error" in body as my condition in any case. whoops!

Related

asyncio wait - process results as they come

This script should take a list of initial tasks (URLs) and asynchronously make requests with aiohttp. And this part is done correctly. The problem is, since asyncio wait doesn't return actual results but only done/pending task set, I cant figure out where and how to process the results as they come, to make more requests and write data to DB. In this variant I placed the creation for a new task (make more requests...) inside the first one, which doesn't work.
PS. I am using wait because a book I am reading suggests using wait for more control over done and pending tasks and exceptions. Appreciate any help:)
async def fetch_content_2(session, url):
async with session.get(url) as result:
res = await result.text()
try:
new_link = BeautifulSoup(res, 'lxml').select_one('element on website 2')['href'])
# ***PROCESS AND WRITE SOME DATA TO DB***
except:
pass
async def fetch_content_1(session, url):
async with session.get(url) as result:
res = await result.text()
try:
link = BeautifulSoup(res, 'lxml').select_one('element on website 1')['href'])
# ***MAKE ANOTHER ASYNC REQUEST WITH NEW LINK***
asyncio.create_task(fetch_content_1(session,link))
except:
pass
async def main(tasks):
async with ClientSession() as session:
pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
# print(f'Done count: {len(done)}')
# print(f'Pending count: {len(pending)}')
asyncio.run(main([url1, url2, ...]))
done and pending are sets of asyncio.Task objects. If you want to get the result of the task or its state you must get the values of the sets and call the method you need, check the (docs). Specifically you can get the result invoking the result method.
async def main(tasks):
async with ClientSession() as session:
pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
res = done.pop().result()
# do some stuff with the result
Check the documentation to see the possible exceptions of call the result method and related methods. A exception may occur if the task had an internal error or the result is not ready (in this case shouldn't happen).

python asycio RuntimeWarning: coroutine was never awaited

I am trying to send many requests to a url (~50) concurrently.
from asyncio import Queue
import yaml
import asyncio
from aiohttp import ClientSession, TCPConnector
async def http_get(url, cookie):
cookie = cookie.split('; ')
cookie1 = cookie[0].split('=')
cookie2 = cookie[1].split('=')
cookies = {
cookie1[0]: cookie1[1],
cookie2[0]: cookie2[1]
}
async with ClientSession(cookies=cookies) as session:
async with session.get(url, ssl=False) as response:
return await response.json()
class FetchUtil:
def __init__(self):
self.config = yaml.safe_load(open('../config.yaml'))
def fetch(self):
asyncio.run(self.extract_objects())
async def http_get_objects(self, object_type, limit, offset):
path = '/path' + \
'?query=&filter=%s&limit=%s&offset=%s' % (
object_type,
limit,
offset)
return await self.http_get_domain(path)
async def http_get_objects_limit(self, object_type, offset):
result = await self.http_get_objects(
object_type,
self.config['object_limit'],
offset
)
return result['result']
async def http_get_domain(self, path):
return await http_get(
f'https://{self.config["domain"]}{path}',
self.config['cookie']
)
async def call_objects(self, object_type, offset):
result = await self.http_get_objects_limit(
object_type,
offset
)
return result
async def extract_objects(self):
calls = []
object_count = (await self.http_get_objects(
'PV', '1', '0'))['result']['count']
for i in range(0, object_count, self.config['object_limit']):
calls.append(self.call_objects('PV', str(i)))
queue = Queue()
for i in range(0, len(calls), self.config['call_limit']):
results = await asyncio.gather(*calls[i:self.config['call_limit']])
await queue.put(results)
After running this code using fetch as the entry point i get the following error message:
/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/events.py:88: RuntimeWarning: coroutine 'FetchUtil.call_objects' was never awaited
self._context.run(self._callback, *self._args)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
The program that stops executing after asyncio.gather returns for the first time. I am having trouble understanding this message since I thought that I diligently made sure all functions were async tasks. The only function i didn't await was call_objects since i wanted it to run concurrently.
https://xinhuang.github.io/posts/2017-07-31-common-mistakes-using-python3-asyncio.html#org630d301
in this article gives the following explanation:
This runtime warning can happen in many scenarios, but the cause are
same: A coroutine object is created by the invocation of an async
function, but is never inserted into an EventLoop.
I believed that was what i was doing when i called the async tasks with asyncio.gather.
I should note that when i put a print('url') inside http_get it outputs the first 50 urls like i want, the problem seems to occur when asyncio.gather returns for the first time.
The posted code has a logic error: [i:self.config['call_limit']] should be [i:i + self.config['call_limit']].
It causes the error because the expression evaluates to an entry slice in iterations after the first one, causing some of the calls coroutines to never get passed to gather, and therefore never awaited)
i don't actually understand why it didn't just keep executing the same requests many time instead of stopping with an error.
Because your i would get incremented, making it larger than call_limit in every loop iteration except the first. For example, assuming call_limit is 10, i will be 0 in the first iteration, and you'll await calls[0:10], so far so good. But in the next iteration, i will be 10, and you'll be awaiting calls[10:10], an empty slice. In the iteration after that you'll await calls[20:10] (also an empty slice, although logically it should be an error), then calls[30:10], again empty, and so on. Only the first iteration picks up actual members of the list.

Python async CancelledError() with no details

The following code fails and I'm not able to get the actual error, I just get numerous CancelledError messages
import aiobotocore, asyncio
async def replicate_to_region(chunks, region):
session = aiobotocore.get_session()
client = session.create_client('dynamodb', region_name=region)
start = 0
while True:
chunk = chunks[start]
item = {'my_table': chunk}
response = await client.batch_write_item(RequestItems=item)
async def main():
asyncio.gather(*(replicate_to_region(payloads, region) for region in regions))
asyncio.run(main())
I get the following errors;
client_session: <aiohttp.client.ClientSession object at 0x7f6fb65a34a8>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f6fb64c82b0>
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
I've tried quite a number of variations of the replicate_to_region function but they all fail with the same error above. It would be useful just to be able to see what the actual error is.
async def main():
asyncio.gather(...)
asyncio.gather() is an awaitable itself:
awaitable asyncio.gather(*aws, loop=None, return_exceptions=False)
It means you should use await when deal with it:
async def main():
await asyncio.gather(*(replicate_to_region(payloads, region) for region in regions))
off-topic:
I didn't work with aiobotocore and not sure if it's important, but it's better to do as documentation says. In particular you should probably use async with when creating a client as example shows.

Handling ensure_future and its missing tasks

I have a streaming application that almost continuously takes the data given as input and sends an HTTP request using that value and does something with the returned value.
Obviously to speed things up I've used asyncio and aiohttp libraries in Python 3.7 to get the best performance, but it becomes hard to debug given how fast the data moves.
This is what my code looks like
'''
Gets the final requests
'''
async def apiRequest(info, url, session, reqType, post_data=''):
if reqType:
async with session.post(url, data = post_data) as response:
info['response'] = await response.text()
else:
async with session.get(url+post_data) as response:
info['response'] = await response.text()
logger.debug(info)
return info
'''
Loops through the batches and sends it for request
'''
async def main(data, listOfData):
tasks = []
async with ClientSession() as session:
for reqData in listOfData:
try:
task = asyncio.ensure_future(apiRequest(**reqData))
tasks.append(task)
except Exception as e:
print(e)
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
responses = await asyncio.gather(*tasks)
return responses #list of APIResponses
'''
Streams data in and prepares batches to send for requests
'''
async def Kconsumer(data, loop, batchsize=100):
consumer = AIOKafkaConsumer(**KafkaConfigs)
await consumer.start()
dataPoints = []
async for msg in consumer:
try:
sys.stdout.flush()
consumedMsg = loads(msg.value.decode('utf-8'))
if consumedMsg['tid']:
dataPoints.append(loads(msg.value.decode('utf-8')))
if len(dataPoints)==batchsize or time.time() - startTime>5:
'''
#1: The task below goes and sends HTTP GET requests in bulk using aiohttp
'''
task = asyncio.ensure_future(getRequests(data, dataPoints))
res = await asyncio.gather(*[task])
if task.done():
outputs = []
'''
#2: Does some ETL on the returned values
'''
ids = await asyncio.gather(*[doSomething(**{'tid':x['tid'],
'cid':x['cid'], 'tn':x['tn'],
'id':x['id'], 'ix':x['ix'],
'ac':x['ac'], 'output':to_dict(xmltodict.parse(x['response'],encoding='utf-8')),
'loop':loop, 'option':1}) for x in res[0]])
simplySaveDataIntoDataBase(id) # This is where I see some missing data in the database
dataPoints = []
except Exception as e:
logger.error(e)
logger.error(traceback.format_exc())
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
logger.error(str(exc_type) +' '+ str(fname) +' '+ str(exc_tb.tb_lineno))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
asyncio.ensure_future(Kconsumer(data, loop, batchsize=100))
loop.run_forever()
Does the ensure_future need to be awaited ?
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
Does the ensure_future need to be awaited ?
Yes, and your code is doing that already. await asyncio.gather(*tasks) awaits the provided tasks and returns their results in the same order.
Note that await asyncio.gather(*[task]) doesn't make sense, because it is equivalent to await asyncio.gather(task), which is again equivalent to await task. In other words, when you need the result of getRequests(data, dataPoints), you can write res = await getRequests(data, dataPoints) without the ceremony of first calling ensure_future() and then calling gather().
In fact, you almost never need to call ensure_future yourself:
if you need to await multiple tasks, you can pass coroutine objects directly to gather, e.g. gather(coroutine1(), coroutine2()).
if you need to spawn a background task, you can call asyncio.create_task(coroutine(...))
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
If you use gather, all requests must finish before any of them return. (That is not aiohttp policy, it's how gather works.) If you need to implement a timeout, you can use asyncio.wait_for or similar.

Failure on unit tests with pytest, tornado and aiopg, any query fail

I've a REST API running on Python 3.7 + Tornado 5, with postgresql as database, using aiopg with SQLAlchemy core (via the aiopg.sa binding). For the unit tests, I use py.test with pytest-tornado.
All the tests go ok as soon as no query to the database is involved, where I'd get this:
Runtime error: Task cb=[IOLoop.add_future..() at venv/lib/python3.7/site-packages/tornado/ioloop.py:719]> got Future attached to a different loop
The same code works fine out of the tests, I'm capable of handling 100s of requests so far.
This is part of an #auth decorator which will check the Authorization header for a JWT token, decode it and get the user's data and attach it to the request; this is the part for the query:
partner_id = payload['partner_id']
provided_scopes = payload.get("scope", [])
for scope in scopes:
if scope not in provided_scopes:
logger.error(
'Authentication failed, scopes are not compliant - '
'required: {} - '
'provided: {}'.format(scopes, provided_scopes)
)
raise ForbiddenException(
"insufficient permissions or wrong user."
)
db = self.settings['db']
partner = await Partner.get(db, username=partner_id)
# The user is authenticated at this stage, let's add
# the user info to the request so it can be used
if not partner:
raise UnauthorizedException('Unknown user from token')
p = Partner(**partner)
setattr(self.request, "partner_id", p.uuid)
setattr(self.request, "partner", p)
The .get() async method from Partner comes from the Base class for all models in the app. This is the .get method implementation:
#classmethod
async def get(cls, db, order=None, limit=None, offset=None, **kwargs):
"""
Get one instance that will match the criteria
:param db:
:param order:
:param limit:
:param offset:
:param kwargs:
:return:
"""
if len(kwargs) == 0:
return None
if not hasattr(cls, '__tablename__'):
raise InvalidModelException()
tbl = cls.__table__
instance = None
clause = cls.get_clause(**kwargs)
query = (tbl.select().where(text(clause)))
if order:
query = query.order_by(text(order))
if limit:
query = query.limit(limit)
if offset:
query = query.offset(offset)
logger.info(f'GET query executing:\n{query}')
try:
async with db.acquire() as conn:
async with conn.execute(query) as rows:
instance = await rows.first()
except DataError as de:
[...]
return instance
The .get() method above will either return a model instance (row representation) or None.
It uses the db.acquire() context manager, as described in aiopg's doc here: https://aiopg.readthedocs.io/en/stable/sa.html.
As described in this same doc, the sa.create_engine() method returns a connection pool, so the db.acquire() just uses one connection from the pool. I'm sharing this pool to every request in Tornado, they use it to perform the queries when they need it.
So this is the fixture I've set up in my conftest.py:
#pytest.fixture
async def db():
dbe = await setup_db()
return dbe
#pytest.fixture
def app(db, event_loop):
"""
Returns a valid testing Tornado Application instance.
:return:
"""
app = make_app(db)
settings.JWT_SECRET = 'its_secret_one'
return app
I can't find an explanation of why this is happening; Tornado's doc and source makes it clear that asyncIO event loop is used as default, and by debugging it I can see the event loop is indeed the same one, but for some reason it seems to get closed or stopped abruptly.
This is one test that fails:
#pytest.mark.gen_test(timeout=2)
def test_score_returns_204_empty(app, http_server, http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = yield http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
This test fails as it returns 401 instead of 204, given the query on the auth decorator fails due to the RuntimeError, which returns then an Unauthorized response.
Any idea from the async experts here will be very appreciated, I'm quite lost on this!!!
Well, after a lot of digging, testing and, of course, learning quite a lot about asyncio, I made it work myself. Thanks for the suggestions so far.
The issue was that the event_loop from asyncio was not running; as #hoefling mentioned, pytest itself does not support coroutines, but pytest-asyncio brings such a useful feature to your tests. This is very well explained here: https://medium.com/ideas-at-igenius/testing-asyncio-python-code-with-pytest-a2f3628f82bc
So, without pytest-asyncio, your async code that needs to be tested will look like this:
def test_this_is_an_async_test():
loop = asyncio.get_event_loop()
result = loop.run_until_complete(my_async_function(param1, param2, param3)
assert result == 'expected'
We use loop.run_until_complete() as, otherwise, the loop will never be running, as this is the way asyncio works by default (and pytest makes nothing to make it work differently).
With pytest-asyncio, your test works with the well-known async / await parts:
async def test_this_is_an_async_test(event_loop):
result = await my_async_function(param1, param2, param3)
assert result == 'expected'
pytest-asyncio in this case wraps the run_until_complete() call above, summarizing it heavily, so the event loop will run and be available for your async code to use it.
Please note: the event_loop parameter in the second case is not even necessary here, pytest-asyncio gives one available for your test.
On the other hand, when you are testing your Tornado app, you usually need to get a http server up and running during your tests, listening in a well-known port, etc., so the usual way goes by writing fixtures to get a http server, base_url (usually http://localhost:, with an unused port, etc etc).
pytest-tornado comes up as a very useful one, as it offers several of these fixtures for you: http_server, http_client, unused_port, base_url, etc.
Also to mention, it gets a pytest mark's gen_test() feature, which converts any standard test to use coroutines via yield, and even to assert it will run with a given timeout, like this:
#pytest.mark.gen_test(timeout=3)
def test_fetch_my_data(http_client, base_url):
result = yield http_client.fetch('/'.join([base_url, 'result']))
assert len(result) == 1000
But, this way it does not support async / await, and actually only Tornado's ioloop will be available via the io_loop fixture (although Tornado's ioloop uses by default asyncio underneath from Tornado 5.0), so you'd need to combine both pytest.mark.gen_test and pytest.mark.asyncio, but in the right order! (which I did fail).
Once I understood better what could be the problem, this was the next approach:
#pytest.mark.gen_test(timeout=2)
#pytest.mark.asyncio
async def test_score_returns_204_empty(http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = await http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
But this is utterly wrong, if you understand how Python's decorator wrappers work. With the code above, pytest-asyncio's coroutine is then wrapped in a pytest-tornado yield gen.coroutine, which won't get the event-loop running... so my tests were still failing with the same problem. Any query to the database were returning a Future waiting for an event loop to be running.
My updated code once I made myself up of the silly mistake:
#pytest.mark.asyncio
#pytest.mark.gen_test(timeout=2)
async def test_score_returns_204_empty(http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = await http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
In this case, the gen.coroutine is wrapped inside the pytest-asyncio coroutine, and the event_loop runs the coroutines as expected!
But there were still a minor issue that took me a little while to realize, too; pytest-asyncio's event_loop fixture creates for every test a new event loop, while pytest-tornado creates too a new IOloop. And the tests were still failing, but this time with a different error.
The conftest.py file now looks like this; please note I've re-declared the event_loop fixture to use the event_loop from pytest-tornado io_loop fixture itself (please recall pytest-tornado creates a new io_loop on each test function):
#pytest.fixture(scope='function')
def event_loop(io_loop):
loop = io_loop.current().asyncio_loop
yield loop
loop.stop()
#pytest.fixture(scope='function')
async def db():
dbe = await setup_db()
yield dbe
#pytest.fixture
def app(db):
"""
Returns a valid testing Tornado Application instance.
:return:
"""
app = make_app(db)
settings.JWT_SECRET = 'its_secret_one'
yield app
Now all my tests work, I'm back a happy man and very proud of my now better understanding of the asyncio way of life. Cool!

Resources