This is probably the second time I've asked for help here so my apologies for little or extra detail / wording in my question *
The following code, though very basic is something I've written as an imitation (in its simplest form) of another piece of code written by an ex-employee at my firm. I am currently working on a project she was working on, and I do not understand how the following is executing without it being awaited or gathered.
In the original code, the 'wait_and_print' function is of course is an async function that that does a single RESTful web API call using aiohttp.ClientSession (using an async context manager) which returns nothing yet appends/extends a list with the response it gets.
So far, it has been 2 weeks since I've been using (and trying to understand) asyncio or asynchronous programming thus I am not very savvy with it. I've used Python, however for 3 years - on and off. I understand what task creation does and how asyncio.gather can run multiple API calls concurrently. BUT, this is something I do not get;
import asyncio
import time
L = []
async def wait_and_print(wait_time):
print(f"starting function {wait_time}")
for x in range(1, wait_time + 1):
print("Sleeping for {} time.".format(x))
await asyncio.sleep(1)
print(f"ending function {wait_time}")
L.append(wait_time)
async def main_loop():
tasks = [ asyncio.create_task(wait_and_print(x)) for x in [3,1,2]]
while len(tasks) != 0:
tasks = [t for t in tasks if not t.done()]
await asyncio.sleep(0) # HOW IS THIS MAKING IT WORK WITHOUT ACTUALLY AWAITING tasks?
print("Main loop ended!")
def final(func):
a = time.time()
asyncio.run(func())
b = time.time()
print(b-a, "seconds taken to run all!")
print(L)
final(main_loop)
Related
I have always resisted using asyncio within my code, but using it might help with some performance issues that I'm having.
Here is my scenario:
An end user provides a list of news sites to scrape
Each element is passed to an Article Class
A valid article is passed to an Extraction Class
The Extraction Class passes data to a NewsExtraction Class
90% this of the time this flow is flawless, but on an occasion one of the 12 functions in the NewsExtraction Class fails to extract data, which exist in the HTML provide. It seems that my code is "stepping on itself," which cause the data element not to be parsed. When I rerun the code all the elements are parsed correctly.
The NewsExtraction Class has this function get_article_data_elements, which is called from the Extraction Class.
The function get_article_data_elements call these items:
published_date = self._extract_article_published_date()
modified_date = self._extract_article_modified_date()
title = self._extract_article_title()
description = self._extract_article_description()
keywords = self._extract_article_key_words()
tags = self._extract_article_tags()
authors = self._extract_article_author()
top_image = self._extract_top_image()
language = self._extract_article_language()
categories = self._extract_article_category()
text = self._extract_textual_content()
url = self._extract_article_url()
Each of these data elements are used to populate a Python Dictionary, which is eventually passed back to the End User.
I have been trying to add asyncio code to the NewsExtraction Class, but I kept getting this error message:
RuntimeWarning: coroutine 'NewsExtraction.get_article_data_elements' was never awaited
I have spent the last 3 days trying to figure this issue out. I have looked at dozens of questions on Stack Overflow on this error RuntimeWarning: coroutine never awaited. I have also looked at numerous articles on using asyncio, but I cannot figure out how to use asyncio with my NewsExtraction Class, which is called from the Extraction Class.
Can someone provide me some pointers to solve my issue?
class NewsExtraction(object):
"""
This class is used to extract common data elements from a news article
on xyz
"""
def __init__(self, url, soup):
self._url = url
self._raw_soup = soup
truncated...
async def _extract_article_published_date(self):
"""
This function is designed to extract the publish date for the article being parsed.
:return: date article was originally published
:rtype: string
"""
json_date_published = JSONExtraction(self._url, self._raw_soup).extract_article_published_date()
if json_date_published is not None:
if len(json_date_published) != 0:
return json_date_published
else:
return None
elif json_date_published is None:
if self._raw_soup.find(name='div', attrs={'class': regex.compile("--publishDate")}):
date_published = self._raw_soup.find(name='div', attrs={'class': regex.compile("--publishDate")})
if len(date_published) != 0:
return date_published.text
else:
logger.info('The HTML tag to extract the publish date for the following article was not found.')
logger.info(f'Article URL -- {self._url}')
return None
truncated...
async def get_article_data_elements(self):
"""
This function is designed to extract all the common data elements from a
news article on xyz.
:return: dictionary of data elements related to the article
:rtype: dict
"""
article_data_elements = {}
# I have tried this:
published_date = self._extract_article_published_date().__await__()
# and this
published_date = self.task(self._extract_article_published_date())
await published_date
truncated...
I have also tried to use:
if __name__ == "__main__":
asyncio.run(NewsExtraction.get_article_data_elements())
# asyncio.run(self.get_article_data_elements())
I'm really banging my head on the wall with using asyncio in my news extraction code.
If this question is off base, I will be happy to delete it and keep reading about how to use asyncio correctly.
Can someone provide me some pointers to solve my issue?
Thanks in advance for any guidance on using asyncio
Your are defining _extract_article_published_date and get_article_data_elements as coroutines, and this coroutines must be await-ed in your code to get the result of their execution in an asynchronous way.
You can do this creating an instance of type NewsExtraction and calling this methods with the keyword await in front, this await pass the execution to other task in the loop until his awaited task completes its execution. Note that there are no threads or process involved in this task execution, the execution is passed only if it is no using cpu-time (await-ing I/O operations or sleeping).
if __name__ == '__main__':
extractor = NewsExtraction(...)
# this creates the event loop and runs the coroutine
asyncio.run(extractor.get_article_data_elements())
Inside your _extract_article_published_date you must also await your coroutines that perform requests over the network, if you are using some library for the scraping make sure that uses async/await behind the scenes to get a real performance while using asyncio.
async def get_article_data_elements(self):
article_data_elements = {}
# note here that the instance is self
published_date = await self._extract_article_published_date()
truncated...
You must dive into the asyncio documentation to get a better understanding of these features of Python 3.7+.
I'm noticing that when I spawn an asyncio task using create_task, it's first completing the rest of the logic rather than starting that task. I'm forced to add an await asyncio.sleep(0) to get the task started, which seems a bit hacky and unclean to me.
Here is some example code:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
Anytime I run the above, it runs the some_very_cpu_intensive_function function before the kicking off the expensive RPCs. The only way I've gotten this to work is to do:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
await asyncio.sleep(0)
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
This feels like a hack to me - I'm forcing the event loop to context switch, and doesn't feel like I'm using the asyncio framework correctly. Is there another way I should be approaching this?
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
Source: https://docs.python.org/3/library/asyncio-task.html
I am new to python and trying to learn asyncio module. I am frustrated on getting return values from async tasks. There is a post here talked about this topic, but it can't tell which value is returned by which task(assuming some one web page response faster than another).
The code below is trying to fetch three web pages concurrently instead of doing it one by one.
import asyncio
import aiohttp
async def fetch(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
assert resp.status == 200
return await resp.text()
def compile_all(urls):
tasks = []
for url in urls:
tasks.append(asyncio.ensure_future(fetch(url)))
return tasks
urls = ['https://python.org', 'https://google.com', 'https://amazon.com']
tasks = compile_all(urls)
loop = asyncio.get_event_loop()
a, b, c = loop.run_until_complete(asyncio.gather(*tasks))
loop.close()
print(a)
print(b)
print(c)
First, it hit Runtimeerror though it did print some html documents: RuntimeError: Event loop is closed.
Second, question is: does this really guarantee that a, b, c will be corresponded to the urls list in sequence of urls[0], url[1], urls[2] web page? (I assume that async tasks execution won't guarantee that).
Third, any other better means or Should I use Queue in this case? if yes, how?
Any help will be greatly appreciated.
The order of the results will correspond to the order of the urls. Take a look at the docs for asyncio.gather:
If all awaitables are completed successfully, the result is an
aggregate list of returned values. The order of result values
corresponds to the order of awaitables in aws.
To process tasks as they complete you can use asyncio.as_completed. This post has more information on how it can be used.
I'm trying to wrap an async function up so that I can use it without importing asyncio in certain files. The ultimate goal is to use asynchronous functions but being able to call them normally and get back the result.
How can I access the result from the callback function printing(task) and use it as the return of my make_task(x) function?
MWE:
#!/usr/bin/env python3.7
import asyncio
loop = asyncio.get_event_loop()
def make_task(x): # Can be used without asyncio
task = loop.create_task(my_async(x))
task.add_done_callback(printing)
# return to get the
def printing(task):
print('sleep done: %s' % task.done())
print('results: %s' % task.result())
return task.result() # How can i access this return?
async def my_async(x): # Handeling the actual async running
print('Starting my async')
res = await my_sleep(x)
return res # The value I want to ultimately use in the real callback
async def my_sleep(x):
print('starting sleep for %d' % x)
await asyncio.sleep(x)
return x**2
async def my_coro(*coro):
return await asyncio.gather(*coro)
val1 = make_task(4)
val2 = make_task(5)
loop.run_until_complete(my_coro(asyncio.sleep(6)))
print(val1)
print(val2)
If I understand correctly you want to use asynchronous functions but don't want to write async/await in top-level code.
If that's the case, I'm afraid it's not possible to achieve with asyncio. asyncio wants you to write async/await everywhere asynchronous stuff happens and this is intentional: forcing to explicitly mark places of possible context switch is a asyncio's way to fight concurrency-related problems (which is very hard to fight otherwise). Read this answer for more info.
If you still want to have asynchronous stuff and use it "as usual code" take a look at alternative solutions like gevent.
Instead of using a callback, you can make printing a coroutine and await the original coroutine, such as my_async. make_task can then create a task out of printing(my_async(...)), which will make the return value of printing available as the task result. In other words, to return a value out of printing, just - return it.
For example, if you define make_task and printing like this and leave the rest of the program unchanged:
def make_task(x):
task = loop.create_task(printing(my_async(x)))
return task
async def printing(coro):
coro_result = await coro
print('sleep done')
print('results: %s' % coro_result)
return coro_result
The resulting output is:
Starting my async
starting sleep for 4
Starting my async
starting sleep for 5
sleep done
results: 16
sleep done
results: 25
<Task finished coro=<printing() done, defined at result1.py:11> result=16>
<Task finished coro=<printing() done, defined at result1.py:11> result=25>
After perusing many docs on AsyncIO and articles I still could not find an answer to this : Run a function asynchronously (without using a thread) and also ensure the function calling this async function continues its execution.
Pseudo - code :
async def functionAsync(p):
#...
#perform intensive calculations
#...
print ("Async loop done")
def functionNormal():
p = ""
functionAsync(p)
return ("Main loop ended")
print ("Start Code")
print functionNormal()
Expected Output :
Start code
Main loop ended
Async loop done
Searched examples where loop.run_until_complete is used, but that will not return the print value of functionNormal() as it is blocking in nature.
asyncio can't run arbitrary code "in background" without using threads. As user4815162342 noted is asyncio you run event loop that blocks main thread and manages execution of coroutines.
If you want to use asyncio and take advantage of using it, you should rewrite all your functions that uses coroutines to be coroutines either up to main function - entry point of your program. This main coroutine is usually passed to run_until_complete. This little post reveals this topic in more detail.
Since you're interested in Flask, take a look Quart: it's a web framework that tries to implement Flask API (as much as it's possible) in terms of asyncio. Reason this project exists is because pure Flask isn't compatible with asyncio. Quart is written to be compatible.
If you want to stay with pure Flask, but have asynchronous stuff, take a look at gevent. Through monkey-patching it can make your code asynchronous. Although this solution has its own problems (which why asyncio was created).
Maybe it's a bit late, but I'm running into a similar situation and I solved it in Flask by using run_in_executor:
def work(p):
# intensive work being done in the background
def endpoint():
p = ""
loop = asyncio.get_event_loop()
loop.run_in_executor(None, work, p)
I'm not sure however how safe this is since the loop is not being closed.
Here is an implementation of a helper function which you can use like this:
result = synchronize_async_helper(some_async_function(parmater1,parameter2))
import asyncio
def synchronize_async_helper(to_await):
async_response = []
async def run_and_capture_result():
r = await to_await
async_response.append(r)
loop = asyncio.get_event_loop()
coroutine = run_and_capture_result()
loop.run_until_complete(coroutine)
return async_response[0]
Assuming the synchronous function is inside an asynchronous function, you can solve it using exceptions.
Pseudo code:
class CustomError(Exception):
pass
async def main():
def test_syn():
time.sleep(2)
# Start Async
raise CustomError
try:
test_syn()
except CustomError:
await asyncio.sleep(2)