Shall I use asyncio.create_task() before calling asyncio.gather()? - python-3.x

Was wondering if there is any benefit of directly calling asyncio.gather(*coros) rather than starting the tasks with asyncio.create_task() and then calling asyncio.gather(*tasks).
So I did this test (please tell me if you notice any bias):
import timit
test1 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [asyncio.create_task(sleep()) for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
test2 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [sleep() for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
print(timeit.repeat(stmt=test1, setup="import asyncio", repeat=5, number=10000))
print(timeit.repeat(stmt=test2, setup="import asyncio", repeat=5, number=10000))
Here's the result:
>TEST 1 : [123.09070299999999, 118.88883120000001, 120.92030820000002, 121.22180739999999, 116.49616249999997]
>TEST 2 : [109.63426249999998, 108.96809150000001, 110.66497140000001, 105.34163260000003, 105.78473080000003]
Seems like there is no overhead when gather() has to create the tasks - it's even faster (although ensure_future() is called internally, if I understand well).
Any thoughts on this? Shall I follow the pattern used for test 2 rather than the one used for test 1? The Zen does not help much there, but as it outlines, "There should be one-- and preferably only one --obvious way to do it".

Related

how to run two async methods simultaneously?

i'm ashamed to admit i've been using python's asyncio for a long time without really understanding how it works and now i'm in a pickle. in pseudo code, my current program is like this:
async def api_function1(parameters):
result = await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
result = await asyncio.gather(*[some_other_thing2(p) for p in parameters])
def a(initial_parameters):
output = []
data = asyncio.run(api_function1(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function1(get_parameters_from_data(data)))
output.append(data)
if some _condition is True:
break
return output
def b(initial_parameters):
output = []
data = asyncio.run(api_function2(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function2(get_parameters_from_data(data)))
output.append(data)
if some condition is True:
break
return output
a() and b() get data from two different rest api endpoints, each with its own rate limits and nuances. i want to run a() and b() simultaneously.
what's the best/easiest way of structuring the program so a() and b() can run simultaneously?
I tried making a() and b() both async methods and tried to await them simultaneously, i.e. something like
async a(initial_parameters):
...
async b(initial_parameters):
...
A = await a(initial_parameters)
B = await b(initial_parameters)
but it didn't work, so based on the docs, I'm guessing maybe i need to manually get the event_loop and pass it as an argument to a() and b() which would pass them to api_function2() and api_function2(), and then close it manually when both tasks are donw but not really sure if i'm on the right track or how to do it.
Also open to better design pattern for this if you have one in mind
There is no reason why you can't nest calls to asyncio.gather. If you want to run a() and b() simultaneously, you must make both of them coroutines. And you can't use asyncio.run() inside either one of them, since that is a blocking call - it doesn't return until its argument has completed. You need to replace all the calls to asyncio.run() in a() and b() with await expressions. You will end up with something that looks like this:
async def api_function1(parameters):
return await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
return await asyncio.gather(*[some_other_thing2(p) for p in parameters])
async def a(initial_parameters):
output = []
data = await api_function1(initial_parameters)
output.append(data)
while True:
data = await api_function1(get_parameters_from_data(data))
output.append(data)
if some _condition is True:
break
return output
async def b(initial_parameters):
output = []
data = await api_function2(initial_parameters)
output.append(data)
while True:
data = await api_function2(get_parameters_from_data(data))
output.append(data)
if some condition is True:
break
return output
async def main():
a_data, b_data = asyncio.gather(a(initial_parameters), b(initial_parameters))
async def main():
task_a = asyncio.create_task(a(initial_parameters))
task_b = asyncio.create_task(b(initial_parameters))
a_data = await task_a
b_data = await task_b
asyncio.run(main())
This is still pseudocode.
I have given two possible ways of writing main(), one using asyncio.gather and the other using two calls to asyncio.create_task. Both versions create two tasks that run simultaneously, but the latter version doesn't require you to collect all the tasks in one place and start them all at the same time, as gather does. If gather works for your requirements, as it does here, it is more convenient.
Finally, a call to asyncio.run starts the program. The docs recommend having only one call to asyncio.run per program.
The two api functions should return something instead of setting a local variable.
In asyncio the crucial concept is the Task. It is Tasks that cooperate to provide simultaneous execution. Asyncio.gather actually creates Tasks under the hood, even though you typically pass it a list of coroutines. That's how it runs things in parallel.

Asyncio with two loops, best practice

I have two infinite loops. Their processing is lightweight. I don't want them to block each other. Is using await asyncio.sleep(0) a good practice?
This is my code
import asyncio
async def loop1():
while True:
print("loop1")
# pull data from kafka
await asyncio.sleep(0)
async def loop2():
while True:
print("loop2")
# send data to all clients using asyncio stream api
await asyncio.sleep(0)
async def main():
await asyncio.gather(loop1(), loop2())
asyncio.run(main())
Two (many more) asyncio tasks will not block each other until one of tasks have some long sync operation inside.
Both of your tasks have only network operations inside (Kafka and API requests), so none of them will block another task.
When should you use asyncio.sleep(0)?
Imagine you have some long sync operation - calculations. Calculations is not I/O operation.
This example is more like good to know, if you have such operations in real app, you have to move them in loop.run_in_executor and use concurrent.futures.ProcessPoolExecutor as executor. The example:
import asyncio
async def long_calc():
"""
Some Heavy CPU bound task.
Better make it sync function and move to ProcessPoolExecutor
"""
s = 0
for _ in range(100):
for i in range(1_000_000):
s += i**2
# comment the line and watch result
# you'll get no working messages
# that's why I use sleep(0.0) here
await asyncio.sleep(0.0)
return s
async def pinger():
"""Task which shows that app is alive"""
n = 0
while True:
await asyncio.sleep(1)
print(f"Working {n}")
n += 1
async def amain():
"""Main async function in this app"""
# run in asyncio.create_task since we want the task
# to run in parallel with long_calc +
# we do not want to wait till it will be finished
# If it were thread it would be called daemon thread
asyncio.create_task(pinger())
# await results of long task
s = await long_calc()
print(f"Done: {s}")
if __name__ == '__main__':
asyncio.run(amain())
If you need me to provide you with run_in_executor example - let me know.

Is there a way to await on an asyncio.Task.result from sync code

I've a situation like below,
event_loop = asyncio.new_event_loop()
async def second_async():
# some async job
print("I'm here")
return 123
def sync():
return asyncio.run_coroutine_threadsafe(second_async(), loop=event_loop).result()
async def first_async():
sync()
event_loop.run_until_complete(first_async())
I call the sync function from a different thread(where the event_loop is not running), it works fine. The problem is if I run the event_loop.run_complete... line, the .result() call on the Task returned by run_coroutine_threadsafe blocks the execution of the loop, which makes sense. To avoid this, I tried changing this as follows,
event_loop = asyncio.new_event_loop()
async def second_async():
# some async job
print("I'm here")
return 123
def sync():
# if event_loop is running on current thread
res = loop.create_task(second_async()).result()
# else
res = asyncio.run_coroutine_threadsafe(second_async(), loop=event_loop).result()
# Additional processing on res
# Need to evaluate the result of task right here in sync.
return res
async def first_async():
sync()
event_loop.run_until_complete(first_async())
This works fine, but the .result() call on the Task object returned by create_task always raises an InvalidStateError. The set_result is never called on the Task object.
Basically, I want the flow to be as such
(async code) -> sync code (a non blocking call ->) async code
I know this is a bad way of doing things, but I'm integrating stuff, so I don't really have an option.
Here is a little single-threaded program that illustrates the problem.
If you un-comment the line asyncio.run(first_async1()), you see the same error as you're seeing, and for the same reason. You're trying to access the result of a task without awaiting it first.
import asyncio
event_loop = asyncio.new_event_loop()
async def second_async():
# some async job
print("I'm here")
return 123
def sync1():
return asyncio.create_task(second_async()).result()
async def first_async1():
print(sync1())
def sync2():
return asyncio.create_task(second_async())
async def first_async2():
print(await sync2())
# This prints I'm here,
# the raises invalid state error:
# asyncio.run(first_async1())
# This works, prints "I'm here" and "123"
asyncio.run(first_async2())
With that line commented out again, the second version of the program (first_async2) runs just fine. The only difference is that the ordinary function, sync2, returns an awaitable instead of a result. The await is done in the async function that called it.
I don't see why this is a bad practice. To me, it seems like there are situations where it's absolutely necessary.
Another approach is to create a second daemon thread and set up an event loop there. Coroutines can be executed in this second thread with asyncio.run_coroutine_threadsafe, which returns a concurrent.futures.Future. Its result method will block until the Future's value is set by the other thread.
#! python3.8
import asyncio
import threading
def a_second_thread(loop):
asyncio.set_event_loop(loop)
loop.run_forever()
loop2 = asyncio.new_event_loop()
threading.Thread(target=a_second_thread, args=(loop2,), daemon=True).start()
async def second_async():
# some async job
print("I'm here")
for _ in range(4):
await asyncio.sleep(0.25)
print("I'm done")
return 123
def sync1():
# Run the coroutine in the second thread -> get a concurrent.futures.Future
fut = asyncio.run_coroutine_threadsafe(second_async(), loop2)
return fut.result()
async def first_async1():
print(sync1())
def sync2():
return asyncio.create_task(second_async())
async def first_async2():
print(await sync2())
# This works, prints "I'm here", "I'm done", and "123"
asyncio.run(first_async1())
# This works, prints "I'm here", "I'm done", and "123"
asyncio.run(first_async2())
Of course this will still block the event loop in the main thread until fut.result() returns. There is no avoiding that. But the program runs.

Python 3 with asyncio tasks not executed when returned in a generator function

This first example does not work, I try to emit an async task but don't care about the response in that situation, output is empty:
from typing import Iterable
import asyncio
async def example_task():
print('example_task')
def emit() -> Iterable:
event_loop = asyncio.get_event_loop()
yield event_loop.create_task(example_task())
async def main():
emit()
await asyncio.sleep(0.5) # wait some time so the task can run
asyncio.run(main())
When I add next(emit()) to actually "read" the yielded task the output works and also in the next example it works when I put all the task into a list first:
from typing import Iterable
import asyncio
async def example_task():
print('example_task')
def emit() -> Iterable:
event_loop = asyncio.get_event_loop()
return iter([event_loop.create_task(example_task())])
async def main():
emit()
await asyncio.sleep(0.5) # wait some time so the task can run
asyncio.run(main())
This is just a simple example, the final version should be able to emit an "event" and run 1..n async tasks that can return a value but don't need to. The caller of emit should be able to decide if he awaits the result at some point or just ignore it like in the examples.
Is there any way I can do this with a generator / yield or is the only possible way to store all the tasks in a list and return an iterator after that?
The issue is that you are returning a generator with the first example where the second example has the task object which needs to be executed.
The modified version of your first example would be something like
async def main():
next(emit())
await asyncio.sleep(0.5) # wait some time so the task can run
or
async def main():
for task in emit():
await task
await asyncio.sleep(0.5) # wait some time so the task can run
Hope this explains the difference between using a generator and a iterator while creating your tasks.

Why don't two sequential coroutines (async functions) execute in parallel?

So, I'm just trying to wrap my head around async programming (in particular the Tornado framework), and thought I'd start with the basics: calling "awaiting" on two coroutines:
from tornado.ioloop import IOLoop
from tornado.web import Application, url, RequestHandler
from tornado.gen import sleep
class TestHandler(RequestHandler):
async def get(self):
f1 = await self.test("f1")
f2 = await self.test("f2")
self.write(f1 + " " + f2)
async def test(self, msg):
for i in range(5):
print(i)
await sleep(1) # this is tornado's async sleep
return msg
app = Application([url(r'/', TestHandler)], debug=True)
app.listen(8080)
ioloop = IOLoop.current()
ioloop.start()
The issue, however, is that when I hit localhost:8080 in my browser, and stare at my python console, I don't see two interwoven sequences of 0 1 2 3 4, but two sequential sequences...
I've read the Tornado FAQ over-and-over again and can't seem to understand what I'm doing wrong.
This runs f1, waits for it to finish, then runs f2:
f1 = await self.test("f1")
f2 = await self.test("f2")
To run things in parallel, you can't await the first one before starting the second. The simplest way to do this is to do them both in one await:
f1, f2 = await tornado.gen.multi(self.test("f1"), self.test("f2"))
Or in advanced cases, you can start f1 without waiting for it then come back to wait for it later:
f1_future = tornado.gen.convert_yielded(self.test("f1"))
f2_future = tornado.gen.convert_yielded(self.test("f2"))
f1 = await f1_future
f2 = await f2_future
Parallelism - official example
The multi function accepts lists and dicts whose values are Futures, and waits for all of those Futures in parallel:
from tornado.gen import multi
async def parallel_fetch(url1, url2):
resp1, resp2 = await multi([http_client.fetch(url1),
http_client.fetch(url2)])
async def parallel_fetch_many(urls):
responses = await multi ([http_client.fetch(url) for url in urls])
# responses is a list of HTTPResponses in the same order
async def parallel_fetch_dict(urls):
responses = await multi({url: http_client.fetch(url)
for url in urls})
# responses is a dict {url: HTTPResponse}

Resources