i'm ashamed to admit i've been using python's asyncio for a long time without really understanding how it works and now i'm in a pickle. in pseudo code, my current program is like this:
async def api_function1(parameters):
result = await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
result = await asyncio.gather(*[some_other_thing2(p) for p in parameters])
def a(initial_parameters):
output = []
data = asyncio.run(api_function1(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function1(get_parameters_from_data(data)))
output.append(data)
if some _condition is True:
break
return output
def b(initial_parameters):
output = []
data = asyncio.run(api_function2(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function2(get_parameters_from_data(data)))
output.append(data)
if some condition is True:
break
return output
a() and b() get data from two different rest api endpoints, each with its own rate limits and nuances. i want to run a() and b() simultaneously.
what's the best/easiest way of structuring the program so a() and b() can run simultaneously?
I tried making a() and b() both async methods and tried to await them simultaneously, i.e. something like
async a(initial_parameters):
...
async b(initial_parameters):
...
A = await a(initial_parameters)
B = await b(initial_parameters)
but it didn't work, so based on the docs, I'm guessing maybe i need to manually get the event_loop and pass it as an argument to a() and b() which would pass them to api_function2() and api_function2(), and then close it manually when both tasks are donw but not really sure if i'm on the right track or how to do it.
Also open to better design pattern for this if you have one in mind
There is no reason why you can't nest calls to asyncio.gather. If you want to run a() and b() simultaneously, you must make both of them coroutines. And you can't use asyncio.run() inside either one of them, since that is a blocking call - it doesn't return until its argument has completed. You need to replace all the calls to asyncio.run() in a() and b() with await expressions. You will end up with something that looks like this:
async def api_function1(parameters):
return await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
return await asyncio.gather(*[some_other_thing2(p) for p in parameters])
async def a(initial_parameters):
output = []
data = await api_function1(initial_parameters)
output.append(data)
while True:
data = await api_function1(get_parameters_from_data(data))
output.append(data)
if some _condition is True:
break
return output
async def b(initial_parameters):
output = []
data = await api_function2(initial_parameters)
output.append(data)
while True:
data = await api_function2(get_parameters_from_data(data))
output.append(data)
if some condition is True:
break
return output
async def main():
a_data, b_data = asyncio.gather(a(initial_parameters), b(initial_parameters))
async def main():
task_a = asyncio.create_task(a(initial_parameters))
task_b = asyncio.create_task(b(initial_parameters))
a_data = await task_a
b_data = await task_b
asyncio.run(main())
This is still pseudocode.
I have given two possible ways of writing main(), one using asyncio.gather and the other using two calls to asyncio.create_task. Both versions create two tasks that run simultaneously, but the latter version doesn't require you to collect all the tasks in one place and start them all at the same time, as gather does. If gather works for your requirements, as it does here, it is more convenient.
Finally, a call to asyncio.run starts the program. The docs recommend having only one call to asyncio.run per program.
The two api functions should return something instead of setting a local variable.
In asyncio the crucial concept is the Task. It is Tasks that cooperate to provide simultaneous execution. Asyncio.gather actually creates Tasks under the hood, even though you typically pass it a list of coroutines. That's how it runs things in parallel.
Related
I have a system where two "processes" A and B run on the same asyncio event loop.
I notice that the order of the initiation of processes matters - i.e. if I start process B first then process B runs all the time, while it seems that A is being "starved" of resources vise-a-versa.
In my experience, the only reason this might happen is due to a mutex which is not being released by B, but in the following toy example it happens without any mutexs being used:
import asyncio
async def A():
while True:
print('A')
await asyncio.sleep(2)
async def B():
while True:
print('B')
await asyncio.sleep(8)
async def main():
await B()
await A()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Is in python the processes do not perform context-switch automatically? If not - how can I make both processes participate, each one in the time the other one is idle (i.e., sleeping)?
TLDR: Coroutines merely enable concurrency, they do not automatically trigger concurrency. Explicitly launch separate tasks, e.g. via create_task or gather, to run the coroutines concurrently.
async def main():
await asyncio.gather(B(), A())
Concurrency in asyncio is handled via Tasks – a close equivalent to Threads – which merely consist of coroutines/awaitables – like Threads consist of functions/callables. In general, a coroutine/awaitable itself does not equate to a separate task.
Using await X() means "start X and wait for it to complete". When using several such constructs in sequence:
async def main():
await B()
await A()
this means launching B first, and only launching A after B has completed: while async def and await allows for concurrency towards other tasks, B and A are run sequentially with respect to each other in a single task.
The simplest means to add concurrency is to explicitly create a task:
async def main():
# execute B in a new task
b_task = asyncio.create_task(B())
# execute A in the current task
await A()
await b_task
Note how B is offloaded to a new task, while one can still do a final await A() to re-use the current task.
Most async frameworks ship with high-level helpers for common concurrency scenarios. In this case, asyncio.gather is appropriate to launch several tasks at once:
async def main():
# execute B and A in new tasks
await asyncio.gather(B(), A())
Was wondering if there is any benefit of directly calling asyncio.gather(*coros) rather than starting the tasks with asyncio.create_task() and then calling asyncio.gather(*tasks).
So I did this test (please tell me if you notice any bias):
import timit
test1 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [asyncio.create_task(sleep()) for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
test2 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [sleep() for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
print(timeit.repeat(stmt=test1, setup="import asyncio", repeat=5, number=10000))
print(timeit.repeat(stmt=test2, setup="import asyncio", repeat=5, number=10000))
Here's the result:
>TEST 1 : [123.09070299999999, 118.88883120000001, 120.92030820000002, 121.22180739999999, 116.49616249999997]
>TEST 2 : [109.63426249999998, 108.96809150000001, 110.66497140000001, 105.34163260000003, 105.78473080000003]
Seems like there is no overhead when gather() has to create the tasks - it's even faster (although ensure_future() is called internally, if I understand well).
Any thoughts on this? Shall I follow the pattern used for test 2 rather than the one used for test 1? The Zen does not help much there, but as it outlines, "There should be one-- and preferably only one --obvious way to do it".
I have some code that runs multiple tasks in a loop like this:
done, running = await asyncio.wait(running, timeout=timeout_seconds,
return_when=asyncio.FIRST_COMPLETED)
I need to be able to determine which of these timed out. According to the documentation:
Note that this function does not raise asyncio.TimeoutError. Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set.
I could use wait_for() instead, but that function only accepts a single awaitable, whereas I need to specify multiple. Is there any way to determine which one from the set of awaitables I passed to wait() was responsible for the timeout?
Alternatively, is there a way to use wait_for() with multiple awaitables?
Your can try that tricks, probably it is not good solution:
import asyncio
async def foo():
return 42
async def need_some_sleep():
await asyncio.sleep(1000)
return 42
async def coro_wrapper(coro):
result = await asyncio.wait_for(coro(), timeout=10)
return result
loop = asyncio.get_event_loop()
done, running = loop.run_until_complete(asyncio.wait(
[coro_wrapper(foo), coro_wrapper(need_some_sleep)],
return_when=asyncio.FIRST_COMPLETED
)
)
for item in done:
print(item.result())
print(done, running)
Here is how I do it:
done, pending = await asyncio.wait({
asyncio.create_task(task, name=index)
for index, task in enumerate([
my_coroutine(),
my_coroutine(),
my_coroutine(),
])
},
return_when=asyncio.FIRST_COMPLETED
)
num = next(t.get_name() for t in done)
if num == 2:
pass
Use enumerate to name the tasks as they are created.
Recently, I have moved my REST server code in express.js to using FastAPI. So far, I've been successful in the transition until recently. I've noticed based on the firebase python admin sdk documention, unlike node.js, the python sdk is blocking. The documentation says here:
In Python and Go Admin SDKs, all write methods are blocking. That is, the write methods do not return until the writes are committed to the database.
I think this feature is having a certain effect on my code. It also could be how I've structured my code as well. Some code from one of my files is below:
from app.services.new_service import nService
from firebase_admin import db
import json
import redis
class TryNewService:
async def tryNew_func(self, request):
# I've already initialized everything in another file for firebase
ref = db.reference()
r = redis.Redis()
holdingData = await nService().dialogflow_session(request)
fulfillmentText = json.dumps(holdingData[-1])
body = await request.json()
if ("user_prelimInfo_address" in holdingData):
holdingData.append("session")
holdingData.append(body["session"])
print(holdingData)
return(holdingData)
else:
if (("Default Welcome Intent" in holdingData)):
pass
else:
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]})
print(holdingData)
return(fulfillmentText)
Is there any workaround for the blocking effect of usingref.set() line in my code? Kinda like adding a callback in node.js? I'm new to the asyncio world of python 3.
Update as of 06/13/2020: So I added following code and am now getting a RuntimeError: Task attached to a different loop. In my second else statement I do the following:
loop = asyncio.new_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as pool:
result = await loop.run_in_executor(pool, ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
print("custom thread pool:{}".format(result))
With this new RuntimeError, I would appreciate some help in figuring out.
If you want to run synchronous code inside an async coroutine, then the steps are:
loop = get_event_loop()
Note: Get and not new. Get provides current event_loop, and new_even_loop returns a new one
await loop.run_in_executor(None, sync_method)
First parameter = None -> use default executor instance
Second parameter (sync_method) is the synchronous code to be called.
Remember that resources used by sync_method need to be properly synchronized:
a) either using asyncio.Lock
b) or using asyncio.run_coroutine_threadsafe function(see an example below)
Forget for this case about ThreadPoolExecutor (that provides a way to I/O parallelism, versus concurrency provided by asyncio).
You can try following code:
loop = asyncio.get_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
result = await loop.run_in_executor(None, sync_method, ref, UserVal, holdingData)
print("custom thread pool:{}".format(result))
With a new function:
def sync_method(ref, UserVal, holdingData):
result = ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
return result
Please let me know your feedback
Note: previous code it's untested. I have only tested next minimum example (using pytest & pytest-asyncio):
import asyncio
import time
import pytest
#pytest.mark.asyncio
async def test_1():
loop = asyncio.get_event_loop()
delay = 3.0
result = await loop.run_in_executor(None, sync_method, delay)
print(f"Result = {result}")
def sync_method(delay):
time.sleep(delay)
print(f"dddd {delay}")
return "OK"
Answer #jeff-ridgeway comment:
Let's try to change previous answer to clarify how to use run_coroutine_threadsafe, to execute from a sync worker thread a coroutine that gather these shared resources:
Add loop as additional parameter in run_in_executor
Move all shared resources from sync_method to a new async_method, that is executed with run_coroutine_threadsafe
loop = asyncio.get_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
result = await loop.run_in_executor(None, sync_method, ref, UserVal, holdingData, loop)
print("custom thread pool:{}".format(result))
def sync_method(ref, UserVal, holdingData, loop):
coro = async_method(ref, UserVal, holdingData)
future = asyncio.run_coroutine_threadsafe(coro, loop)
future.result()
async def async_method(ref, UserVal, holdingData)
result = ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
return result
Note: previous code is untested. And now my tested minimum example updated:
#pytest.mark.asyncio
async def test_1():
loop = asyncio.get_event_loop()
delay = 3.0
result = await loop.run_in_executor(None, sync_method, delay, loop)
print(f"Result = {result}")
def sync_method(delay, loop):
coro = async_method(delay)
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result()
async def async_method(delay):
time.sleep(delay)
print(f"dddd {delay}")
return "OK"
I hope this can be helpful
Run blocking database calls on the event loop using a ThreadPoolExecutor. See https://medium.com/#hiranya911/firebase-python-admin-sdk-with-asyncio-d65f39463916
I've include asyncio to asynchronous my code in my library twbotlib (https://github.com/truedl/twbotlib).
I tried the asynced commands some versions ago and all go well, but I don't check about if is really asynced. Then I've tried to create a giveaway command and use await asyncio.sleep(5). I realized that is blocking all my other code...
After many tries to play with the asyncio code I don't reach the result is running without blocking...
(My class Bot in main.py have an attribute that called self.loop and is actually asyncio.get_event_loop)
I don't know if I do all correctly because I'm just after calling the Run function, I call all later operations with await.
I've tried to replace the just await with
await self.loop.create_task(foo).
I tried to do
await self.loop.ensure_future(foo) but nothing...
Too I've tried to split the code to two functions (mainloop and check_data).
First of all in the code is the Run function here I start the loop (just creating task and run_forever):
def run(self, startup_function=None) -> None:
""" Run the bot and start the main while. """
self.loop.create_task(self.mainloop(startup_function))
self.loop.run_forever()
Secondly here the mainloop function (all the await functions are blocking...):
async def mainloop(self, startup_function) -> None:
""" The main loop that reads and process the incoming data. """
if startup_function:
await startup_function()
self.is_running = True
while self.is_running:
data = self.sock.recv(self.buffer).decode('utf-8').split('\n')
await self.check_data(data)
And the last one is the check_data (is splitted mainloop [I've replace the long if's with "condition" for readability], here too the await's is blocking):
async def check_data(self, data: str) -> None:
for line in data:
if confition:
message = self.get_message_object_from_str(line)
if condition:
if condition:
await self.commands[message.command](message, message.args)
else:
await self.commands[message.command](message)
elif hasattr(self.event, 'on_message'):
await self.event.on_message(message)
if self.logs:
print(line)
There is no error message.
The code is blocking and I'm trying to change it to not block the code.
The loop for line in data: is blocking you code.