When using trio and nursery objects, how do you capture any value that was returned from a method?
Take this example from the trio website:
async def append_fruits():
fruits = []
fruits.append("Apple")
fruits.append("Orange")
return fruits
async def numbers():
numbers = []
numbers.append(1)
numbers.append(2)
return numbers
async def parent():
async with trio.open_nursery() as nursery:
nursery.start_soon(append_fruits)
nursery.start_soon(numbers)
I modified it so that each method returns a list. How would you capture the return value so that I could print them?
Currently, there is no built-in mechanism for this. Mostly because we haven't figured out how we would even want it to work, so if you have some suggestions that would be helpful :-).
The thing is, with regular functions, there's exactly one obvious place to access the return value – the caller is waiting, so you hand them the return value, done. With concurrent functions, the caller isn't waiting, so you also need some way to specify where to return it to, when to return it, if there are multiple functions you have to keep track of which one is returning a value, and so on. It's not as simple a concept.
What do you want to do with the return values? Do you want to, say, print them immediately when each function returns? In that case the simplest thing is to do it directly from the tasks:
async def print_fruits():
print(await fruits())
async def print_numbers():
print(await numbers())
async with trio.open_nursery() as nursery:
nursery.start_soon(print_fruits)
nursery.start_soon(print_numbers)
You could even factor this into a helper function:
async def call_then_print(fn):
print(await fn())
async with trio.open_nursery() as nursery:
nursery.start_soon(call_then_print, fruits)
nursery.start_soon(call_then_print, numbers)
Or maybe you want to put them in a data structure to look at later?
results = {}
async def store_fruits_in_results_dict():
results["fruits"] = await fruits()
async def store_numbers_in_results_dict():
results["numbers"] = await numbers()
async with trio.open_nursery() as nursery:
nursery.start_soon(store_fruits_in_results_dict)
nursery.start_soon(store_numbers_in_results_dict)
# This is after the nursery block, so we know that the dict is fully filled in:
print(results["fruits"])
print(results["numbers"])
You can imagine fancier versions of those too – for example, sometimes when you run a lot of tasks in parallel you want to capture exceptions, not just return values, so that some tasks can still succeed even if some of them fail. For that you can use a try/except around each individual function, or the outcome library. Or when each operation finishes you could put its return value into a trio.Queue, so that another task can process the results as they're finished. But hopefully this gives you a good starting point :-)
In this case, simply create the arrays in the parent and pass each to the child that needs it.
More generally, pass an object to the tasks; they can set an attribute on it. You might also add an Event so that the parent can wait for the results to be available.
Related
i'm ashamed to admit i've been using python's asyncio for a long time without really understanding how it works and now i'm in a pickle. in pseudo code, my current program is like this:
async def api_function1(parameters):
result = await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
result = await asyncio.gather(*[some_other_thing2(p) for p in parameters])
def a(initial_parameters):
output = []
data = asyncio.run(api_function1(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function1(get_parameters_from_data(data)))
output.append(data)
if some _condition is True:
break
return output
def b(initial_parameters):
output = []
data = asyncio.run(api_function2(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function2(get_parameters_from_data(data)))
output.append(data)
if some condition is True:
break
return output
a() and b() get data from two different rest api endpoints, each with its own rate limits and nuances. i want to run a() and b() simultaneously.
what's the best/easiest way of structuring the program so a() and b() can run simultaneously?
I tried making a() and b() both async methods and tried to await them simultaneously, i.e. something like
async a(initial_parameters):
...
async b(initial_parameters):
...
A = await a(initial_parameters)
B = await b(initial_parameters)
but it didn't work, so based on the docs, I'm guessing maybe i need to manually get the event_loop and pass it as an argument to a() and b() which would pass them to api_function2() and api_function2(), and then close it manually when both tasks are donw but not really sure if i'm on the right track or how to do it.
Also open to better design pattern for this if you have one in mind
There is no reason why you can't nest calls to asyncio.gather. If you want to run a() and b() simultaneously, you must make both of them coroutines. And you can't use asyncio.run() inside either one of them, since that is a blocking call - it doesn't return until its argument has completed. You need to replace all the calls to asyncio.run() in a() and b() with await expressions. You will end up with something that looks like this:
async def api_function1(parameters):
return await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
return await asyncio.gather(*[some_other_thing2(p) for p in parameters])
async def a(initial_parameters):
output = []
data = await api_function1(initial_parameters)
output.append(data)
while True:
data = await api_function1(get_parameters_from_data(data))
output.append(data)
if some _condition is True:
break
return output
async def b(initial_parameters):
output = []
data = await api_function2(initial_parameters)
output.append(data)
while True:
data = await api_function2(get_parameters_from_data(data))
output.append(data)
if some condition is True:
break
return output
async def main():
a_data, b_data = asyncio.gather(a(initial_parameters), b(initial_parameters))
async def main():
task_a = asyncio.create_task(a(initial_parameters))
task_b = asyncio.create_task(b(initial_parameters))
a_data = await task_a
b_data = await task_b
asyncio.run(main())
This is still pseudocode.
I have given two possible ways of writing main(), one using asyncio.gather and the other using two calls to asyncio.create_task. Both versions create two tasks that run simultaneously, but the latter version doesn't require you to collect all the tasks in one place and start them all at the same time, as gather does. If gather works for your requirements, as it does here, it is more convenient.
Finally, a call to asyncio.run starts the program. The docs recommend having only one call to asyncio.run per program.
The two api functions should return something instead of setting a local variable.
In asyncio the crucial concept is the Task. It is Tasks that cooperate to provide simultaneous execution. Asyncio.gather actually creates Tasks under the hood, even though you typically pass it a list of coroutines. That's how it runs things in parallel.
I have some code that runs multiple tasks in a loop like this:
done, running = await asyncio.wait(running, timeout=timeout_seconds,
return_when=asyncio.FIRST_COMPLETED)
I need to be able to determine which of these timed out. According to the documentation:
Note that this function does not raise asyncio.TimeoutError. Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set.
I could use wait_for() instead, but that function only accepts a single awaitable, whereas I need to specify multiple. Is there any way to determine which one from the set of awaitables I passed to wait() was responsible for the timeout?
Alternatively, is there a way to use wait_for() with multiple awaitables?
Your can try that tricks, probably it is not good solution:
import asyncio
async def foo():
return 42
async def need_some_sleep():
await asyncio.sleep(1000)
return 42
async def coro_wrapper(coro):
result = await asyncio.wait_for(coro(), timeout=10)
return result
loop = asyncio.get_event_loop()
done, running = loop.run_until_complete(asyncio.wait(
[coro_wrapper(foo), coro_wrapper(need_some_sleep)],
return_when=asyncio.FIRST_COMPLETED
)
)
for item in done:
print(item.result())
print(done, running)
Here is how I do it:
done, pending = await asyncio.wait({
asyncio.create_task(task, name=index)
for index, task in enumerate([
my_coroutine(),
my_coroutine(),
my_coroutine(),
])
},
return_when=asyncio.FIRST_COMPLETED
)
num = next(t.get_name() for t in done)
if num == 2:
pass
Use enumerate to name the tasks as they are created.
I'm trying to wrap an async function up so that I can use it without importing asyncio in certain files. The ultimate goal is to use asynchronous functions but being able to call them normally and get back the result.
How can I access the result from the callback function printing(task) and use it as the return of my make_task(x) function?
MWE:
#!/usr/bin/env python3.7
import asyncio
loop = asyncio.get_event_loop()
def make_task(x): # Can be used without asyncio
task = loop.create_task(my_async(x))
task.add_done_callback(printing)
# return to get the
def printing(task):
print('sleep done: %s' % task.done())
print('results: %s' % task.result())
return task.result() # How can i access this return?
async def my_async(x): # Handeling the actual async running
print('Starting my async')
res = await my_sleep(x)
return res # The value I want to ultimately use in the real callback
async def my_sleep(x):
print('starting sleep for %d' % x)
await asyncio.sleep(x)
return x**2
async def my_coro(*coro):
return await asyncio.gather(*coro)
val1 = make_task(4)
val2 = make_task(5)
loop.run_until_complete(my_coro(asyncio.sleep(6)))
print(val1)
print(val2)
If I understand correctly you want to use asynchronous functions but don't want to write async/await in top-level code.
If that's the case, I'm afraid it's not possible to achieve with asyncio. asyncio wants you to write async/await everywhere asynchronous stuff happens and this is intentional: forcing to explicitly mark places of possible context switch is a asyncio's way to fight concurrency-related problems (which is very hard to fight otherwise). Read this answer for more info.
If you still want to have asynchronous stuff and use it "as usual code" take a look at alternative solutions like gevent.
Instead of using a callback, you can make printing a coroutine and await the original coroutine, such as my_async. make_task can then create a task out of printing(my_async(...)), which will make the return value of printing available as the task result. In other words, to return a value out of printing, just - return it.
For example, if you define make_task and printing like this and leave the rest of the program unchanged:
def make_task(x):
task = loop.create_task(printing(my_async(x)))
return task
async def printing(coro):
coro_result = await coro
print('sleep done')
print('results: %s' % coro_result)
return coro_result
The resulting output is:
Starting my async
starting sleep for 4
Starting my async
starting sleep for 5
sleep done
results: 16
sleep done
results: 25
<Task finished coro=<printing() done, defined at result1.py:11> result=16>
<Task finished coro=<printing() done, defined at result1.py:11> result=25>
Imagine the following very common situation: you have written a long and complicated function and realize that some of the code should be extracted into a seperate function for reuse and/or readability. Usually, this extra function call will not change the semantics of your program.
However, now imagine that your function is a coroutine and the code you want to extract contains at least one asyncronous call. Extracting it into a separate function now suddenly changes your programs semantics by inserting a new point on which the coroutine yields, the event loop takes control and any other coroutine could be scheduled in between.
Example before:
async def complicated_func():
foo()
bar()
await baz()
Example after:
async def complicated_func():
foo()
await extracted_func()
async def extracted_func():
bar()
await baz()
In the example before, the complicated_func is guaranteed not to be suspended between calling foo() and calling bar(). After refactoring, this guarantee is lost.
My question is this: is it possible to call extracted_func() such that it is executed immediately as if its code would be inline? Or is there some other way to perform such common refactoring tasks without changing the programs semantics?
After refactoring, this guarantee is lost.
It's actually not.
Is it possible to call extracted_func() such that it is executed immediately as if its code would be inline?
That's already the case.
await some_coroutine() means that some_coroutine is likely to give to control back to the event loop, but it's not going to do so until it actually awaits a future (e.g some I/O operation).
Consider this example:
import asyncio
async def coro():
print(1)
await asyncio.sleep(0)
print(3)
async def main():
loop.call_soon(print, 2)
await coro()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Notice how 2 gets printed between 1 and 3 as expected.
That also means it's possible to freeze the event loop by writing such code:
async def coro():
return
async def main():
while True:
await coro()
In this case, the event loop never gets a chance to run another task.
I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.
It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.
Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.