How to get a regular iterator from an asynchronous iterator? - python-3.x

Got an async iterable. Need a regular iterable.
asyc def aiter2iter(aiter):
l = []
async for chunk in aiter:
l.append(chunk)
return l
regular_iterable = await aiter2iter(my_async_iterable)
for chunk in regular_iterable:
print('Hooray! No async required here!')
Is this the way to go or am I reinventing the wheel?
Is there any way provided by Python to convert an async iterable to a regular iterable?
Also is what I wrote even correct? Did I not miss anything?

Your way works alright. I would also try unsync when working between async/sync functions.
Given
import time
import random
import asyncio
from unsync import unsync
# Sample async iterators
class AsyncIterator:
"""Yield random numbers."""
def __aiter__(self):
return self
async def __anext__(self):
await asyncio.sleep(0.1)
return random.randint(0, 10)
async def anumbers(n=10):
"""Yield the first `n` random numbers."""
i = 0
async for x in AsyncIterator():
if i == n:
return
yield x
i +=1
Code
Rather than awaiting and reiterating the result, we can just call the result():
#unsync
async def aiterate(aiter):
"""Return a list from an aiter object."""
return [x async for x in aiter]
aiterate(anumbers(5)).result()
# [8, 2, 5, 8, 9]
Details
Here's a description, from Python Byte's episode 73:
You just take any async function, and put an #unsync decorator. ... it will basically wrap it up and do all that asyncio initialization stuff ... then you can wait on the result, or not wait on the result, however you like. ... then if you put that on a regular function, not an async one, it'll cause it to run on a thread pool thread, on thread pool executor.

Related

Async Generator Comprehension

The requirement is to concurrently perform a time consuming operation for a list of data.
My current implementation:
async def expensive_routine(service) -> Optional[Any]:
await asyncio.sleep(5)
if service % 2:
return service
return None
async def producer():
# let's say
services = range(10)
#
for future in asyncio.as_completed(
[expensive_routine(service) for service in services]
):
result = await future
if result:
yield result
This is then used by:
async for x, y in producer():
print(f"I have my {x} and {y}")
the function expensive_routine returns Optional[Any]. I want to yield only the not None results.
Is there a way to perform this more efficiently or using a Comprehension ?
You mean if you can cram your nice little producer coroutine into a single-line generator expression abomination of unreadability? Why, yes!
import asyncio
import random
async def expensive_routine(service):
await asyncio.sleep(random.randint(0, 5))
if random.choice([0, 1]):
return service
return None
async def main():
async for x in (
res
for coro in asyncio.as_completed(
[expensive_routine(service) for service in range(10)]
)
if (res := (await coro)) # Python 3.8+
):
print(x)
asyncio.run(main())
Jokes aside, I don't see anything wrong with your code and I'm not sure what you mean by more efficient, since your speed here is dominated by the slow expensive_routine.
I wrote this small example because I think this is what you meant with a comprehension, but I would prefer your much more readable producer.

How to use asynchronous iterator using aiter() and anext() builtins

I have gone through the documentation of aiter and anext (New in version 3.10). But not understanding how to use them.
I have the following program:
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
How can I use aiter and anext builtin here?
Like with the regular synchronous iter and next builtins, you rarely need to use the new builtins directly. The async for loop in main calls the __aiter__ and __anext__ methods of your class already. If that does all you want, you're home free.
You only need to explicitly use aiter and anext if you are writing code that interacts with an asynchronous iterator in some way not directly supported by a async for loop. For instance, here's an asynchronous generator that yields pairs of values from the iterable it's given:
async def pairwise(aiterable, default=None):
ait = aiter(aiterable) # get a reference to the iterator
async for x in ait:
yield x, await anext(ait, default) # get an extra value, yield a 2-tuple
If you loop on pairwise(AIter(10)) in your main function, you'll find that it now prints tuples of numbers, like finally (0, 1). Before each tuple, you'll see two sets of the begin and end lines printed by the iterator class, one for each value that ends up in the paired result.

Appending to merged async generators in Python

I'm trying to merge a bunch of asynchronous generators in Python 3.7 while still adding new async generators on iteration. I'm currently using aiostream to merge my generators:
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
print(v)
if __name__ == '__main__':
run(main())
However, I need to be able to continue to add to the running tasks once the loop has begun. Something like.
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
if v == 50:
tasks.merge(go())
print(v)
if __name__ == '__main__':
run(main())
The closest I've got to this is using the aiostream library but maybe this can also be written fairly neatly with just the native asyncio standard library.
Here is an implementation that should work efficiently even with a large number of async iterators:
class merge:
def __init__(self, *iterables):
self._iterables = list(iterables)
self._wakeup = asyncio.Event()
def _add_iters(self, next_futs, on_done):
for it in self._iterables:
it = it.__aiter__()
nfut = asyncio.ensure_future(it.__anext__())
nfut.add_done_callback(on_done)
next_futs[nfut] = it
del self._iterables[:]
return next_futs
async def __aiter__(self):
done = {}
next_futs = {}
def on_done(nfut):
done[nfut] = next_futs.pop(nfut)
self._wakeup.set()
self._add_iters(next_futs, on_done)
try:
while next_futs:
await self._wakeup.wait()
self._wakeup.clear()
for nfut, it in done.items():
try:
ret = nfut.result()
except StopAsyncIteration:
continue
self._iterables.append(it)
yield ret
done.clear()
if self._iterables:
self._add_iters(next_futs, on_done)
finally:
# if the generator exits with an exception, or if the caller stops
# iterating, make sure our callbacks are removed
for nfut in next_futs:
nfut.remove_done_callback(on_done)
def append_iter(self, new_iter):
self._iterables.append(new_iter)
self._wakeup.set()
The only change required for your sample code is that the method is named append_iter, not merge.
This can be done using stream.flatten with an asyncio queue to store the new generators.
import asyncio
from aiostream import stream, pipe
async def main():
queue = asyncio.Queue()
await queue.put(go())
await queue.put(go())
await queue.put(go())
xs = stream.call(queue.get)
ys = stream.cycle(xs)
zs = stream.flatten(ys, task_limit=5)
async with zs.stream() as streamer:
async for item in streamer:
if item == 50:
await queue.put(go())
print(item)
Notice that you may tune the number of tasks that can run at the same time using the task_limit argument. Also note that zs can be elegantly defined using the pipe syntax:
zs = stream.call(queue.get) | pipe.cycle() | pipe.flatten(task_limit=5)
Disclaimer: I am the project maintainer.

How to iterate over an asynchronous iterator with a timeout?

I think it's easier to understand in terms of code:
try:
async for item in timeout(something(), timeout=60):
await do_something_useful(item)
except asyncio.futures.TimeoutError:
await refresh()
I want the async for to run at most 60 seconds.
I needed to do something like this to create a websocket(also an async iterator) which times out if it doesn't get a message after a certain duration. I settled on the following:
socket_iter = socket.__aiter__()
try:
while True:
message = await asyncio.wait_for(
socket_iter.__anext__(),
timeout=10
)
except asyncio.futures.TimeoutError:
# streaming is completed
pass
AsyncTimedIterable could be the implementation of timeout() in your code:
class _AsyncTimedIterator:
__slots__ = ('_iterator', '_timeout', '_sentinel')
def __init__(self, iterable, timeout, sentinel):
self._iterator = iterable.__aiter__()
self._timeout = timeout
self._sentinel = sentinel
async def __anext__(self):
try:
return await asyncio.wait_for(self._iterator.__anext__(), self._timeout)
except asyncio.TimeoutError:
return self._sentinel
class AsyncTimedIterable:
__slots__ = ('_factory', )
def __init__(self, iterable, timeout=None, sentinel=None):
self._factory = lambda: _AsyncTimedIterator(iterable, timeout, sentinel)
def __aiter__(self):
return self._factory()
(original answer)
Or use this class to replace your timeout() function:
class AsyncTimedIterable:
def __init__(self, iterable, timeout=None, sentinel=None):
class AsyncTimedIterator:
def __init__(self):
self._iterator = iterable.__aiter__()
async def __anext__(self):
try:
return await asyncio.wait_for(self._iterator.__anext__(),
timeout)
except asyncio.TimeoutError:
return sentinel
self._factory = AsyncTimedIterator
def __aiter__(self):
return self._factory()
A simple approach is to use an asyncio.Queue, and separate the code into two coroutines:
queue = asyncio.Queue()
async for item in something():
await queue.put(item)
In another coroutine:
while True:
try:
item = await asyncio.wait_for(queue.get(), 60)
except asyncio.TimeoutError:
pass
else:
if item is None:
break # use None or whatever suits you to gracefully exit
await do_something_useful(item)
refresh()
Please note, it will make the queue grow if the handler do_something_useful() is slower than something() generates items. You may set a maxsize on the queue to limit the buffer size.
Answer to your question can be different based on nature of refresh function. If it's very short-running function it can be freely called inside coroutine. But if it's blocking function (due to network or CPU) it should be ran in executor to avoid freezing asyncio event loop.
Code below shows example for the first case, changing it to run refresh in executor is not hard.
Second thing to be clarified is a nature of asynchronous iterator. As far as I understand, you're using it to either get result from something or None if timeout occurred.
If I understand logic correctly, your code can be written clearer (similar to non-async style as asyncio is created to allow) using async_timeout context manager and without using asynchronous iterator at all:
import asyncio
from async_timeout import timeout
async def main():
while True:
try:
async with timeout(60):
res = await something()
await do_something_useful(item)
except asyncio.TimeoutError:
pass
finally:
refresh()
Your question is missing a couple of details, but assuming something() is an async iterator or generator and you want item to be sentinel everytime something has not yielded a value within the timeout, here is an implementation of timeout():
import asyncio
from typing import *
T = TypeVar('T')
# async generator, needs python 3.6
async def timeout(it: AsyncIterator[T], timeo: float, sentinel: T) -> AsyncGenerator[T, None]:
try:
nxt = asyncio.ensure_future(it.__anext__())
while True:
try:
yield await asyncio.wait_for(asyncio.shield(nxt), timeo)
nxt = asyncio.ensure_future(it.__anext__())
except asyncio.TimeoutError:
yield sentinel
except StopAsyncIteration:
pass
finally:
nxt.cancel() # in case we're getting cancelled our self
test:
async def something():
yield 1
await asyncio.sleep(1.1)
yield 2
await asyncio.sleep(2.1)
yield 3
async def test():
expect = [1, None, 2, None, None, 3]
async for item in timeout(something(), 1, None):
print("Check", item)
assert item == expect.pop(0)
asyncio.get_event_loop().run_until_complete(test())
When wait_for() times out it will cancel the task. Therefore, we need to wrap it.__anext__() in a task and then shield it, to be able to resume the iterator.
I want the coroutine to execute refresh at least every 60 seconds.
If you need to execute refresh every 60 seconds regardless of what happens with do_something_useful, you can arrange that with a separate coroutine:
import time
async def my_loop():
# ensure refresh() is invoked at least once in 60 seconds
done = False
async def repeat_refresh():
last_run = time.time()
while not done:
await refresh()
now = time.time()
await asyncio.sleep(max(60 - (now - last_run), 0))
last_run = now
# start repeat_refresh "in the background"
refresh_task = asyncio.get_event_loop().create_task(repeat_refresh())
try:
async for item in something():
if item is not None:
await do_something_useful(item)
await refresh()
finally:
done = True

Wait on Python async generators

Say I have two async generators:
async def get_rules():
while True:
yield 'rule=1'
asyncio.sleep(2)
async def get_snapshots():
while True:
yield 'snapshot=1'
asyncio.sleep(5)
I want to merge them into a single async generator that returns 2-tuples, with the latest value from both. Sort of combineLatest.
What is the best way to do this?
You might want to have a look at aiostream, especially stream.merge and stream.accumulate:
import asyncio
from itertools import count
from aiostream import stream
async def get_rules():
for x in count():
await asyncio.sleep(2)
yield 'rule', x
async def get_snapshots():
for x in count():
await asyncio.sleep(5)
yield 'snapshot', x
async def main():
xs = stream.merge(get_rules(), get_snapshots())
ys = stream.map(xs, lambda x: {x[0]: x[1]})
zs = stream.accumulate(ys, lambda x, e: {**x, **e}, {})
async with zs.stream() as streamer:
async for z in streamer:
print(z)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Output:
{}
{'rule': 0}
{'rule': 1}
{'rule': 1, 'snapshot': 0}
{'rule': 2, 'snapshot': 0}
[...]
See the project page and the documentation for further information.
Disclaimer: I am the project maintainer.
I came up with this:
async def combine(**generators):
"""Given a bunch of async generators, merges the events from
all of them. Each should have a name, i.e. `foo=gen, bar=gen`.
"""
combined = Channel()
async def listen_and_forward(name, generator):
async for value in generator:
await combined.put({name: value})
for name, generator in generators.items():
asyncio.Task(listen_and_forward(name, generator))
async for item in combined:
yield item
async def combine_latest(**generators):
"""Like "combine", but always includes the latest value from
every generator.
"""
current = {}
async for value in combine(**generators):
current.update(value)
yield current
Call it like so:
async for item in combine_latest(rules=rulesgen, snap=snapgen):
print(item)
Output looks like this:
{'rules': 'rule-1'}
{'rules': 'rule-1', 'snap': 'snapshot-1'}
{'rules': 'rule-1', 'snap': 'snapshot-1'}
....
I am using aiochannel, but a normal asyncio.Queue should be fine, too.

Resources