How to use asynchronous iterator using aiter() and anext() builtins - python-3.x

I have gone through the documentation of aiter and anext (New in version 3.10). But not understanding how to use them.
I have the following program:
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
How can I use aiter and anext builtin here?

Like with the regular synchronous iter and next builtins, you rarely need to use the new builtins directly. The async for loop in main calls the __aiter__ and __anext__ methods of your class already. If that does all you want, you're home free.
You only need to explicitly use aiter and anext if you are writing code that interacts with an asynchronous iterator in some way not directly supported by a async for loop. For instance, here's an asynchronous generator that yields pairs of values from the iterable it's given:
async def pairwise(aiterable, default=None):
ait = aiter(aiterable) # get a reference to the iterator
async for x in ait:
yield x, await anext(ait, default) # get an extra value, yield a 2-tuple
If you loop on pairwise(AIter(10)) in your main function, you'll find that it now prints tuples of numbers, like finally (0, 1). Before each tuple, you'll see two sets of the begin and end lines printed by the iterator class, one for each value that ends up in the paired result.

Related

How does Python return a generator and continue in unknown functions

When I have a function such as
def gen_5():
for i in range(5):
yield i
This can be utilized as
for i in gen_5():
print(i)
When the function gen_5 hits the keyword yield, a generator is returned from the function and when I call "next" it will continue once again.
I have created my own range generator generator that does the same thing,
class RangeGenerator():
def __init__(self, n):
self.n = n
self.current_n = 0
def __iter__(self):
return self
def __next__(self):
return self.next()
def next(self):
if self.current_n == self.n:
raise StopIteration()
rv = self.current_n
self.current_n += 1
return rv
Which can accomplish the same thing, usage as,
rg_5 = RangeGenerator(5)
for i in rg_5:
print(i)
My own class holds its own state and I can easily understand how this works, what I cannot understand is how the keyword yield works and returns a generator in any function that can have random code in it.
How does it keep track of the state of the variables inside the function?
Where is the Generator class being returned written, can I view the source code?
How does is it translating my function gen_5 inside something like my class with a defined __next__, a way to track states etc? Although my function is small, yield can work on very large and complex functions, what is keeping track of all of this?

Async Generator Comprehension

The requirement is to concurrently perform a time consuming operation for a list of data.
My current implementation:
async def expensive_routine(service) -> Optional[Any]:
await asyncio.sleep(5)
if service % 2:
return service
return None
async def producer():
# let's say
services = range(10)
#
for future in asyncio.as_completed(
[expensive_routine(service) for service in services]
):
result = await future
if result:
yield result
This is then used by:
async for x, y in producer():
print(f"I have my {x} and {y}")
the function expensive_routine returns Optional[Any]. I want to yield only the not None results.
Is there a way to perform this more efficiently or using a Comprehension ?
You mean if you can cram your nice little producer coroutine into a single-line generator expression abomination of unreadability? Why, yes!
import asyncio
import random
async def expensive_routine(service):
await asyncio.sleep(random.randint(0, 5))
if random.choice([0, 1]):
return service
return None
async def main():
async for x in (
res
for coro in asyncio.as_completed(
[expensive_routine(service) for service in range(10)]
)
if (res := (await coro)) # Python 3.8+
):
print(x)
asyncio.run(main())
Jokes aside, I don't see anything wrong with your code and I'm not sure what you mean by more efficient, since your speed here is dominated by the slow expensive_routine.
I wrote this small example because I think this is what you meant with a comprehension, but I would prefer your much more readable producer.

Appending to merged async generators in Python

I'm trying to merge a bunch of asynchronous generators in Python 3.7 while still adding new async generators on iteration. I'm currently using aiostream to merge my generators:
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
print(v)
if __name__ == '__main__':
run(main())
However, I need to be able to continue to add to the running tasks once the loop has begun. Something like.
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
if v == 50:
tasks.merge(go())
print(v)
if __name__ == '__main__':
run(main())
The closest I've got to this is using the aiostream library but maybe this can also be written fairly neatly with just the native asyncio standard library.
Here is an implementation that should work efficiently even with a large number of async iterators:
class merge:
def __init__(self, *iterables):
self._iterables = list(iterables)
self._wakeup = asyncio.Event()
def _add_iters(self, next_futs, on_done):
for it in self._iterables:
it = it.__aiter__()
nfut = asyncio.ensure_future(it.__anext__())
nfut.add_done_callback(on_done)
next_futs[nfut] = it
del self._iterables[:]
return next_futs
async def __aiter__(self):
done = {}
next_futs = {}
def on_done(nfut):
done[nfut] = next_futs.pop(nfut)
self._wakeup.set()
self._add_iters(next_futs, on_done)
try:
while next_futs:
await self._wakeup.wait()
self._wakeup.clear()
for nfut, it in done.items():
try:
ret = nfut.result()
except StopAsyncIteration:
continue
self._iterables.append(it)
yield ret
done.clear()
if self._iterables:
self._add_iters(next_futs, on_done)
finally:
# if the generator exits with an exception, or if the caller stops
# iterating, make sure our callbacks are removed
for nfut in next_futs:
nfut.remove_done_callback(on_done)
def append_iter(self, new_iter):
self._iterables.append(new_iter)
self._wakeup.set()
The only change required for your sample code is that the method is named append_iter, not merge.
This can be done using stream.flatten with an asyncio queue to store the new generators.
import asyncio
from aiostream import stream, pipe
async def main():
queue = asyncio.Queue()
await queue.put(go())
await queue.put(go())
await queue.put(go())
xs = stream.call(queue.get)
ys = stream.cycle(xs)
zs = stream.flatten(ys, task_limit=5)
async with zs.stream() as streamer:
async for item in streamer:
if item == 50:
await queue.put(go())
print(item)
Notice that you may tune the number of tasks that can run at the same time using the task_limit argument. Also note that zs can be elegantly defined using the pipe syntax:
zs = stream.call(queue.get) | pipe.cycle() | pipe.flatten(task_limit=5)
Disclaimer: I am the project maintainer.

Merging async iterables in python3

Is there a good way, or a well-supported library, for merging async iterators in python3?
The desired behavior is basically the same as that of merging observables in reactivex.
That is, in the normal case, if I'm merging two async iterator, I want the resulting async iterator to yield results chronologically. An error in one of the iterators should derail the merged iterator.
(Source: http://reactivex.io/documentation/operators/merge.html)
This is my best attempt, but it seems like something there might be a standard solution to:
async def drain(stream, q, sentinal=None):
try:
async for item in stream:
await q.put(item)
if sentinal:
await q.put(sentinal)
except BaseException as e:
await q.put(e)
async def merge(*streams):
q = asyncio.Queue()
sentinal = namedtuple("QueueClosed", ["truthy"])(True)
futures = {
asyncio.ensure_future(drain(stream, q, sentinal)) for stream in streams
}
remaining = len(streams)
while remaining > 0:
result = await q.get()
if result is sentinal:
remaining -= 1
continue
if isinstance(result, BaseException):
raise result
yield result
if __name__ == "__main__":
# Example: Should print:
# 1
# 2
# 3
# 4
loop = asyncio.get_event_loop()
async def gen():
yield 1
await asyncio.sleep(1.5)
yield 3
async def gen2():
await asyncio.sleep(1)
yield 2
await asyncio.sleep(1)
yield 4
async def go():
async for x in merge(gen(), gen2()):
print(x)
loop.run_until_complete(go())
You can use aiostream.stream.merge:
from aiostream import stream
async def go():
async for x in stream.merge(gen(), gen2()):
print(x)
More examples in the documentation and this answer.

how to cache asyncio coroutines

I am using aiohttp to make a simple HTTP request in python 3.4 like this:
response = yield from aiohttp.get(url)
The application requests the same URL over and over again so naturally I wanted to cache it. My first attempt was something like this:
#functools.lru_cache(maxsize=128)
def cached_request(url):
return aiohttp.get(url)
The first call to cached_request works fine, but in later calls I end up with None instead of the response object.
I am rather new to asyncio so I tried a lot of combinations of the asyncio.coroutine decorator, yield from and some other things, but none seemed to work.
So how does caching coroutines work?
Maybe a bit late, but I've started a new package that may help: https://github.com/argaen/aiocache. Contributions/comments are always welcome.
An example:
import asyncio
from collections import namedtuple
from aiocache import cached
from aiocache.serializers import PickleSerializer
Result = namedtuple('Result', "content, status")
#cached(ttl=10, serializer=PickleSerializer())
async def async_main():
print("First ASYNC non cached call...")
await asyncio.sleep(1)
return Result("content", 200)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
print(loop.run_until_complete(async_main()))
print(loop.run_until_complete(async_main()))
print(loop.run_until_complete(async_main()))
print(loop.run_until_complete(async_main()))
Note that as an extra, it can cache any python object into redis using Pickle serialization. In case you just want to work with memory, you can use the SimpleMemoryCache backend :).
An popular async version of lru_cache exist here: async_lru
To use functools.lru_cache with coroutines, the following code works.
class Cacheable:
def __init__(self, co):
self.co = co
self.done = False
self.result = None
self.lock = asyncio.Lock()
def __await__(self):
with (yield from self.lock):
if self.done:
return self.result
self.result = yield from self.co.__await__()
self.done = True
return self.result
def cacheable(f):
def wrapped(*args, **kwargs):
r = f(*args, **kwargs)
return Cacheable(r)
return wrapped
#functools.lru_cache()
#cacheable
async def foo():
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
The following is thread safe
class ThreadSafeCacheable:
def __init__(self, co):
self.co = co
self.done = False
self.result = None
self.lock = threading.Lock()
def __await__(self):
while True:
if self.done:
return self.result
if self.lock.acquire(blocking=False):
self.result = yield from self.co.__await__()
self.done = True
return self.result
else:
yield from asyncio.sleep(0.005)
I wrote a simple cache decorator myself:
def async_cache(maxsize=128):
cache = {}
def decorator(fn):
def wrapper(*args):
key = ':'.join(args)
if key not in cache:
if len(cache) >= maxsize:
del cache[cache.keys().next()]
cache[key] = yield from fn(*args)
return cache[key]
return wrapper
return decorator
#async_cache()
#asyncio.coroutine
def expensive_io():
....
This kind-of-works. But many aspects can probably be improved. For example: If the cached function is called a second time before the first call returns, it will execute a second time.
I'm not that familiar with aiohttp so I'm not sure of exactly what is happening that would cause Nones to be returned, but the lru_cache decorator will not work with async functions.
I use a decorator which does essentially the same thing; note that it is different to tobib's decorator above in that it will always return a future or a task, rather than the value:
from collections import OrderedDict
from functools import _make_key, wraps
def future_lru_cache(maxsize=128):
# support use as decorator without calling, for this case maxsize will
# not be an int
try:
real_max_size = int(maxsize)
except ValueError:
real_max_size = 128
cache = OrderedDict()
async def run_and_cache(func, args, kwargs):
"""Run func with the specified arguments and store the result
in cache."""
result = await func(*args, **kwargs)
cache[_make_key(args, kwargs, False)] = result
if len(cache) > real_max_size:
cache.popitem(False)
return result
def wrapper(func):
#wraps(func)
def decorator(*args, **kwargs):
key = _make_key(args, kwargs, False)
if key in cache:
# Some protection against duplicating calls already in
# progress: when starting the call cache the future, and if
# the same thing is requested again return that future.
if isinstance(cache[key], asyncio.Future):
return cache[key]
else:
f = asyncio.Future()
f.set_result(cache[key])
return f
else:
task = asyncio.Task(run_and_cache(func, args, kwargs))
cache[key] = task
return task
return decorator
if callable(maxsize):
return wrapper(maxsize)
else:
return wrapper
I used _make_key from functools as lru_cache does, I guess it's supposed to be private so probably better to copy it over.
This is how I think it's most easily done, using the built-in lru_cache and futures:
import asyncio
import functools
# parameterless decorator
def async_lru_cache_decorator(async_function):
#functools.lru_cache
def cached_async_function(*args, **kwargs):
coroutine = async_function(*args, **kwargs)
return asyncio.ensure_future(coroutine)
return cached_async_function
# decorator with options
def async_lru_cache(*lru_cache_args, **lru_cache_kwargs):
def async_lru_cache_decorator(async_function):
#functools.lru_cache(*lru_cache_args, **lru_cache_kwargs)
def cached_async_function(*args, **kwargs):
coroutine = async_function(*args, **kwargs)
return asyncio.ensure_future(coroutine)
return cached_async_function
return async_lru_cache_decorator
#async_lru_cache(maxsize=128)
async def your_async_function(...): ...
This is basically taking your original function and wrapping it so I can store the Coroutine it returns and convert it into a Future. This way, this can be treated as a regular function and you can lru_cache-it as you would usually do it.
Why is wrapping it in a Future necessary? Python coroutines are low level constructs and you can't await one more than once (You would get RuntimeError: cannot reuse already awaited coroutine). Futures, on the other hand, are handy and can be awaited consecutively and will return the same result.
One caveat is that caching a Future will also cache when the original functions raised an Error. The original lru_cache does not cache interrupted executions, so watch out for this edge case using the solution above.
Further tweaking can be done to merge both the parameter-less and the parameterized decorators, like the original lru_cache which supports both usages.
Another variant of lru decorator, which caches not yet finished coroutines, very useful with parallel requests to the same key:
import asyncio
from collections import OrderedDict
from functools import _make_key, wraps
def async_cache(maxsize=128, event_loop=None):
cache = OrderedDict()
if event_loop is None:
event_loop = asyncio.get_event_loop()
awaiting = dict()
async def run_and_cache(func, args, kwargs):
"""await func with the specified arguments and store the result
in cache."""
result = await func(*args, **kwargs)
key = _make_key(args, kwargs, False)
cache[key] = result
if len(cache) > maxsize:
cache.popitem(False)
cache.move_to_end(key)
return result
def decorator(func):
#wraps(func)
async def wrapper(*args, **kwargs):
key = _make_key(args, kwargs, False)
if key in cache:
return cache[key]
if key in awaiting:
task = awaiting[key]
return await asyncio.wait_for(task, timeout=None, loop=event_loop)
task = asyncio.ensure_future(run_and_cache(func, args, kwargs), loop=event_loop)
awaiting[key] = task
result = await asyncio.wait_for(task, timeout=None, loop=event_loop)
del awaiting[key]
return result
return wrapper
return decorator
async def test_async_cache(event_loop):
counter = 0
n, m = 10, 3
#async_cache(maxsize=n, event_loop=event_loop)
async def cached_function(x):
nonlocal counter
await asyncio.sleep(0) # making event loop switch to other coroutine
counter += 1
return x
tasks = [asyncio.ensure_future(cached_function(x), loop=event_loop)
for x in list(range(n)) * m]
done, pending = await asyncio.wait(tasks, loop=event_loop, timeout=1)
assert len(done) == n * m
assert counter == n
event_loop = asyncio.get_event_loop()
task = asyncio.ensure_future(test_async_cache(event_loop))
event_loop.run_until_complete(task)
I think that the simplest way is to use aiohttp_cache (documentation)
pip install aiohttp-cache
And use it in code:
from aiohttp_cache import cache, setup_cache
#cache() # <-- DECORATED FUNCTION
async def example_1(request):
return web.Response(text="Example")
app = web.Application()
app.router.add_route('GET', "/", example_1)
setup_cache(app) # <-- INITIALIZED aiohttp-cache
web.run_app(app, host="127.0.0.1")
Try async-cache :pypi async-cache :github for caching async functions in python.
It also supports function which have parameters of user defined or object type or unhashable type which is not supported in either functools.lru_cache or async_lru .
Usage:
pip install async-cache
from cache import AsyncLRU
#AsyncLRU(maxsize=128)
async def func(*args, **kwargs):
pass
I wrote a simple package named asyncio-cache - https://github.com/matan1008/asyncio-cache.
I tried to keep the code as close as possible to the original python implementation and as simple as possible.
For example:
from asyncio_cache import lru_cache
import aiohttp
#lru_cache(maxsize=128)
async def cached_get(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()

Resources