Async Generator Comprehension - python-3.x

The requirement is to concurrently perform a time consuming operation for a list of data.
My current implementation:
async def expensive_routine(service) -> Optional[Any]:
await asyncio.sleep(5)
if service % 2:
return service
return None
async def producer():
# let's say
services = range(10)
#
for future in asyncio.as_completed(
[expensive_routine(service) for service in services]
):
result = await future
if result:
yield result
This is then used by:
async for x, y in producer():
print(f"I have my {x} and {y}")
the function expensive_routine returns Optional[Any]. I want to yield only the not None results.
Is there a way to perform this more efficiently or using a Comprehension ?

You mean if you can cram your nice little producer coroutine into a single-line generator expression abomination of unreadability? Why, yes!
import asyncio
import random
async def expensive_routine(service):
await asyncio.sleep(random.randint(0, 5))
if random.choice([0, 1]):
return service
return None
async def main():
async for x in (
res
for coro in asyncio.as_completed(
[expensive_routine(service) for service in range(10)]
)
if (res := (await coro)) # Python 3.8+
):
print(x)
asyncio.run(main())
Jokes aside, I don't see anything wrong with your code and I'm not sure what you mean by more efficient, since your speed here is dominated by the slow expensive_routine.
I wrote this small example because I think this is what you meant with a comprehension, but I would prefer your much more readable producer.

Related

How to use asynchronous iterator using aiter() and anext() builtins

I have gone through the documentation of aiter and anext (New in version 3.10). But not understanding how to use them.
I have the following program:
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
How can I use aiter and anext builtin here?
Like with the regular synchronous iter and next builtins, you rarely need to use the new builtins directly. The async for loop in main calls the __aiter__ and __anext__ methods of your class already. If that does all you want, you're home free.
You only need to explicitly use aiter and anext if you are writing code that interacts with an asynchronous iterator in some way not directly supported by a async for loop. For instance, here's an asynchronous generator that yields pairs of values from the iterable it's given:
async def pairwise(aiterable, default=None):
ait = aiter(aiterable) # get a reference to the iterator
async for x in ait:
yield x, await anext(ait, default) # get an extra value, yield a 2-tuple
If you loop on pairwise(AIter(10)) in your main function, you'll find that it now prints tuples of numbers, like finally (0, 1). Before each tuple, you'll see two sets of the begin and end lines printed by the iterator class, one for each value that ends up in the paired result.

How to get a regular iterator from an asynchronous iterator?

Got an async iterable. Need a regular iterable.
asyc def aiter2iter(aiter):
l = []
async for chunk in aiter:
l.append(chunk)
return l
regular_iterable = await aiter2iter(my_async_iterable)
for chunk in regular_iterable:
print('Hooray! No async required here!')
Is this the way to go or am I reinventing the wheel?
Is there any way provided by Python to convert an async iterable to a regular iterable?
Also is what I wrote even correct? Did I not miss anything?
Your way works alright. I would also try unsync when working between async/sync functions.
Given
import time
import random
import asyncio
from unsync import unsync
# Sample async iterators
class AsyncIterator:
"""Yield random numbers."""
def __aiter__(self):
return self
async def __anext__(self):
await asyncio.sleep(0.1)
return random.randint(0, 10)
async def anumbers(n=10):
"""Yield the first `n` random numbers."""
i = 0
async for x in AsyncIterator():
if i == n:
return
yield x
i +=1
Code
Rather than awaiting and reiterating the result, we can just call the result():
#unsync
async def aiterate(aiter):
"""Return a list from an aiter object."""
return [x async for x in aiter]
aiterate(anumbers(5)).result()
# [8, 2, 5, 8, 9]
Details
Here's a description, from Python Byte's episode 73:
You just take any async function, and put an #unsync decorator. ... it will basically wrap it up and do all that asyncio initialization stuff ... then you can wait on the result, or not wait on the result, however you like. ... then if you put that on a regular function, not an async one, it'll cause it to run on a thread pool thread, on thread pool executor.

Process tasks in batchs in asyncio

I have got a funcion that generates tasks (io bound tasks):
def get_task():
while True:
new_task = _get_task()
if new_task is not None:
yield new_task
else:
sleep(1)
And I am trying to write a consumer in asyncio that will be processing max 10 tasks at the time and one task is finished then will take new one.
I am not sure if I should use semaphores or is there any kind of asycio pool executor? I started to write a pseudocode with threads:
def run(self)
while True:
self.semaphore.acquire() # first acquire, then get task
t = get_task()
self.process_task(t)
def process_task(self, task):
try:
self.execute_task(task)
self.mark_as_done(task)
except:
self.mark_as_failed(task)
self.semaphore.release()
Could anyone help me? I have no clue where to put async/await keywords
Simple task cap using asyncio.Sepmaphore
async def max10(task_generator):
semaphore = asyncio.Semaphore(10)
async def bounded(task):
async with semaphore:
return await task
async for task in task_generator:
asyncio.ensure_future(bounded(task))
The problem with this solution is that tasks are being drawn from the generator greedily. For example, if generator reads from a large database, the program could run out of memory.
Other than that it's idiomatic and well-behaved.
A solution, that uses async generator protocol to pull new tasks on demand:
async def max10(task_generator):
tasks = set()
gen = task_generator.__aiter__()
try:
while True:
while len(tasks) < 10:
tasks.add(await gen.__anext__())
_done, tasks = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
except StopAsyncIteration:
await asyncio.gather(*tasks)
It may be considered sub-optimal, because it doesn't start executing tasks until 10 are available.
And here's concise and magic solution using worker pattern:
async def max10(task_generator):
async def worker():
async for task in task_generator:
await task
await asyncio.gather(*[worker() for i in range(10)])
It relies on a somewhat counter-intuitive property of being able to have multiple async iterators over the same async generator, in which case each generated item is seen by only one iterator.
My gut tells me that none of these solutions behaves properly on cancellation.
Async isn't threads. If for example you have tasks that are file IO bound then write them async using aiofiles
async with aiofiles.open('filename', mode='r') as f:
contents = await f.read()
Then replace task with your tasks. If you want to only run 10 at a time await asyncio.gather every 10 tasks.
import asyncio
async def task(x):
await asyncio.sleep(0.5)
print( x, "is done" )
async def run(loop):
futs = []
for x in range(50):
futs.append( task(x) )
await asyncio.gather( *futs )
loop = asyncio.get_event_loop()
loop.run_until_complete( run(loop) )
loop.close()
If you can't write the tasks async and need threads this is a basic example using asyncio's ThreadPoolExecutor. Note that with max_workers=5 only 5 tasks are run at a time.
import time
from concurrent.futures import ThreadPoolExecutor
import asyncio
def blocking(x):
time.sleep(1)
print( x, "is done" )
async def run(loop):
futs = []
executor = ThreadPoolExecutor(max_workers=5)
for x in range(15):
future = loop.run_in_executor(executor, blocking, x)
futs.append( future )
await asyncio.sleep(4)
res = await asyncio.gather( *futs )
loop = asyncio.get_event_loop()
loop.run_until_complete( run(loop) )
loop.close()
As pointed out by Dima Tismek, using semaphores to limit concurrency is vulnerable to exhausting task_generator too eagerly, since there is no backpressure between obtaining the tasks and submitting them to the event loop. A better option, also explored by the other answer, is not to spawn a task as soon as the generator has produced an item, but to create a fixed number of workers that exhaust the generator concurrently.
There are two areas where the code could be improved:
there is no need for a semaphore - it is superfluous when the number of tasks is fixed to begin with;
handling cancellation of generated tasks and of the throttling task.
Here is an implementation that tackles both issues:
async def throttle(task_generator, max_tasks):
it = task_generator.__aiter__()
cancelled = False
async def worker():
async for task in it:
try:
await task
except asyncio.CancelledError:
# If a generated task is canceled, let its worker
# proceed with other tasks - except if it's the
# outer coroutine that is cancelling us.
if cancelled:
raise
# other exceptions are propagated to the caller
worker_tasks = [asyncio.create_task(worker())
for i in range(max_tasks)]
try:
await asyncio.gather(*worker_tasks)
except:
# In case of exception in one worker, or in case we're
# being cancelled, cancel all workers and propagate the
# exception.
cancelled = True
for t in worker_tasks:
t.cancel()
raise
A simple test case:
async def mock_task(num):
print('running', num)
await asyncio.sleep(random.uniform(1, 5))
print('done', num)
async def mock_gen():
tnum = 0
while True:
await asyncio.sleep(.1 * random.random())
print('generating', tnum)
yield asyncio.create_task(mock_task(tnum))
tnum += 1
if __name__ == '__main__':
asyncio.run(throttle(mock_gen(), 3))

Appending to merged async generators in Python

I'm trying to merge a bunch of asynchronous generators in Python 3.7 while still adding new async generators on iteration. I'm currently using aiostream to merge my generators:
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
print(v)
if __name__ == '__main__':
run(main())
However, I need to be able to continue to add to the running tasks once the loop has begun. Something like.
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
if v == 50:
tasks.merge(go())
print(v)
if __name__ == '__main__':
run(main())
The closest I've got to this is using the aiostream library but maybe this can also be written fairly neatly with just the native asyncio standard library.
Here is an implementation that should work efficiently even with a large number of async iterators:
class merge:
def __init__(self, *iterables):
self._iterables = list(iterables)
self._wakeup = asyncio.Event()
def _add_iters(self, next_futs, on_done):
for it in self._iterables:
it = it.__aiter__()
nfut = asyncio.ensure_future(it.__anext__())
nfut.add_done_callback(on_done)
next_futs[nfut] = it
del self._iterables[:]
return next_futs
async def __aiter__(self):
done = {}
next_futs = {}
def on_done(nfut):
done[nfut] = next_futs.pop(nfut)
self._wakeup.set()
self._add_iters(next_futs, on_done)
try:
while next_futs:
await self._wakeup.wait()
self._wakeup.clear()
for nfut, it in done.items():
try:
ret = nfut.result()
except StopAsyncIteration:
continue
self._iterables.append(it)
yield ret
done.clear()
if self._iterables:
self._add_iters(next_futs, on_done)
finally:
# if the generator exits with an exception, or if the caller stops
# iterating, make sure our callbacks are removed
for nfut in next_futs:
nfut.remove_done_callback(on_done)
def append_iter(self, new_iter):
self._iterables.append(new_iter)
self._wakeup.set()
The only change required for your sample code is that the method is named append_iter, not merge.
This can be done using stream.flatten with an asyncio queue to store the new generators.
import asyncio
from aiostream import stream, pipe
async def main():
queue = asyncio.Queue()
await queue.put(go())
await queue.put(go())
await queue.put(go())
xs = stream.call(queue.get)
ys = stream.cycle(xs)
zs = stream.flatten(ys, task_limit=5)
async with zs.stream() as streamer:
async for item in streamer:
if item == 50:
await queue.put(go())
print(item)
Notice that you may tune the number of tasks that can run at the same time using the task_limit argument. Also note that zs can be elegantly defined using the pipe syntax:
zs = stream.call(queue.get) | pipe.cycle() | pipe.flatten(task_limit=5)
Disclaimer: I am the project maintainer.

async def and coroutines, what's the link

I'm trying to wrap my head around the new asyncio functionnalities which appeared in Python3.
I started from a simple worker example found on stackoverflow, modified a bit :
import asyncio, random
q = asyncio.Queue()
#asyncio.coroutine
def produce(name):
while True:
value = random.random()
yield from q.put(value)
print("Produced by {0}".format(name))
yield from asyncio.sleep(1.0 + random.random())
#asyncio.coroutine
def consume(name):
while True:
value = yield from q.get()
print("Consumed by {0} ({1})".format(name, q.qsize()))
yield from asyncio.sleep(1.2 + random.random())
loop = asyncio.get_event_loop()
loop.create_task(produce('X'))
loop.create_task(produce('Y'))
loop.create_task(consume('A'))
loop.create_task(consume('B'))
loop.run_forever()
I mostly understand how it works (except perhaps for the yield from asyncio.sleep()... Is it a placeholder for a delegated, but blocking function ? Where does it yield to ?)
But, above all, how could I transform this example to use the new fancy async def and await keywords ? And what would be the gain ?
This article is what I found to be the most helpful.
http://www.snarky.ca/how-the-heck-does-async-await-work-in-python-3-5
Just replace
#asyncio.coroutine
def f(arg)
with
async def f(arg)
and yield from with await in your code.
Also read PEP 412 about async with and async for.

Resources