Asyncio with two loops, best practice - python-3.x

I have two infinite loops. Their processing is lightweight. I don't want them to block each other. Is using await asyncio.sleep(0) a good practice?
This is my code
import asyncio
async def loop1():
while True:
print("loop1")
# pull data from kafka
await asyncio.sleep(0)
async def loop2():
while True:
print("loop2")
# send data to all clients using asyncio stream api
await asyncio.sleep(0)
async def main():
await asyncio.gather(loop1(), loop2())
asyncio.run(main())

Two (many more) asyncio tasks will not block each other until one of tasks have some long sync operation inside.
Both of your tasks have only network operations inside (Kafka and API requests), so none of them will block another task.
When should you use asyncio.sleep(0)?
Imagine you have some long sync operation - calculations. Calculations is not I/O operation.
This example is more like good to know, if you have such operations in real app, you have to move them in loop.run_in_executor and use concurrent.futures.ProcessPoolExecutor as executor. The example:
import asyncio
async def long_calc():
"""
Some Heavy CPU bound task.
Better make it sync function and move to ProcessPoolExecutor
"""
s = 0
for _ in range(100):
for i in range(1_000_000):
s += i**2
# comment the line and watch result
# you'll get no working messages
# that's why I use sleep(0.0) here
await asyncio.sleep(0.0)
return s
async def pinger():
"""Task which shows that app is alive"""
n = 0
while True:
await asyncio.sleep(1)
print(f"Working {n}")
n += 1
async def amain():
"""Main async function in this app"""
# run in asyncio.create_task since we want the task
# to run in parallel with long_calc +
# we do not want to wait till it will be finished
# If it were thread it would be called daemon thread
asyncio.create_task(pinger())
# await results of long task
s = await long_calc()
print(f"Done: {s}")
if __name__ == '__main__':
asyncio.run(amain())
If you need me to provide you with run_in_executor example - let me know.

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Using contextlib.redirect_stdout in an async function redirects output of other tasks

I want to redirect the output of a few lines in my code that I don't have control over, but the outputs are not relevant. I've been able to use contextlib.redirect_stdout(io.StringIO()) in a synchronous function to successfully redirect the lines I want, but I can't do it with an async function
This is what I have so far
import asyncio
import contextlib
import sys
async def long_function(val: int, semaphore: asyncio.Semaphore, file_out, old_stdout=sys.stdout):
# Only let two tasks start at a time
await semaphore.acquire()
print(f"{val}: Starting")
# Redirect stdout of ONLY the lines within this context manager
with contextlib.redirect_stdout(file_out):
await asyncio.sleep(3) # long-running task that prints output I can't control, but is not useful to me
print(f"{val}: Finished redirect")
contextlib.redirect_stdout(old_stdout)
print(f"{val}: Done")
semaphore.release()
async def main():
# I want to limit the number of concurrent tasks to 2
semaphore: asyncio.Semaphore = asyncio.Semaphore(2)
# Create a list of tasks to perform
file_out = open("file.txt", "w")
tasks = []
for i in range(0, 9):
tasks.append(long_function(i, semaphore, file_out))
# Gather/run the tasks
await asyncio.gather(*tasks)
if __name__ == '__main__':
asyncio.run(main())
When running this, however, the output of other tasks is also placed into the "file.txt" file. I only want the "Finished redirect" to be placed into the file
I see the following in the Python docs
Note that the global side effect on sys.stdout means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts.
Is there any other way to go about this, or do I just have to live with the output as-is?
Thanks for any help!

Starvation in `asyncio` loop

I have a system where two "processes" A and B run on the same asyncio event loop.
I notice that the order of the initiation of processes matters - i.e. if I start process B first then process B runs all the time, while it seems that A is being "starved" of resources vise-a-versa.
In my experience, the only reason this might happen is due to a mutex which is not being released by B, but in the following toy example it happens without any mutexs being used:
import asyncio
async def A():
while True:
print('A')
await asyncio.sleep(2)
async def B():
while True:
print('B')
await asyncio.sleep(8)
async def main():
await B()
await A()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Is in python the processes do not perform context-switch automatically? If not - how can I make both processes participate, each one in the time the other one is idle (i.e., sleeping)?
TLDR: Coroutines merely enable concurrency, they do not automatically trigger concurrency. Explicitly launch separate tasks, e.g. via create_task or gather, to run the coroutines concurrently.
async def main():
await asyncio.gather(B(), A())
Concurrency in asyncio is handled via Tasks – a close equivalent to Threads – which merely consist of coroutines/awaitables – like Threads consist of functions/callables. In general, a coroutine/awaitable itself does not equate to a separate task.
Using await X() means "start X and wait for it to complete". When using several such constructs in sequence:
async def main():
await B()
await A()
this means launching B first, and only launching A after B has completed: while async def and await allows for concurrency towards other tasks, B and A are run sequentially with respect to each other in a single task.
The simplest means to add concurrency is to explicitly create a task:
async def main():
# execute B in a new task
b_task = asyncio.create_task(B())
# execute A in the current task
await A()
await b_task
Note how B is offloaded to a new task, while one can still do a final await A() to re-use the current task.
Most async frameworks ship with high-level helpers for common concurrency scenarios. In this case, asyncio.gather is appropriate to launch several tasks at once:
async def main():
# execute B and A in new tasks
await asyncio.gather(B(), A())

Shall I use asyncio.create_task() before calling asyncio.gather()?

Was wondering if there is any benefit of directly calling asyncio.gather(*coros) rather than starting the tasks with asyncio.create_task() and then calling asyncio.gather(*tasks).
So I did this test (please tell me if you notice any bias):
import timit
test1 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [asyncio.create_task(sleep()) for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
test2 = """
async def sleep():
await asyncio.sleep(0)
async def main():
tasks = [sleep() for s in range(1000)]
await asyncio.gather(*tasks)
asyncio.run(main())
"""
print(timeit.repeat(stmt=test1, setup="import asyncio", repeat=5, number=10000))
print(timeit.repeat(stmt=test2, setup="import asyncio", repeat=5, number=10000))
Here's the result:
>TEST 1 : [123.09070299999999, 118.88883120000001, 120.92030820000002, 121.22180739999999, 116.49616249999997]
>TEST 2 : [109.63426249999998, 108.96809150000001, 110.66497140000001, 105.34163260000003, 105.78473080000003]
Seems like there is no overhead when gather() has to create the tasks - it's even faster (although ensure_future() is called internally, if I understand well).
Any thoughts on this? Shall I follow the pattern used for test 2 rather than the one used for test 1? The Zen does not help much there, but as it outlines, "There should be one-- and preferably only one --obvious way to do it".

Python 3 with asyncio tasks not executed when returned in a generator function

This first example does not work, I try to emit an async task but don't care about the response in that situation, output is empty:
from typing import Iterable
import asyncio
async def example_task():
print('example_task')
def emit() -> Iterable:
event_loop = asyncio.get_event_loop()
yield event_loop.create_task(example_task())
async def main():
emit()
await asyncio.sleep(0.5) # wait some time so the task can run
asyncio.run(main())
When I add next(emit()) to actually "read" the yielded task the output works and also in the next example it works when I put all the task into a list first:
from typing import Iterable
import asyncio
async def example_task():
print('example_task')
def emit() -> Iterable:
event_loop = asyncio.get_event_loop()
return iter([event_loop.create_task(example_task())])
async def main():
emit()
await asyncio.sleep(0.5) # wait some time so the task can run
asyncio.run(main())
This is just a simple example, the final version should be able to emit an "event" and run 1..n async tasks that can return a value but don't need to. The caller of emit should be able to decide if he awaits the result at some point or just ignore it like in the examples.
Is there any way I can do this with a generator / yield or is the only possible way to store all the tasks in a list and return an iterator after that?
The issue is that you are returning a generator with the first example where the second example has the task object which needs to be executed.
The modified version of your first example would be something like
async def main():
next(emit())
await asyncio.sleep(0.5) # wait some time so the task can run
or
async def main():
for task in emit():
await task
await asyncio.sleep(0.5) # wait some time so the task can run
Hope this explains the difference between using a generator and a iterator while creating your tasks.

Resources