I have a system where two "processes" A and B run on the same asyncio event loop.
I notice that the order of the initiation of processes matters - i.e. if I start process B first then process B runs all the time, while it seems that A is being "starved" of resources vise-a-versa.
In my experience, the only reason this might happen is due to a mutex which is not being released by B, but in the following toy example it happens without any mutexs being used:
import asyncio
async def A():
while True:
print('A')
await asyncio.sleep(2)
async def B():
while True:
print('B')
await asyncio.sleep(8)
async def main():
await B()
await A()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Is in python the processes do not perform context-switch automatically? If not - how can I make both processes participate, each one in the time the other one is idle (i.e., sleeping)?
TLDR: Coroutines merely enable concurrency, they do not automatically trigger concurrency. Explicitly launch separate tasks, e.g. via create_task or gather, to run the coroutines concurrently.
async def main():
await asyncio.gather(B(), A())
Concurrency in asyncio is handled via Tasks – a close equivalent to Threads – which merely consist of coroutines/awaitables – like Threads consist of functions/callables. In general, a coroutine/awaitable itself does not equate to a separate task.
Using await X() means "start X and wait for it to complete". When using several such constructs in sequence:
async def main():
await B()
await A()
this means launching B first, and only launching A after B has completed: while async def and await allows for concurrency towards other tasks, B and A are run sequentially with respect to each other in a single task.
The simplest means to add concurrency is to explicitly create a task:
async def main():
# execute B in a new task
b_task = asyncio.create_task(B())
# execute A in the current task
await A()
await b_task
Note how B is offloaded to a new task, while one can still do a final await A() to re-use the current task.
Most async frameworks ship with high-level helpers for common concurrency scenarios. In this case, asyncio.gather is appropriate to launch several tasks at once:
async def main():
# execute B and A in new tasks
await asyncio.gather(B(), A())
Related
i'm ashamed to admit i've been using python's asyncio for a long time without really understanding how it works and now i'm in a pickle. in pseudo code, my current program is like this:
async def api_function1(parameters):
result = await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
result = await asyncio.gather(*[some_other_thing2(p) for p in parameters])
def a(initial_parameters):
output = []
data = asyncio.run(api_function1(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function1(get_parameters_from_data(data)))
output.append(data)
if some _condition is True:
break
return output
def b(initial_parameters):
output = []
data = asyncio.run(api_function2(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function2(get_parameters_from_data(data)))
output.append(data)
if some condition is True:
break
return output
a() and b() get data from two different rest api endpoints, each with its own rate limits and nuances. i want to run a() and b() simultaneously.
what's the best/easiest way of structuring the program so a() and b() can run simultaneously?
I tried making a() and b() both async methods and tried to await them simultaneously, i.e. something like
async a(initial_parameters):
...
async b(initial_parameters):
...
A = await a(initial_parameters)
B = await b(initial_parameters)
but it didn't work, so based on the docs, I'm guessing maybe i need to manually get the event_loop and pass it as an argument to a() and b() which would pass them to api_function2() and api_function2(), and then close it manually when both tasks are donw but not really sure if i'm on the right track or how to do it.
Also open to better design pattern for this if you have one in mind
There is no reason why you can't nest calls to asyncio.gather. If you want to run a() and b() simultaneously, you must make both of them coroutines. And you can't use asyncio.run() inside either one of them, since that is a blocking call - it doesn't return until its argument has completed. You need to replace all the calls to asyncio.run() in a() and b() with await expressions. You will end up with something that looks like this:
async def api_function1(parameters):
return await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
return await asyncio.gather(*[some_other_thing2(p) for p in parameters])
async def a(initial_parameters):
output = []
data = await api_function1(initial_parameters)
output.append(data)
while True:
data = await api_function1(get_parameters_from_data(data))
output.append(data)
if some _condition is True:
break
return output
async def b(initial_parameters):
output = []
data = await api_function2(initial_parameters)
output.append(data)
while True:
data = await api_function2(get_parameters_from_data(data))
output.append(data)
if some condition is True:
break
return output
async def main():
a_data, b_data = asyncio.gather(a(initial_parameters), b(initial_parameters))
async def main():
task_a = asyncio.create_task(a(initial_parameters))
task_b = asyncio.create_task(b(initial_parameters))
a_data = await task_a
b_data = await task_b
asyncio.run(main())
This is still pseudocode.
I have given two possible ways of writing main(), one using asyncio.gather and the other using two calls to asyncio.create_task. Both versions create two tasks that run simultaneously, but the latter version doesn't require you to collect all the tasks in one place and start them all at the same time, as gather does. If gather works for your requirements, as it does here, it is more convenient.
Finally, a call to asyncio.run starts the program. The docs recommend having only one call to asyncio.run per program.
The two api functions should return something instead of setting a local variable.
In asyncio the crucial concept is the Task. It is Tasks that cooperate to provide simultaneous execution. Asyncio.gather actually creates Tasks under the hood, even though you typically pass it a list of coroutines. That's how it runs things in parallel.
Two coroutintes in code below, running in different threads, cannot communicate with each other by asyncio.Queue. After the producer inserts a new item in asyncio.Queue, the consumer cannot get this item from that asyncio.Queue, it gets blocked in method await self.n_queue.get().
I try to print the ids of asyncio.Queue in both consumer and producer, and I find that they are same.
import asyncio
import threading
import time
class Consumer:
def __init__(self):
self.n_queue = None
self._event = None
def run(self, loop):
loop.run_until_complete(asyncio.run(self.main()))
async def consume(self):
while True:
print("id of n_queue in consumer:", id(self.n_queue))
data = await self.n_queue.get()
print("get data ", data)
self.n_queue.task_done()
async def main(self):
loop = asyncio.get_running_loop()
self.n_queue = asyncio.Queue(loop=loop)
task = asyncio.create_task(self.consume())
await asyncio.gather(task)
async def produce(self):
print("id of queue in producer ", id(self.n_queue))
await self.n_queue.put("This is a notification from server")
class Producer:
def __init__(self, consumer, loop):
self._consumer = consumer
self._loop = loop
def start(self):
while True:
time.sleep(2)
self._loop.run_until_complete(self._consumer.produce())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
print(id(loop))
consumer = Consumer()
threading.Thread(target=consumer.run, args=(loop,)).start()
producer = Producer(consumer, loop)
producer.start()
id of n_queue in consumer: 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
I try to debug step by step in asyncio.Queue, and I find after the method self._getters.append(getter) is invoked in asyncio.Queue, the item is inserted in queue self._getters. The following snippets are all from asyncio.Queue.
async def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
"""
while self.empty():
getter = self._loop.create_future()
self._getters.append(getter)
try:
await getter
except:
# ...
raise
return self.get_nowait()
When a new item is inserted into asycio.Queue in producer, the methods below would be invoked. The variable self._getters has no items although it has same id in methods put() and set().
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
Does anyone know what's wrong with the demo code above? If the two coroutines are running in different threads, how could they communicate with each other by asyncio.Queue?
Short answer: no!
Because the asyncio.Queue needs to share the same event loop, but
An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task.
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Even though you can pass the event loop to threads, it might be dangerous to mix the different concurrency concepts. Still note, that passing the loop just means that you can add tasks to the loop from different threads, but they will still be executed in the main thread. However, adding tasks from threads can lead to race conditions in the event loop, because
Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low-level asyncio API, the loop.call_soon_threadsafe() method should be used
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Typically, you should not need to run async functions in different threads, because they should be IO bound and therefore a single thread should be sufficient to handle the work load. If you still have some CPU bound tasks, you are able to dispatch them to different threads and make the result awaitable using asyncio.to_thread, see https://docs.python.org/3/library/asyncio-task.html#running-in-threads.
There are many questions already about this topic, see e.g. Send asyncio tasks to loop running in other thread or How to combine python asyncio with threads?
If you want to learn more about the concurrency concepts, I recommend to read https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8
I have two infinite loops. Their processing is lightweight. I don't want them to block each other. Is using await asyncio.sleep(0) a good practice?
This is my code
import asyncio
async def loop1():
while True:
print("loop1")
# pull data from kafka
await asyncio.sleep(0)
async def loop2():
while True:
print("loop2")
# send data to all clients using asyncio stream api
await asyncio.sleep(0)
async def main():
await asyncio.gather(loop1(), loop2())
asyncio.run(main())
Two (many more) asyncio tasks will not block each other until one of tasks have some long sync operation inside.
Both of your tasks have only network operations inside (Kafka and API requests), so none of them will block another task.
When should you use asyncio.sleep(0)?
Imagine you have some long sync operation - calculations. Calculations is not I/O operation.
This example is more like good to know, if you have such operations in real app, you have to move them in loop.run_in_executor and use concurrent.futures.ProcessPoolExecutor as executor. The example:
import asyncio
async def long_calc():
"""
Some Heavy CPU bound task.
Better make it sync function and move to ProcessPoolExecutor
"""
s = 0
for _ in range(100):
for i in range(1_000_000):
s += i**2
# comment the line and watch result
# you'll get no working messages
# that's why I use sleep(0.0) here
await asyncio.sleep(0.0)
return s
async def pinger():
"""Task which shows that app is alive"""
n = 0
while True:
await asyncio.sleep(1)
print(f"Working {n}")
n += 1
async def amain():
"""Main async function in this app"""
# run in asyncio.create_task since we want the task
# to run in parallel with long_calc +
# we do not want to wait till it will be finished
# If it were thread it would be called daemon thread
asyncio.create_task(pinger())
# await results of long task
s = await long_calc()
print(f"Done: {s}")
if __name__ == '__main__':
asyncio.run(amain())
If you need me to provide you with run_in_executor example - let me know.
I have some code that runs multiple tasks in a loop like this:
done, running = await asyncio.wait(running, timeout=timeout_seconds,
return_when=asyncio.FIRST_COMPLETED)
I need to be able to determine which of these timed out. According to the documentation:
Note that this function does not raise asyncio.TimeoutError. Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set.
I could use wait_for() instead, but that function only accepts a single awaitable, whereas I need to specify multiple. Is there any way to determine which one from the set of awaitables I passed to wait() was responsible for the timeout?
Alternatively, is there a way to use wait_for() with multiple awaitables?
Your can try that tricks, probably it is not good solution:
import asyncio
async def foo():
return 42
async def need_some_sleep():
await asyncio.sleep(1000)
return 42
async def coro_wrapper(coro):
result = await asyncio.wait_for(coro(), timeout=10)
return result
loop = asyncio.get_event_loop()
done, running = loop.run_until_complete(asyncio.wait(
[coro_wrapper(foo), coro_wrapper(need_some_sleep)],
return_when=asyncio.FIRST_COMPLETED
)
)
for item in done:
print(item.result())
print(done, running)
Here is how I do it:
done, pending = await asyncio.wait({
asyncio.create_task(task, name=index)
for index, task in enumerate([
my_coroutine(),
my_coroutine(),
my_coroutine(),
])
},
return_when=asyncio.FIRST_COMPLETED
)
num = next(t.get_name() for t in done)
if num == 2:
pass
Use enumerate to name the tasks as they are created.
If I have a coroutine which runs a task which should not be cancelled, I will wrap that task in asyncio.shield().
It seems the behavior of cancel and shield is not what I would expect. If I have a task wrapped in shield and I cancel it, the await-ing coroutine returns from that await statement immediately rather than awaiting for the task to finish as shield would suggest. Additionally, the task that was run with shield continues to run but its future is now cancelled an not await-able.
From the docs:
except that if the coroutine containing it is cancelled, the Task running in something() is not cancelled. From the point of view of something(), the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a CancelledError.
These docs do not imply strongly that the caller is cancelled potentially before the callee finishes, which is the heart of my issue.
What is the proper method to shield a task from cancellation and then wait for it to complete before returning.
It would make more sense if asyncio.shield() raised the asyncio.CancelledError after the await-ed task has completed, but obviously there is some other idea going on here that I don't understand.
Here is a simple example:
import asyncio
async def count(n):
for i in range(n):
print(i)
await asyncio.sleep(1)
async def t():
try:
await asyncio.shield(count(5))
except asyncio.CancelledError:
print('This gets called at 3, not 5')
return 42
async def c(ft):
await asyncio.sleep(3)
ft.cancel()
async def m():
ft = asyncio.ensure_future(t())
ct = asyncio.ensure_future(c(ft))
r = await ft
print(r)
loop = asyncio.get_event_loop()
loop.run_until_complete(m())
# Running loop forever continues to run shielded task
# but I'd rather not do that
#loop.run_forever()
It seems the behavior of cancel and shield is not what I would expect. If I have a task wrapped in shield and I cancel it, the await-ing coroutine returns from that await statement immediately rather than awaiting for the task to finish as shield would suggest. Additionally, the task that was run with shield continues to run but its future is now cancelled an not await-able.
Conceptually shield is like a bullet-proof vest that absorbs the bullet and protects the wearer, but is itself destroyed by the impact. shield absorbs the cancellation, and reports itself as canceled, raising a CancelledError when asked for result, but allows the protected task to continue running. (Artemiy's answer explains the implementation.)
Cancellation of the future returned by shield could have been implemented differently, e.g. by completely ignoring the cancel request. The current approach ensures that the cancellation "succeeds", i.e. that the canceller can't tell that the cancellation was in fact circumvented. This is by design, and it makes the cancellation mechanism more consistent on the whole.
What is the proper method to shield a task from cancellation and then wait for it to complete before returning
By keeping two objects: the original task, and the shielded task. You pass the shielded task to whatever function it is that might end up canceling it, and you await the original one. For example:
async def coro():
print('starting')
await asyncio.sleep(2)
print('done sleep')
async def cancel_it(some_task):
await asyncio.sleep(0.5)
some_task.cancel()
print('cancellation effected')
async def main():
loop = asyncio.get_event_loop()
real_task = loop.create_task(coro())
shield = asyncio.shield(real_task)
# cancel the shield in the background while we're waiting
loop.create_task(cancel_it(shield))
await real_task
assert not real_task.cancelled()
assert shield.cancelled()
asyncio.get_event_loop().run_until_complete(main())
The code waits for the task to fully complete, despite its shield getting cancelled.
It would make more sense if asyncio.shield() raised the asyncio.CancelledError after the await-ed task has completed, but obviously there is some other idea going on here that I don't understand.
asyncio.shield
creates a dummy future, that may be cancelled
executes the wrapped coroutine as future and bind to it a callback on done to setting a result for the dummy future from the completed wrapped coroutine
returns the dummy future
You can see the implementation here
What is the proper method to shield a task from cancellation and then wait for it to complete before returning
You should shield count(5) future
async def t():
c_ft = asyncio.ensure_future(count(5))
try:
await asyncio.shield(c_ft)
except asyncio.CancelledError:
print('This gets called at 3, not 5')
await c_ft
return 42
or t() future
async def t():
await count(5)
return 42
async def m():
ft = asyncio.ensure_future(t())
shielded_ft = asyncio.shield(ft)
ct = asyncio.ensure_future(c(shielded_ft))
try:
r = await shielded_ft
except asyncio.CancelledError:
print('Shield cancelled')
r = await ft
Shielding a coroutine
we can also shield a task (but this code is about shilding a coroutine)
import asyncio
async def task1():
print("Starting task1")
await asyncio.sleep(1)
print("Ending task1")
print("SUCCESS !!")
async def task2(some_task):
print("Starting task2")
await asyncio.sleep(2)
print("Cancelling task1")
some_task.cancel()
print("Ending task2")
async def main():
# coroutines
co_task1 = task1()
# creating task from coroutines
task1_shielded = asyncio.shield(co_task1) # Create a shielded task1
task2_obj = asyncio.create_task(coro=task2(task1_shielded))
await task2_obj
await task1_shielded
asyncio.run(main())
out put:
Starting task1
Starting task2
Ending task1
SUCCESS !!
Cancelling task1
Ending task2