Python Asyncio and Multithreading - python-3.x

I have created a greatly simplified version of an application below that intends to use Python's asyncio and threading modules. The general structure is as follows:
import asyncio
import threading
class Node:
def __init__(self, loop):
self.loop = loop
self.tasks = set()
async def computation(self, x):
print("Node: computation called with input ", x)
await asyncio.sleep(1)
def schedule_computation(self, x):
print("Node: schedule_computation called with input ", x)
task = self.loop.create_task(self.computation(x))
self.tasks.add(task)
class Router:
def __init__(self, loop):
self.loop = loop
self.nodes = {}
def register_node(self, id):
self.nodes[id] = Node(self.loop)
def schedule_computation(self, node_id, x):
print("Router: schedule_computation called with input ", x)
self.nodes[node_id].schedule_computation(x)
class Client:
def __init__(self, router):
self.router = router
self.counter = 0
def run(self):
while True:
if self.counter == 1000000:
self.router.schedule_computation(1, 5)
self.counter += 1
def main():
loop = asyncio.get_event_loop()
# construct Router instance and register a node
router = Router(loop)
router.register_node(1)
# construct Client instance
client = Client(router)
client_thread = threading.Thread(target=client.run)
client_thread.start()
loop.run_forever()
main()
In practice the Node.computation method is doing some network I/O and thus I'd like to perform said work asynchronously. The Client.run method is synchronous and blocking and I'd like to give this function it's own thread to execute in (in fact I'd like the ability to run this method in a separate process if possible).
Upon executing this application we get the following output:
Router: schedule_computation called with input 5
Node: schedule_computation called with input 5
However, I expect that "Node: computation called with input 5" should print as well because the Node.schedule_computation method creates a task to run on loop. In summary, why does it seem that Node.computation is never scheduled?

Use loop.call_soon_threadsafe
In general, asyncio isn't thread safe
Almost all asyncio objects are not thread safe, which is typically not
a problem unless there is code that works with them from outside of a
Task or a callback. If there’s a need for such code to call a
low-level asyncio API, the loop.call_soon_threadsafe() method should
be used
https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading
SCHEDULE COMPUTATION
loop.call_soon_threadsafe(self.nodes[node_id].schedule_computation,x)
Node.computation runs on main thread
Not sure if you are aware, but even though you can use call_soon_threadsafe to initiate a coroutine from another thread. The coroutine always runs in the thread the loop was created in. If you want to run coroutines on another thread, then your background thread will need its own EventLoop also.

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Is it possible for two coroutines running in different threads can communicate with each other by asyncio.Queue?

Two coroutintes in code below, running in different threads, cannot communicate with each other by asyncio.Queue. After the producer inserts a new item in asyncio.Queue, the consumer cannot get this item from that asyncio.Queue, it gets blocked in method await self.n_queue.get().
I try to print the ids of asyncio.Queue in both consumer and producer, and I find that they are same.
import asyncio
import threading
import time
class Consumer:
def __init__(self):
self.n_queue = None
self._event = None
def run(self, loop):
loop.run_until_complete(asyncio.run(self.main()))
async def consume(self):
while True:
print("id of n_queue in consumer:", id(self.n_queue))
data = await self.n_queue.get()
print("get data ", data)
self.n_queue.task_done()
async def main(self):
loop = asyncio.get_running_loop()
self.n_queue = asyncio.Queue(loop=loop)
task = asyncio.create_task(self.consume())
await asyncio.gather(task)
async def produce(self):
print("id of queue in producer ", id(self.n_queue))
await self.n_queue.put("This is a notification from server")
class Producer:
def __init__(self, consumer, loop):
self._consumer = consumer
self._loop = loop
def start(self):
while True:
time.sleep(2)
self._loop.run_until_complete(self._consumer.produce())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
print(id(loop))
consumer = Consumer()
threading.Thread(target=consumer.run, args=(loop,)).start()
producer = Producer(consumer, loop)
producer.start()
id of n_queue in consumer: 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
I try to debug step by step in asyncio.Queue, and I find after the method self._getters.append(getter) is invoked in asyncio.Queue, the item is inserted in queue self._getters. The following snippets are all from asyncio.Queue.
async def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
"""
while self.empty():
getter = self._loop.create_future()
self._getters.append(getter)
try:
await getter
except:
# ...
raise
return self.get_nowait()
When a new item is inserted into asycio.Queue in producer, the methods below would be invoked. The variable self._getters has no items although it has same id in methods put() and set().
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
Does anyone know what's wrong with the demo code above? If the two coroutines are running in different threads, how could they communicate with each other by asyncio.Queue?
Short answer: no!
Because the asyncio.Queue needs to share the same event loop, but
An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task.
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Even though you can pass the event loop to threads, it might be dangerous to mix the different concurrency concepts. Still note, that passing the loop just means that you can add tasks to the loop from different threads, but they will still be executed in the main thread. However, adding tasks from threads can lead to race conditions in the event loop, because
Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low-level asyncio API, the loop.call_soon_threadsafe() method should be used
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Typically, you should not need to run async functions in different threads, because they should be IO bound and therefore a single thread should be sufficient to handle the work load. If you still have some CPU bound tasks, you are able to dispatch them to different threads and make the result awaitable using asyncio.to_thread, see https://docs.python.org/3/library/asyncio-task.html#running-in-threads.
There are many questions already about this topic, see e.g. Send asyncio tasks to loop running in other thread or How to combine python asyncio with threads?
If you want to learn more about the concurrency concepts, I recommend to read https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8

Dart: Store heavy object in an isolate and access its method from main isolate without reinstatiating it

is it possible in Dart to instantiate a class in an isolate, and then send message to this isolate to receive a return value from its methods (instead of spawning a new isolate and re instantiate the same class every time)? I have a class with a long initialization, and heavy methods. I want to initialize it once and then access its methods without compromising the performance of the main isolate.
Edit: I mistakenly answered this question thinking python rather than dart. snakes on the brain / snakes on a plane
I am not familiar with dart programming, but it would seem the concurrency model has a lot of similarities (isolated memory, message passing, etc..). I was able to find an example of 2 way message passing with a dart isolate. There's a little difference in how it gets set-up, and the streams are a bit simpler than python Queue's, but in general the idea is the same.
Basically:
Create a port to receive data from the isolate
Create the isolate passing it the port it will send data back on
Within the isolate, create the port it will listen on, and send the other end of it back to main (so main can send messages)
Determine and implement a simple messaging protocol for remote method call on an object contained within the isolate.
This is basically duplicating what a multiprocessing.Manager class does, however it can be helpful to have a simplified example of how it can work:
from multiprocessing import Process, Lock, Queue
from time import sleep
class HeavyObject:
def __init__(self, x):
self._x = x
sleep(5) #heavy init
def heavy_method(self, y):
sleep(.2) #medium weight method
return self._x + y
def HO_server(in_q, out_q):
ho = HeavyObject(5)
#msg format for remote method call: ("method_name", (arg1, arg2, ...), {"kwarg1": 1, "kwarg2": 2, ...})
#pass None to exit worker cleanly
for msg in iter(in_q.get, None): #get a remote call message from the queue
out_q.put(getattr(ho, msg[0])(*msg[1], **msg[2])) #call the method with the args, and put the result back on the queue
class RMC_helper: #remote method caller for convienience
def __init__(self, in_queue, out_queue, lock):
self.in_q = in_queue
self.out_q = out_queue
self.l = lock
self.method = None
def __call__(self, *args, **kwargs):
if self.method is None:
raise Exception("no method to call")
with self.l: #isolate access to queue so results don't pile up and get popped off in possibly wrong order
print("put to queue: ", (self.method, args, kwargs))
self.in_q.put((self.method, args, kwargs))
return self.out_q.get()
def __getattr__(self, name):
if not name.startswith("__"):
self.method = name
return self
else:
super().__getattr__(name)
def child_worker(remote):
print("child", remote.heavy_method(5)) #prints 10
sleep(3) #child works on something else
print("child", remote.heavy_method(2)) #prints 7
if __name__ == "__main__":
in_queue = Queue()
out_queue = Queue()
lock = Lock() #lock is used as to not confuse which reply goes to which request
remote = RMC_helper(in_queue, out_queue, lock)
Server = Process(target=HO_server, args=(in_queue, out_queue))
Server.start()
Worker = Process(target=child_worker, args=(remote, ))
Worker.start()
print("main", remote.heavy_method(3)) #this will *probably* start first due to startup time of child
Worker.join()
with lock:
in_queue.put(None)
Server.join()
print("done")

Starvation in `asyncio` loop

I have a system where two "processes" A and B run on the same asyncio event loop.
I notice that the order of the initiation of processes matters - i.e. if I start process B first then process B runs all the time, while it seems that A is being "starved" of resources vise-a-versa.
In my experience, the only reason this might happen is due to a mutex which is not being released by B, but in the following toy example it happens without any mutexs being used:
import asyncio
async def A():
while True:
print('A')
await asyncio.sleep(2)
async def B():
while True:
print('B')
await asyncio.sleep(8)
async def main():
await B()
await A()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Is in python the processes do not perform context-switch automatically? If not - how can I make both processes participate, each one in the time the other one is idle (i.e., sleeping)?
TLDR: Coroutines merely enable concurrency, they do not automatically trigger concurrency. Explicitly launch separate tasks, e.g. via create_task or gather, to run the coroutines concurrently.
async def main():
await asyncio.gather(B(), A())
Concurrency in asyncio is handled via Tasks – a close equivalent to Threads – which merely consist of coroutines/awaitables – like Threads consist of functions/callables. In general, a coroutine/awaitable itself does not equate to a separate task.
Using await X() means "start X and wait for it to complete". When using several such constructs in sequence:
async def main():
await B()
await A()
this means launching B first, and only launching A after B has completed: while async def and await allows for concurrency towards other tasks, B and A are run sequentially with respect to each other in a single task.
The simplest means to add concurrency is to explicitly create a task:
async def main():
# execute B in a new task
b_task = asyncio.create_task(B())
# execute A in the current task
await A()
await b_task
Note how B is offloaded to a new task, while one can still do a final await A() to re-use the current task.
Most async frameworks ship with high-level helpers for common concurrency scenarios. In this case, asyncio.gather is appropriate to launch several tasks at once:
async def main():
# execute B and A in new tasks
await asyncio.gather(B(), A())

Python's asyncio.Event() across different classes

I'm writing a Python program to interact with a device based on a CAN Bus. I'm using the python-can module successfully for this purpose. I'm also using asyncio to react to asynchronous events. I have written a "CanBusManager" class that is used by the "CanBusSequencer" class. The "CanBusManager" class takes care of generating/sending/receiving messages, and the CanBusSequencer drives the sequence of messages to be sent.
At some point in the sequence I want to wait until a specific message is received to "unlock" the remaining messages to be sent in the sequence. Overview in code:
main.py
async def main():
event = asyncio.Event()
sequencer = CanBusSequencer(event)
task = asyncio.create_task(sequencer.doSequence())
await task
asyncio.run(main(), debug=True)
canBusSequencer.py
from canBusManager import CanBusManager
class CanBusSequencer:
def __init__(self, event)
self.event = event
self.canManager = CanBusManager(event)
async def doSequence(self):
for index, row in self.df_sequence.iterrows():
if:...
self.canManager.sendMsg(...)
else:
self.canManager.sendMsg(...)
await self.event.wait()
self.event.clear()
canBusManager.py
import can
class CanBusManager():
def __init__(self, event):
self.event = event
self.startListening()
**EDIT**
def startListening(self):
self.msgNotifier = can.Notifier(self.canBus, self.receivedMsgCallback)
**EDIT**
def receivedMsgCallback(self, msg):
if(msg == ...):
self.event.set()
For now my program stays by the await self.event.wait(), even though the relevant message is received and the self.event.set() is executed. Running the program with debug = True reveals an
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
that I don't really get. It has to do with the asyncio event loop, somehow not properly defined/managed. I'm coming from the C++ world and I'm currently writing my first large program with Python. Any guidance would be really appreciated:)
Your question doesn't explain how you arrange for receivedMsgCallback to be invoked.
If it is invoked by a classic "async" API which uses threads behind the scenes, then it will be invoked from outside the thread that runs the event loop. According to the documentation, asyncio primitives are not thread-safe, so invoking event.set() from another thread doesn't properly synchronize with the running event loop, which is why your program doesn't wake up when it should.
If you want to do anything asyncio-related, such as invoke Event.set, from outside the event loop thread, you need to use call_soon_threadsafe or equivalent. For example:
def receivedMsgCallback(self, msg):
if msg == ...:
self.loop.call_soon_threadsafe(self.event.set)
The event loop object should be made available to the CanBusManager object, perhaps by passing it to its constructor and assigning it to self.loop.
On a side note, if you are creating a task only to await it immediately, you don't need a task in the first place. In other words, you can replace task = asyncio.create_task(sequencer.doSequence()); await task with the simpler await sequencer.doSequence().

Resources