Why is the asyncio is blocking with the processPool?

Why is the asyncio is blocking with the processPool? - python-3.x

I have a following code:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
print("In blocking waiting")
time.sleep(x) # Pretend this is expensive calculations
print("after blocking waiting")
return x * 5
#asyncio.coroutine
def main():
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 2) # This does not
print("after process pool")
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
In blocking waiting
after blocking waiting
after process pool
10
But I was expecting the process pool will run the code in different process. So I was expecting the output to be:
Expecting output:
In blocking waiting
after process pool
after blocking waiting
10
I thought if we run the code on process pool it would not block the main loop.But in the output it came back to the main event loop after it is done with the blocking function.
What is blocking the event loop? Is it the blocking_function? If it is the blocking_function what is the use of having the process pool?

yield from here means "wait for coroutine to complete and return its result". Comparing to Python threading API, it is like calling join().
To get desired result, use something like this:
#asyncio.coroutine
def main():
executor = ProcessPoolExecutor()
task = loop.run_in_executor(executor, blocking_func, 2)
# at this point your blocking func is already running
# in the executor process
print("after process pool")
out = yield from task
print(out)

Coroutines arent' t separate processes. The difference is that coroutines need to give up control to the loop by themselves. This means if you have a blocking coroutine then it will block the whole loop.
The reason you use coroutines is mainly to handle I/O activities. If you are waiting for a message you can simply check a socket and if nothing happens you will return to the main loop. Then other coroutines can be handled before finally the control comes back to the IO function.
In your case it makes sense to use await asyncio.sleep(x) instead of time.sleep(x). This way control is suspended from blocking_func() for the sleep time. Afterwards control goes back there and the result should be as you expected it.
More infos: https://docs.python.org/3/library/asyncio.html

Related

Is it possible for two coroutines running in different threads can communicate with each other by asyncio.Queue?

Two coroutintes in code below, running in different threads, cannot communicate with each other by asyncio.Queue. After the producer inserts a new item in asyncio.Queue, the consumer cannot get this item from that asyncio.Queue, it gets blocked in method await self.n_queue.get().
I try to print the ids of asyncio.Queue in both consumer and producer, and I find that they are same.
import asyncio
import threading
import time
class Consumer:
def __init__(self):
self.n_queue = None
self._event = None
def run(self, loop):
loop.run_until_complete(asyncio.run(self.main()))
async def consume(self):
while True:
print("id of n_queue in consumer:", id(self.n_queue))
data = await self.n_queue.get()
print("get data ", data)
self.n_queue.task_done()
async def main(self):
loop = asyncio.get_running_loop()
self.n_queue = asyncio.Queue(loop=loop)
task = asyncio.create_task(self.consume())
await asyncio.gather(task)
async def produce(self):
print("id of queue in producer ", id(self.n_queue))
await self.n_queue.put("This is a notification from server")
class Producer:
def __init__(self, consumer, loop):
self._consumer = consumer
self._loop = loop
def start(self):
while True:
time.sleep(2)
self._loop.run_until_complete(self._consumer.produce())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
print(id(loop))
consumer = Consumer()
threading.Thread(target=consumer.run, args=(loop,)).start()
producer = Producer(consumer, loop)
producer.start()
id of n_queue in consumer: 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
I try to debug step by step in asyncio.Queue, and I find after the method self._getters.append(getter) is invoked in asyncio.Queue, the item is inserted in queue self._getters. The following snippets are all from asyncio.Queue.
async def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
"""
while self.empty():
getter = self._loop.create_future()
self._getters.append(getter)
try:
await getter
except:
# ...
raise
return self.get_nowait()
When a new item is inserted into asycio.Queue in producer, the methods below would be invoked. The variable self._getters has no items although it has same id in methods put() and set().
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
Does anyone know what's wrong with the demo code above? If the two coroutines are running in different threads, how could they communicate with each other by asyncio.Queue?

Short answer: no!
Because the asyncio.Queue needs to share the same event loop, but
An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task.
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Even though you can pass the event loop to threads, it might be dangerous to mix the different concurrency concepts. Still note, that passing the loop just means that you can add tasks to the loop from different threads, but they will still be executed in the main thread. However, adding tasks from threads can lead to race conditions in the event loop, because
Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low-level asyncio API, the loop.call_soon_threadsafe() method should be used
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Typically, you should not need to run async functions in different threads, because they should be IO bound and therefore a single thread should be sufficient to handle the work load. If you still have some CPU bound tasks, you are able to dispatch them to different threads and make the result awaitable using asyncio.to_thread, see https://docs.python.org/3/library/asyncio-task.html#running-in-threads.
There are many questions already about this topic, see e.g. Send asyncio tasks to loop running in other thread or How to combine python asyncio with threads?
If you want to learn more about the concurrency concepts, I recommend to read https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8

Python: asyncio loops with threads

Could you tell me if this is a correct approach to build several independent async loops inside own threads?
def init():
print("Initializing Async...")
global loop_heavy
loop_heavy = asyncio.new_event_loop()
start_loop(loop_heavy)
def start_loop(loop):
thread = threading.Thread(target=loop.run_forever)
thread.start()
def submit_heavy(task):
future = asyncio.run_coroutine_threadsafe(task, loop_heavy)
try:
future.result()
except Exception as e:
print(e)
def stop():
loop_heavy.call_soon_threadsafe(loop_heavy.stop)
async def heavy():
print("3. heavy start %s" % threading.current_thread().name)
await asyncio.sleep(3) # or await asyncio.sleep(3, loop=loop_heavy)
print("4. heavy done")
Then I am testing it with:
if __name__ == "__main__":
init()
print("1. submit heavy: %s" % threading.current_thread().name)
submit_heavy(heavy())
print("2. submit is done")
stop()
I am expecting to see 1->3->2->4 but in fact it is 1->3->4->2:
Initializing Async...
1. submit heavy: MainThread
3. heavy start Thread-1
4. heavy done
2. submit is done
I think that I miss something in understanding async and threads.
Threads are different. Why am I waiting inside MainThread until the job inside Thread-1 is finished?

Why am I waiting inside MainThread until the job inside Thread-1 is finished?
Good question, why are you?
One possible answer is, because you actually want to block the current thread until the job is finished. This is one of the reasons to put the event loop in another thread and use run_coroutine_threadsafe.
The other possible answer is that you don't have to if you don't want. You can simply return from submit_heavy() the concurrent.futures.Future object returned by run_coroutine_threadsafe, and leave it to the caller to wait for the result (or check if one is ready) at their own leisure.
Finally, if your goal is just to run a regular function "in the background" (without blocking the current thread), perhaps you don't need asyncio at all. Take a look at the concurrent.futures module, whose ThreadPoolExecutor allows you to easily submit a function to a thread pool and leave it to execute unassisted.

I will add one of the possible solutions that I found from the asyncio documentation.
I'm not sure that it is the correct way, but it works as expected (MainThread is not blocked by the execution of the child thread)
Running Blocking Code
Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second.
An executor can be used to run a task in a different thread or even in a different process to avoid blocking block the OS thread with the event loop. See the loop.run_in_executor() method for more details.
Applying to my code:
import asyncio
import threading
import concurrent.futures
import multiprocessing
import time
def init():
print("Initializing Async...")
global loop, thread_executor_pool
thread_executor_pool = concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count())
loop = asyncio.get_event_loop()
thread = threading.Thread(target=loop.run_forever)
thread.start()
def submit_task(task, *args):
loop.run_in_executor(thread_executor_pool, task, *args)
def stop():
loop.call_soon_threadsafe(loop.stop)
thread_executor_pool.shutdown()
def blocked_task(msg1, msg2):
print("3. task start msg: %s, %s, thread: %s" % (msg1, msg2, threading.current_thread().name))
time.sleep(3)
print("4. task is done -->")
if __name__ == "__main__":
init()
print("1. --> submit task: %s" % threading.current_thread().name)
submit_task(blocked_task, "a", "b")
print("2. --> submit is done")
stop()
Output:
Initializing Async...
1. --> submit task: MainThread
3. task start msg: a, b, thread: ThreadPoolExecutor-0_0
2. --> submit is done
4. task is done -->
Correct me if there are still any mistakes or it can be done in the other way.

Using Asyncio subprocess in a pyramid view

I am trying to run a asyncio sub-process in a pyramid view but the view hangs and the async task appears to never complete. I can run this example outside of a pyramid view and it works.
With that said I have tested originally using loop = asyncio.get_event_loop() but this tells me RuntimeError: There is no current event loop in thread 'Dummy-2'
There are certainly things I don't fully understand here. Like maybe the view thread is different to the main thread so get_event_loop doesn't work.
So does anybody know why my async task might not yield its result in this scenario? This is a naive example.
#asyncio.coroutine
def async_task(dir):
# This task can be of varying length for each handled directory
print("Async task start")
create = asyncio.create_subprocess_exec(
'ls',
'-l',
dir,
stdout=asyncio.subprocess.PIPE)
proc = yield from create
# Wait for the subprocess exit
data = yield from proc.stdout.read()
exitcode = yield from proc.wait()
return (exitcode, data)
#view_config(
route_name='test_async',
request_method='GET',
renderer='json'
)
def test_async(request):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
dirs = ['/tmp/1/', '/tmp/2/', '/tmp/3/']
tasks = []
for dir in dirs:
tasks.append(asyncio.ensure_future(async_task(dir), loop=loop))
loop.run_until_complete(asyncio.gather(*tasks))
loop.close()
return

You are invoking loop.run_until_complete in your view so clearly it is going to block until complete!
If you want to use asyncio with a WSGI app then you need to do so in another thread. For example you could spin up a thread that contains the eventloop and executes your async code. WSGI code is all synchronous and so any async code must be done this way, with it's own issues, or you can just live with it blocking the request thread like you're doing now.

Monitoring the asyncio event loop

I am writing an application using python3 and am trying out asyncio for the first time. One issue I have encountered is that some of my coroutines block the event loop for longer than I like. I am trying to find something along the lines of top for the event loop that will show how much wall/cpu time is being spent running each of my coroutines. If there isn't anything already existing does anyone know of a way to add hooks to the event loop so that I can take measurements?
I have tried using cProfile which gives some helpful output, but I am more interested in time spent blocking the event loop, rather than total execution time.

Event loop can already track if coroutines take much CPU time to execute. To see it you should enable debug mode with set_debug method:
import asyncio
import time
async def main():
time.sleep(1) # Block event loop
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.set_debug(True) # Enable debug
loop.run_until_complete(main())
In output you'll see:
Executing <Task finished coro=<main() [...]> took 1.016 seconds
By default it shows warnings for coroutines that blocks for more than 0.1 sec. It's not documented, but based on asyncio source code, looks like you can change slow_callback_duration attribute to modify this value.

You can use call_later. Periodically run callback that will log/notify the difference of loop's time and period interval time.
class EventLoopDelayMonitor:
def __init__(self, loop=None, start=True, interval=1, logger=None):
self._interval = interval
self._log = logger or logging.getLogger(__name__)
self._loop = loop or asyncio.get_event_loop()
if start:
self.start()
def run(self):
self._loop.call_later(self._interval, self._handler, self._loop.time())
def _handler(self, start_time):
latency = (self._loop.time() - start_time) - self._interval
self._log.error('EventLoop delay %.4f', latency)
if not self.is_stopped():
self.run()
def is_stopped(self):
return self._stopped
def start(self):
self._stopped = False
self.run()
def stop(self):
self._stopped = True
example
import time
async def main():
EventLoopDelayMonitor(interval=1)
await asyncio.sleep(1)
time.sleep(2)
await asyncio.sleep(1)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
output
EventLoop delay 0.0013
EventLoop delay 1.0026
EventLoop delay 0.0014
EventLoop delay 0.0015

For anyone reading this in 2019, this might be a better answer: yappi. With Yappi version 1.2.1>=, you can natively profile coroutines and see exactly how much wall or cpu time is spent inside a coroutine.
See here for details on this coroutine profiling.

To expand a bit on one of the answers, if you want to monitor your loop and detect hangs, here's a snippet to do just that. It launches a separate thread that checks whether the loop's tasks yielded execution recently enough.
def monitor_loop(loop, delay_handler):
loop = loop
last_call = loop.time()
INTERVAL = .5 # How often to poll the loop and check the current delay.
def run_last_call_updater():
loop.call_later(INTERVAL, last_call_updater)
def last_call_updater():
nonlocal last_call
last_call = loop.time()
run_last_call_updater()
run_last_call_updater()
def last_call_checker():
threading.Timer(INTERVAL / 2, last_call_checker).start()
if loop.time() - last_call > INTERVAL:
delay_handler(loop.time() - last_call)
threading.Thread(target=last_call_checker).start()

How to terminate a Python3 thread correctly while it's reading a stream

I'm using a thread to read Strings from a stream (/dev/tty1) while processing other things in the main loop. I would like the Thread to terminate together with the main program when pressing CTRL-C.
from threading import Thread
class myReader(Thread):
def run(self):
with open('/dev/tty1', encoding='ascii') as myStream:
for myString in myStream:
print(myString)
def quit(self):
pass # stop reading, close stream, terminate the thread
myReader = Reader()
myReader.start()
while(True):
try:
pass # do lots of stuff
KeyboardInterrupt:
myReader.quit()
raise
The usual solution - a boolean variable inside the run() loop - doesn't work here. What's the recommended way to deal with this?
I can just set the Daemon flag, but then I won't be able to use a quit() method which might prove valuable later (to do some clean-up). Any ideas?

AFAIK, there is no built-in mechanism for that in Python 3 (just as in Python 2). Have you tried the proven Python 2 approach with PyThreadState_SetAsyncExc, documented here and here, or the alternative tracing approach here?
Here's a slightly modified version of the PyThreadState_SetAsyncExc approach from above:
import threading
import inspect
import ctypes
def _async_raise(tid, exctype):
"""raises the exception, performs cleanup if needed"""
if not inspect.isclass(exctype):
exctype = type(exctype)
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), ctypes.py_object(exctype))
if res == 0:
raise ValueError("invalid thread id")
elif res != 1:
# """if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"""
ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, None)
raise SystemError("PyThreadState_SetAsyncExc failed")
def stop_thread(thread):
_async_raise(thread.ident, SystemExit)

Make your thread a daemon thread. When all non-daemon threads have exited, the program exits. So when Ctrl-C is passed to your program and the main thread exits, there's no need to explicitly kill the reader.
myReader = Reader()
myReader.daemon = True
myReader.start()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is the asyncio is blocking with the processPool? - python-3.x

Related

Is it possible for two coroutines running in different threads can communicate with each other by asyncio.Queue?

Python: asyncio loops with threads

Using Asyncio subprocess in a pyramid view

Monitoring the asyncio event loop

How to terminate a Python3 thread correctly while it's reading a stream

Categories

Resources