In Python asyncio, why does line-per-file processing blocks method?

In Python asyncio, why does line-per-file processing blocks method? - python-3.x

I'm totally new to python's asyncio. I understand the idea, but even the most simple task won't work due to a lack of understanding on my side.
Here's my code which tries to read a file (and ultimately process each line of it) reguarily:
#!/usr/bin/env python3
import asyncio
import aiofiles
async def main():
async def work():
while True:
async with aiofiles.open('../v2.rst', 'r') as f:
async for line in f:
# real work will happen here
pass
print('loop')
await asyncio.sleep(2)
tasks = asyncio.gather(
work(),
)
await asyncio.sleep(10)
# Cancel tasks
tasks.add_done_callback(lambda r: r.exception())
tasks.cancel()
if __name__ == '__main__':
asyncio.run(main())
The work-function should read a file, do some line-per-line processing and then wait 2 seconds.
What happens is, that the function does "nothing". It blocks, I never see loop printed.
Where is my error in understanding asyncio?

The code hides the exception because the callback installed with add_done_callback retrieves the exception, only to immediately discard it. This prevents the (effectively unhandled) exception from getting logged by asyncio, which happens if you comment out the line with add_done_callback.
Also:
the code calls gather without awaiting it, either immediately after the call or later.
it unnecessarily invokes gather with a single coroutine. If the idea is to run the coroutine in the background, the idiomatic way to do so is with asyncio.create_task(work()).

Related

How can I sleep() parallely inside asyncio task if parent function isn't async?

CODE:
class App:
def __init__(self):
# some of the code
...
...
xxx.add_handler(self.event_handler, event_xyz)
asyncio.create_task(self.keep_alive())
xxx.run_until_disconnected()
def keep_alive(self):
# stuff to keep connection alive
...
...
time.sleep(5) # this will block whole script
asyncio.sleep(5) # this won't work because of lack of async on _init_ and keep_alive
async def event_handler(self):
await stuff
# other functions
if __name__ == '__main__':
App()
The part of the code that keeps the connection alive has api limits. So, I need to have the sleep statement inside keep_alive() function.
I understand that the design of the code can be completely changed to make it work but it is a big script and everything else is working perfectly. So, preferable is if this could be made to work.
I'm open to using anything else like threads as long as rest of the code isn't getting blocked during the sleep.

This is a straightforward situation. time.sleep will block the current thread, including the asyncio event loop for that thread (if there is one). Period. Case closed.
If your API requires you to have time.sleep calls, and your program must do something while the current thread is sleeping, then asyncio is not the solution. That doesn't mean that asyncio cannot be used for other threads or other purposes within your program design, but it absolutely can't run other tasks in the current thread during a time.sleep interval.
Regarding the function keep_alive in your code snippet: this function cannot be made into a task because it's not declared as "async def." Calling asyncio.sleep() from inside this type of regular function is an error; it must always be "awaited," and the "await" keyword must be inside an async def function. On the other hand, calling time.sleep inside an async def function is not an error and the function will work as expected. But it's probably not something you want to do.

The proper way to use try/except blocks in tasks that can be cancelled

In essence my question is when and where is the asyncio.CancelledError
exception raised in the coroutine being cancelled?
I have an application with a couple of async tasks that run in a loop. At some
point I start those tasks like this:
async def connect(self);
...
t1 = asyncio.create_tasks(task1())
t2 = asyncio.create_task(task2())
...
self._workers = [t1, t2, ...]
When disconnecting, I cancel the tasks like this:
async def disconnect(self):
for task in self._workers:
task.cancel()
This has been working fine. The documentation of Task.cancel says
The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a
try … … except CancelledError … finally block. Therefore, unlike Future.cancel(), Task.cancel() does
not guarantee that the Task will be cancelled, although suppressing cancellation completely is
not common and is actively discouraged.
so in my workers I avoid doing stuff like this:
async def worker():
while True:
...
try:
some work
except:
continue
but that means that now I have to explicitly put asyncio.CancelledError in the
except statement:
async def worker():
while True:
...
try:
some work
except asyncio.CancelledError:
raise
except:
continue
which can be tedious and I also have to make sure that anything that I call from
my worker obliges by this rule.
So now I'm not sure if this is a good practice at all. Now that I'm thinking
about it, I don't even know when exactly the exception is raised. I was
searching for a similar case here in SO and found this question which also
raised the same question "When will this exception be thrown? And where?". The
answer says
This exception is thrown after task.cancel() is called. It is thrown inside the coroutine,
where it is caught in the example, and it is then re-raised to be thrown and caught
in the awaiting routine.
And while it make sense, this got me thinking: this is async scheduling, the
tasks are not interrupted at any arbitrary place like with threads but they only
"give back control" to the event loop when a task does an await. Right?
So that means that checking everywhere whether
asyncio.CancelledError was raised might not be necessary. For example, let's
consider this example:
def worker(interval=1):
while True:
try:
# doing some work and no await is called in this block
sync_call1()
sync_call2()
sync_call3()
except asyncio.CancelledError:
raise
except:
# deal with error
pass
await asyncio.sleep(interval)
So I think here the except asyncio.CancelledError is unnecessary because this
error cannot "physically" be raised in the try/block at all since the thread
in the try block will never be interrupted by the event loop. The only place
where this task gives back the control to the event loop is at the sleep call,
that is not even in a try/block and hence it doesn't suppress the exception. Is
my train of though correct? If so, does that mean that I only have to account
for asyncio.CancelledError when I have an await in the try block? So would
this also be OK, knowing that worker() can be cancelled?
def worker(interval=1):
while True:
try:
# doing some work and no await is called in this block
sync_call1()
sync_call2()
sync_call3()
except:
# deal with error
pass
await asyncio.sleep(interval)
And after reading the answer of the other SO question, I think I should also
wait for the cancelled tasks in my disconnect() function, do I? Like this?
async def disconnect(self):
for task in self._workers:
task.cancel()
await asyncio.gather(*self._workers)
Is this correct?

Your reasoning is correct: if the code doesn't contain an awaiting construct, you can't get a CancelledError (at least not from task.cancel; someone could still raise it manually, but then you probably want to treat is as any other exception). Note that awaiting constructs include await, async for and async with.
Having said that, I would add that try: ... except: continue is an anti-pattern. You should always catch a more specific exception. If you do catch all exceptions, that should be only to perform some cleanup/logging before re-raising it. If you do so, you won't have a problem with CancelledError. If you absolutely must catch all exceptions, consider at least logging the fact that an exception was raised, so that it doesn't pass silently.
Python 3.8 made it much easier to catch exceptions other than CancelledError because it switched to deriving CancelledError from BaseException. In 3.8 except Exception won't catch it, resolving your issue.
To sum it up:
If you run Python 3.8 and later, use except Exception: traceback.print_exc(); continue.
In Python 3.7 and earlier you need to use the pattern put forth in the question. If it's a lot of typing, you can abstract it into a function, but that will still require some refactoring.
For example, you could define a utility function like this:
def run_safe(thunk):
try:
thunk()
return True
except asyncio.CancelledError:
raise
except:
traceback.print_exc()
return False

Asyncio Queue waits until it is full before get returns something

I'm having a weird issue with asyncio.Queue - instead of returning an item as soon as it available, the queue waits until it is full before returning anything. I realized that while using a queue to store frames collected from cv2.VideoCapture, the larger the maxsize of the queue was, the longer it took to show anything on screen, and then, it looked like a sequence of all the frames collected into the queue.
Is that a feature, a bug, or am i just using this wrong?
Anyway, here is my code
import asyncio
import cv2
import numpy as np
async def collecting_loop(queue):
print("cl")
cap = cv2.VideoCapture(0)
while True:
_, img = cap.read()
await queue.put(img)
async def processing_loop(queue):
print("pl")
await asyncio.sleep(0.1)
while True:
img = await queue.get()
cv2.imshow('img', img)
cv2.waitKey(5)
async def main(e_loop):
print("running main")
queue = asyncio.Queue(loop=e_loop, maxsize=10)
await asyncio.gather(collecting_loop(queue), processing_loop(queue))
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main(e_loop=loop))
except KeyboardInterrupt:
pass
finally:
loop.close()

Is [the queue getter not waking up until the queue fills up] a feature, a bug, or am i just using this wrong?
You're using it wrong, but in a subtle way. As Andrew explained, queue.put doesn't guarantee a task switch, and the collector coroutine only runs blocking code and queue.put. Although the blockade is short, asyncio doesn't know that and thinks you are invoking queue.put in a really tight loop. The queue getters simply don't get a chance to run until the queue fills up.
The correct way to integrate asyncio and cv is to run the cv code in a separate thread and have the asyncio event loop wait for it to finish. The run_in_executor method makes that really simple:
async def collecting_loop(queue):
print("cl")
loop = asyncio.get_event_loop()
cap = cv2.VideoCapture(0)
while True:
_, img = await loop.run_in_executor(None, cap.read)
await queue.put(img)
run_in_executor will automatically suspend the collector coroutine while waiting for a new frame, allowing for the queued frame(s) to be processed in time.

The problem is that await q.put() doesn't switch to another task every call. Actually it does only when inserting a new value is suspended by queue-full state transition.
Inserting await asyncio.sleep(0) forces task switch.
Like in multithreaded code file.read() doesn't enforce OS thread switching but time.sleep(0) does.
Misunderstandings like this are pretty common for newbies, I've discussed very similar problem yesterday, see github issue.
P.S.
Your code has much worse problem actually: you call blocking synchronous code from async function, it just is not how asyncio works.
If no asynchronous OpenCV API exists (yet) you should run OpenCV functions in a separate thread.
Already mentioned janus can help with passing data between sync and async code.

how to use asyncio with boost.python?

Is it possible to use Python3 asyncio package with Boost.Python library?
I have CPython C++ extension that builds with Boost.Python. And functions that are written in C++ can work really long time. I want to use asyncio to call these functions but res = await cpp_function() code doesn't work.
What happens when cpp_function is called inside coroutine?
How not get blocked by calling C++ function that works very long time?
NOTE: C++ doesn't do some I/O operations, just calculations.

What happens when cpp_function is called inside coroutine?
If you call long-running Python/C function inside any of your coroutines, it freezes your event loop (freezes all coroutines everywhere).
You should avoid this situation.
How not get blocked by calling C++ function that works very long time
You should use run_in_executor to run you function in separate thread or process. run_in_executor returns coroutine that you can await.
You'll probably need ProcessPoolExecutor because of GIL (I'm not sure if ThreadPoolExecutor is option in your situation, but I advice you to check it).
Here's example of awaiting long-running code:
import asyncio
from concurrent.futures import ProcessPoolExecutor
import time
def blocking_function():
# Function with long-running C/Python code.
time.sleep(3)
return True
async def main():
# Await of executing in other process,
# it doesn't block your event loop:
loop = asyncio.get_event_loop()
res = await loop.run_in_executor(executor, blocking_function)
if __name__ == '__main__':
executor = ProcessPoolExecutor(max_workers=1) # Prepare your executor somewhere.
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()

Python thread never starts if run() contains yield from

Python 3.4, I'm trying to make a server using the websockets module (I was previously using regular sockets but wanted to make a javascript client) when I ran into an issue (because it expects async, at least if the examples are to be trusted, which I didn't use before). Threading simply does not work. If I run the following code, bar will never be printed, whereas if I comment out the line with yield from, it works as expected. So yield is probably doing something I don't quite understand, but why is it never even executed? Should I install python 3.5?
import threading
class SampleThread(threading.Thread):
def __init__(self):
super(SampleThread, self).__init__()
print("foo")
def run(self):
print("bar")
yield from var2
thread = SampleThread()
thread.start()

This is not the correct way to handle multithreading. run is neither a generator nor a coroutine. It should be noted that the asyncio event loop is only defined for the main thread. Any call to asyncio.get_event_loop() in a new thread (without first setting it with asyncio.set_event_loop() will throw an exception.
Before looking at running the event loop in a new thread, you should first analyze to see if you really need the event loop running in its own thread. It has a built-in thread pool executor at: loop.run_in_executor(). This will take a pool from concurrent.futures (either a ThreadPoolExecutor or a ProcessPoolExecutor) and provides a non-blocking way of running processes and threads directly from the loop object. As such, these can be await-ed (with Python3.5 syntax)
That being said, if you want to run your event loop from another thread, you can do it thustly:
import asyncio
class LoopThread(threading.Thread):
def __init__(self):
self.loop = asyncio.new_event_loop()
def run():
ayncio.set_event_loop(self.loop)
self.loop.run_forever()
def stop():
self.loop.call_soon_threadsafe(self.loop.stop)
From here, you still need to device a thread-safe way of creating tasks, etc. Some of the code in this thread is usable, although I did not have a lot of success with it: python asyncio, how to create and cancel tasks from another thread

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

In Python asyncio, why does line-per-file processing blocks method? - python-3.x

Related

How can I sleep() parallely inside asyncio task if parent function isn't async?

The proper way to use try/except blocks in tasks that can be cancelled

Asyncio Queue waits until it is full before get returns something

how to use asyncio with boost.python?

Python thread never starts if run() contains yield from

Categories

Resources