What should I do with "not awaited coroutine" warning? - python-3.x

First I do apologize for posting code using a proprietary library similar to a subset of asyncio. Anyway the question is not related to that library and I'm sure it will be readable for everybody familiar with asynchronous Python.
I have a standard coroutine, e.g.:
async def tcoro():
pass
that is used as a timeout handler. For that purpose it is wrapped into a Task object. Basically the Task keeps the coroutine alive by repeating send calls. The task is scheduled to be run when the operation times out:
timer = loop.call_later(Task(tcoro()).run, delay=timeout)
But when the operation succeeds, the pending timeout gets cancelled with:
timer.cancel()
which just sets a cancelled flag. Later, when the delay is over, the event loop will pop the timer object out of the queue and because it got cancelled in the meantime, it will be ignored. And that produces:
RuntimeWarning: coroutine 'tcoro' was never awaited
My question is how am I supposed to react to this warning when there is no problem with program's functionality?
Should I ignore it? Should I suppress it somehow? Should I rewrite the code because it is badly designed?
The fix is very simple:
def timeout():
Task(tcoro()).run()
timer = loop.call_later(timeout, delay=timeout)
But I'm not sure if the original code is broken.

Related

Why using sync functions in nodejs is known as "blocking the event loop"?

If I understand correctly, the EventLoop is the mechanism that node uses to resolve asynchronous operations and then passing them to the call stack, right? My question is, when I use a synchronous method (for example pbkdf2Sync) it will get stuck in the call stack until it is finished but it won't be moved to the EventLoop because is not an async operation, so why is this known as blocking the event loop if in reality it is blocking everything? not JUST the event loop (that as far as I understand, can continue working and will pass the callbacks to the call stack when they're finished)
Is my understanding of the NodeJs inner workins completly wrong? This topic specifically is kinda hard to understand because every resource I read differs in some way or another, so even if I think I get the bigger picture, these are the "details" that confuse me.
What is blocking
Why using sync functions in nodejs is known as "blocking the event loop"?
In a nutshell, it's because the event loop can only process the next event when you return from whatever your current Javascript is doing and allow the event loop to look for the next event. A sync function blocks the interpreter until it finishes. So, the entire time a sync function is working and you're waiting for it to return, the interpreter is blocked and control is not returned back to the event loop. This blocks the event loop and also blocks your Javascript from running.
Single Thread
Nodejs runs your Javascript all with a single thread. Other threads are used internally, but your Javascript itself runs only in a single thread (we're assuming there is no use of WorkerThreads in your code). So, when you make a synchronous function call, that single thread that runs your Javascript is busy and blocked until the synchronous function call returns and can then continue executing more of your Javascript.
This blocks everything. It blocks running more of your Javascript after the synchronous function call and it blocks getting back to the event loop to run any other event handlers that are pending such as incoming network events, timers, completion events from other things such as disk I/O, etc... So, while this is blocking the event loop, it's also blocking running more of your own code after the function call.
Non-Blocking, Asynchronous Operations
On the other hand, asynchronous functions such as fs.readFile(), for example, don't block. They initiate their operation and return immediately. This allows the interpreter to continue running any more of your own Javascript after the call to fs.readFile() and it also allows you to return from whatever event triggered your work in the first place which will return control back to the event loop so it can service other waiting events or other events that will trigger in the future. fs.readFile() then does most of its work in native code (behind the scenes) outside of the main thread that runs your Javascript. So, these type of asynchronous functions don't block the event loop - instead they cooperate with the event loop so that other things can get run while waiting for the completion of the asynchronous operation that was previously initiated. When they complete, they insert an event into the event loop that causes the event loop to call the completion callback at it's earliest convenience (when it's not blocked).
Differences in Blocking
It's also worth noting that functions that represent both synchronous and asynchronous operations both block the execution of your Javascript and block the event loop until they return. The difference is that an asynchronous operation returns from the function nearly immediately, long before the asynchronous operation itself is complete and communicates its completion and/or eventual result back via a promise, callback or event (which are all callbacks at the lowest level of the event loop). The synchronous operation does not return until the operation itself is complete. So, the asynchronous operation only blocks for a very short duration while the operation is being initiated whereas the synchronous operation blocks for the entire duration of the operation (until it completes).
More About the Event Loop
So, in what moment during the event loop is my javascript code executed?
When control returns back to the event loop, it goes through several different phases looking for things to do. When it finds something to do, that "something" results in calling a Javascript callback that starts running some of your Javascript. For example if the "something to do" is a setTimeout() timer that is ready to fire, then it will call the Javascript callback that was passed to setTimeout(). That callback runs to its completion and only when your Javascript returns from that callback does the event loop regain control and get to look for the next event to run and call its callback.
it won't be moved to the EventLoop because is not an async operation
This is not really the correct way to think about things. Things are not really "moved to the event loop".
A synchronous operation is just a blocking function call that returns when it returns and execution of any other Javascript is blocked until that blocking function call returns. Things are blocked because the single threaded interpreter running your Javascript is stuck waiting for this function to finish. It can't do anything else and the event loop is also blocked because it can't do anything until the interpreter returns control back to the event loop.
An asynchronous operation, on the other hand, initiates some operation (let's say it issues an http request to some other host) and then immediately returns, long before it has the result of that http request. Since this asynchronous operation returns before it has its result, it is considered non-blocking and because it returns quickly, you can then return from whatever event caused your code to run and that will then return control back to the event loop. That allows the event loop to then look for other events to handle and run their corresponding callbacks. Meanwhile, the asynchronous operation that was previously started has some native code associated with it (that may or may not be running in an native code OS thread - depending upon what type of operation it is). But regardless, that native code is configured such that when the asynchronous operation completes, it will insert an event into the appropriate event queue. So, at some future point when nodejs has control back in the event loop, it will find that event and run the Javascript callback associated with that event, thus notifying the original Javascript code that the asynchronous operation is now complete and providing some sort of result or error code.
Example
As a simple example, let's say you run this code:
// timer that wants to fire in 1 second
setTimeout(function() {
console.log("timer fired")
}, 1000);
// loop that blocks for 5 seconds
const start = Date.now();
while (Date.now() - start < 5000) { }
console.log("blocking loop finished");
This will output:
blocking loop finished
timer fired
Even though the timer was set to run 1 second from now, the while() loop blocked everything for 5 seconds so it wasn't until after the while() loop finished and returned back to the system that the event loop could look at what it had to do next and call the callback associated with the timer.
This while loop is similar to a blocking function such as pbkdf2Sync(). Both block the interpreter and don't return until they are finished and therefore nothing in the event loop gets a chance to run until they are finished.
A Simple Analogy
Here's a simple analogy. Imagine you need to contact your cable company to troubleshoot a problem. There are two ways for you to get ahold of them. The first way is that you call them and sit on the phone on hold for an hour waiting for a customer service representative to pick up your call. You can't really do much else while you're sitting on hold because you have to be right there waiting and ready to respond when someone finally comes on the line to help you. Thus, you are "blocked" from doing a lot of other things. This is a "blocking, synchronous operation". You can't really do much else while you're waiting on hold.
The second way you can call the cable company is to request a callback sometime in the future and when a representative is available, they will call you back. As soon as you're done requesting the callback, you can go about your business doing other things. You have to be able to answer an incoming call when it arrives, but other than that you're not blocked from doing many other things. This is a "non-blocking, asynchronous operation".
But, if while you're supposed to be able to answer your telephone callback from the second scenario above, you make another call and are on the phone for 30 minutes, the cable company is blocked from reaching you on the phone for the duration of your other call. In our analogy, you are "blocking the event loop" during your other call as the incoming event from the cable company can't be processed while you're on a call.

redis-py not closing threads on exit

I am using redis-py 2.10.6 and redis 4.0.11.
My application uses redis for both the db and the pubsub. When I shut down I often get either hanging or a crash. The latter usually complains about a bad file descriptor or an I/O error on a file (I don't use any) which happens while handling a pubsub callback, so I'm guessing the underlying issue is the same: somehow I don't get disconnected properly and the pool used by my redis.Redis object is alive and kicking.
An example of the output of the former kind of error (during _read_from_socket):
redis.exceptions.ConnectionError: Error while reading from socket: (9, 'Bad file descriptor')
Other times the stacktrace clearly shows redis/connection.py -> redis/client.py -> threading.py, which proves that redis isn't killing the threads it uses.
When I star the application I run:
self.redis = redis.Redis(host=XXXX, port=XXXX)
self.pubsub = self.redis.pubsub()
subscriptions = {'chan1': self.cb1, 'chan2': self.cb2} # cb1 and cb2 are functions
self.pubsub.subscribe(**subscriptions)
self.pubsub_thread = self.pubsub.run_in_thread(sleep_time=1)
When I want to exit the application the last instruction I execute in main is a call to a function in my redis using class, whose implementation is:
self.pubsub.close()
self.pubsub_thread.stop()
self.redis.connection_pool.disconnect()
My understanding is that in theory I do not even need to do any of these 'closing' calls, and yet, with or without them, I still can't guarantee a clean shutdown.
My question is, how am I supposed to guarantee a clean shutdown?
I ran into this same issue and it's largely caused by improper handling of the shutdown by the redis library. During the cleanup, the thread continues to process new messages and doesn't account for situations where the socket is no longer available. After scouring the code a bit, I couldn't find a way to prevent additional processing without just waiting.
Since this is run during a shutdown phase and it's a remedy for a 3rd party library, I'm not overly concerned about the sleep, but ideally the library should be updated to prevent further action while shutting down.
self.pubsub_thread.stop()
time.sleep(0.5)
self.pubsub.reset()
This might be worth an issue log or PR on the redis-py library.
PubSubWorkerThread class check for self._running.is_set() inside the loop.
To do a "clean shutdown" you should call self.pubsub_thread._running.clean() to set the thread event to false and it will stop.
Check how it work here:
https://redis.readthedocs.io/en/latest/_modules/redis/client.html?highlight=PubSubWorkerThread#

Is there any linter that detects blocking calls in an async function?

https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)
So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.

Creating non blocking restful service using aiohttp [duplicate]

I have tried the following code in Python 3.6 for asyncio:
Example 1:
import asyncio
import time
async def hello():
print('hello')
await asyncio.sleep(1)
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output is as expected:
hello
hello
hello again
hello again
Then I want to change the asyncio.sleep into another def:
async def sleep():
time.sleep(1)
async def hello():
print('hello')
await sleep()
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output:
hello
hello again
hello
hello again
It seems it is not running in an asynchronous mode, but a normal sync mode.
The question is: Why is it not running in an asynchronous mode and how can I change the old sync module into an 'async' one?
Asyncio uses an event loop, which selects what task (an independent call chain of coroutines) in the queue to activate next. The event loop can make intelligent decisions as to what task is ready to do actual work. This is why the event loop also is responsible for creating connections and watching file descriptors and other I/O primitives; it gives the event loop insight into when there are I/O operations in progress or when results are available to process.
Whenever you use await, there is an opportunity to return control to the loop which can then pass control to another task. Which task then is picked for execution depends on the exact implementation; the asyncio reference implementation offers multiple choices, but there are other implementations, such as the very, very efficient uvloop implementation.
Your sample is still asynchronous. It just so happens that by replacing the await.sleep() with a synchronous time.sleep() call, inside a new coroutine function, you introduced 2 coroutines into the task callchain that don't yield, and thus influenced in what order they are executed. That they are executed in what appears to be synchronous order is a coincidence. If you switched event loops, or introduced more coroutines (especially some that use I/O), the order can easily be different again.
Moreover, your new coroutines use time.sleep(); this makes your coroutines uncooperative. The event loop is not notified that your code is waiting (time.sleep() will not yield!), so no other coroutine can be executed while time.sleep() is running. time.sleep() simply doesn't return or lets any other code run until the requested amount of time has passed. Contrast this with the asyncio.sleep() implementation, which simply yields to the event loop with a call_later() hook; the event loop now knows that that task won't need any attention until a later time.
Also see asyncio: why isn't it non-blocking by default for a more in-depth discussion of how tasks and the event loop interact. And if you must run blocking, synchronous code that can't be made to cooperate, then use an executor pool to have the blocking code executed in a separate tread or child process to free up the event loop for other, better behaved tasks.

How to know when the Promise is actually resolved in Node.js?

When we are using Promise in nodejs, given a Promise p, we can't know when the Promise p is actually resolved by logging the currentTime in the "then" callback.
To prove that, I wrote the test code below (using CoffeeScript):
# the procedure we want to profile
getData = (arg) ->
new Promise (resolve) ->
setTimeout ->
resolve(arg + 1)
, 100
# the main procedure
main = () ->
beginTime = new Date()
console.log beginTime.toISOString()
getData(1).then (r) ->
resolveTime = new Date()
console.log resolveTime.toISOString()
console.log resolveTime - beginTime
cnt = 10**9
--cnt while cnt > 0
return cnt
main()
When you run the above code, you will notice that the resolveTime (the time your code run into the callback function) is much later than 100ms from the beginTime.
So If we want to know when the Promise is actually resolved, HOW?
I want to know the exact time because I'm doing some profiling via logging. And I'm not able to modify the Promise p 's implementation when I'm doing some profiling outside of the black box.
So, Is there some function like promise.onUnderlyingConditionFulfilled(callback) , or any other way to make this possible?
This is because you have a busy loop that apparently takes longer than your timer:
cnt = 10**9
--cnt while cnt > 0
Javascript in node.js is single threaded and event driven. It can only do one thing at a time and it will finish the current thing it's doing before it can service the event posted by setTimeout(). So, if your busy loop (or any other long running piece of Javascript code) takes longer than you've set your timer for, the timer will not be able to run until this other Javascript code is done. "single threaded" means Javascript in node.js only does one thing at a time and it waits until one thing returns control back to the system before it can service the next event waiting to run.
So, here's the sequence of events in your code:
It calls the setTimeout() to schedule the timer callback for 100ms from now.
Then you go into your busy loop.
While it's in the busy loop, the setTimeout() timer fires inside of the JS implementation and it inserts an event into the Javascript event queue. That event can't run at the moment because the JS interpreter is still running the busy loop.
Then eventually it finishes the busy loop and returns control back to the system.
When that is done, the JS interpreter then checks the event queue to see if any other events need servicing. It finds the timer event and so it processes that and the setTimeout() callback is called.
That callback resolves the promise which triggers the .then() handler to get called.
Note: Because of Javascript's single threaded-ness and event-driven nature, timers in Javascript are not guaranteed to be called exactly when you schedule them. They will execute as close to that as possible, but if other code is running at the time they fire or if their are lots of items in the event queue ahead of you, that code has to finish before the timer callback actually gets to execute.
So If we want to know when the Promise is actually resolved, HOW?
The promise is resolved when your busy loop is done. It's not resolved at exactly the 100ms point (because your busy loop apparently takes longer than 100ms to run). If you wanted to know exactly when the promise was resolved, you would just log inside the setTimeout() callback right where you call resolve(). That would tell you exactly when the promise was resolved though it will be pretty much the same as where you're logging now. The cause of your delay is the busy loop.
Per your comments, it appears that you want to somehow measure exactly when resolve() is actually called in the Promise, but you say that you cannot modify the code in getData(). You can't really do that directly. As you already know, you can measure when the .then() handler gets called which will probably be no more than a couple ms after resolve() gets called.
You could replace the promise infrastructure with your own implementation and then you could instrument the resolve() callback directly, but replacing or hooking the promise implementation probably influences the timing of things even more than just measuring from the .then() handler.
So, it appears to me that you've just over-constrained the problem. You want to measure when something inside of some code happens, but you don't allow any instrumentation inside that code. That just leaves you with two choices:
Replace the promise implementation so you can instrument resolve() directly.
Measure when .then() is triggered.
The first option probably has a heisenberg uncertainty issue in that you've probably influenced the timing by more than you should if you replace or hook the promise implementation.
The second option measures an event that happens just slightly after the actual .resolve(). Pick which one sounds closest to what you actually want.

Resources