https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)
So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.
Related
I am codig a realtime game using celery and django-channels.
I have a task that works like a timer and if this timer reaches to zero, once the task was activated, a group_send() is called. From what I see, celery tasks are async but we can't await functions inside tasks.. this makes me a little bit confused.. here is the code:
#app.task(ignore_result=True)
def count_down(channel_name):
set_random_game_result(channel_name)
room = process_game_result(channel_name, revoke=False)
channel_layer = get_channel_layer()
async_to_sync(channel_layer.group_send)(
channel_name,
{
"type": "game_room_info",
"room": room
}
)
from the docs:
By default the send(), group_send(), group_add() and other functions are async functions, meaning you have to await them. If you need to call them from synchronous code, you’ll need to use the handy asgiref.sync.async_to_sync wrapper
So if celery is async, why can't I use the group_send without using the async_to_sync util?
Another thing is about querying.. From the docs:
If you are writing asynchronous code, however, you will need to call database methods in a safe, synchronous context, using database_sync_to_async.
database_sync_to_async actually doesn't work inside the task function. Am I missing something?
The problem you are talking about is something that is done by design, for a good reason. It is also well-documented, and can be solved easily with appropriate Canvas.
Also... Do not get confused with terminology... Celery is asynchronous, but it is not "Python async" as it predates the Python's asyncio... Maybe Celery 5 will get its async parts replaced/refactored to use Python 3+ asyncio and related.
Lets say I have a function A that uses a function B which uses C, etc:
A -> B -> C -> D and E
Now assume that function D has to use async/await. This means I have to use async/await to the call of function C and then to the call of function B and so on. I understand that this is because they depend on each other and if one of them is waiting for a function to resolve, then transitively, they all have to. What alternatives can I do to make this cleaner?
There is a way to do this, but you'll loose the benefits of async-await.
One of the reason for async-await, is, that if your thread has to wait for another process to complete, like a read or write to the hard-disk, a database query, or fetching some internet information, your thread might do some other useful stuff instead of just waiting idly for this other process to complete.
This is done by using the keyword await. Once your thread sees the await. The thread doesn't really wait idly. Instead, it remembers the context (think of variable values, parts of the call stack etc) and goes up the call stack to see if the caller is not awaiting. If not, it starts executing these statements until it sees an await. It goes up the call stack again to see if the caller is not awaiting, etc.
Once this other process is completed the thread (or maybe another thread from the thread pool that acts as if it is the original thread) continues with the statements after the await until the procedure is finished, or until it sees another await.
To be able to do this, your procedure must know, that when it sees an await, the context needs to be saved and the thread must act like described above. That is why you declare a method async.
That is why typical async functions are functions that order other processes to do something lengthy: disk access, database access, internet communications. Those are typical functions where you'll find a ReadAsync / WriteAsync, next to the standard Read / Write functions. You'll also find them in classes that are typically designed to call these processes, like StreamReaders, TextWriters etc.
If you have a non-async class that calls an async function and waits until the async function completes before returning, the go-up-the-call-stack-to-see-if-the-caller-is-not-awaiting stops here: your program acts as if it is not using async-await.
Almost!
If you start an awaitable task, without waiting for it to finish, and do something else before you wait for the result, then this something else is executed instead of the idly wait, that the thread would have done if you would have used the non-async version.
How to call async function from non-async function
ICollection<string> ReadData(...)
{
// call the async function, don't await yet, you'll have far more interesting things to do
var taskReadLines = myReader.ReadLinesAsync(...);
DoSomethingInteresting();
// now you need the data from the read task.
// However, because this method is not async, you can't await.
// This Wait will really be an idle wait.
taskReadLines.Wait();
ICollection<string> readLines= taskRead.Result;
return readLines();
}
Your callers won't benefit from async-await, however your thread will be able to do something interesting while the lines have not been read yet.
I have tried the following code in Python 3.6 for asyncio:
Example 1:
import asyncio
import time
async def hello():
print('hello')
await asyncio.sleep(1)
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output is as expected:
hello
hello
hello again
hello again
Then I want to change the asyncio.sleep into another def:
async def sleep():
time.sleep(1)
async def hello():
print('hello')
await sleep()
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output:
hello
hello again
hello
hello again
It seems it is not running in an asynchronous mode, but a normal sync mode.
The question is: Why is it not running in an asynchronous mode and how can I change the old sync module into an 'async' one?
Asyncio uses an event loop, which selects what task (an independent call chain of coroutines) in the queue to activate next. The event loop can make intelligent decisions as to what task is ready to do actual work. This is why the event loop also is responsible for creating connections and watching file descriptors and other I/O primitives; it gives the event loop insight into when there are I/O operations in progress or when results are available to process.
Whenever you use await, there is an opportunity to return control to the loop which can then pass control to another task. Which task then is picked for execution depends on the exact implementation; the asyncio reference implementation offers multiple choices, but there are other implementations, such as the very, very efficient uvloop implementation.
Your sample is still asynchronous. It just so happens that by replacing the await.sleep() with a synchronous time.sleep() call, inside a new coroutine function, you introduced 2 coroutines into the task callchain that don't yield, and thus influenced in what order they are executed. That they are executed in what appears to be synchronous order is a coincidence. If you switched event loops, or introduced more coroutines (especially some that use I/O), the order can easily be different again.
Moreover, your new coroutines use time.sleep(); this makes your coroutines uncooperative. The event loop is not notified that your code is waiting (time.sleep() will not yield!), so no other coroutine can be executed while time.sleep() is running. time.sleep() simply doesn't return or lets any other code run until the requested amount of time has passed. Contrast this with the asyncio.sleep() implementation, which simply yields to the event loop with a call_later() hook; the event loop now knows that that task won't need any attention until a later time.
Also see asyncio: why isn't it non-blocking by default for a more in-depth discussion of how tasks and the event loop interact. And if you must run blocking, synchronous code that can't be made to cooperate, then use an executor pool to have the blocking code executed in a separate tread or child process to free up the event loop for other, better behaved tasks.
I know that writing async functions is recommended in nodejs. However, I feel it's not so nescessary to write some non IO events asynchronously. My code can get less convenient. For example:
//sync
function now(){
return new Date().getTime();
}
console.log(now());
//async
function now(callback){
callback(new Date().getTime());
}
now(function(time){
console.log(time);
});
Does sync method block CPU in this case? Is this remarkable enough that I should use async instead?
Async style is necessary if the method being called can block for a long time waiting for IO. As the node.js event loop is single-threaded you want to yield to the event loop during an IO. If you didn't do this there could be only one IO outstanding at each point in time. That would lead to total non-scalability.
Using callbacks for CPU work accomplishes nothing. It does not unblock the event loop. In fact, for CPU work it is not possible to unblock the event loop. The CPU must be occupied for a certain amount of time and that is unavoidable. (Disregarding things like web workers here).
Callbacks are nothing good. You use them when you have to. They are a necessary consequence of the node.js event loop IO model.
That said, if you later plan on introducing IO into now you might eagerly use a callback style even if not strictly necessary. Changing from synchronous calls to callback-based calls later can be time-consuming because the callback style is viral.
By adding a callback to a function's signature, the code communicates that something asynchronous might happen in this function and the function will call the callback with an error and/or result object.
In case a function does nothing asynchronous and does not involve conditions where a (non programming) error may occur don't use a callback function signature but simply return the computation result.
Functions with callbacks are not very convenient to handle by the caller so avoid callbacks until you really need them.
Node.js now has generators.
My understanding is that generators can be used to write code that appears to be much more linear and avoids callback hell and pyramid of doom style coding.
So to this point, my understanding is that inside a generator, code executes until it reaches a "yield" statement. Execution of the generator function suspends at this point. The yield statement specifies a return value which may be a function. Typically this would be a blocking I/O function - one that would normally need to be executed asynchronously.
The yield's return function is returned to whatever called the generator.
My question is, what happens at this point? What exactly executes the blocking I/O function that the yield returned?
Is it correct that to write generator/yield code that appears to be linear, there needs to be a specific sort of function that is calling the generator, a function that loops through the generator and executes each asynch function returned by the yield and returns the result of the asynch function back into the generator?
It's still not clear to me exactly how the asynch function returned by the yield is executed. If it is executed by the function that calls the generator, is it executed asynchronously? I'm guessing so because to do otherwise would result in blocking behaviour.
To summarise my questions:
To write "linear" asynch code with generators, is it necessary for there to be a calling function that iterates over the generator, executing yielded functions as callbacks and returning the result of the callback back into the generator?
If the answer to question 1 is yes, the exactly how are the yielded functions executed - asynchronously?
Can anyone offer a better overview/summary of how the whole process works?
When writing async code with generators you are dealing with two types of functions:
normal functions declared with function. These functions cannot yield. You cannot write async code in sync style with them because they run to completion; you can only handle asynchronous completion through callbacks (unless you invoke extra power like the node-fibers library or a code transform).
generator functions declared with function*. These functions can yield. You can write async code in sync style inside them because they can yield. But you need a companion function that creates the generator, handles the callbacks and resumes the generator with a next call every time a callback fires.
There are several libraries that implement companion functions. In most of these libraries, the companion function handles a single function* at a time and you have to put a wrapper around every function* in your code. The galaxy library (that I wrote) is a bit special because it can handle function* calling other function* without intermediate wrappers. The companion function is a bit tricky because it has to deal with a stack of generators.
The execution flow can be difficult to understand because of the little yield/next dance between your function* and the companion function. One way to understand the flow is to write an example with the library of your choice, add console.log statements both in your code and in the library, and run it.
If [the blocking io function] is executed by the function that calls
the generator, is it executed asynchronously? I'm guessing so because
to do otherwise would result in blocking behaviour.
I don't think generators do asynchronous task handling. With generators, only one thing is executing at the same time--it's just that one function can stop executing and pass control to another function. For instance,
function iofunc1() {
console.log('iofunc1');
}
function iofunc2() {
console.log('iofunc2');
}
function* do_stuff() {
yield iofunc1;
yield iofunc2;
console.log('goodbye');
}
var gen = do_stuff();
(gen.next().value)();
(gen.next().value)(); //This line won't begin execution until the function call on the previous line returns
gen.next(); //continue executing do_stuff
If you read some of the articles about nodejs generators:
http://jlongster.com/2012/10/05/javascript-yield.html
http://jlongster.com/A-Study-on-Solving-Callbacks-with-JavaScript-Generators
http://jlongster.com/A-Closer-Look-at-Generators-Without-Promises
...they all employ additional functions/libraries to add in asynchronous execution.
1: to write "linear" asynch code with generators, is it necessary for
there to be a calling function that iterates over the generator,
executing yielded functions as callbacks and returning the result of
the callback back into the generator?
Yes. Let's call it "launcher".
2: if the answer to question 1 is yes, the exactly how are the yielded
functions executed - asynchronously?
Inside the generator, you yield an array with: the function and its parameters. In the controlling caller (launcher), you use fn.apply(..,callback) to call the async, putting the call to "generator.next(data);" (resume) inside the callback.
the async function is executed asynchronously, but the generator will be "paused" at the yield point, until the callback is called (and then "generator.next(data)" is executed)
Full working lib and samples:
https://github.com/luciotato/waitfor-es6