Is Celery really async?

Is Celery really async? - python-3.x

I am codig a realtime game using celery and django-channels.
I have a task that works like a timer and if this timer reaches to zero, once the task was activated, a group_send() is called. From what I see, celery tasks are async but we can't await functions inside tasks.. this makes me a little bit confused.. here is the code:
#app.task(ignore_result=True)
def count_down(channel_name):
set_random_game_result(channel_name)
room = process_game_result(channel_name, revoke=False)
channel_layer = get_channel_layer()
async_to_sync(channel_layer.group_send)(
channel_name,
{
"type": "game_room_info",
"room": room
}
)
from the docs:
By default the send(), group_send(), group_add() and other functions are async functions, meaning you have to await them. If you need to call them from synchronous code, you’ll need to use the handy asgiref.sync.async_to_sync wrapper
So if celery is async, why can't I use the group_send without using the async_to_sync util?
Another thing is about querying.. From the docs:
If you are writing asynchronous code, however, you will need to call database methods in a safe, synchronous context, using database_sync_to_async.
database_sync_to_async actually doesn't work inside the task function. Am I missing something?

The problem you are talking about is something that is done by design, for a good reason. It is also well-documented, and can be solved easily with appropriate Canvas.
Also... Do not get confused with terminology... Celery is asynchronous, but it is not "Python async" as it predates the Python's asyncio... Maybe Celery 5 will get its async parts replaced/refactored to use Python 3+ asyncio and related.

Related

Is there any linter that detects blocking calls in an async function?

https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)

So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.

How can I stop async/await from bubbling up in functions?

Lets say I have a function A that uses a function B which uses C, etc:
A -> B -> C -> D and E
Now assume that function D has to use async/await. This means I have to use async/await to the call of function C and then to the call of function B and so on. I understand that this is because they depend on each other and if one of them is waiting for a function to resolve, then transitively, they all have to. What alternatives can I do to make this cleaner?

There is a way to do this, but you'll loose the benefits of async-await.
One of the reason for async-await, is, that if your thread has to wait for another process to complete, like a read or write to the hard-disk, a database query, or fetching some internet information, your thread might do some other useful stuff instead of just waiting idly for this other process to complete.
This is done by using the keyword await. Once your thread sees the await. The thread doesn't really wait idly. Instead, it remembers the context (think of variable values, parts of the call stack etc) and goes up the call stack to see if the caller is not awaiting. If not, it starts executing these statements until it sees an await. It goes up the call stack again to see if the caller is not awaiting, etc.
Once this other process is completed the thread (or maybe another thread from the thread pool that acts as if it is the original thread) continues with the statements after the await until the procedure is finished, or until it sees another await.
To be able to do this, your procedure must know, that when it sees an await, the context needs to be saved and the thread must act like described above. That is why you declare a method async.
That is why typical async functions are functions that order other processes to do something lengthy: disk access, database access, internet communications. Those are typical functions where you'll find a ReadAsync / WriteAsync, next to the standard Read / Write functions. You'll also find them in classes that are typically designed to call these processes, like StreamReaders, TextWriters etc.
If you have a non-async class that calls an async function and waits until the async function completes before returning, the go-up-the-call-stack-to-see-if-the-caller-is-not-awaiting stops here: your program acts as if it is not using async-await.
Almost!
If you start an awaitable task, without waiting for it to finish, and do something else before you wait for the result, then this something else is executed instead of the idly wait, that the thread would have done if you would have used the non-async version.
How to call async function from non-async function
ICollection<string> ReadData(...)
{
// call the async function, don't await yet, you'll have far more interesting things to do
var taskReadLines = myReader.ReadLinesAsync(...);
DoSomethingInteresting();
// now you need the data from the read task.
// However, because this method is not async, you can't await.
// This Wait will really be an idle wait.
taskReadLines.Wait();
ICollection<string> readLines= taskRead.Result;
return readLines();
}
Your callers won't benefit from async-await, however your thread will be able to do something interesting while the lines have not been read yet.

Node.js Control Flow and Callbacks

I've been confused on this for a month and searched everything but could not find an answer.
I want to get control of what runs first in the node.js. I know the way node deals with the code is non-blocking. I have the following example:
setTimeOut(function(){console.log("one second passed");}, 1000);
console.log("HelloWorld");
Here I want to run first one, output "one second passed", and then run the second one. How can I do that? I know setTimeOut is a way to solve this problem but that's not the answer I am looking for. I've tried using callback but not working. I am not sure about if I got the correct understanding of callbacks. Callbacks just mean function parameters to me and I don't think that will help me to solve this problem.
One possible way to solve this problem is to define a function that contains the "error first callback" like the following example:
function print_helloworld_atend(function helloworld(){
console.log("HelloWorld");
}){
setTimeOut(function(){console.log("one second passed");}, 1000);
helloworld();
}
Can I define a function with a callback who will know when the previous tasks are done. In the above function, how to make the callback function: helloworld to run after the "setTimeOut" expression?
If there is a structure that can solve my problem, that's my first choice. I am tired of using setTimeOuts.
I would really appreciate if anyone can help. Thanks for reading

You should be using promises. Bluebird is a great promise library. Faster than native and comes with great features. With promises you can chain together functions, and know that one will not be called until the previous function resolves. No need to set timeouts or delays. Although you can if you'd like. Below is example of a delay. Function B wont run until 6 seconds after A finishes. If you remove .delay(ms) B will run immediately after A finishes.
var Promise = require("bluebird");
console.time('tracked');
console.time('first');
function a (){
console.log('hello');
console.timeEnd('first');
return Promise.resolve();
}
function b (){
console.log('world');
console.timeEnd('tracked');
}
a().delay(6000)
.then(b)
.catch(Promise.TimeoutError, function(e) {
console.log('Something messed up yo', e);
});
This outputs:
→ node test.js
hello
first: 1.278ms
world
tracked: 6009.422ms
Edit: Promises are, in my opinion, the most fun way of control flow in node/javascript. To my knowledge there is not a .delay() or .timeout() in native javascript promises. However, there are Promises in general. You can find their documentation on mozilla's site. I would recommend that you use Bluebird instead though.
Use bluebird instead of native because:
It's faster. Petka Antonov, the creator of bluebird, has a great understanding of the V8 engines two compile steps and has optimized the library around it's many quirks. Native has little to no optimization and it shows when you compare their performance. More information here and here.
It has more features: Nice things like .reflect(), .spread(), .delay(), .timeout(), the list goes on.
You lose nothing by switching: all features in bluebird which also exist in native function in exactly the same way in implementation. If you find yourself in a situation where only native is available to you, you wont have to relearn what you already know.

Just execute everything that you want to execute after you log "one second passed", after you log "one second passed":
setTimeOut(function(){
console.log("one second passed");
console.log("HelloWorld");
}, 1000);

You can use async module to handle the callbacks.
To understand callbacks I'll give you a high level glance:
function: i want to do some i/o work.
nodejs: ok, but you shouldn't be blocking my process as I'm single threaded.
nodejs: pass a callback function, and I will let you know from it when the i/o work is done.
function: passes the callback function
nodejs: i/o work is done, calls the callback function.
function: thanks for the notification, continue processing other work.

python asyncio Transport asynchronous methods vs coroutines

I'm new to asyncio and I started working with Transports to create a simple server-client program.
on the asyncio page I see the following:
Transport.close() can be called immediately after WriteTransport.write() even if data are not sent yet on the socket: both methods are asynchronous. yield from is not needed because these transport methods are not coroutines.
I searched the web (including stackoverflow) but couldn't find a good answer to the following question: what are the major differences between an asynchronous method and a coroutine?
The only 2 differences I can make are:
in coroutines you have a more fine grained control over the order of the methods the main loop executes using the yield from expression.
coroutines are generators, hence are more memory efficient.
anything else I am missing?
Thank you.

In the context asynchronous means both .write() and .close() are regular methods, not coroutines.
If .write() cannot write data immediately it stores the data in internal buffer.
.close() never closes connection immediately but schedules socket closing after all internal buffer will be sent.
So
transp.write(b'data')
transp.write(b'another data')
transp.close()
is safe and perfectly correct code.
Also .write() and .close() are not coroutines, obviously.
You should call coroutine via yield from expression, e.g. yield from coro().
But these methods are convention functions, so call it without yield from as shown in example above.

Difference between the TPL & async/await (Thread handling)

Trying to understanding the difference between the TPL & async/await when it comes to thread creation.
I believe the TPL (TaskFactory.StartNew) works similar to ThreadPool.QueueUserWorkItem in that it queues up work on a thread in the thread pool. That's of course unless you use TaskCreationOptions.LongRunning which creates a new thread.
I thought async/await would work similarly so essentially:
TPL:
Factory.StartNew( () => DoSomeAsyncWork() )
.ContinueWith(
(antecedent) => {
DoSomeWorkAfter();
},TaskScheduler.FromCurrentSynchronizationContext());
Async/Await:
await DoSomeAsyncWork();
DoSomeWorkAfter();
would be identical. From what I've been reading it seems like async/await only "sometimes" creates a new thread. So when does it create a new thread and when doesn't it create a new thread? If you were dealing with IO completion ports i can see it not having to create a new thread but otherwise I would think it would have to. I guess my understanding of FromCurrentSynchronizationContext always was a bit fuzzy also. I always throught it was, in essence, the UI thread.

I believe the TPL (TaskFactory.Startnew) works similar to ThreadPool.QueueUserWorkItem in that it queues up work on a thread in the thread pool.
Pretty much.
From what i've been reading it seems like async/await only "sometimes" creates a new thread.
Actually, it never does. If you want multithreading, you have to implement it yourself. There's a new Task.Run method that is just shorthand for Task.Factory.StartNew, and it's probably the most common way of starting a task on the thread pool.
If you were dealing with IO completion ports i can see it not having to create a new thread but otherwise i would think it would have to.
Bingo. So methods like Stream.ReadAsync will actually create a Task wrapper around an IOCP (if the Stream has an IOCP).
You can also create some non-I/O, non-CPU "tasks". A simple example is Task.Delay, which returns a task that completes after some time period.
The cool thing about async/await is that you can queue some work to the thread pool (e.g., Task.Run), do some I/O-bound operation (e.g., Stream.ReadAsync), and do some other operation (e.g., Task.Delay)... and they're all tasks! They can be awaited or used in combinations like Task.WhenAll.
Any method that returns Task can be awaited - it doesn't have to be an async method. So Task.Delay and I/O-bound operations just use TaskCompletionSource to create and complete a task - the only thing being done on the thread pool is the actual task completion when the event occurs (timeout, I/O completion, etc).
I guess my understanding of FromCurrentSynchronizationContext always was a bit fuzzy also. I always throught it was, in essence, the UI thread.
I wrote an article on SynchronizationContext. Most of the time, SynchronizationContext.Current:
is a UI context if the current thread is a UI thread.
is an ASP.NET request context if the current thread is servicing an ASP.NET request.
is a thread pool context otherwise.
Any thread can set its own SynchronizationContext, so there are exceptions to the rules above.
Note that the default Task awaiter will schedule the remainder of the async method on the current SynchronizationContext if it is not null; otherwise it goes on the current TaskScheduler. This isn't so important today, but in the near future it will be an important distinction.
I wrote my own async/await intro on my blog, and Stephen Toub recently posted an excellent async/await FAQ.
Regarding "concurrency" vs "multithreading", see this related SO question. I would say async enables concurrency, which may or may not be multithreaded. It's easy to use await Task.WhenAll or await Task.WhenAny to do concurrent processing, and unless you explicitly use the thread pool (e.g., Task.Run or ConfigureAwait(false)), then you can have multiple concurrent operations in progress at the same time (e.g., multiple I/O or other types like Delay) - and there is no thread needed for them. I use the term "single-threaded concurrency" for this kind of scenario, though in an ASP.NET host, you can actually end up with "zero-threaded concurrency". Which is pretty sweet.

async / await basically simplifies the ContinueWith methods ( Continuations in Continuation Passing Style )
It does not introduce concurrency - you still have to do that yourself ( or use the Async version of a framework method. )
So, the C# 5 version would be:
await Task.Run( () => DoSomeAsyncWork() );
DoSomeWorkAfter();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string