Too many callbacks issue - node.js

I know that writing async functions is recommended in nodejs. However, I feel it's not so nescessary to write some non IO events asynchronously. My code can get less convenient. For example:
//sync
function now(){
return new Date().getTime();
}
console.log(now());
//async
function now(callback){
callback(new Date().getTime());
}
now(function(time){
console.log(time);
});
Does sync method block CPU in this case? Is this remarkable enough that I should use async instead?

Async style is necessary if the method being called can block for a long time waiting for IO. As the node.js event loop is single-threaded you want to yield to the event loop during an IO. If you didn't do this there could be only one IO outstanding at each point in time. That would lead to total non-scalability.
Using callbacks for CPU work accomplishes nothing. It does not unblock the event loop. In fact, for CPU work it is not possible to unblock the event loop. The CPU must be occupied for a certain amount of time and that is unavoidable. (Disregarding things like web workers here).
Callbacks are nothing good. You use them when you have to. They are a necessary consequence of the node.js event loop IO model.
That said, if you later plan on introducing IO into now you might eagerly use a callback style even if not strictly necessary. Changing from synchronous calls to callback-based calls later can be time-consuming because the callback style is viral.

By adding a callback to a function's signature, the code communicates that something asynchronous might happen in this function and the function will call the callback with an error and/or result object.
In case a function does nothing asynchronous and does not involve conditions where a (non programming) error may occur don't use a callback function signature but simply return the computation result.
Functions with callbacks are not very convenient to handle by the caller so avoid callbacks until you really need them.

Related

Is there any linter that detects blocking calls in an async function?

https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)
So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.

How can I stop async/await from bubbling up in functions?

Lets say I have a function A that uses a function B which uses C, etc:
A -> B -> C -> D and E
Now assume that function D has to use async/await. This means I have to use async/await to the call of function C and then to the call of function B and so on. I understand that this is because they depend on each other and if one of them is waiting for a function to resolve, then transitively, they all have to. What alternatives can I do to make this cleaner?
There is a way to do this, but you'll loose the benefits of async-await.
One of the reason for async-await, is, that if your thread has to wait for another process to complete, like a read or write to the hard-disk, a database query, or fetching some internet information, your thread might do some other useful stuff instead of just waiting idly for this other process to complete.
This is done by using the keyword await. Once your thread sees the await. The thread doesn't really wait idly. Instead, it remembers the context (think of variable values, parts of the call stack etc) and goes up the call stack to see if the caller is not awaiting. If not, it starts executing these statements until it sees an await. It goes up the call stack again to see if the caller is not awaiting, etc.
Once this other process is completed the thread (or maybe another thread from the thread pool that acts as if it is the original thread) continues with the statements after the await until the procedure is finished, or until it sees another await.
To be able to do this, your procedure must know, that when it sees an await, the context needs to be saved and the thread must act like described above. That is why you declare a method async.
That is why typical async functions are functions that order other processes to do something lengthy: disk access, database access, internet communications. Those are typical functions where you'll find a ReadAsync / WriteAsync, next to the standard Read / Write functions. You'll also find them in classes that are typically designed to call these processes, like StreamReaders, TextWriters etc.
If you have a non-async class that calls an async function and waits until the async function completes before returning, the go-up-the-call-stack-to-see-if-the-caller-is-not-awaiting stops here: your program acts as if it is not using async-await.
Almost!
If you start an awaitable task, without waiting for it to finish, and do something else before you wait for the result, then this something else is executed instead of the idly wait, that the thread would have done if you would have used the non-async version.
How to call async function from non-async function
ICollection<string> ReadData(...)
{
// call the async function, don't await yet, you'll have far more interesting things to do
var taskReadLines = myReader.ReadLinesAsync(...);
DoSomethingInteresting();
// now you need the data from the read task.
// However, because this method is not async, you can't await.
// This Wait will really be an idle wait.
taskReadLines.Wait();
ICollection<string> readLines= taskRead.Result;
return readLines();
}
Your callers won't benefit from async-await, however your thread will be able to do something interesting while the lines have not been read yet.

Is it safe to skip calling callback if no action needed in nodejs

scenario 1
function a(callback){
console.log("not calling callback");
}
a(function(callback_res){
console.log("callback_res", callback_res);
});
scenario 2
function a(callback){
console.log("calling callback");
callback(true);
}
a(function(callback_res){
console.log("callback_res", callback_res);
});
will function a be waiting for callback and will not terminate in scenario 1? However program gets terminated in both scenario.
The problem is not safety but intention. If a function accepts a callback, it's expected that it will be called at some point. If it ignores the argument it accepts, the signature is misleading.
This is a bad practice because function signature gives false impression about how a function works.
It also may cause parameter is unused warning in linters.
will function a be waiting for callback and will not terminate in scenario 1?
The function doesn't contain asynchronous code and won't wait for anything. The fact that callbacks are commonly used in asynchronous control flow doesn't mean that they are asynchronous per se.
will function a be waiting for callback and will not terminate in scenario 1?
No. There is nothing in the code you show that waits for a callback to be called.
Passing a callback to a function is just like passing an integer to a function. The function is free to use it or not and it doesn't mean anything more than that to the interpreter. the JS interpreter has no special logic to "wait for a passed callback to get called". That has no effect one way or the other on when the program terminates. It's just a function argument that the called function can decide whether to use or ignore.
As another example, it used to be common to pass two callbacks to a function, one was called upon success and one was called upon error:
function someFunc(successFn, errorFn) {
// do some operation and then call either successFn or errorFn
}
In this case, it was pretty clear that one of these was going to get called and the other was not. There's no need (from the JS interpreter's point of view) to call a passed callback. That's purely the prerogative of the logic of your code.
Now, it would not be a good practice to design a function that shows a callback in the calling signature and then never, ever call that callback. That's just plain wasteful and a misleading design. There are many cases of callbacks that are sometimes called and sometimes not depending upon circumstances. Array.prototype.forEach is one such example. If you call array.forEach(fn) on an empty array, the callback is never called. But, of course, if you call it on a non-empty array, it is called.
If your function carries out asynchronous operations and the point of the callback is to communicate when the asynchronous operation is done and whether it concluded with an error or a value, then it would generally be bad form to have code paths that would never call the callback because it would be natural for a caller to assume the callback is doing to get called eventually. I can imagine there might be some exceptions to this, but they better be documented really well with the doc/comments for the function.
For asynchronous operations, your question reminds me somewhat of this: Do never resolved promises cause memory leak? which might be useful to read.

How to know when the Promise is actually resolved in Node.js?

When we are using Promise in nodejs, given a Promise p, we can't know when the Promise p is actually resolved by logging the currentTime in the "then" callback.
To prove that, I wrote the test code below (using CoffeeScript):
# the procedure we want to profile
getData = (arg) ->
new Promise (resolve) ->
setTimeout ->
resolve(arg + 1)
, 100
# the main procedure
main = () ->
beginTime = new Date()
console.log beginTime.toISOString()
getData(1).then (r) ->
resolveTime = new Date()
console.log resolveTime.toISOString()
console.log resolveTime - beginTime
cnt = 10**9
--cnt while cnt > 0
return cnt
main()
When you run the above code, you will notice that the resolveTime (the time your code run into the callback function) is much later than 100ms from the beginTime.
So If we want to know when the Promise is actually resolved, HOW?
I want to know the exact time because I'm doing some profiling via logging. And I'm not able to modify the Promise p 's implementation when I'm doing some profiling outside of the black box.
So, Is there some function like promise.onUnderlyingConditionFulfilled(callback) , or any other way to make this possible?
This is because you have a busy loop that apparently takes longer than your timer:
cnt = 10**9
--cnt while cnt > 0
Javascript in node.js is single threaded and event driven. It can only do one thing at a time and it will finish the current thing it's doing before it can service the event posted by setTimeout(). So, if your busy loop (or any other long running piece of Javascript code) takes longer than you've set your timer for, the timer will not be able to run until this other Javascript code is done. "single threaded" means Javascript in node.js only does one thing at a time and it waits until one thing returns control back to the system before it can service the next event waiting to run.
So, here's the sequence of events in your code:
It calls the setTimeout() to schedule the timer callback for 100ms from now.
Then you go into your busy loop.
While it's in the busy loop, the setTimeout() timer fires inside of the JS implementation and it inserts an event into the Javascript event queue. That event can't run at the moment because the JS interpreter is still running the busy loop.
Then eventually it finishes the busy loop and returns control back to the system.
When that is done, the JS interpreter then checks the event queue to see if any other events need servicing. It finds the timer event and so it processes that and the setTimeout() callback is called.
That callback resolves the promise which triggers the .then() handler to get called.
Note: Because of Javascript's single threaded-ness and event-driven nature, timers in Javascript are not guaranteed to be called exactly when you schedule them. They will execute as close to that as possible, but if other code is running at the time they fire or if their are lots of items in the event queue ahead of you, that code has to finish before the timer callback actually gets to execute.
So If we want to know when the Promise is actually resolved, HOW?
The promise is resolved when your busy loop is done. It's not resolved at exactly the 100ms point (because your busy loop apparently takes longer than 100ms to run). If you wanted to know exactly when the promise was resolved, you would just log inside the setTimeout() callback right where you call resolve(). That would tell you exactly when the promise was resolved though it will be pretty much the same as where you're logging now. The cause of your delay is the busy loop.
Per your comments, it appears that you want to somehow measure exactly when resolve() is actually called in the Promise, but you say that you cannot modify the code in getData(). You can't really do that directly. As you already know, you can measure when the .then() handler gets called which will probably be no more than a couple ms after resolve() gets called.
You could replace the promise infrastructure with your own implementation and then you could instrument the resolve() callback directly, but replacing or hooking the promise implementation probably influences the timing of things even more than just measuring from the .then() handler.
So, it appears to me that you've just over-constrained the problem. You want to measure when something inside of some code happens, but you don't allow any instrumentation inside that code. That just leaves you with two choices:
Replace the promise implementation so you can instrument resolve() directly.
Measure when .then() is triggered.
The first option probably has a heisenberg uncertainty issue in that you've probably influenced the timing by more than you should if you replace or hook the promise implementation.
The second option measures an event that happens just slightly after the actual .resolve(). Pick which one sounds closest to what you actually want.

When to use callbacks?

I don't quite understand the use of callbacks in node.js. I understand that if you have something like
result = db.execute(query);
doSomething(result);
you should make doSomething a callback because doSomething would get executed before the result is ready. This makes sense because the db operation can be expensive.
Now let's say I have something like
result = calculate(x,y)
doSomething(result)
where calculate is not expensive (i.e. no reading from database or I/O), should I still be using a callback? How can I tell if my function would complete before or after the next line would get executed?
Thanks
In short, your function needs to accept a callback parameter if your function is calling asynchronous functions (e.g. invoking I/O operations or database calls) so that the results of those calls can be provided to the caller of your function. If your function is just making synchronous calls then your function is also synchronous and you don't need a callback parameter (as in the case of your second example).

Resources