Nodejs convert expensive synchronous tasks to async series - node.js

In nodejs I have an expensive function such as:
function expensiveCode(){
a.doExensiveOperation(1);
b.doAnotherExensiveOperation(2);
c.doADiffererentExensiveOperation(3);
d.doExensiveOperation(4);
}
Such that each sub-function call has different parameters and therefore can't be done in a loop. I'd like to throttle this expensive function call so that each sub-call is done on nextTick such as:
function expensiveCode(){
process.nextTick(function(){
a.doExensiveOperation(1);
process.nextTick(function(){
b.doAnotherExensiveOperation(2);
process.nextTick(function(){
c.doADiffererentExensiveOperation(3);
process.nextTick(function(){
d.doExensiveOperation(4);
});
});
});
});
}
That's obviously ugly, and if there are 20 lines of different operations will too hideous to even consider.
I've reviewed a number of libraries like "async.js" but they all appear to be expecting the called functions to be async - to have a callback function on completion. I need a simple way to do it without converting all my code to the 'callback when done' method which seems like overkill.

Sorry to burst your bubble, but async.waterfall or perhaps async.series combined with async.apply is what you want, and yes you'll need to make those operations async. I find it hard to believe you have found 20 different computationally-intense operations, none of which do any IO whatsoever. Check out the bcrypt library for an example of how to offer both synchronous and asynchronous versions of a CPU-intensive call. Converting your code to callback on completion isn't overkill, it's node. That's the rule in node. Either your function does no IO and completes quickly or you make it async with a callback. End of options.

Related

Mongoose eachAsync with async function = memory leak?

I am not sure if it's a bug in mongoose or if I am doing something wrong. Once I start using async functions when iterating on a cursor with eachAsync I experience memory leaks (quickly goes up to 4gb and then crashes). After trying some things I noticed that this wouldn't happen if I don't use an async function as callback.
No Memory leak:
const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
await playerCursor.eachAsync(
(profile: IPlayerProfileModel) => {
return;
},
{ parallel: 50 }
);
Memory leak:
const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
await playerCursor.eachAsync(
async (profile: IPlayerProfileModel) => {
return;
},
{ parallel: 50 }
);
Obviously above code doesn't make any sense but I need to perform an asynchronous operation within the function.
Question:
What is causing the memory leak / how can I avoid it?
It has to do with how async functions work.
Quoting the documentation:
When the async function returns a value, the Promise will be resolved
with the returned value.
Meaning, values returned by async functions will be automatically wrapped into a Promise.
In your first code sample, your code returns undefined whereas in the second code sample your code returns Promise.resolve(undefined).
What is causing the memory leak?
I didn't take a look at mongoose code but the documentation states:
If fn returns a promise, will wait for the promise to resolve before iterating on to the next one.
Since your first example does not return a Promise, I am thinking your callback is executed on each result all at once rather than sequentially.
How can I avoid it?
I'd recommend using async/wait as you used it on your second code sample.
After taking a look at the code (looking for an answer myself), if provided with a callback that doesn't return a promise eachAsync will run the callback as many times as fast as possible as it can.
This line is where your callback is executed. The next line checks whether it's a Promise and if it's not then execute right away callback which effectively calls your eachAsync callback on the next result. If your callback has any sort of async operation but returns right away then you end up with thousands and thousands of async operations running all at once.
On top of that, you set the option parallel to 100 so it executes eachAsync callback one hundred times in parallel.
This isn't a bug on mongoose because there are cases where this behavior is wanted and it does provide with a sequential processing using Promise. The documentation should mention the caveat of using a callback which doesn't return a Promise.
To go a little further, express uses next on middleware callbacks for the purpose of sequencing them.

Node.js Control Flow and Callbacks

I've been confused on this for a month and searched everything but could not find an answer.
I want to get control of what runs first in the node.js. I know the way node deals with the code is non-blocking. I have the following example:
setTimeOut(function(){console.log("one second passed");}, 1000);
console.log("HelloWorld");
Here I want to run first one, output "one second passed", and then run the second one. How can I do that? I know setTimeOut is a way to solve this problem but that's not the answer I am looking for. I've tried using callback but not working. I am not sure about if I got the correct understanding of callbacks. Callbacks just mean function parameters to me and I don't think that will help me to solve this problem.
One possible way to solve this problem is to define a function that contains the "error first callback" like the following example:
function print_helloworld_atend(function helloworld(){
console.log("HelloWorld");
}){
setTimeOut(function(){console.log("one second passed");}, 1000);
helloworld();
}
Can I define a function with a callback who will know when the previous tasks are done. In the above function, how to make the callback function: helloworld to run after the "setTimeOut" expression?
If there is a structure that can solve my problem, that's my first choice. I am tired of using setTimeOuts.
I would really appreciate if anyone can help. Thanks for reading
You should be using promises. Bluebird is a great promise library. Faster than native and comes with great features. With promises you can chain together functions, and know that one will not be called until the previous function resolves. No need to set timeouts or delays. Although you can if you'd like. Below is example of a delay. Function B wont run until 6 seconds after A finishes. If you remove .delay(ms) B will run immediately after A finishes.
var Promise = require("bluebird");
console.time('tracked');
console.time('first');
function a (){
console.log('hello');
console.timeEnd('first');
return Promise.resolve();
}
function b (){
console.log('world');
console.timeEnd('tracked');
}
a().delay(6000)
.then(b)
.catch(Promise.TimeoutError, function(e) {
console.log('Something messed up yo', e);
});
This outputs:
→ node test.js
hello
first: 1.278ms
world
tracked: 6009.422ms
Edit: Promises are, in my opinion, the most fun way of control flow in node/javascript. To my knowledge there is not a .delay() or .timeout() in native javascript promises. However, there are Promises in general. You can find their documentation on mozilla's site. I would recommend that you use Bluebird instead though.
Use bluebird instead of native because:
It's faster. Petka Antonov, the creator of bluebird, has a great understanding of the V8 engines two compile steps and has optimized the library around it's many quirks. Native has little to no optimization and it shows when you compare their performance. More information here and here.
It has more features: Nice things like .reflect(), .spread(), .delay(), .timeout(), the list goes on.
You lose nothing by switching: all features in bluebird which also exist in native function in exactly the same way in implementation. If you find yourself in a situation where only native is available to you, you wont have to relearn what you already know.
Just execute everything that you want to execute after you log "one second passed", after you log "one second passed":
setTimeOut(function(){
console.log("one second passed");
console.log("HelloWorld");
}, 1000);
You can use async module to handle the callbacks.
To understand callbacks I'll give you a high level glance:
function: i want to do some i/o work.
nodejs: ok, but you shouldn't be blocking my process as I'm single threaded.
nodejs: pass a callback function, and I will let you know from it when the i/o work is done.
function: passes the callback function
nodejs: i/o work is done, calls the callback function.
function: thanks for the notification, continue processing other work.

Too many callbacks issue

I know that writing async functions is recommended in nodejs. However, I feel it's not so nescessary to write some non IO events asynchronously. My code can get less convenient. For example:
//sync
function now(){
return new Date().getTime();
}
console.log(now());
//async
function now(callback){
callback(new Date().getTime());
}
now(function(time){
console.log(time);
});
Does sync method block CPU in this case? Is this remarkable enough that I should use async instead?
Async style is necessary if the method being called can block for a long time waiting for IO. As the node.js event loop is single-threaded you want to yield to the event loop during an IO. If you didn't do this there could be only one IO outstanding at each point in time. That would lead to total non-scalability.
Using callbacks for CPU work accomplishes nothing. It does not unblock the event loop. In fact, for CPU work it is not possible to unblock the event loop. The CPU must be occupied for a certain amount of time and that is unavoidable. (Disregarding things like web workers here).
Callbacks are nothing good. You use them when you have to. They are a necessary consequence of the node.js event loop IO model.
That said, if you later plan on introducing IO into now you might eagerly use a callback style even if not strictly necessary. Changing from synchronous calls to callback-based calls later can be time-consuming because the callback style is viral.
By adding a callback to a function's signature, the code communicates that something asynchronous might happen in this function and the function will call the callback with an error and/or result object.
In case a function does nothing asynchronous and does not involve conditions where a (non programming) error may occur don't use a callback function signature but simply return the computation result.
Functions with callbacks are not very convenient to handle by the caller so avoid callbacks until you really need them.

How to determine whether an asychronous call is the last one or not

Working with an asynchronous language (node.js in my case) I often end up with a two dimensional loop and an response at the very end:
firstArray.forEach(function(first){
sencondArray[first].forEach(function(second){
doStuff(function callback(resultSum){
if(isLastCallback(this))
response.send(resultSum);
});
});
});
My first approach to get the proper result of isLastCallback, was using the provided counter i of both forEach iterations, but that ended up in a huge mess.
What is the common way to detect if the current run of callback is the last one?
I think what you're looking for is to do several asynchronous calls and then combine the results. You should look at the async module. You probably want async#parallel. It will run your async functions in parallel and then call the final callback with an array of all the results.

Synchronous NodeJs (or other serverside JS) call

We are using Node for developing and 95% of code is Async, working fine.
For some 5% (one small module), which is sync in nature [and depends on other third party software],
and we are looking for
1. "Code to block until call back is finished"
2. At a time only one instance of function1 + its callback should be executed.
PS 1: I do completely agree, Node is for async work, We should avoid that, but this is separate non-realtime process.
PS 2: If not with Node any other Serverside JS framework? Last option is to use other lang like python, but if anything in JS possible, we are ready to give it a shot!
SEQ should solve your problem.
For an overview about sync modules please look at http://nodejsrocks.blogspot.de/2012/05/how-to-avoid-nodejs-spaghetti-code-with.html
Seq()
.seq(function () {
mysql.query("select * from foo",[], function(err,rows,fields) {
this(null, rows);
});
})
.seq(function(mysqlResult) {
console.log("mysql callback returnes:"+mysqlResult);
})
There are lots and lots of options, look at node-async, kaffeine async support, IcedCoffeescript, etc.
I want to make a plug for IcedCoffeeScript since I'm its maintainer. You can get by with solutions like Seq, but in general you'll wind up encoding control flow with function calls. I find that approach difficult to write and maintain. IcedCoffeeScript makes simple sequential operations a breeze:
console.log "hello, just wait a sec"
await setTimeout defer(), 100
console.log "ok, what did you want"
But more important, it handles any combination of async code and standard control flow:
console.log "Let me check..."
if isRunningLate()
console.log "Can't stop now, sorry!"
else
await setTimeout defer(), 1000
console.log "happy to wait, now what did you want?"
resumeWhatIWasDoingBefore()
Also loops work well, here is serial dispatch:
for i in [0...10]
await launchRpc defer res[i]
done()
And here is parallel dispatch:
await
for i in [0...10]
launchRpc defer res[i]
done()
Not only does ICS make sequential chains of async code smoother, it also encourages you to do as much as possible in parallel. If you need to change your code or your concurrency requirements, the changes are minimal, not a complete rewrite (as it would be in standard JS/CS or with some concurrency libraries).

Resources