Invoking the same function multiple times parallely - node.js

I'am trying to invoke the same function parallely, but with different arguements. I used Promise.all, but that doesn't seem to run the tasks parallely. I tried using bluebird, but still it seems that the executions are happening sequentially only. PFB the snippet of code and logs.
let records = await getRecords(query);
if (_.size(records) > 0) {
bluebird.map(records, function (record) {
return prepareFileContent(record.MESSAGE_PAYLOAD);
}, { concurrency: records.length }).then(function (data) {
finalData = data;
console.log("done");
});
}
export async function prepareFileContent(payload : string) : Promise<string>{
return new Promise<string>(function(resolve,reject){
try{
console.log("content generation starts");
//logic goes here
console.log("content generation ends");
resolve(details);
}
catch(err)
{
log.error("Error in parsing the payload:", err);
reject(err);
}
});`
Logs look something like this which shows that they are executed sequentially and not parallely. (from the time here, each one takes 4 seconds for execution)
2018-04-16T08:47:53.095Z content generation starts
2018-04-16T08:47:57.819Z content generation ends
2018-04-16T08:47:57.820Z content generation starts
2018-04-16T08:48:02.253Z content generation ends
2018-04-16T08:48:02.254Z content generation starts
2018-04-16T08:48:06.718Z content generation ends
2018-04-16T08:48:06.718Z content generation starts
2018-04-16T08:48:11.163Z content generation ends
2018-04-16T08:48:11.163Z content generation starts
2018-04-16T08:48:15.573Z content generation ends
2018-04-16T08:48:15.574Z content generation starts
Can someone help me out on how to achieve the same parallely and what am I missing here?

First off, node.js Javascript is single threaded. So, no two pieces of Javascript are ever truly run in parallel. When people speak of things running parallel, that only really applies to asynchronous operations that have a native code component such as networking operations, file operations, etc...
It appears that you're operating under an assumption that promises and functions like Bluebird's Promise.map() enable parallel operation. That is only true if the underlying operations you're monitoring with promises (your prepareFileContent() function in your example) are actually capable of running by themselves outside of the Javascript interpreter. But, the code from your function prepareFileContent() that you show us is just Javascript so it can't ever run in parallel with anything else. Remember, node.js runs your Javascript single threaded so it can't run two pieces of Javascript at the same time, ever.
So, your output is exactly as expected. bluebird.map() iterates through the array, calling your callback on each item in the array and collecting a promise from each function call. Then, it waits for all the promises to be done and collects all the resolved results into an array for you.
But, each of your callbacks is synchronous. They don't have any asynchronous part to them so all your code ends up running synchronously. Nothing runs in parallel.

Related

When should I split some task into asynchronous tinier tasks?

I'm writing a personal project in Node and I'm trying to figure out when a task should be asynchronously splitted. Let's say I have this "4-Step-Task", they are not very expensive (the most expensive its the one who iterates over an array of objects and trying to match a URL with a RegExp, and the array probably won't have more than 20 or 30 objects).
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
The other way will be just executing one after another, but nothing else will progress until this task is done. With the above approach, others tasks can progress at least a little bit between each part.
Is there any criteria about when this approach should be prefered over a classic synchronous one?
Sorry my bad english, not my native language.
All you've described is synchronous code that isn't very long to run. First off, there's no reason to even use promises for that type of code. Secondly, there's no reason to break it up into chunks. All you would be doing with either of those choices is making the code more complicated to write, more complicated to test and more complicated to understand and it would also run slower. All of those are undesirable.
If you force even synchronous code into a promise, then a .then() handler will give some other code a chance to run between .then() handlers, but only certain types of events can be run there because processing a resolved promise is one of the highest priority things to do in the event queue system. It won't, for example, allow another incoming http request arriving on your server to start to run.
If you truly wanted to allow other requests to run and so on, you would be better off just putting the code (without promises) into a WorkerThread and letting it run there and then communicate back the result via messaging. If you wanted to keep it in the main thread, but let any other code run, you'd probably have to use a short setTimeout() delay to truly let all possible other types of tasks run in between.
So, if this code doesn't take much time to run, there's just really no reason to mess with complicating it. Just let it run in the fastest, quickest and simplest way.
If you want more concrete advice, then please show some actual code and provide some timing information about how long it takes to run. Iterating through an array of 20-30 objects is nothing in the general scheme of things and is not a reason to rewrite it into timesliced pieces.
As for code that iterates over an array/list of items doing matching against some string, this is exactly what the Express web server framework does on every incoming URL to find the matching routes. That is not a slow thing to do in Javascript.
Asynchronous programming is a better fit for code that must respond to events – for example, any kind of graphical UI. An example of a situation where programmers use async but shouldn't is any code that can focus entirely on data processing and can accept a “stop-the-world” block while waiting for data to download.
I use it extensivly with a rest API server as we have no idea of how long a request can take to for a server to respond . So in order for us not to "block the app" while waiting for the server response async requests are most useful
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
As you have described in your sample is much more of a synchronous procedural process that would not necessarily allow your interface to still work while your algorithm is busy with a process
In the case of a server call, if you still waiting for server to respond the algorithm using then is still using up resources and wont free your app up to run any other user interface events, while its waiting for the process to reach the next then statement .
You should use Async Await in this instance where you waiting for a user event or a server to respond but do not want your app to hang while waiting for server data...
async function wait() {
await new Promise(resolve => setTimeout(resolve,2000));
console.log("awaiting for server once !!")
return 10;
}
async function wait2() {
await new Promise(resolve => setTimeout(resolve,3000));
console.log("awaiting for server twice !!")
return 10;
}
async function f() {
let promise = new Promise((resolve, reject) => {
setTimeout(() => resolve("done!"), 1000)
});
let result = await promise;//.then(async function(){
console.log(result)
let promise6 = await wait();
let promise7 = await wait2();
//}); // wait until the promise resolves (*)
//console.log(result); // "done!"
}
f();
This sample should help you gain a basic understanding of how async/ Await works and here are a few resources to research it
Promises and Async
Mozilla Refrences

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

What is the fastest way to complete a list of tasks with node.js async?

I have an array of fs.writeFile png jobs with the png headers already removed like so
canvas.toDataURL().replace(/^data:image\/\w+;base64,/,"")
jobs array like this
jobs=[['location0/file0'],['location1/file1'],['location2/file2'],['location3/file3']];
I have just started to use async and was looking at their docs and there are lots of methods
queue looks interesting and parallel..
Right now I handle my jobs (in a async.waterfall) like so
function(callback){//part of waterfall
(function fswritefile(){
if(jobs.length!==0){
var job=jobs.shift();
fs.writeFile(job[0],(new Buffer(job[1],'base64')),function(e){if(e){console.log(e);}else{fswritefile();}})
}
else{callback();}
})();
},//end of waterfall part
Could this be done more efficiently/faster using this module?
async.waterfall will process jobs sequentially. I think you could do everything in parallel with async.each:
async.each(jobs, function (job, done) {
var data = new Buffer(job[1],'base64');
fs.writeFile(job[0], data, done);
}, function (err) {
// …
});
All jobs will start everything in parallel. However, node.js always limits the number of concurrent operations on the disk to 4.
EDIT: No matter what you do, node.js will limit the number of concurrent operations on the fs. The main reason is that you have only have 1 disk and it would be inefficient to attempt more.

nodejs - When to use Asynchronous instead of Synchronous function?

I've read some articles about this stuff. However, I still get stuck in a point. For example, I have two function:
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
People said that asynchronous programming is recommended in IO bound. However, I can't see anything different in this case. The Async one seems to be more ugly.
What's wrong with my point?
nodejs is asynchronous by default which mean that it won't execute your statement in order like in other language for example
database.query("SELECT * FROM hugetable", function(rows) {
var result = rows;
});
console.log("Hello World");
In other language, it will wait until the query statement finish execution.
But in nodejs, it will execute the query statement separately and continue execute to log Hello World to the screen.
so when you say
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
it will return data before db.query return data
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
but in node.js way the function that pass as parameter is called callback which mean it will call whenever the getDataAsync() finish its execution
We use callback in nodejs because we don't know when db.query() finishes its execution (as they don't finish execution in order) but when it finishes it will call the callback.
In your first example, the thread will get blocked at this point, until the data is retrieved from the db
db.query("some query");
In the second example, the thread will not get blocked but it will be available to serve more requests.
{
return JSON.stringify(result);
}
This function will be called as soon as the data is available from the db.
That's why node.js is called non-blocking IO which means your thread is never blocked.
Asynchronous is used when some operation blocks the execution. It is not an problem in Multi thread application or server. But NodeJS is single threaded application which should not blocked by single operation. If it is blocked by operation like db.query("some query");, Node.js will wait to finish it.
It is similar to you just stands idle in front of rice cooker until it is cooked. Generally, you will do other activities while rice is cooking. When whistle blows, you can do anything with cooked rice. Similarly NodeJS will sends the asychronous operation to event loop which will intimate us when operation is over. Meanwhile Node.js can do processing other operation like serving other connections.
You said Ugly. Asynchronous does not mean to use only callbacks. You can use Promise, co routine or library async.

Why does Node.js have both async & sync version of fs methods?

In Node.js, I can do almost any async operation one of two ways:
var file = fs.readFileSync('file.html')
or...
var file
fs.readFile('file.html', function (err, data) {
if (err) throw err
console.log(data)
})
Is the only benefit of the async one custom error handling? Or is there really a reason to have the file read operation non-blocking?
These exist mostly because node itself needs them to load your program's modules from disk when your program starts. More broadly, it is typical to do a bunch a synchronous setup IO when a service is initially started but prior to accepting network connections. Once the program is ready to go (has it's TLS cert loaded, config file has been read, etc), then a network socket is bound and at that point everything is async from then on.
Asynchronous calls allow for the branching of execution chains and the passing of results through that execution chain. This has many advantages.
For one, the program can execute two or more calls at the same time, and do work on the results as they complete, not necessarily in the order they were first called.
For example if you have a program waiting on two events:
var file1;
var file2;
//Let's say this takes 2 seconds
fs.readFile('bigfile1.jpg', function (err, data) {
if (err) throw err;
file1 = data;
console.log("FILE1 Done");
});
//let's say this takes 1 second.
fs.readFile('bigfile2.jpg', function (err, data) {
if (err) throw err;
file2 = data;
console.log("FILE2 Done");
});
console.log("DO SOMETHING ELSE");
In the case above, bigfile2.jpg will return first and something will be logged after only 1 second. So your output timeline might be something like:
#0:00: DO SOMETHING ELSE
#1:00: FILE2 Done
#2:00: FILE1 Done
Notice above that the log to "DO SOMETHING ELSE" was logged right away. And File2 executed first after only 1 second... and at 2 seconds File1 is done. Everything was done within a total of 2 seconds though the callBack order was unpredictable.
Whereas doing it synchronously it would look like:
file1 = fs.readFileSync('bigfile1.jpg');
console.log("FILE1 Done");
file2 = fs.readFileSync('bigfile2.jpg');
console.log("FILE2 Done");
console.log("DO SOMETHING ELSE");
And the output might look like:
#2:00: FILE1 Done
#3:00: FILE2 Done
#3:00 DO SOMETHING ELSE
Notice it takes a total of 3 seconds to execute, but the order is how you called it.
Doing it synchronously typically takes longer for everything to finish (especially for external processes like filesystem reads, writes or database requests) because you are waiting for one thing to complete before moving onto the next. Sometimes you want this, but usually you don't. It can be easier to program synchronously sometimes though, since you can do things reliably in a particular order (usually).
Executing filesystem methods asynchronously however, your application can continue executing other non-filesystem related tasks without waiting for the filesystem processes to complete. So in general you can continue executing other work while the system waits for asynchronous operations to complete. This is why you find database queries, filesystem and communication requests to generally be handled using asynchronous methods (usually). They basically allow other work to be done while waiting for other (I/O and off-system) operations to complete.
When you get into more advanced asynchronous method chaining you can do some really powerful things like creating scopes (using closures and the like) with a little amount of code and also create responders to certain event loops.
Sorry for the long answer. There are many reasons why you have the option to do things synchronously or not, but hopefully this will help you decide whether either method is best for you.
The benefit of the asynchronous version is that you can do other stuff while you wait for the IO to complete.
fs.readFile('file.html', function (err, data) {
if (err) throw err;
console.log(data);
});
// Do a bunch more stuff here.
// All this code will execute before the callback to readFile,
// but the IO will be happening concurrently. :)
You want to use the async version when you are writing event-driven code where responding to requests quickly is paramount. The canonical example for Node is writing a web server. Let's say you have a user making a request which is such that the server has to perform a bunch of IO. If this IO is performed synchronously, the server will block. It will not answer any other requests until it has finished serving this request. From the perspective of the users, performance will seem terrible. So in a case like this, you want to use the asynchronous versions of the calls so that Node can continue processing requests.
The sync version of the IO calls is there because Node is not used only for writing event-driven code. For instance, if you are writing a utility which reads a file, modifies it, and writes it back to disk as part of a batch operation, like a command line tool, using the synchronous version of the IO operations can make your code easier to follow.

Resources