What is a blocking function? - node.js

What is a blocking function or a blocking call?
This is a term I see again and again when referring to Node.js or realtime processing languages.

A function that stops script execution until it ends.
For example, if I had a function in my language that was used to write to a file, like so:
fwrite(file, "Contents");
print("Wrote to file!");
The print statement would only be executed once the file has been written to the disk. The whole program is halted on this instruction. This isn't noticeable for small enough writes, but imagine I had a huge blob to write to the file, one that took many seconds:
fwrite(file, blob);
print("Wrote to file!");
The print statement would only be executed after a few seconds of writting, and the whole program would be stopped for that time. In Node.js, this stuff is done asynchronously, using events and callbacks. Our example would become:
fwrite(file, blob, function() {
print("Wrote to file!");
});
print("Do other stuff");
Where the third parameter is a function to be called once the file has been written. The print statement located after the write function would be called immediately after, whether or not the file has been written yet. So if we were to write a huge enough blob, the output might look like this:
Do other stuff
Wrote to file!
This makes applictions very fast because you're not waiting on a client message, a file write or other. You can keep on processing the data in a parallel manner. This is considered by many one of the strengths of Node.js.

var block = function _block() {
while(true) {
readInputs();
compute();
drawToScreen();
}
}
A blocking function basically computes forever. That's what it means by blocking.
Other blocking functions would wait for IO to occur
a non-blocking IO system means a function starts an IO action, then goes idle then handles the result of the IO action when it happens.
It's basically the difference between a thread idling and sleeping.

A blocking call is one that doesn't allow processing to continue until it returns to the calling thread - this is also referred to as a synchronous call. Asynchronous on the other hand means that threads (and code) can execute at the same time (concurrently).

Related

Doubts about event loop , multi-threading, order of executing readFile() in Node.js?

fs.readFile("./large.txt", "utf8", (err, data) => {
console.log('It is a large file')
//this file has many words (11X MB).
//It takes 1-2 seconds to finish reading (usually 1)
});
fs.readFile("./small.txt","utf8", (err, data) => {
for(let i=0; i<99999 ;i++)
console.log('It is a small file');
//This file has just one word.
//It always takes 0 second
});
Result:
The console will always first print "It is a small file" for 99999 times (it takes around 3 seconds to finish printing).
Then, after they are all printed, the console does not immediately print "It is a large file". (It is always printed after 1 or 2 seconds).
My thought:
So, it seems that the first readFile() and second readFile() functions do not run in parallel. If the two readFile() functions ran in parallel, then I would expect that after "It is a small file" was printed for 99999 times,
the first readFile() is finished reading way earlier (just 1 second) and the console would immediately print out the callback of the first readFile() (i.e. "It is a large file".)
My questions are :
(1a) Does this mean that the first readFile() will start to read file only after the callback of second readFile() has done its work?
(1b) To my understanding, in nodeJs, event loop passes the readFile() to Libuv multi-thread. However, I wonder in what order they are passed. If these two readFile() functions do not run in parallel, why is the second readFile() function always executed first?
(2) By default, Libuv has four threads for Node.js. So, here, do these two readFile() run in the same thread? Among these four threads, I am not sure whether there is only one for readFile().
Thank you very much for spending your time! Appreciate!
I couldn't possibly believe that node would delay the large file read until the callback for the small file read had completed, and so I did a little more instrumentation of your example:
const fs = require('fs');
const readLarge = process.argv.includes('large');
const readSmall = process.argv.includes('small');
if (readLarge) {
console.time('large');
fs.readFile('./large.txt', 'utf8', (err, data) => {
console.timeEnd('large');
if (readSmall) {
console.timeEnd('large (since busy wait)');
}
});
}
if (readSmall) {
console.time('small');
fs.readFile('./small.txt', 'utf8', (err, data) => {
console.timeEnd('small');
var stop = new Date().getTime();
while(new Date().getTime() < stop + 3000) { ; } // busy wait
console.time('large (since busy wait)');
});
}
(Note that I replaced your loop of console.logs with a 3s busy wait).
Running this against node v8.15.0 I get the following results:
$ node t small # read small file only
small: 0.839ms
$ node t large # read large file only
large: 447.348ms
$ node t small large # read both files
small: 3.916ms
large: 3252.752ms
large (since busy wait): 247.632ms
These results seem sane; the large file took ~0.5s to read on its own, but when the busy waiting callback interfered for 2s, it completed relatively soon (~1/4s) thereafter. Tweaking the length of the busy wait keeps this relatively consistent, so I'd be willing to just say this was some kind of scheduling overhead and not necessarily a sign that the large file I/O had not been running during the busy wait.
But then I ran the same program against node 10.16.3, and here's what I got:
$ node t small
small: 1.614ms
$ node t large
large: 1019.047ms
$ node t small large
small: 3.595ms
large: 4014.904ms
large (since busy wait): 1009.469ms
Yikes! Not only did the large file read time more than double (to ~1s), it certainly appears as if no I/O at all had been completed before the busy wait concluded! i.e., it sure looks like the busy wait in the main thread prevented any I/O at all from happening on the large file.
I suspect that this change from 8.x to 10.x is a result of this "optimization" in Node 10: https://github.com/nodejs/node/pull/17054. This change, which splits the read of large files into multiple operations, seems to be appropriate for smoothing performance of the system in general purpose cases, but it is likely exacerbated by the unnatural long main thread processing / busy wait in this scenario. Presumably, without the main thread yielding, the I/O is not getting a chance to advance to the next range of bytes in the large file to be read.
It would seem that, with Node 10.x, it is important to have a responsive main thread (i.e., one that yields frequently, and doesn't busy wait like in this example) in order to sustain I/O performance of large file reads.
(1a) Does this mean that the first readFile() will start to read file only after the callback of second readFile() has done its work?
No. Each readFile() actually consists of multiple steps (open file, read chunk, read chunk ... close file). The logic flow between steps is controlled by Javascript code in the node.js fs library. But, a portion of each step is implemented by native threaded code in libuv using a thread pool.
So, the first step of the first readFile() will be initiated and then control is returned back to the JS interpreter. Then, the first step of the second readFile() will be initiated and then control returned back to the JS interpreter. It can ping pong back and forth between progress in the two readFile() operations as long as the JS interpreter isn't kept busy. But, if the JS interpreter does get busy for awhile, it will stall further progress when the current step that's proceeding in the background completes. There's a full step-by-step chronology at the end of the answer if you want to follow the details of each step.
(1b) To my understanding, in nodeJs, event loop passes the readFile() to Libuv multi-thread. However, I wonder in what order they are passed. If these two readFile() functions do not run in parallel, why is the second readFile() function always executed first?
fs.readFile() itself is not implemented in libuv. It's implemented as a series of individual steps in node.js Javascript. Each individual step (open file, read chunk, close file) is implemented in libuv, but Javascript in the fs library controls the sequencing between steps. So, think of fs.readfile() as a series of calls to libuv. When you have two fs.readFile() operations in flight at the same time, each will have some libuv operation going at any given time and one step for each fs.readFile() can be proceeding in parallel due to the thread pool implementation in libuv. But, between each step in the process, control comes back the JS interpreter. So, if the interpreter gets busy for some extended portion of time, then further progress in scheduling the next step of the other fs.readFile() operation is stalled.
(2) By default, Libuv has four threads for Node.js. So, here, do these two readFile() run in the same thread? Among these four threads, I am not sure whether there is only one for readFile().
I think this is covered in the previous two explanations. readFile() itself is not implemented in native code of libuv. Instead, it's written in Javascript with calls to open, read, close operations that are written in native code and use libuv and the thread pool.
Here's a full accounting of what's going on. To fully understand, one needs to know about these:
Main Concepts
The single threaded, non-pre-emptive nature of node.js running your Javascript (assuming no WorkerThreads are manually coded here - which they aren't).
The multi-threaded, native code of the fs module's file I/O and how that works.
How native code asynchronous operations communicate completion via the event queue and how event loop scheduling works when the JS interpreter is busy doing something.
Asynchronous, Non-Blocking
I presume you know that fs.readFile() is asynchronous and non-blocking. That means when you call it, all it does is initiate an operation to read the file and then it goes right onto the next line of code at the top level after the fs.readFile() (not the code inside the callback you pass to it).
So, a condensed version of your code is basically this:
fs.readFile(x, funcA);
fs.readFile(y, funcB);
If we added some logging to this:
function funcA() {
console.log("funcA");
}
function funcB() {
console.log("funcB");
}
function spin(howLong) {
let finishTime = Date.now() + howLong;
// spin until howLong ms passes
while (Date.now() < finishTime) {}
}
console.log("1");
fs.readFile(x, funcA);
console.log("2");
fs.readFile(y, funcB);
console.log("3");
spin(30000); // spin for 30 seconds
console.log("4");
You would see either this order:
1
2
3
4
A
B
or this order:
1
2
3
4
B
A
Which of the two it was would just depend upon the indeterminate race between the two fs.readFile() operations. Either could happen. Also, notice that 1, 2, 3 and 4 are all logged before any asynchronous completion events can occur. This is because the single-threaded, non-pre-emptive JS interpreter main thread is busy executing Javascript. It won't pull the next event out of the event queue until it's done executing this piece of Javascript.
Libuv Thread Pool
As you appear to already know, the fs module uses a libuv thread pool for running file I/O. That's independent of the main JS thread so those read operations can proceed independently from further JS execution. Using native code, file I/O will communicate with the event queue when they are done to schedule their completion callback.
Indeterminate Race Between Two Asynchronous Operations
So, you've just created an indeterminate race between the two fs.readFile() operations that are likely each running in their own thread. A small file is much more likely to complete first before the larger file because the larger file has a lot more data to read from the disk.
Whichever fs.readFile() finishes first will insert its callback into the event queue first. When the JS interpreter is free, it will pick the next event out of the event queue. Whichever one finishes first gets to run its callback first. Since the small file is likely to finish first (which is what you are reporting), it gets to run its callback. Now, when it is running its callback, this is just Javascript and even though the large file may finish and insert its callback into the event queue, that callback can't run until the callback from the small file finishes. So, it finishes and THEN the callback from the large file gets to run.
In general, you should never write code like this unless you don't care at all what order the two asynchronous operations finish in because it's an indeterminate race and you cannot count on which one will finish first. Because of the asynchronous non-blocking nature of fs.readFile(), there is no guarantee that the first file operation initiated will finish first. It's no different than firing off two separate http requests one after the other. You don't know which one will complete first.
Step By Step Chronology
Here's a step by step chronology of what happens:
You call fs.readFile("./large.txt", ...);
In Javascript code, that initiates opening the large.txt file by calling native code and then returns. The opening of the actual file is handled by libuv in native code and when that is done, an event will be inserted into the JS event queue.
Immediately after that operation is initiated, then that first fs.readFile() returns (not yet done yet, still processing internally).
Now the JS interpreter picks up at the next line of code and runs fs.readFile("./small.txt", ...);
In Javascript code, that initiates opening the small.txt file by calling native code and then returns. The opening of the actual file is handled by libuv in native code and when that is done, an event will be inserted into the JS event queue.
Immediately after that operation is initiated, then that second fs.readFile() returns (not yet done yet, still processing internally).
The JS interpreter is actually free to run any following code or process any incoming events.
Then, some time later, one of the two fs.readFile() operations finishes its first step (opening the file), an event is inserted into the JS event queue and when the JS interpreter has time, a callback is called. Since opening each file is about the same operation time, it's likely that the open operation for the large.txt file finishes first, but that isn't guaranteed.
After the file open succeeds, it initiates an asynchronous operation to read the first chunk from the file. This again is asynchronous and is handled by libuv so as soon as this is initiated, it returns control back to the JS interpreter.
The second file open likely finises next and it does the same thing as the first, initiates reading the first chunk of data from disk and returns control back to the JS interpreter.
Then, one of these two chunk reads finishes and inserts an event into the event queue and when the JS interpreter is free, a callback is called to process that. At this point, this could be either the large or small file, but for purposes of simplicity of explanation, lets assume the first chunk of the large file finishes first. It will buffer that chunk, see that there is more data to read and will initiate another asynchronous read operation and then return control back to the JS interpreter.
Then, the other first chunk read finishes. It will buffer that chunk and see that there is no more data to read. At this point, it will issue a file close operation which is again handled by libuv and control is returned back to the JS interpreter.
One of the two previous operations completes (a second block read from large.txt or a file close of small.txt) and its callback is called. Since the close operation doesn't have to actually touch the disk (it just goes into the OS), let's assume the close operation finishes first for purposes of explanation. That close triggers the end of the fs.ReadFile() for small.txt and calls the completion callback for that.
So, at this point, small.txt is done and large.txt has read one chunk from its file and is awaiting completion of the second chunk to read.
Your code now executes the for loop that takes whatever time that takes.
By the point that finishes and the JS interpreter is free again, the 2nd file read from large.txt is probably done so the JS interpreter finds it's event in the event queue and executes a callback to do some more processing on reading more chunks from that file.
The process of reading a chunk, returning control back to the interpreter, waiting for the next chunk completion event and then buffering that chunk continues until all the data has been read.
Then a close operation is initiated for large.txt.
When that close operation is done, the callback for the fs.readFile() for large.txt is called and your code that is timing large.txt will measure completion.
So, because the logic of fs.readFile() is implemented in Javascript with a number of discrete asynchronous steps with each one ultimately handled by libuv (open file, read chunk - N times, close file), there will be an interleaving of the work between the two files. The reading of the smaller file will finish first just because it has fewer and smaller read operations. When it finishes, the large file will still have multiple more chunks to read and a close operation left. Because the multiple steps of fs.readFile() are controlled through Javascript, when you do the long for loop in the small.txt completion, you are stalling the fs.readFile() operation for the large.txt file too. Whatever chunk read was in progress when that loop happened will complete in the background, but the next chunk read won't get issued until that small file callback completes.
It appears that there would be an opportunity for node.js to improve the responsiveness of fs.readFile() in competitive circumstances like this if that operation was rewritten entirely in native code so that one native code operation could read the contents of the whole file rather than all these transitions back and forth between the single threaded main JS thread and libuv. If this was the case, the big for loop wouldn't stall the progress of large.txt because it would be progressing entirely in a libuv thread rather than waiting for some cycles from the JS interpreter in order to get to its next step.
We can theorize that if both files were able to be read in one chunk, then not much would get stalled by the long for loop. Both files would get opened (which should take approximately the same time for each). Both operations would initiate a read of their first chunk. The read for the smaller file would likely complete first (less data to read), but actually this depends upon both OS and disk controller logic. Because the actual reads are handed off to threaded code, both reads will be pending at the same time. Assuming the smaller read finishes first, it would fire completion and then during the busy loop the large read would finish, inserting an event in the event queue. When the busy loop finishes, the only thing left to do on the larger file (but still something that can was read in one chunk) would be to close the file which is a faster operation.
But, when the larger file can't be read in one chunk and needs multiple chunks of reading, that's why its progress really gets stalled by the busy loop because a chunk finishes, but the next chunk doesn't get scheduled until the busy loop is done.
Testing
So, let's test out all this theory. I created two files. small.txt is 558 bytes. large.txt is 255,194,500 bytes.
Then, I wrote the following program to time these and allow us to optionally do a 3 second spin loop after the small one finishes.
const fs = require('fs');
let doSpin = false; // -s will set this to true
let fname = "./large.txt";
for (let i = 2; i < process.argv.length; i++) {
let arg = process.argv[i];
console.log(`"${arg}"`);
if (arg.startsWith("-")) {
switch(arg) {
case "-s":
doSpin = true;
break;
default:
console.log(`Unknown arg ${arg}`);
process.exit(1);
break;
}
} else {
fname = arg;
}
}
function padDecimal(num, n = 3) {
let str = num.toFixed(n);
let index = str.indexOf(".");
if (index === -1) {
str += ".";
index = str.length - 1;
}
let zeroesToAdd = n - (str.length - index);
while (zeroesToAdd-- >= 0) {
str += "0";
}
return str;
}
let startTime;
function log(msg) {
if (!startTime) {
startTime = Date.now();
}
let diff = (Date.now() - startTime) / 1000; // in seconds
console.log(padDecimal(diff), ":", msg)
}
function funcA(err, data) {
if (err) {
log("error on large");
log(err);
return;
}
log("large completed");
}
function funcB(err, data) {
if (err) {
log("error on small");
log(err);
return;
}
log("small completed");
if (doSpin) {
spin(3000);
log("spin completed");
}
}
function spin(howLong) {
let finishTime = Date.now() + howLong;
// spin until howLong ms passes
while (Date.now() < finishTime) {}
}
log("start");
fs.readFile(fname, funcA);
log("large initiated");
fs.readFile("./small.txt", funcB);
log("small initiated");
Then (using node v12.13.0), I ran it both with and without the 3 second spin. Without the spin, I get this output:
0.000 : start
0.015 : large initiated
0.016 : small initiated
0.021 : small completed
0.240 : large completed
This shows a 0.219 second delta between the time to complete small and large (while running both at the same time).
Then, inserting the 3 second delay, we get this output:
0.000 : start
0.003 : large initiated
0.004 : small initiated
0.009 : small completed
3.010 : spin completed
3.229 : large completed
We have the exact same 0.219 second delta between the time to complete the small and the large (while running both at the same time). This shows that the large fs.readFile() essentially made no progress during the 3 second spin. It's progress was completely blocked. As we've theorized in the previous explanation, this is apparently because the progression from one chunked read to the next is written in Javascript and while the spin loop is running, that progression to the next chunk is blocked so it can't make any further progress.
How Big A File Makes Big File Finish Second?
If you look in the code for fs.readFile() in the source for node v12.13.0, you can find that the chunk size it reads is 512 * 1024 which is 512k. So, in theory, it's possible that the larger file might finish first if it can be read in one chunk. Whether that actually happens or not depends upon some OS and disk implementation details, but I thought I'd try it on my laptop running a current version of Windows 10 with an SSD drive.
I found that, for a 255k "large" file, it does finish before the small file (essentially in execution order). So, because the large file read is started before the small file read, even though it has more data to read, it will still finish before the small file.
0.000 : start
0.003 : large initiated
0.003 : small initiated
0.007 : large completed
0.008 : small completed
Keep in mind, this is OS and disk dependent so this is not guaranteed.
File I/O in Node.js runs in separate thread. But this does not matter. Node.js always executes all callbacks in the main thread. I/O callbacks are never executed in a separate thread (the file read operation is done in a separate thread then when it is finished will signal the main thread to run your callback). This essentially makes node.js single-threaded because all the code you write runs in the main thread (we're of course ignoring the worker_threads module/API which allows you to manually execute code in separate threads).
But the bytes in the files are read in parallel (or as parallel as your hardware allows - depending on the number of free DMA channels, which disk each file is from etc). What is parallel is the wait. Asynchronous I/O in any language (node.js, Java, C++, Python etc.) is basically an API that allows you to wait in parallel but handle events in a single thread. There is a word for this kind of parallel: concurrent. It is essentially parallel wait (while data is handled in parallel by your hardware) but not parallel code execution.
I think that you understand the behavior of event loop and libuv, don't lose your way.
My answers :
1a) Of course the two read file are executed in two different threads , I tried to run your code replacing a large file with a small one and the output was
It is a large file
It is a small file
1b) The second call just end before in your case and then the callback is invoked before.
2 ) As you said , libuv has four threads by default , but be sure that the default are not changed setting the env variable UV_THREADPOOL_SIZE ( http://docs.libuv.org/en/v1.x/threadpool.html )
I tried to work with a large and a big file , to read the big file my PC take 23/25 ms , to read the small file it take 8/10 ms.
When I try to read both the process terminate in 26/27 ms and this demonstrate that the two read file are executed in parallel .
Try to measure the time that your code take from small file callback to large file callback :
console.log(process.env.UV_THREADPOOL_SIZE)
const fs = require('fs')
const start = Date.now()
let smallFileEnd
fs.readFile("./alphabet.txt", "utf8", (err, data) => {
console.log('It is a large file')
console.log(`From the start to now are passed ${Date.now() - start} ms`)
console.log(`From the small file end to now are passed ${Date.now() - smallFileEnd} ms`)
//this file has many words (11X MB).
//It takes 1-2 seconds to finish reading (usually 1)
// 18ms to execute
});
fs.readFile("./stackoverflow.js","utf8", (err, data) => {
for(let i=0; i<99999 ;i++)
if(i === 99998){
smallFileEnd = Date.now()
console.log('is a small file ')
console.log(`From the start to now are passed ${Date.now() - start} ms`)
// 4/7 ms to execute
}
});

Can two callbacks execute the code at the same time(Parrallely) or not?

I am doing an IO wait operation inside a for loop now the thing is when all of the operations terminates I want to send the response to the server. Now I was just wondering that suppose two IO operation terminates exactly at the same time now can they execute code at the same time(parallel) or will they execute serially?
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
node.js runs Javascript with a single thread. That means that two pieces of Javascript can never be running at the exact same moment.
node.js processes I/O completion using an event queue. That means when an I/O operation completes, it places an event in the event queue and when that event gets to the front of the event queue and the JS interpreter has finished whatever else it was doing, then it will pull that event from the event queue and call the callback associated with it.
Because of this process, even if two I/O operations finish at basically the same moment, one of them will put its completion event into the event queue before the other (access to the event queue internally is likely controlled by a mutex so one will get the mutex before the other) and that one's completion callback will get into the event queue first and then called before the other. The two completion callbacks will not run at the exact same time.
Keep in mind that more than one piece of Javascript can be "in flight" or "in process" at the same time if it contains non-blocking I/O operations or other asynchronous operations. This is because when you "wait" for an asynchronous operation to complete in Javscript, you return control back to the system and you then resume processing only when your completion callback is called. While the JS interpreter is waiting for an asynchronous I/O operation to complete and the associated callback to be called, then other Javascript can run. But, there's still only one piece of Javascript actually ever running at a time.
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
Yes, that's correct. That's not exactly how I'd describe it since "concurrent" and "parallel" don't have strict technical definitions, but based on what I think you mean by them, that is correct.
you can use Promise.all :
let promises = [];
for(...)
{
promises.push(somePromise); // somePromise represents your IO operation
}
Promise.all(promises).then((results) => { // here you send the response }
You don't have to worry about the execution order.
Node.js is designed to be single thread. So basically there is no way that 'two IO operation terminates exactly at the same time' could happen. They will just finish one by one.

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

Why does Node.js have both async & sync version of fs methods?

In Node.js, I can do almost any async operation one of two ways:
var file = fs.readFileSync('file.html')
or...
var file
fs.readFile('file.html', function (err, data) {
if (err) throw err
console.log(data)
})
Is the only benefit of the async one custom error handling? Or is there really a reason to have the file read operation non-blocking?
These exist mostly because node itself needs them to load your program's modules from disk when your program starts. More broadly, it is typical to do a bunch a synchronous setup IO when a service is initially started but prior to accepting network connections. Once the program is ready to go (has it's TLS cert loaded, config file has been read, etc), then a network socket is bound and at that point everything is async from then on.
Asynchronous calls allow for the branching of execution chains and the passing of results through that execution chain. This has many advantages.
For one, the program can execute two or more calls at the same time, and do work on the results as they complete, not necessarily in the order they were first called.
For example if you have a program waiting on two events:
var file1;
var file2;
//Let's say this takes 2 seconds
fs.readFile('bigfile1.jpg', function (err, data) {
if (err) throw err;
file1 = data;
console.log("FILE1 Done");
});
//let's say this takes 1 second.
fs.readFile('bigfile2.jpg', function (err, data) {
if (err) throw err;
file2 = data;
console.log("FILE2 Done");
});
console.log("DO SOMETHING ELSE");
In the case above, bigfile2.jpg will return first and something will be logged after only 1 second. So your output timeline might be something like:
#0:00: DO SOMETHING ELSE
#1:00: FILE2 Done
#2:00: FILE1 Done
Notice above that the log to "DO SOMETHING ELSE" was logged right away. And File2 executed first after only 1 second... and at 2 seconds File1 is done. Everything was done within a total of 2 seconds though the callBack order was unpredictable.
Whereas doing it synchronously it would look like:
file1 = fs.readFileSync('bigfile1.jpg');
console.log("FILE1 Done");
file2 = fs.readFileSync('bigfile2.jpg');
console.log("FILE2 Done");
console.log("DO SOMETHING ELSE");
And the output might look like:
#2:00: FILE1 Done
#3:00: FILE2 Done
#3:00 DO SOMETHING ELSE
Notice it takes a total of 3 seconds to execute, but the order is how you called it.
Doing it synchronously typically takes longer for everything to finish (especially for external processes like filesystem reads, writes or database requests) because you are waiting for one thing to complete before moving onto the next. Sometimes you want this, but usually you don't. It can be easier to program synchronously sometimes though, since you can do things reliably in a particular order (usually).
Executing filesystem methods asynchronously however, your application can continue executing other non-filesystem related tasks without waiting for the filesystem processes to complete. So in general you can continue executing other work while the system waits for asynchronous operations to complete. This is why you find database queries, filesystem and communication requests to generally be handled using asynchronous methods (usually). They basically allow other work to be done while waiting for other (I/O and off-system) operations to complete.
When you get into more advanced asynchronous method chaining you can do some really powerful things like creating scopes (using closures and the like) with a little amount of code and also create responders to certain event loops.
Sorry for the long answer. There are many reasons why you have the option to do things synchronously or not, but hopefully this will help you decide whether either method is best for you.
The benefit of the asynchronous version is that you can do other stuff while you wait for the IO to complete.
fs.readFile('file.html', function (err, data) {
if (err) throw err;
console.log(data);
});
// Do a bunch more stuff here.
// All this code will execute before the callback to readFile,
// but the IO will be happening concurrently. :)
You want to use the async version when you are writing event-driven code where responding to requests quickly is paramount. The canonical example for Node is writing a web server. Let's say you have a user making a request which is such that the server has to perform a bunch of IO. If this IO is performed synchronously, the server will block. It will not answer any other requests until it has finished serving this request. From the perspective of the users, performance will seem terrible. So in a case like this, you want to use the asynchronous versions of the calls so that Node can continue processing requests.
The sync version of the IO calls is there because Node is not used only for writing event-driven code. For instance, if you are writing a utility which reads a file, modifies it, and writes it back to disk as part of a batch operation, like a command line tool, using the synchronous version of the IO operations can make your code easier to follow.

How does the stack work on NodeJs?

I've recently started reading up on NodeJS (and JS) and am a little confused about how call backs are working in NodeJs. Assume we do something like this:
setTimeout(function(){
console.log("World");
},1000);
console.log("Hello");
output: "Hello World"
So far on what I've read JS is single threaded so the event loop is going though one big stack, and I've also been told not to put big calls in the callback functions.
1)
Ok, so my first question is assuming its one stack does the callback function get run by the main event loop thread? if so then if we have a site with that serves up content via callbacks (fetches from db and pushes request), and we have 1000 concurrent, are those 1000 users basically being served synchronously, where the main thread goes in each callback function does the computation and then continues the main event loop? If this is the case, how exactly is this concurrent?
2) How are the callback functions added to the stack, so lets say my code was like the following:
setTimeout(function(){
console.log("Hello");
},1000);
setTimeout(function(){
console.log("World");
},2000);
then does the callback function get added to the stack before the timeout even has occured? if so is there a flag that's set that notifies the main thread the callback function is ready (or is there another mechanism). If this is infact what is happening doesn't it just bloat the stack, especially for large web applications with callback functions, and the larger the stack the longer everything will take to run since the thread has to step through the entire thing.
The event loop is not a stack. It could be better thought of as a queue (first in/first out). The most common usage is for performing I/O operations.
Imagine you want to read a file from disk. You start by saying hey, I want to read this file, and when you're done reading, call this callback. While the actual I/O operation is being performed in a separate process, the Node application is free to keep doing something else. When the file has finished being read, it adds an item to the event loop's queue indicating the completion. The node app might still be busy doing something else, or there may be other items waiting to be dequeued first, but eventually our completion notification will be dequeued by node's event loop. It will match this up with our callback, and then the callback will be called as expected. When the callback has returned, the event loop dequeues the next item and continues. When there is nothing left in node's event loop, the application exits.
That's a rough approximation of how the event loop works, not an exact technical description.
For the specific setTimeout case, think of it like a priority queue. Node won't consider dequeuing the item/job/callback until at least that amount of time has passed.
There are lots of great write-ups on the Node/JavaScript event loop which you probably want to read up on if you're confused.
callback functions are not added to caller stack. There is no recursion here. They are called from event loop. Try to replace console.log in your example and watch result - stack is not growing.

Resources