Understanding Call backs - node.js

I create a small script to understand callback better.
From the below script, the behavior I expected was: "http.get runs and takes on average 200 ms. The for loop "i" increment takes on average 2500 ms. At 200 ms, the process should exit and the script should have stopped to work. Why is it printing all of i? If I understand this better, I think I understand callback.
var http = require("http");
var starttime = new Date();
//Function with Callback
for (var j =0; j<10; j++){
http.get({host : 'nba.com'}, function(res){
console.log("Time Taken = ", new Date() - starttime, 'ms');
process.exit();
}).on('error', function(er){
console.log('Got Error :', er.message);
})
}
//Loop that exceeds callback trigger time
for(var i=1; i<10000; i++){
console.log(i);
}
console.log("Time Taken = ", new Date() - starttime, 'ms');

Javascript in node.js is single threaded and I/O is event driven using an event queue. Thus your async callbacks that signal the completion of the http requests cannot run until your original thread of Javascript finishes and returns control back to the system where it can then pull the next event from the event queue to service the completion of the http request.
As such, your for loop will run to completion before any http responses can be processed.
Here's the step by step process:
Your first for loop runs and sends 10 http requests.
These http requests run in the background using async networking. When one of them completes and has a response, the http module will put an event in the Javascript event queue and it will be the JS interpreter's job to pull that event from the event queue when it is finished with its other activities.
Your second for loop runs to completion and all the i values are output to the console.
Your script finishes.
The JS interpreter then checks the event queue to see if there are any pending events. In this case, there will be some http response events. The JS interpreter pulls the oldest event from the event queue and calls the callback associated with that.
When that callback finishes, the next event is pulled from the event queue and the process continues until the event queue is empty.
If any of your callbacks call process.exit(), then this short circuits the remaining callbacks and exits the process immediately.
While this other answer was written for the browser, the event-driven, single threaded concept is the same as it is in node.js so this other answer may explain some more things for you: How does JavaScript handle AJAX responses in the background?

Related

Node.js Event-Loop. Why callbacks from the check queue execute before those from the poll queue while Node.js DOCs state vice versa?

According to the Node.js DOCs,
when the event-loop enters its poll phase and the poll queue is not empty,
the callbacks in the poll queue should get executed before the event-loop
proceeds further to its check phase.
In reality, however, the opposite happens, that is if neither the poll, nor the
check (setImmediate) queue is empty by the time the event-loop is entering
the poll phase, the callbacks from the check (setImmediate) queue always execute
before the callbacks from the poll queue.
Why is that? What am I missing in the Node.js DOCs?
Here follow the sample piece of code and the quotation from the Node.js DOCs.
The quote from the Node.js DOCs:
The poll phase has two main functions:
1. Calculating how long it should block and poll for I/O, then
2. Processing events in the poll queue.
When the event loop enters the poll phase and there are no timers scheduled,
one of two things will happen:
(a) - If the poll queue is not empty, the event loop will iterate
through its queue of callbacks executing them synchronously
until either the queue has been exhausted,
or the system-dependent hard limit is reached.
(b) - If the poll queue is empty, one of two more things will happen:
- If scripts have been scheduled by setImmediate(),
the event loop will end the poll phase and continue to the check phase
to execute those scheduled scripts.
- If scripts have not been scheduled by setImmediate(),
the event loop will wait for callbacks to be added to the queue,
then execute them immediately.
Once the poll queue is empty the event loop will check for timers
whose time thresholds have been reached. If one or more timers are ready,
the event loop will wrap back to the timers phase
to execute those timers' callbacks.
The sample code:
const fs = require(`fs`);
console.log(`START`);
const readFileCallback = () => {
console.log(`readFileCallback`);
};
fs.readFile(__filename, readFileCallback);
const setImmediateCallback = () => {
console.log(`setImmediateCallback`);
};
setImmediate(setImmediateCallback);
// Now follows the loop long enough to give the fs.readFile enough time
// to finish its job and to place its callback (the readFileCallback)
// into the event-loop's poll phase queue before the "main" synchronous part
// of the this code finishes.
for (let i = 1; i <= 10000000000; i++) {}
console.log(`END`);
// So when the event-loop starts its first tick there should be two callbacks
// waiting:
// (1) the readFileCallback (in the poll queue)
// (2) the setImmediateCallback (in the check queue)
// Now according to the Node.js DOCs, of these two waiting callbacks
// the readFileCallback should execute first, but the opposite
// is actually happening, that is setImmediateCallback executes first.

How does nodejs works asynchronously?

I get the response from /route1 until /route2 logs "ROUTE2".
But I studied that nodejs puts functions like setTimeout() in external threads and the main thread continues to work. Shouldn't the for loop executed in an external thread?
app.get("/route1",(req,res)=>
{
res.send("Route1");
});
app.get("/route2",(req,res)=>
{
setTimeout(()=>
{
console.log("ROUTE2");
for(let i=0;i<1000000000000000;i++);
res.send("Route2");
},10000);
})
Node.js uses an Event Loop to handle Asynchronous operations. And by Asynchronous operations I mean all I/O operations like interaction with the system's disk or network, Callback functions and Timers(setTimeout, setInterval). If you put Synchronous operations like a long running for loop inside a Timer or a Callback, it will block the event loop from executing the next Callback or I/O Operation.
In above case both /route1 and /route2 are Callbacks, so when you hit them, each callback is put into the Event Loop.
Below scenarios will give clear understanding on Asynchronous nature of Node.js:
Hit /route2 first and within 10 Seconds(Since setTimeout is for 10000 ms) hit /route1
In this case you will see the output Route 1 because setTimeout for 10 secs still isn't complete. So everything is asynchronous.
Hit /route2 first and after 10 Seconds(Since setTimeout is for 10000 ms) hit /route1
In this case you will have to wait till the for loop execution is complete
for(let i=0;i<1000000000000000;i++);
because a for loop is still a Synchronous operation so /route1 will have to wait till /route2 ends
Refer below Node.js Guides for more details:
Blocking vs Non-Blocking
The Node.js Event Loop

how node handles computation and event loop

for(var i = 0; i < 100000; i++){
setTimeout(function(){
console.log("Inside")
}, 0);
console.log("Outside")
};
It gives the output:
Outside * 100000
Inside * 100000
why this output?
CPU bound activities are handled in-line, and when all CPU bound activities are over then it processes the event que?
setTimeout() is non-blocking and it calls it's callback asynchronously. That means that setTimeout() runs, schedules a timer for the future and then immediately returns. Then, sometime later, when the timer fires, it inserts an event in the event queue. When the interpreter is done executing other Javascript, it fetches the next event out of the event queue and will then run the timer callback.
So, here's what happens in your code:
Run the body of the for loop 100,000 times and each time through the loop, schedule a timer and then output console.log("Outside").
Then, when the for loop is done and control returns back to the system, check the event queue to see if there is an event.
Because all the timers were set with a timer of 0, there are probably a lot of timer events in the queue. Pull the first event out and runs it's callback which will output console.log("Inside").
Repeat until there are no more events in the event queue (which will take 100,000 iterations to process all the timers you previously set).
There are two keys here:
setTimeout() is "non-blocking". It schedules the timer and immediately returns.
When the timer fires, it inserts an event in the event queue to run the timer callback and that callback will ONLY get called when the JS interpreter is done doing what else it was doing and can then fetch the next event from the event queue.
This means that all synchronous code finishes first before any asynchronous callbacks can be fetched from the event queue and run.

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

How to have heavy processing operations done in node.js

I have a heavy data processing operation that I need to get done per 10-12 simulatenous request. I have read that for higher level of concurrency Node.js is a good platform and it achieves it by having an non blocking event loop.
What I know is that for having things like querying a database, I can spawn off an event to a separate process (like mongod, mysqld) and then have a callback which will handle the result from that process. Fair enough.
But what if I want to have a heavy piece of computation to be done within a callback. Won't it block other request until the code in that callback is executed completely. For example I want to process an high resolution image and code I have is in Javascript itself (no separate process to do image processing).
The way I think of implementing is like
get_image_from_db(image_id, callback(imageBitMap) {
heavy_operation(imageBitMap); // Can take 5 seconds.
});
Will that heavy_operation stop node from taking in any request for those 5 seconds. Or am I thinking the wrong way to do such task. Please guide, I am JS newbie.
UPDATE
Or can it be like I could process partial image and make the event loop go back to take in other callbacks and return to processing that partial image. (something like prioritising events).
Yes it will block it, as the callback functions are executed in the main loop. It is only the asynchronously called functions which do not block the loop. It is my understanding that if you want the image processing to execute asynchronously, you will have to use a separate processes to do it.
Note that you can write your own asynchronous process to handle it. To start you could read the answers to How to write asynchronous functions for Node.js.
UPDATE
how do i create a non-blocking asynchronous function in node.js? may also be worth reading. This question is actually referenced in the other one I linked, but I thought I'd include it here to for simplicity.
Unfortunately, I don't yet have enough reputation points to comment on Nick's answer, but have you looked into Node's cluster API? It's currently still experimental, but it would allow you to spawn multiple threads.
When a heavy piece of computation is done in the callback, the event loop would be blocked until the computation is done. That means the callback will block the event loop for the 5 seconds.
My solution
It's possible to use a generator function to yield back control to the event loop. I will use a while loop that will run for 3 seconds to act as a long running callback.
Without a Generator function
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log('blocked')
}
}
loop();
The output would be:
// blocked
// blocked
//
// ... would not return to the event loop while the loop is running
//
// blocked
//...when the loop is over then the setInterval kicks in
// resumed
// resumed
With a Generator function
let gen;
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function *loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log(yield output())
}
}
function output() {
setTimeout(() => gen.next('blocked'), 500)
}
gen = loop();
gen.next();
The output is:
// resumed
// blocked
//...returns control back to the event loop while though the loop is still running
// resumed
// blocked
//...end of the loop
// resumed
// resumed
// resumed
Using javascript generators can help run heavy computational functions that would yield back control to the event loop while it's still computing.
To know more about the event loop visit
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*
https://davidwalsh.name/es6-generators

Resources