async nodejs execution order - node.js

When does processItem start executing. Does it start as soon as some items are pushed onto the queue? Or must the for loop finish before the first item on the queue starts executing?
var processItem = function (item, callback) {
console.log(item)
callback();
}
var myQueue = async.queue(processItem, 2)
for (index = 0; index < 1000; ++index) {
myQueue.push(index)
}

The simple answer is that all of your tasks will be added to the queue, and then executed in a random (undefined) order.
background information, via https://github.com/caolan/async#queueworker-concurrency
queue(worker, concurrency)
Creates a queue object with the specified concurrency. Tasks added to the queue are processed in parallel (up to the concurrency limit). If all workers are in progress, the task is queued until one becomes available. Once a worker completes a task, that task's callback is called.
In other words, the order is undefined. If you require the tasks to be executed in a particular order, you should be using a different asynchronous primitive, such as series(tasks, [callback])
https://github.com/caolan/async#seriestasks-callback

The current call stack must resolve before any asynchronous tasks will run. This means that the current function must run to completion (plus any functions that called that function, etc.) before any asynchronous operations will run. To answer your question directly: all items will be fully queued before the first one runs.
It might be helpful for you to read more about the JavaScript event loop: asynchronous jobs sit in the event queue, where they are handled one by one. Each job in the event queue causes the creation of the call stack (where each item in the stack is function call -- the first function is on the bottom, a function called within that first function is next-to-bottom, etc.). When the stack is fully cleared, the event loop processes the next event.

Related

fs.readFile operation inside the callback of parent fs.readFile does not execute asynchronously

I have read that fs.readFile is an asynchronous operation and the reading takes place in a separate thread other than the main thread and hence the main thread execution is not blocked. So I wanted to try out something like below
// reads take almost 12-15ms
fs.readFile('./file.txt', (err, data) => {
console.log('FIRST READ', Date.now() - start)
const now = Date.now()
fs.readFile('./file.txt', (err, data) => {
// logs how much time it took from the beginning
console.log('NESTED READ CALLBACK', Date.now() - start)
})
// blocks untill 20ms more, untill the above read operation is done
// so that the above read operation is done and another callback is queued in the poll phase
while (Date.now() - now < 20) {}
console.log('AFTER BLOCKING', Date.now() - start)
})
I am making another fs.readFile call inside the callback of parent fs.readFile call. What I am expecting is that the log NESTED READ CALLBACK must arrive immediately after AFTER BLOCKING since, the reading must be completed asynchronously in a separate thread
Turns out the log NESTED READ CALLBACK comes 15ms after the AFTER BLOCKING call, indicating that while I was blocking in the while loop, the async read operation somehow never took place. By the way the while loop is there to model some task that takes 20ms to complete
What exactly is happening here? Or am I missing some information here? Btw I am using windows
During your while() loop, no events in Javascript will be processed and none of your Javascript code will run (because you are blocking the event loop from processing by just looping).
The disk operations can make some progress (as they do some work in a system thread), but their results will not be processed until your while loop is done. But, because the fs.readFile() actually consists of three or more operations, fs.open(), fs.read() and fs.close(), it probably won't get very far while the event loop is blocked because it needs events to be processed to advance through the different stages of its work.
Turns out the log NESTED READ CALLBACK comes 15ms after the AFTER BLOCKING call, indicating that while I was blocking in the while loop, the async read operation somehow never took place. By the way the while loop is there to model some task that takes 20ms to complete
What exactly is happening here?
fs.readFile() is not a single monolithic operation. Instead, it consists of fs.open(), fs.read() and fs.close() and the sequencing of those is run in user-land Javascript in the main thread. So, while you are blocking the main thread with your while() loop, the fs.readFile() can't make a lot of progress. Probably what happens is you initiate the second fs.readFile() operation and that kicks off the fs.open() operation. That gets sent off to an OS thread in the libuv thread pool. Then, you block the event loop with your while() loop. While that loop is blocking the event loop, the fs.open() completes and (internal to the libuv event loop) an event is placed into the event queue to call the completion callback for the fs.open() call. But, the event loop is blocked by your loop so that callback can't get called immediately. Thus, any further progress on completing the fs.readFile() operation is blocked until the event loop frees up and can process waiting events again.
When your while() loop finishes, then control returns to the event loop and the completion callback for the fs.open() call gets called while will then initiate the reading of the actual data from the file.
FYI, you can actually inspect the code yourself for fs.readFile() right here in the Github repository. If you follow its flow, you will see that, from its own Javascript, it first calls binding.open() which is a native code operation and then, when that completes and Javascript is able to process the completion event through the event queue, it will then run the fucntion readFileAfterOpen(...) which will call bind.fstat() (another native code operation) and then, when that completes and Javascript is able to process the completion event
through the event queue, it will then call `readFileAfterStat(...) which will allocate a buffer and initiate the read operation.
Here, the code gets momentarily harder to follow as flow skips over to a read_file_context object here where eventually it calls read() and again when that completes and Javascript is able to process the completion event via the event loop, it can advance the process some more, eventually reading all the bytes from the file into a buffer, closing the file and then calling the final callback.
The point of all this detail is to illustrate how fs.readFile() is written itself in Javascript and consists of multiple steps (some of which call code which will use some native code in a different thread), but can only advance from one step to the next when the event loop is able to process new events. So, if you are blocking the event loop with a while loop, then fs.readFile() will get stuck between steps and not be able to advance. It will only be able to advance and eventually finish when the event loop is able to process events.
An analogy
Here's a simple analogy. You ask your brother to do you a favor and go to three stores for you to pick up things for dinner. You give him the list and destination for the first store and then you ask him to call you on your cell phone after he's done at the first store and you'll give him the second destination and the list for that store. He heads off to the first store.
Meanwhile you call your girlfriend on your cell phone and start having a long conversation with her. Your brother finishes at the first store and calls you, but you're ignoring his call because you're still talking to your girlfriend. Your brother gets stuck on his mission of running errands because he needs to talk to you to find out what the next step is.
In this analogy, the cell phone is kind of like the event loop's ability to process the next event. If you are blocking his ability to call you on the phone, then he can't advance to his next step (the event loop is blocked). His visits to the three stores are like the individual steps involved in carrying out the fs.readfile() operation.

Node.js Event-Loop. Why callbacks from the check queue execute before those from the poll queue while Node.js DOCs state vice versa?

According to the Node.js DOCs,
when the event-loop enters its poll phase and the poll queue is not empty,
the callbacks in the poll queue should get executed before the event-loop
proceeds further to its check phase.
In reality, however, the opposite happens, that is if neither the poll, nor the
check (setImmediate) queue is empty by the time the event-loop is entering
the poll phase, the callbacks from the check (setImmediate) queue always execute
before the callbacks from the poll queue.
Why is that? What am I missing in the Node.js DOCs?
Here follow the sample piece of code and the quotation from the Node.js DOCs.
The quote from the Node.js DOCs:
The poll phase has two main functions:
1. Calculating how long it should block and poll for I/O, then
2. Processing events in the poll queue.
When the event loop enters the poll phase and there are no timers scheduled,
one of two things will happen:
(a) - If the poll queue is not empty, the event loop will iterate
through its queue of callbacks executing them synchronously
until either the queue has been exhausted,
or the system-dependent hard limit is reached.
(b) - If the poll queue is empty, one of two more things will happen:
- If scripts have been scheduled by setImmediate(),
the event loop will end the poll phase and continue to the check phase
to execute those scheduled scripts.
- If scripts have not been scheduled by setImmediate(),
the event loop will wait for callbacks to be added to the queue,
then execute them immediately.
Once the poll queue is empty the event loop will check for timers
whose time thresholds have been reached. If one or more timers are ready,
the event loop will wrap back to the timers phase
to execute those timers' callbacks.
The sample code:
const fs = require(`fs`);
console.log(`START`);
const readFileCallback = () => {
console.log(`readFileCallback`);
};
fs.readFile(__filename, readFileCallback);
const setImmediateCallback = () => {
console.log(`setImmediateCallback`);
};
setImmediate(setImmediateCallback);
// Now follows the loop long enough to give the fs.readFile enough time
// to finish its job and to place its callback (the readFileCallback)
// into the event-loop's poll phase queue before the "main" synchronous part
// of the this code finishes.
for (let i = 1; i <= 10000000000; i++) {}
console.log(`END`);
// So when the event-loop starts its first tick there should be two callbacks
// waiting:
// (1) the readFileCallback (in the poll queue)
// (2) the setImmediateCallback (in the check queue)
// Now according to the Node.js DOCs, of these two waiting callbacks
// the readFileCallback should execute first, but the opposite
// is actually happening, that is setImmediateCallback executes first.

When does the callback function passed to process.nextTick() get processed?

The Node.js official documentation states that:
the nextTickQueue will be processed after the current operation is
completed, regardless of the current phase of the event loop
and defines operation as :
transition from the underlying C/C++ handler, and handling the
JavaScript that needs to be executed.
Still not clear on what operation means. Can someone explain this to me please? Examples would be appreciated.
I think that the documentation from that point after is quite explicative:
Looking back at our diagram, any time you call process.nextTick() in a given phase, all callbacks passed to process.nextTick() will be resolved before the event loop continues. This can create some bad situations because it allows you to "starve" your I/O by making recursive process.nextTick() calls, which prevents the event loop from reaching the poll phase.
The code passed to nextThick will be executed as soon as the current phase ends, hence nextThick allows you to append code at the end of the current phase rather than at the end of the loop (as a timer would)
The key answer is here:
By using process.nextTick() we guarantee that apiCall() always runs its callback after the rest of the user's code and before the event loop is allowed to proceed. To achieve this, the JS call stack is allowed to unwind then immediately execute the provided callback
So next tick appends code to a list that will be executed at the end of the current operation. Being the current operation intended as the current stack returning.
So in pool phase for example the phase has not to continue to execute nextThick but rather complete the entire current call stack (within which nextThick was called) then execute the nextThick list and then go on with the next pool operation, go to the next phase or go idle.
This example in the documentation (along with the explanation there is clear):
let bar;
function someAsyncApiCall(callback) {
process.nextTick(callback);
}
someAsyncApiCall(() => {
console.log('bar', bar); // 1
});
bar = 1;
// << callback will execute here with bar being assigned

How to count total ticks of event loop (NodeJs)?

I'm learning nodejs event loop. I'm curious if we can count total ticks event loop took throughout the application runtime?
setTimeout(() => {
console.log("Hey there!")
}, 1000);
As per my understanding, event loop will tick total 2 times before the application ends. How can I make node tell me exact number of ticks?
The event loop is a system for scheduling tasks and async operations, it uses Queues and Stack under the hood to process function calls.
Whenever you get a result of a function, this means the function was put in the Queue and the pushed in to the stack.
This way we have non-blocking operation. What you are trying to do does not make any sense to me, every time you get a log from a function it means this function was processed both by the queue and the stack.

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

Resources