How to know when the Promise is actually resolved in Node.js? - node.js

When we are using Promise in nodejs, given a Promise p, we can't know when the Promise p is actually resolved by logging the currentTime in the "then" callback.
To prove that, I wrote the test code below (using CoffeeScript):
# the procedure we want to profile
getData = (arg) ->
new Promise (resolve) ->
setTimeout ->
resolve(arg + 1)
, 100
# the main procedure
main = () ->
beginTime = new Date()
console.log beginTime.toISOString()
getData(1).then (r) ->
resolveTime = new Date()
console.log resolveTime.toISOString()
console.log resolveTime - beginTime
cnt = 10**9
--cnt while cnt > 0
return cnt
main()
When you run the above code, you will notice that the resolveTime (the time your code run into the callback function) is much later than 100ms from the beginTime.
So If we want to know when the Promise is actually resolved, HOW?
I want to know the exact time because I'm doing some profiling via logging. And I'm not able to modify the Promise p 's implementation when I'm doing some profiling outside of the black box.
So, Is there some function like promise.onUnderlyingConditionFulfilled(callback) , or any other way to make this possible?

This is because you have a busy loop that apparently takes longer than your timer:
cnt = 10**9
--cnt while cnt > 0
Javascript in node.js is single threaded and event driven. It can only do one thing at a time and it will finish the current thing it's doing before it can service the event posted by setTimeout(). So, if your busy loop (or any other long running piece of Javascript code) takes longer than you've set your timer for, the timer will not be able to run until this other Javascript code is done. "single threaded" means Javascript in node.js only does one thing at a time and it waits until one thing returns control back to the system before it can service the next event waiting to run.
So, here's the sequence of events in your code:
It calls the setTimeout() to schedule the timer callback for 100ms from now.
Then you go into your busy loop.
While it's in the busy loop, the setTimeout() timer fires inside of the JS implementation and it inserts an event into the Javascript event queue. That event can't run at the moment because the JS interpreter is still running the busy loop.
Then eventually it finishes the busy loop and returns control back to the system.
When that is done, the JS interpreter then checks the event queue to see if any other events need servicing. It finds the timer event and so it processes that and the setTimeout() callback is called.
That callback resolves the promise which triggers the .then() handler to get called.
Note: Because of Javascript's single threaded-ness and event-driven nature, timers in Javascript are not guaranteed to be called exactly when you schedule them. They will execute as close to that as possible, but if other code is running at the time they fire or if their are lots of items in the event queue ahead of you, that code has to finish before the timer callback actually gets to execute.
So If we want to know when the Promise is actually resolved, HOW?
The promise is resolved when your busy loop is done. It's not resolved at exactly the 100ms point (because your busy loop apparently takes longer than 100ms to run). If you wanted to know exactly when the promise was resolved, you would just log inside the setTimeout() callback right where you call resolve(). That would tell you exactly when the promise was resolved though it will be pretty much the same as where you're logging now. The cause of your delay is the busy loop.
Per your comments, it appears that you want to somehow measure exactly when resolve() is actually called in the Promise, but you say that you cannot modify the code in getData(). You can't really do that directly. As you already know, you can measure when the .then() handler gets called which will probably be no more than a couple ms after resolve() gets called.
You could replace the promise infrastructure with your own implementation and then you could instrument the resolve() callback directly, but replacing or hooking the promise implementation probably influences the timing of things even more than just measuring from the .then() handler.
So, it appears to me that you've just over-constrained the problem. You want to measure when something inside of some code happens, but you don't allow any instrumentation inside that code. That just leaves you with two choices:
Replace the promise implementation so you can instrument resolve() directly.
Measure when .then() is triggered.
The first option probably has a heisenberg uncertainty issue in that you've probably influenced the timing by more than you should if you replace or hook the promise implementation.
The second option measures an event that happens just slightly after the actual .resolve(). Pick which one sounds closest to what you actually want.

Related

fs.readFile operation inside the callback of parent fs.readFile does not execute asynchronously

I have read that fs.readFile is an asynchronous operation and the reading takes place in a separate thread other than the main thread and hence the main thread execution is not blocked. So I wanted to try out something like below
// reads take almost 12-15ms
fs.readFile('./file.txt', (err, data) => {
console.log('FIRST READ', Date.now() - start)
const now = Date.now()
fs.readFile('./file.txt', (err, data) => {
// logs how much time it took from the beginning
console.log('NESTED READ CALLBACK', Date.now() - start)
})
// blocks untill 20ms more, untill the above read operation is done
// so that the above read operation is done and another callback is queued in the poll phase
while (Date.now() - now < 20) {}
console.log('AFTER BLOCKING', Date.now() - start)
})
I am making another fs.readFile call inside the callback of parent fs.readFile call. What I am expecting is that the log NESTED READ CALLBACK must arrive immediately after AFTER BLOCKING since, the reading must be completed asynchronously in a separate thread
Turns out the log NESTED READ CALLBACK comes 15ms after the AFTER BLOCKING call, indicating that while I was blocking in the while loop, the async read operation somehow never took place. By the way the while loop is there to model some task that takes 20ms to complete
What exactly is happening here? Or am I missing some information here? Btw I am using windows
During your while() loop, no events in Javascript will be processed and none of your Javascript code will run (because you are blocking the event loop from processing by just looping).
The disk operations can make some progress (as they do some work in a system thread), but their results will not be processed until your while loop is done. But, because the fs.readFile() actually consists of three or more operations, fs.open(), fs.read() and fs.close(), it probably won't get very far while the event loop is blocked because it needs events to be processed to advance through the different stages of its work.
Turns out the log NESTED READ CALLBACK comes 15ms after the AFTER BLOCKING call, indicating that while I was blocking in the while loop, the async read operation somehow never took place. By the way the while loop is there to model some task that takes 20ms to complete
What exactly is happening here?
fs.readFile() is not a single monolithic operation. Instead, it consists of fs.open(), fs.read() and fs.close() and the sequencing of those is run in user-land Javascript in the main thread. So, while you are blocking the main thread with your while() loop, the fs.readFile() can't make a lot of progress. Probably what happens is you initiate the second fs.readFile() operation and that kicks off the fs.open() operation. That gets sent off to an OS thread in the libuv thread pool. Then, you block the event loop with your while() loop. While that loop is blocking the event loop, the fs.open() completes and (internal to the libuv event loop) an event is placed into the event queue to call the completion callback for the fs.open() call. But, the event loop is blocked by your loop so that callback can't get called immediately. Thus, any further progress on completing the fs.readFile() operation is blocked until the event loop frees up and can process waiting events again.
When your while() loop finishes, then control returns to the event loop and the completion callback for the fs.open() call gets called while will then initiate the reading of the actual data from the file.
FYI, you can actually inspect the code yourself for fs.readFile() right here in the Github repository. If you follow its flow, you will see that, from its own Javascript, it first calls binding.open() which is a native code operation and then, when that completes and Javascript is able to process the completion event through the event queue, it will then run the fucntion readFileAfterOpen(...) which will call bind.fstat() (another native code operation) and then, when that completes and Javascript is able to process the completion event
through the event queue, it will then call `readFileAfterStat(...) which will allocate a buffer and initiate the read operation.
Here, the code gets momentarily harder to follow as flow skips over to a read_file_context object here where eventually it calls read() and again when that completes and Javascript is able to process the completion event via the event loop, it can advance the process some more, eventually reading all the bytes from the file into a buffer, closing the file and then calling the final callback.
The point of all this detail is to illustrate how fs.readFile() is written itself in Javascript and consists of multiple steps (some of which call code which will use some native code in a different thread), but can only advance from one step to the next when the event loop is able to process new events. So, if you are blocking the event loop with a while loop, then fs.readFile() will get stuck between steps and not be able to advance. It will only be able to advance and eventually finish when the event loop is able to process events.
An analogy
Here's a simple analogy. You ask your brother to do you a favor and go to three stores for you to pick up things for dinner. You give him the list and destination for the first store and then you ask him to call you on your cell phone after he's done at the first store and you'll give him the second destination and the list for that store. He heads off to the first store.
Meanwhile you call your girlfriend on your cell phone and start having a long conversation with her. Your brother finishes at the first store and calls you, but you're ignoring his call because you're still talking to your girlfriend. Your brother gets stuck on his mission of running errands because he needs to talk to you to find out what the next step is.
In this analogy, the cell phone is kind of like the event loop's ability to process the next event. If you are blocking his ability to call you on the phone, then he can't advance to his next step (the event loop is blocked). His visits to the three stores are like the individual steps involved in carrying out the fs.readfile() operation.

Why using sync functions in nodejs is known as "blocking the event loop"?

If I understand correctly, the EventLoop is the mechanism that node uses to resolve asynchronous operations and then passing them to the call stack, right? My question is, when I use a synchronous method (for example pbkdf2Sync) it will get stuck in the call stack until it is finished but it won't be moved to the EventLoop because is not an async operation, so why is this known as blocking the event loop if in reality it is blocking everything? not JUST the event loop (that as far as I understand, can continue working and will pass the callbacks to the call stack when they're finished)
Is my understanding of the NodeJs inner workins completly wrong? This topic specifically is kinda hard to understand because every resource I read differs in some way or another, so even if I think I get the bigger picture, these are the "details" that confuse me.
What is blocking
Why using sync functions in nodejs is known as "blocking the event loop"?
In a nutshell, it's because the event loop can only process the next event when you return from whatever your current Javascript is doing and allow the event loop to look for the next event. A sync function blocks the interpreter until it finishes. So, the entire time a sync function is working and you're waiting for it to return, the interpreter is blocked and control is not returned back to the event loop. This blocks the event loop and also blocks your Javascript from running.
Single Thread
Nodejs runs your Javascript all with a single thread. Other threads are used internally, but your Javascript itself runs only in a single thread (we're assuming there is no use of WorkerThreads in your code). So, when you make a synchronous function call, that single thread that runs your Javascript is busy and blocked until the synchronous function call returns and can then continue executing more of your Javascript.
This blocks everything. It blocks running more of your Javascript after the synchronous function call and it blocks getting back to the event loop to run any other event handlers that are pending such as incoming network events, timers, completion events from other things such as disk I/O, etc... So, while this is blocking the event loop, it's also blocking running more of your own code after the function call.
Non-Blocking, Asynchronous Operations
On the other hand, asynchronous functions such as fs.readFile(), for example, don't block. They initiate their operation and return immediately. This allows the interpreter to continue running any more of your own Javascript after the call to fs.readFile() and it also allows you to return from whatever event triggered your work in the first place which will return control back to the event loop so it can service other waiting events or other events that will trigger in the future. fs.readFile() then does most of its work in native code (behind the scenes) outside of the main thread that runs your Javascript. So, these type of asynchronous functions don't block the event loop - instead they cooperate with the event loop so that other things can get run while waiting for the completion of the asynchronous operation that was previously initiated. When they complete, they insert an event into the event loop that causes the event loop to call the completion callback at it's earliest convenience (when it's not blocked).
Differences in Blocking
It's also worth noting that functions that represent both synchronous and asynchronous operations both block the execution of your Javascript and block the event loop until they return. The difference is that an asynchronous operation returns from the function nearly immediately, long before the asynchronous operation itself is complete and communicates its completion and/or eventual result back via a promise, callback or event (which are all callbacks at the lowest level of the event loop). The synchronous operation does not return until the operation itself is complete. So, the asynchronous operation only blocks for a very short duration while the operation is being initiated whereas the synchronous operation blocks for the entire duration of the operation (until it completes).
More About the Event Loop
So, in what moment during the event loop is my javascript code executed?
When control returns back to the event loop, it goes through several different phases looking for things to do. When it finds something to do, that "something" results in calling a Javascript callback that starts running some of your Javascript. For example if the "something to do" is a setTimeout() timer that is ready to fire, then it will call the Javascript callback that was passed to setTimeout(). That callback runs to its completion and only when your Javascript returns from that callback does the event loop regain control and get to look for the next event to run and call its callback.
it won't be moved to the EventLoop because is not an async operation
This is not really the correct way to think about things. Things are not really "moved to the event loop".
A synchronous operation is just a blocking function call that returns when it returns and execution of any other Javascript is blocked until that blocking function call returns. Things are blocked because the single threaded interpreter running your Javascript is stuck waiting for this function to finish. It can't do anything else and the event loop is also blocked because it can't do anything until the interpreter returns control back to the event loop.
An asynchronous operation, on the other hand, initiates some operation (let's say it issues an http request to some other host) and then immediately returns, long before it has the result of that http request. Since this asynchronous operation returns before it has its result, it is considered non-blocking and because it returns quickly, you can then return from whatever event caused your code to run and that will then return control back to the event loop. That allows the event loop to then look for other events to handle and run their corresponding callbacks. Meanwhile, the asynchronous operation that was previously started has some native code associated with it (that may or may not be running in an native code OS thread - depending upon what type of operation it is). But regardless, that native code is configured such that when the asynchronous operation completes, it will insert an event into the appropriate event queue. So, at some future point when nodejs has control back in the event loop, it will find that event and run the Javascript callback associated with that event, thus notifying the original Javascript code that the asynchronous operation is now complete and providing some sort of result or error code.
Example
As a simple example, let's say you run this code:
// timer that wants to fire in 1 second
setTimeout(function() {
console.log("timer fired")
}, 1000);
// loop that blocks for 5 seconds
const start = Date.now();
while (Date.now() - start < 5000) { }
console.log("blocking loop finished");
This will output:
blocking loop finished
timer fired
Even though the timer was set to run 1 second from now, the while() loop blocked everything for 5 seconds so it wasn't until after the while() loop finished and returned back to the system that the event loop could look at what it had to do next and call the callback associated with the timer.
This while loop is similar to a blocking function such as pbkdf2Sync(). Both block the interpreter and don't return until they are finished and therefore nothing in the event loop gets a chance to run until they are finished.
A Simple Analogy
Here's a simple analogy. Imagine you need to contact your cable company to troubleshoot a problem. There are two ways for you to get ahold of them. The first way is that you call them and sit on the phone on hold for an hour waiting for a customer service representative to pick up your call. You can't really do much else while you're sitting on hold because you have to be right there waiting and ready to respond when someone finally comes on the line to help you. Thus, you are "blocked" from doing a lot of other things. This is a "blocking, synchronous operation". You can't really do much else while you're waiting on hold.
The second way you can call the cable company is to request a callback sometime in the future and when a representative is available, they will call you back. As soon as you're done requesting the callback, you can go about your business doing other things. You have to be able to answer an incoming call when it arrives, but other than that you're not blocked from doing many other things. This is a "non-blocking, asynchronous operation".
But, if while you're supposed to be able to answer your telephone callback from the second scenario above, you make another call and are on the phone for 30 minutes, the cable company is blocked from reaching you on the phone for the duration of your other call. In our analogy, you are "blocking the event loop" during your other call as the incoming event from the cable company can't be processed while you're on a call.

Why does Node seemingly wait for all Promises to resolve?

I'm new to JavaScript and NodeJS so forgive me if this question is rudimentary.
Let's say I just have a simple file hello.js and I run it with $ node hello.js and all this file contains is
setTimeout(() => {console.log('hello');}, 5000);
Why doesn't this program finish immediately? Why instead does it wait for the underlying Promise to resolve?
After all, isn't the Promise associated with setTimeout created and run asynchronously? So wouldn't the main 'thread' of execution "fall off" when it encounters no more code to run?
The Node event loop keeps running until all outstanding tasks are completed or cancelled.
setTimeout creates a pending event, so the loop will keep running until that executes.
Outstanding Promises, setInterval and other mechanisms can all prevent the event loop from halting.
It's worth noting that setTimeout does not use a promise at all. That's just a regular callback function. The setTimeout() API long predates Promises.
I think there's more to the story/explanation here so I'll add an answer that contains additional info.
Nodejs keeps a reference counter for all unfinished asynchronous operations and nodejs itself will not exit automatically until all the asynchronous operations are complete (until the reference count gets to zero). If you want nodejs to exit before that, you can call process.exit() whenever you want.
Since setTimeout() is an asynchronous operation, it contributes to the reference count of unfinished asynchronous operations and thus it keeps nodejs from automatically exiting until the timer fires.
Note that setTimeout() does not use a promise - it's just a plain callback. It is not promises that nodejs waits for. It is the underlying asynchronous operations that promises often are attached to that it actually waits for.
So, if you did just this:
const myPromise = new Promise((resolve, reject) => {
console.log("did nothing here");
});
Then, nodejs would not wait for that promise all by itself, even though that promise never resolves or rejects. This is because nodejs does not actually wait for promises. It waits for the underlying asynchronous operations that are usually behind promises. Since there is no such underlying asynchronous operation behind this promise, nodejs does not wait for it.
It's also worth mentioning that you can actually tell nodejs to NOT wait for some asynchronous operations by using the .unref() method. In the example you show, if you do this:
const timer = setTimeout(() => {console.log('hello');}, 5000);
timer.unref();
Then, nodejs will NOT wait for that timer before exiting and if this is your entire program, nodejs will actually exit immediately without waiting for the timer to fire.
As an example, I have a nodejs program that carries out some nightly maintenance using a recurring timer. I don't want that maintenance timer to keep the nodejs program running if the other things that it's doing are done. So, after I set the timer, I call the .unref() method on it.

the difference between `setTimeout` and `setImmediate`

the nodejs document say
Schedules "immediate" execution of callback after I/O events' callbacks and before timers set by setTimeout and setInterval are triggered. Returns an immediateObject for possible use with clearImmediate.
but I write a test code as follows:
server = http.createServer((req, res)->
res.end()
)
setImmediate(()->
console.log 'setImmediate'
)
setTimeout(()->
console.log 'setTimeout'
, 0)
process.nextTick(()->
console.log 'nextTick'
)
server.listen(8280, ()->
console.log 'i/o event'
)
why the setTimeout always output befeore setImmediate
SetTimeOut - This type of function will be call after set time which is 0 in your case but it follows event loop. And event loop does not grantee that it will work after 0 seconds. In fact it only guarantees that function will be called after completing set time.
But, function can be called any time after completing time when node event queue is free to take up callback function
Source to understand event loop - https://www.youtube.com/watch?v=8aGhZQkoFbQ
SetImmediate - This will get called as and when it goes to stack and does not follow cycle of callback in event loop.
The main advantage to using setImmediate() over setTimeout() is setImmediate() will always be executed before any timers if scheduled within an I/O cycle, independently of how many timers are present.
However, if executed in the main module, the execution order will be nondeterministic and dependent on the performance of the process
see https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ for more information

How does the stack work on NodeJs?

I've recently started reading up on NodeJS (and JS) and am a little confused about how call backs are working in NodeJs. Assume we do something like this:
setTimeout(function(){
console.log("World");
},1000);
console.log("Hello");
output: "Hello World"
So far on what I've read JS is single threaded so the event loop is going though one big stack, and I've also been told not to put big calls in the callback functions.
1)
Ok, so my first question is assuming its one stack does the callback function get run by the main event loop thread? if so then if we have a site with that serves up content via callbacks (fetches from db and pushes request), and we have 1000 concurrent, are those 1000 users basically being served synchronously, where the main thread goes in each callback function does the computation and then continues the main event loop? If this is the case, how exactly is this concurrent?
2) How are the callback functions added to the stack, so lets say my code was like the following:
setTimeout(function(){
console.log("Hello");
},1000);
setTimeout(function(){
console.log("World");
},2000);
then does the callback function get added to the stack before the timeout even has occured? if so is there a flag that's set that notifies the main thread the callback function is ready (or is there another mechanism). If this is infact what is happening doesn't it just bloat the stack, especially for large web applications with callback functions, and the larger the stack the longer everything will take to run since the thread has to step through the entire thing.
The event loop is not a stack. It could be better thought of as a queue (first in/first out). The most common usage is for performing I/O operations.
Imagine you want to read a file from disk. You start by saying hey, I want to read this file, and when you're done reading, call this callback. While the actual I/O operation is being performed in a separate process, the Node application is free to keep doing something else. When the file has finished being read, it adds an item to the event loop's queue indicating the completion. The node app might still be busy doing something else, or there may be other items waiting to be dequeued first, but eventually our completion notification will be dequeued by node's event loop. It will match this up with our callback, and then the callback will be called as expected. When the callback has returned, the event loop dequeues the next item and continues. When there is nothing left in node's event loop, the application exits.
That's a rough approximation of how the event loop works, not an exact technical description.
For the specific setTimeout case, think of it like a priority queue. Node won't consider dequeuing the item/job/callback until at least that amount of time has passed.
There are lots of great write-ups on the Node/JavaScript event loop which you probably want to read up on if you're confused.
callback functions are not added to caller stack. There is no recursion here. They are called from event loop. Try to replace console.log in your example and watch result - stack is not growing.

Resources