I've been testing some code to see how does NodeJS event loop actually works. So I get in touch with this piece of code:
console.time('Time spending');
let list = [];
for (let index = 0; index < 1000000; index++) {
const data = JSON.stringify({
id: Date.now(),
index,
});
list.push(data);
}
console.log(list);
console.timeEnd('Time spending');
When this code is executed, NodeJS spawns eleven threads on SO (Ubuntu running on WSL 2). But why it does that?
This code is not being declared as an async code.
That's the worker pool. As mentioned in the great guide Don't block the event loop, Node.js has an "event loop" and a worker pool. Those threads you see are the worker pool, and the size is defined with the environment variable UV_THREADPOOL_SIZE, from libuv, which Node.js uses internally. The reason node.js spawns those threads has nothing to do with your expensive loop, it's just the default behavior at startup.
There's extensive documentation on how the event loop works on the official Node.js site, but essentially some operations, like filesystem I/O, are synchronous because the underlying operating system does not offer an asynchronous interface (or it's too new/experimental). Node.js works around that by using a thread pool where the event loop submits a task, like reading a file, which is usually a synchronous job, and goes to the next event while a thread does the dirty work of actually reading the file, it can block the thread, but it does not matter, because the event loop is not blocked. When it's done, it reaches back to the event loop with the data. So, for the event loop (and the programmer), the synchronous read was done asynchronously.
There are no parallel threads being used to run your code. Nodejs runs all your code you show in just one thread. You could just do this:
setTimeout(() => {
console.log("done with timeout");
}, 10 * 60 * 1000);
And, you would see the same number of threads. What you are seeing has nothing to do with your specific code.
The other threads you see are just other threads that nodejs uses for it's own internal purposes such as the worker pool for disk I/O, asynchronous crypto, some other built-in operations and other internal housekeeping operations.
Also, Javascript code marked as async still runs in the one main Javascript thread so your reference that nothing is async wouldn't change things either. It doesn't matter (from a thread point of view) whether code is async or not.
Your big for loop blocks the entire event loop so no other Javascript code or events can run until your for loop finishes. There's no really much to learn about the event loop from this code except that your loop blocks the event loop until the loop completes.
I see that Event loop typically phases through the following cycle in each iteration: Timers -> I/O Callbacks -> idle -> Poll -> Check -> Close
as per the official Node.js docs https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ .
Now it also says 'I/O Callbacks' phase executes callbacks for some system operations such as types of TCP errors' for example typical operations like TCP connection errors.
and in the 'Poll' phase - it says 'retrieve new I/O events' . like incoming connections , data, etc
I am confused. in which phase are I/O events (callback handlers) like 'new HTTP Request received', 'data received from database per previous query' are executed?
Generally
Generally, you shouldn't care about these phases. Even the distinction between a microtask (like nextTick) and a macrotask (like setImmediate) is not very important for every day NodeJS developers. That article is an in depth look at how Node handles thing internally.
All a user typically has to care about is that when they register a request - the callback they provide will eventually be called at some later time in the future which is typically fast enough as long as they don't "block the event loop" by doing a lot of synchronous CPU bound work on the Node process.
Your specific question
in which phase are I/O events (callback handlers) like 'new HTTP Request received', 'data received from database per previous query' are executed?
They are executed in the poll phase:
If the poll queue is not empty, the event loop will iterate through its queue of callbacks executing them synchronously until either the queue has been exhausted, or the system-dependent hard limit is reached.
Note that the poll phase is not the only place where I/O callbacks might be executed. The (somewhat poorly named) I/O callbacks phase is also in charge of some callbacks. This is due to how libuv works and should be transparent in your code. In addition - some libraries (like DB libraries) might do their own scheduling and run callback code inside a timer (in the timers phase) - and some asynchronous callbacks like close callbacks run in their own phase.
There is nothing to be confused about. All async callbacks are executed in the callback phase. It is one of the only two places where the interpreter execute javascript: when the script starts and in the callback phase.
As I understand , a Node JS server continues to listen on a port for any incoming requests which means the thread is continuously busy ? When does it break from this continuous never ending loop and check if there are any events to be processed from the call back queue?
2) When Node JS starts executing code of callback functions, the server is essentially stopped? It is not listening for further requests? I mean since only single thread is going to do both the task only one can be done at a time?
Is this understanding correct or there's more to it?
Yes, you are right. Everything in nodejs runs on the main thread except the I/o calls and fs file calls which go into to OS kernel and thread pool respectively for execution. All the synchronous code runs on the main thread and while this code is running, nodejs cannot process any further requests. That is why it is not advisable to use a long for loop because it may block the main thread for much time. You need to make child threads to handle that.
Node thread keeps an event loop and whenever any task get completed, it fires the corresponding event which signals the event listener function to get executed. The event loop simply iterate over the event queue which is basically a list of events and callbacks of completed operations. there is generally a main loop that listens for events, and then triggers a callback function when one of those events is detected.
(source: abdelraoof.com)
Similar event loop questions are here:
Node.js Event loop
Understanding the Event Loop
Source:
http://abdelraoof.com/blog/2015/10/28/understanding-nodejs-event-loop/
http://www.tutorialspoint.com/nodejs/nodejs_event_loop.htm
http://chimera.labs.oreilly.com/books/1234000001808/ch03.html#chap3_id35941348
Can't understand one thing. On server I have some process which runs forever in async mode. For example like this:
function loginf() {
console.log(1+1);
process.nextTick(loginf);
}
loginf();
Its recusrion, and as I understand it must cause stack overflow and(or) eat the memory.
How to to do long running forever without memory leak in node.js? Is it possible?
If you want to do something repeatedly and you want to do it in a friendly way that leaves cycles for other events in your server to be processed, then the usual way to do that is with setInterval().
setInterval(function() {
console.log("called");
}, 1000);
Repeatedly calling the same function like you are doing with process.nextTick() is not really recursion and does not lead to a stack overflow because the stack completely unwinds before the event queue calls the function the next time. It finishes the current path of execution and then calls the function you passed to nextTick().
Your choices for this type of operation are:
setInterval()
setTimeout()
setImmediate()
process.nextTick()
All three choices let the current thread of execution finish before calling the callback function so there is no stack build-up.
setInterval() uses a system timer set for some time in the future and allows all other events currently in the queue or in the queue before the timer time occurs to be serviced before calling the setInterval() callback. Use setInterval() when a time pause between calls to the callback is advisable and you want the callback called repeatedly over and over again.
setTimeout() uses a system timer set for some time in the future and allows all other events currently in the queue or in the queue before the timer time occurs to be serviced before calling the setTimeout() callback. You can use setTimeout() repeatedly (setting another timeout from each callback), though this is generally what setInterval() is designed for. setTimeout() in node.js does not follow the minimum time interval that browsers do, so setTimeout(fn, 1) will be called pretty quickly, though not as quickly as setImmediate() or process.nextTick() due to implementation differences.
setImmediate() runs as soon as other events that are currently in the event queue have all been serviced. This is thus "fair" to other events in the system. Note, this is more efficient that setTimeout(fn, 0); because it doesn't need to use a system timer, but is coded right into the event sub-system. Use this when you want the stack to unwind and you want other events already in the queue to get processed first, but you want the callback to run as soon as possible otherwise.
process.nextTick() runs as soon as the current thread of execution finishes (and the stack unwinds), but BEFORE any other events currently in the event queue. This is not "fair" and if you run something over and over with process.nextTick(), you will starve the system of processing other types of events. It can be used once to run something as soon as possible after the stack unwinds, but should not be used repeatedly over and over again.
Some useful references:
setImmediate vs. nextTick
Does Node.js enforce a minimum delay for setTimeout?
NodeJS - setTimeout(fn,0) vs setImmediate(fn)
setImmediate vs process.nextTick vs setTimeout
Is the Node.js I/O event loop single- or multithreaded?
If I have several I/O processes, node puts them in an external event loop. Are they processed in a sequence (fastest first) or handles the event loop to process them concurrently (...and in which limitations)?
Event Loop
The Node.js event loop runs under a single thread, this means the application code you write is evaluated on a single thread. Nodejs itself uses many threads underneath through libuv, but you never have to deal with with those when writing nodejs code.
Every call that involves I/O call requires you to register a callback. This call also returns immediately, this allows you to do multiple IO operations in parallel without using threads in your application code. As soon as an I/O operation is completed it's callback will be pushed on the event loop. It will be executed as soon as all the other callbacks that where pushed on the event loop before it are executed.
There are a few methods to do basic manipulation of how callbacks are added to the event loop.
Usually you shouldn't need these, but every now and then they can be useful.
setImmediate
process.nextTick
At no point will there ever be two true parallel paths of execution, so all operations are inherently thread safe. There usually will be several asynchronous concurrent paths of execution that are being managed by the event loop.
Read More about the event loop
Limitations
Because of the event loop, node doesn't have to start a new thread for every incoming tcp connection. This allows node to service hundreds of thousands of requests concurrently , as long as you aren't calculating the first 1000 prime numbers for each request.
This also means it's important to not do CPU intensive operations, as these will keep a lock on the event loop and prevent other asynchronous paths of execution from continuing.
It's also important to not use the sync variant of all the I/O methods, as these will keep a lock on the event loop as well.
If you want to do CPU heavy things you should ether delegate it to a different process that can execute the CPU bound operation more efficiently or you could write it as a node native add on.
Read more about use cases
Control Flow
In order to manage writing many callbacks you will probably want to use a control flow library.
I believe this is currently the most popular callback based library:
https://github.com/caolan/async
I've used callbacks and they pretty much drove me crazy, I've had much better experience using Promises, bluebird is a very popular and fast promise library:
https://github.com/petkaantonov/bluebird
I've found this to be a pretty sensitive topic in the node community (callbacks vs promises), so by all means, use what you feel will work best for you personally. A good control flow library should also give you async stack traces, this is really important for debugging.
The Node.js process will finish when the last callback in the event loop finishes it's path of execution and doesn't register any other callbacks.
This is not a complete explanation, I advice you to check out the following thread, it's pretty up to date:
How do I get started with Node.js
From Willem's answer:
The Node.js event loop runs under a single thread. Every I/O call requires you to register a callback. Every I/O call also returns immediately, this allows you to do multiple IO operations in parallel without using threads.
I would like to start explaining with this above quote, which is one of the common misunderstandings of node js framework that I am seeing everywhere.
Node.js does not magically handle all those asynchronous calls with just one thread and still keep that thread unblocked. It internally uses google's V8 engine and a library called libuv(written in c++) that enables it to delegate some potential asynchronous work to other worker threads (kind of like a pool of threads waiting there for any work to be delegated from the master node thread). Then later when those threads finish their execution they call their callbacks and that is how the event loop is aware of the fact that the execution of a worker thread is completed.
The main point and advantage of nodejs is that you will never need to care about those internal threads and they will stay away from your code!. All the nasty sync stuff that should normally happen in multi threaded environments will be abstracted out by nodejs framework and you can happily work on your single thread (main node thread) in a more programmer friendly environment (while benefiting from all the performance enhancements of multiple threads).
Below is a good post if anyone is interested:
When is the thread pool used?
you have to know first about nodeJs implementaion in order to know event loop.
actually node js core implementation using two components :
v8 javascript runtime engine
libuv for handlign non i/o blocking operation and handling threads and concurrent operations for you;
with the javascript you can actually write code with one thread but this means not that your code execute on the one thread although you can execute on multiple thread s using clusters in node js
now when you want to execute some code like :
let fs = require('fs');
fs.stat('path',(err,stat)=>{
//do something with the stat;
console.log('second');
});
console.log('first');
the execution of this code at high level is like this:
first the v8 engine run this code and then if there is no error
everything is good then it looks for the
it try to run it run line by line when it gets to the fs .stats this is a node js api very similar to the web apis like setTimeout that the browser handle it for us when it encounter to the fs.stats it is pass the code to the libuv components with a flag and pass your callback to the event queue then the libuv you execute your code during the operation and when its done just send some signal and then d the v8 execute your code az a callback you set on the queue but it always check for the stack is empty then go for the your code on the queue # always remember that !
Well, to understand nodejs I/O events in the event, you must understand nodejs event loop properly.
from the name event loop, we understand it's a loop that runs cycle after cycle round-robin basis until there are no events remains in the loop or the app closed.
The event loop is one of the topmost features in nodejs, it is what makes async programming in nodejs.
When the program starts we are in a node process in the single thread where the event loop runs. Now the most importing things we need to know that the event loop is where all the application code that is inside callback functions is executed.
So, basically all code that is not top-level code will run in the event loop. Some part (mostly heavy duties) might get offloaded to the thread pool
(When is the thread pool used?), the event loop will take care of those heavy duties and return the result to the event of the event loop.
It is the heart of the node architecture, and nodejs built around callback functions. so callbacks will triggered as soon as some work is finished sometime in the future because node uses an event-triggered architecture.
When an application receives an HTTP request on a node server or a timer expiring or a file finishing to read all these will emit events as soon as they are done with their work, and our event loop will then pick up these events and call the callback functions that are associated with each event, it's usually said that the event loop does the orchestration, which simply means that it receives events, calls their callback functions, and offloads the more expensive tasks to the thread pool.
Now, how does all this actually work behind the scenes? In what order are these callbacks executed?
Well, when we start our node application, the event loop starts running right away. An event loop has multiple phases, and each phase has a callback queue, where the four most important phases are 1. Expired timer callbacks, 2.I/O polling and callbacks 3. setImmediate callbacks, and 4. Close callbacks. There are other phases that is used internally by Node.
So, the first phase takes care of callbacks of expired timers, for example, from the setTimeout() function. So, if there are callback functions from timers that just expired, these are the first ones to be processed by the event loop.
** The most important thing is, If a timer expires later during the time when one of the other phases is being processed, well then the callback of that timer will only be called as soon as the event loop comes back to this first phase. And it works like this in all four phases.**
So callbacks in each queue are processed one by one until there are no ones left in the queue and only then, the event loop will enter the next phase. for example, suppose there is 1000 setTimeOut callbacks timer expired and the event loop is in the first phase then all these 1000 setTimeOuts callbacks will execute one by one then it will go to the next phase(I/O pooling and callbacks).
Next up, we have I/O pooling and execution of I/O callbacks. Here I/O stands for input/output and polling basically means looking for new I/O events that are ready to be processed and putting theme into the callback queue.
In the context of a Node application, I/O means mainly stuff like networking and file access, so in this phase where probably 99% of general application code gets executed.
The next phase is for setImmediate callbacks, and SetImmediate is a special kind of timer that we can use if we want to process callbacks immediately after the I/O polling and execution phase.
And finally, the fourth phase is the close callbacks, in this phase, all close events are processed, for example when a server or a WebSocket shut down.
These are the four phases in the event loop, but besides these four callbacks queues there are actually also two other queues,
1. nextTick() other
2. microtasks queue(which is mainly for resolved promises)
If there are any callbacks in one of these two queues to be processed, they will be executed right after the current phase of the event loop finishes instead of waiting for the entire loop/cycle to finish.
In other words, after each of these four phases, if there are any callbacks in these two special queues, they will be executed right away. Now imagine that a promise resolves and returns some data from an API call while the callback of an expired timer is running, In this case, the promise callback will be executed right after the one from the timer finish.
The same logic also applies to the nextTick() queue. The nextTick() is a function that we can use when we really, really need to execute a certain callback right after the current event loop phase. It's a bit similar to setImmediate, with the difference that setImmediate only runs after the I/O callback phase.
Will all the above things can happen in one tick/cycle of the event loop, In the meantime their new events could have arisen in a particular phase or old event could be expired, the event loop will handle those events with another new cycle.
So now it's time to decide whether the loop should continue to the next tick or if the program should exit. Node simply checks whether there are any timers or I/O tasks that are still running in the background if there aren't any then it will exit the application. But if there are any pending timers or I/O tasks, then the node will continue running the event loop and go starting to the next cycle.
For example, in node application when we are listening for incoming HTTP requests, we basically running an infinite I/O task, and that is run in the event loop, for that Node.js keep running and keep listening for new HTTP request coming in instead of just exiting the application.
Also when we are writing or reading a file in the background that's also an I/O task and it makes sense that the app doesn't exist while it's working with that file, right?
Now The event loop in practices:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
});
setImmediate(()=>console.log('Immediate 1 finished'))
console.log('Hello from the top level code');
Output:
Well the first lin is Hello from the top level code, yes it is expected because this is a code that gets executed immediately. Then after we have three output, Timer 1 finished this line is expected because of phase one as we discuess before, but after that I/O finished should be printed, because we discuess that setImmediate runs after the I/O callback phase, but this code is actually not in an I/O cycle, so it is not running inside of the event loop, because it's not runnin inside of any callback function.
Now lets do another test:
const fs = require('fs');
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 0);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
Output:
The output is as expected right? Now let's add some delay:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
});
console.log('Hello from the top level code')
output:
In the first cycle inside I/O everything executed, but because of the dealy Timer-2 executed inside its code in the second cycle.
Now Lets add nextTick(), and see how nodejs behaves:
setTimeout(()=>console.log('Timer 1 finished'), 0);
setImmediate(()=>console.log('Immediate 1 finished'));
fs.readFile('test-file.txt', ()=>{
console.log('I/O finished');
setTimeout(()=>console.log('Timer 2 finished'), 3000);
setImmediate(()=>console.log('Immediate 2 finished'));
setTimeout(()=>console.log('Timer 3 finished'), 0);
setImmediate(()=>console.log('Immediate 3 finished'));
process.nextTick(()=>console.log('Process Next Tick'));
});
console.log('Hello from the top level code')
Output:
Well, the first callback is executed is inside the process.NextTick(), as it is expected right? Because nextTicks callbacks stays in the microtask queue an they executed after each phase.
If you run this simple node code
console.log('starting')
setTimeout(()=>{
console.log('0sec')
}, 0)
setTimeout(()=>{
console.log('2sec')
}, 2000)
console.log('end')
What do you expect output to be?
If its,
starting
0sec
end
2sec
it's is wrong guess, we will get
starting
end
0sec
2sec
because node will never print code in event loop before exiting main()
So basically, First main() will go in stack, then console.log('starting ') so you will see it printed first, after that come setTimeout(()=>{console.log('0sec')}, 0) will go in a stack and then in nodeAPI (node uses multi-threads (lib written in c++) to execute setTimeout to finish, even tho above code is single thread code) after time is up it moves to the event loop, now node can't print it unless stack is not empty. So, next line i.e setTimeout of 2sec will be first pushed to stack,then nodeAPI which will wait for 2 sec to finish, and then to even loop, in mean while next code line will be executed that is console.log('end') and so we see end msg before 0sec, because if nodes non blocking nature. After end code is over and hence main is poped out and its turn of event loop code to be executed that is first 0sec and after that 2sec msg will be printed.