Node.js: why are idle and prepare phases only used internally? - node.js

The documentation of node.js describes the so called phases of its underlying event loop.
It explicitly states also that idle and prepare phases are only used internally.
For the event loop of node.js is the one of libuv, it goes without saying that those phases are probably mapped on the idle and prepare handles of libuv.
They would help to have greater granularity while organizing the tasks in a software. In particular, they are the only way to schedule something between the execution of the I/O callbacks and the poll phase.
Anyway, they are not exported from the underlying environment.
What's the reason for which those phases have been forbidden, actually giving to the users an apparently poorest event loop than the one offered by libuv?
Is there any other way to schedule tasks the way mentioned above?
Side note: it's just curiosity.
I used to work with both libuv and nodejs and I noticed it, so I want to know if there is a technical reason for that or... Well, that is how it has been designed and that's all, no particular reason.

I don't think there is a specific reason to "forbid" them. Moreover, they are not really forbidden, they are just not exposed. You could create a Node addon which allows you to create idle and prepare handles and there would be no problem at all. There are some things you must be aware of:
Idle handles have a terrible name: they don't run when the loop is actually idle. They run once per loop iteration, after the timers, and if any idle timer is active, the loop will block for i/o for zero seconds. So they can be dangerous because the CPU will spin if you don't stop it.
Callbacks registered with process.nextTick are called when the C++ <-> JS boundary is crossed (see calls to MakeCallback) so i/o callbacks could be deferred and run a bit later. If you exposed prepare handles to JS you would use MakeCallback in the C++ code, so some of the process.nextTick callbacks would also be called alongside your prepare callbacks.
As a general note: idle, check and prepare handles were somehow inherited from libev (which libuv used to use internally). Check and prepare can be used when embedding libuv with other libraries and idle handles are a bit weird, as I mentioned above. Also, libuv follows its own path these days, so not everything libuv has will end up exposed in Node land.

You could ask a reverse question "why do you need idle phase, for example, to be exposed"? You can just use setImmediate().
Also, why do you want to execute something in between I/O callbacks and polling phase, as you don't control explicitly those things anyways?

Related

What does it mean by " asynchronous I/O primitives" in nodejs?

I was going through Node.js documentation, and couldn't understand the line :
A Node.js app runs in a single process, without creating a new thread for every request. Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking and generally, libraries in Node.js are written using non-blocking paradigms, making blocking behavior the exception rather than the norm.
Source : Introduction to node js
I couldn't understand specifically:
[...] Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking [..]
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
If not then what are these set of asynchronous I/O primitives? If anyone could provide me some link for better understanding or getting started with Node.js, that would be really great.
P.S : I have practical experience with Nodejs where I understand how it's code will work but don't understand why it will work, so I want understand its theoretical part, so I can understand what actually is going on in the background.
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
Yes, that's basically what it means.
In a "traditional" one-thread-per-connection* model you accept a connection and then hand off the handling of that request to a thread (either a freshly started one or from a pool, doesn't really change much) and do all work related to that connection on that one thread, including sending the response.
This can easily done with synchronous/blocking I/O: have a read method that simply returns the read bytes and a write method that blocks until the writing is done.
This does, however mean that the thread handling that request can not do anything else and also that you need many threads to be able to handle many concurrent connections/requests. And since I/O operations take a relatively huge time (when measured in the speed of memory access and computation), that means that most of these threads will be waiting for one I/O operation or another for most of the time.
Having asynchronous I/O and an event-based core architecture means that when a I/O operation is initiated the CPU can immediately go on to processing whatever action needs to be done next, which will likely be related to an entirely different request.
Therefore one can handle many more requests on a single thread.
The "primitives" just means that basic I/O operations such as "read bytes" and "write bytes" to/from network connections or files are provided as asynchronous operations and higher-level operations need to be built on top of those (again in an asynchronous way, to keep the benefits).
As a side node: many other programming environments have either had asynchronous I/O APIs for a long time or have gotten them in recent years. The one thing that sets Node.js apart is that it's the default option: If you're reading from a socket or a file, then doing it asynchronously is what is "normal" and blocking calls are the big exception. This means that the entire ecosystem surrounding Node.js (i.e. almost all third-party libraries) works with that assumption in mind and is also written in that same manner.
So while Java, for example, has asynchronous I/O you lose that advantage as soon as you use any I/O related libraries that only support blocking I/O.
* I use connection/request interchangeably in this answer, under the assumption that each connection contains a single request. That assumption is usually wrong these days (most common protocols allow multiple request/response pairs in a single connnection), but handling multiple requests on a single connection doesn't fundamentally change anything about this answer.
It means node js doesn't halts on input/output operations. Suppose you need to do some task and it have some blocking condition e.g if space key is pressed then do this or while esc key isn't pressed keep taking input as node js is single threaded this will stop all operations and it will focus of doing the blocking condition job until its finishes it asyncronous will allow application not halt other tasks while doing one it will do other task until than task finishes and thats why we use await to get value out of promises in node js async function when the data is processed then the pointer will comes to the line where await line is present and get the value or process the information from it.

How Node.js event loop model scales well

I know this question has been discussed in the past in much details (How is Node.js inherently faster when it still relies on Threads internally?) but I still fail to properly understand node.js event loop model and being a single threaded model how it handles concurrent requests.
Uptil now my understanding is : We receive an IO request --> a thread is spawned internally by node.js and IO request is handed to it --> since this is an IO request so CPU hands it to DMA controller and frees this thread --> this thread again goes into the thread pool to serve a different request --> DMA is still doing the IO, once DMA get all the data a sort of event is fired --> this event is captured by the node.js system and it puts the supplied callback function on the event loop --> whenever event loop get the opportunity it executed the callback on the data fetched by the IO -- > thanks to closures, callback function executes on the data fetched by the callback only
So this process goes on repeatedly. Please someone elucidate on my understand and provide some information
There is only one thread (the main thread) for dealing with network I/O (file I/O is a slightly different story because not all platforms provide usable asynchronous, non-blocking file I/O APIs, so the synchronous file I/O APIs are used on those platforms in a threadpool).
So when network requests come in, they're all handled by the main thread which uses (indirectly via libuv) epoll/kqueue/IOCP/etc. for detecting (in a non-blocking way) when data is available (or when there is an incoming TCP connection for example). If there is data available, it calls out appropriately to javascript as needed, passing the socket data. If there is no data on the socket (and there's nothing else for the event loop to do, e.g. firing timers), then execution proceeds to the next iteration of the event loop where the process starts all over again.
As far as associating socket data with socket javascript objects goes, it's the combination of C++ wrapper objects (e.g. tcp_wrap, udp_wrap, etc.) and javascript objects that makes sure the data gets to the appropriate place.
Here's a slightly older diagram that explains what happens in a single cycle of node's event loop. Some of it may have changed slightly since node v0.9, but it gets you the general idea:
node.js has a single threaded model which eliminates the need for locks and semaphores (used in the traditional multithreaded model). Locks and semaphores can add some costs in terms of performance and, more importantly, can provide a lot of rope to hang yourself with (in other words, many pitfalls). IO operations happen in parallel and because work between IOs is typically very small, this single threaded model usually works quite nicely.
(side note: if you have an app that does a lot of work between IO operations, i.e. CPU intense apps, that is a case where node doesn't not scale well)
I like to think of the argument for why node's model scales well is the same as why people think NoSQL scales better than SQL databases. Obviously Java (multi-threaded) and SQL scale; big companies like Facebook and Twitter have proven that. However, like in SQL, there are a lot of things you could do incorrectly to slow down your performance. Node.js doesn't eliminate all potential problems, it just does a good job of restricting many of the common causes.

What is the difference between single thread and non-blocking I/O operation in NodeJs?

I've been reading up and going through as much of NodeJs code as I can but I'm a bit confused about this:
What exactly does Node being single threaded mean and what does non-blocking I/O mean? I can achieve the first one by spawning a child process and the second one by using async library. But I wanted to be clear what it meant and how non-blocking I/O can still slow up your app.
I'll try my best to explain.
Single-threaded means that the Node.js Javascript runtime - at a particular point in time - only is executing one piece of code from all the code it has loaded. In effect, it starts somewhere, and works it way down through all instructions (the call stack) until it's done. While it's executing code, nothing can interrupt this process, and all I/O must wait. Thankfully, most call stacks are relatively short, and lots of things we do in Node.js are more of the "bookkeeping" type than CPU-heavy.
Being single-thread though, any instructions that would take a long time would be a huge problem for the responsiveness in a system. The runtime can only do one thing at a time, so everything must wait until that instruction has finished. If any "I/O" instruction (say reading from disk) would block execution, then the system would be unnecessarily unavailable at that time.
But thankfully, we've got non-blocking I/O.
Instead of waiting for a file to be read:
console.log(readFileSync(filePath))
you write your code so that you DON'T wait for a file to be read:
readFile(filePath)
The readFile call returns almost instantly (perhaps in a a few nano-seconds), so the runtime can continue executing instructions that come next. But if the readFile call returns before the data has been read, there's no way that the the readFile call can return the file contents. That's where callbacks come in:
readFile(filePath, function(err, contents) { console.log(contents))
Still, the readFile call returns almost instantly. The runtime can continue. It will finish the current work before it (all instructions coming after readFile). Nothing is done with the function that's passed, other than storing a reference to it.
Then, at some later point in time (perhaps 10ms, 100ms, or 1000ms later) when reading the file has completed, the callback is called with as second argument the full contents of the file. Before that time, any number of other batches of work could have been done by the runtime.
Now I will address your comments about spawning child processes and Async library. You are wrong on both accounts.
Spawning a child process is a way to let Node.js use more than CPU core. Being single-threaded, a single Node.js has no purpose for using more than one core. Still, if you are on a multi-core computer, you may want to use all those cores. Hence, start multiple Node.js. processes.
The Async library will not give you non-blocking I/O, Node.js gives you that. What Node.js does not give you itself, is an easy way to deal with data coming in from multiple callbacks. The Async library can help a great deal with that.
As I'm not an expert on Node.js internals, I welcome corrections!
Related questions:
asynchronous vs non-blocking
What's the difference between: Asynchronous, Non-Blocking, Event-Base architectures?

How does node.js know if a call has to be executed in the thread pool or in the event loop?

I've read that node.js uses both treads and an event loop.
I'm curious to know how does it know how to treat a call back... Is it specified by the EventEmitter (and does the engineer know if it is going to be blocking or not)?
Or is the core itself that chooses it at runtime?
If it's this one how does it detect if it has to be run async or threaded?
I've already read a lot of resources but i didn't find about this. Im reading the source code but it's hard since it is a lot of time since the last time i coded with C++.
thanks
Your JavaScript code always runs in a single thread. That's because the V8 JavaScript engine is not threadsafe.
However, as an implementation detail of some of the C++ code, there may be threads. For example, suppose you write some JavaScript code that connects to a database. Your JavaScript code will of course be async, like any good Node code. But async coding is very uncommon in the C/C++ world, so the database vendor probably didn't write an async C/C++ API.
So when someone is writing a Node package for database access, they have to write a shim that adapts between the "blocking" C++ behavior and the "non-blocking, event-driven" Node behavior. When you call, say, a "connect" method, that goes to C++ code that spawns a new thread, and that thread issues a (blocking) "connect" call to the database, which blocks the thread until the connection is done. Then the C++ code will communicate the "connection done" back to the event queue, and the next time the main (JavaScript) thread polls the event queue, your callback will fire.
So yes, there are threads, but their use should be completely transparent to you. When you're writing Node.js code in JavaScript, you don't need to worry about threads -- you just care that things happen when they're supposed to. Package authors may use threads, but that's purely an implementation detail and you should never have to worry about it. Your JavaScript code never explicitly uses threads.

How does Asynchronous programming work in a single threaded programming model?

I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.
Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.
Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.
To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!

Resources