How does node.js/libuv support async io using the reactor pattern

How does node.js/libuv support async io using the reactor pattern - node.js

The reactor pattern which is utilized by libuv for handling IO is synchronous by design but libuv supports async io. How is this possible? Does libuv extend the reactor's design somehow to support async io? Does using multiple threads/event loops aid in achieving this?

The I/O model of Node and libuv is very similar to what nginx does internally.
The libuv uses a single-threaded event loop and non-blocking asynchronous I/O. All functions are synchronous in a way that they run to completion but some clever hackery with promises and generators can be used to appear that they don't (when in fact both the invocation of the generator function is non-blocking and returns the generator object immediately and the generator methods like .next() run to completion), plus the new async/await syntax makes it very convenient.
For operations that cannot be accomplished in a non-blocking way Node uses a thread pool to run the blocking operations in separate threads but this is done transparently and it is never exposed to the application code written in JavaScript (you need to step down to C++ to work with that directly).
See: http://docs.libuv.org/en/v1.x/design.html
Unlike network I/O, there are no platform-specific file I/O primitives libuv could rely on, so the current approach is to run blocking file I/O operations in a thread pool. [...]
libuv currently uses a global thread pool on which all loops can queue work on. 3 types of operations are currently run on this pool:
File system operations
DNS functions (getaddrinfo and getnameinfo)
User specified code via uv_queue_work()
See also those answers for more details:
what is mean by event loop in node.js ? javascript event loop or libuv event loop?
*NodeJS event loop internal working
Prevent NodeJS from exiting event-loop
How node.js server serve next request, if current request have huge computation?
Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?
Speed up setInterval
Async.js - Is parallel really parallel?
Node.js: Asynchronous Callback Execution. Is this Zalgo?
See the links and illustration in those answers. There are a lot of resources to read bout that topic.

Related

Why "worker_threads" when we have default worker pool?

I see clear the cluster method as it deploys different whole processes. And I guess the professional programmers made "worker_threads" library for some good reason... but I still need to clear this point for my understanding:
In a normal single threaded process the event loop thread has the aid of the default worker pool to unload its heavy I/O tasks, so the main thread is not blocked.
At the same time, user defined "worker threads" will be used for the same reason with their own event loops and NodeJS instances.
What's the point of spawning those event loop and Nodejs instances when they are not the bottle neck as the libuv is intended to manage to spawn the workers.
Is this meaning that the default worker pool may not be enough? I mean just a quantity matter or concept?

There are two types of operation(call) in Nodejs blocking and non-blocking
non-blocking
Nodejs use Libuv for IO non-blocking operation. Network, file, and DNS IO operations run asynchronously by Libuv. Nodejs use the following scheme:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, Libuv's thread pool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the thread pool are:
all fs APIs, other than the file watcher APIs and those that are:
explicitly synchronous asynchronous crypto APIs such as crypto.pbkdf2(),
crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
dns.lookup() all zlib *APIs, other than those that are explicitly synchronous.
So we don't have direct access to the Libuv thread pool. We may define our own uses of the thread pool using C++ add-ons.
Blocking calls
Nodejs execute blocking code in the main thread. fs.readfileSync(), compression-algorithm, encrypting data, image-resize, calculating primes for the large range are some examples of blocking operation. Nodejs golden rule is never block event-loop(main thread). We can execute these operations asynchronously by creating child process using cluster module or child-process module. But creating a child process is a heavy task in terms of OS resources and that's why worker-thread was born.
Using worker-thread you can execute blocking javascript code in worker-thread hence unblocking the main thread and you can communicate to parent thread(main thread) via message passing. Worker threads are still lightweight as compared to a child process.
Read more here:
https://nodesource.com/blog/worker-threads-nodejs
https://blog.insiderattack.net/deep-dive-into-worker-threads-in-node-js-e75e10546b11

Why is async programming needed in NodeJS?

I have a problem with the concept of async in NodeJS. I have read a lot about the event poll in NodeJS. They say things like:
The event loop is what allows Node.js to perform non-blocking I/O
operations
or
Node uses the Worker Pool to handle "expensive" tasks. This includes
I/O for which an operating system does not provide a non-blocking
version, as well as particularly CPU-intensive tasks.
or
These are the Node module APIs that make use of this Worker Pool such
as File System(fs)
So, I found that Node manages I/O running using a Thread pool. Now my question is, if Node is managing them, why do we need to utilize async programming at all in NodeJS? And whats the reason behind some modules like BlueBird?

tl;dr: You need async to prevent a blocking of the Event-Loop.
NodeJS uses a certain number of threads to handle clients. There basically are two types of threads:
Event Loop (or your main thread)
Worker Pool (or threadpool)
The Event Loop:
Basically the reason why async programming is needed:
Once all events are registered, NodeJS enters the Event Loop and handles all incoming requests as well as outgoing responses. All of them pass through the Event Loop.
Worker Pool
As you already said, NodeJS uses the Worker Pool to perform I/O and CPU intensive tasks .
Asynchronous Code
In order to prevent blocking the main thread, you want to keep your Event Loop clean, and delegate certain task. This is where async code is needed. That way your code becomes non-blocking. The terminology concerning async and non-blocking is a bit vague though. To clarify:
Async Code: Performs certain tasks in parallel
Non-Blocking: Basically Polling without blocking further code.
In NodeJS however, Async is often used for I/O operations. There it doesn't just mean "perform in parallel", because it mostly means "don't block and get the signal".
So in order to make the Event Loop of NodeJS efficient, we don't want to wait for an operation to finish. Therefore we register an async "listener" instead. This allows NodeJS to efficiently manage its own resources.
BlueBird (or Promises in general):
Libraries like BlueBird which you mentioned, aren't required anymore because NodeJS supports promises out of the box (see note here).
Promises are just another way of writing asynchronous code. So are Async/Await and Generator Functions.
Side note: Functions defined with the async keyword actually yield a promise.

Asynchronous process handler in node

Which process handles async or simultaneous work to happen in node js. Is there any specific api that takes care of all these events to happen in queue?

Is there any specific api that takes care of all these events to happen in queue?
No, not accessible from Javascript. The event queue is completely under the covers. You don't access it directly.
The implementation of asynchronous operations is all handled in native code. When an async operation completes, its native code calls an internal C++ API that inserts the completion event into the node.js event queue. If no Javascript is currently running in node.js at that moment, then inserting the item in the event queue will trigger it to get pulled out of the queue and the callback associated with it will be run. If Javascript is running at the moment, it will stay in the event queue until the current piece of running Javascript finishes at which point the interpreter will check the event queue, see there is an event in there and will pull that event out and run the callback associated with that event.
Which process handles async or simultaneous work to happen in node js.
It is not entirely clear what you mean by this. Each node.js function that is asynchronous has its own implementation. Networking uses OS-level event driven networking (not threads). Async file I/O uses a native thread pool. Timers use OS level timers. Some other asynchronous operation will have its own implementation and do it some other way as it completely depends upon what the async operation is for who it will accomplish its work.
The only three ways (I know of) for you to write your own asynchronous operation are:
Compose your own operation entirely using existing asynchronous operations such as request this data from another server, then write it to this file.
Use native code to write your own node.js add-on that can expose an asynchronous interface and use native code to implement that asynchronous interface in whatever manner is most appropriate for your operation.
Run some other process and communicate back the result from that other process. This can be some other program written in any language or it can be Javascript that you run in another node.js process.
Now, there are a few ways you can influence the event queue timing of some things from Javascript. For example, setTimeout(fn, t), process.nextTick(fn) and setImmediate(fn) all have slightly different ways they insert your callback function into the event queue that determines what (that is already in the event queue) they run before or after. But, these by themselves just schedule a callback sometime in the future - they don't actually implement an asynchronous operation that accomplishes some tasks in a non-blocking way.
You may want to read some of these references:
The Node.js Event Loop, Timers, and process.nextTick()
setImmediate() vs nextTick() vs setTimeout(fn,0) – in depth explanation
Demystifying Asynchronous Programming Part 1: Node.js Event Loop

You might be thinking of child_process.spawn().
From the NodeJS documentation
The child_process.spawn(), child_process.fork(), child_process.exec(),
and child_process.execFile() methods all follow the idiomatic
asynchronous programming pattern typical of other Node.js APIs.
Each of the methods returns a ChildProcess instance. These objects
implement the Node.js EventEmitter API, allowing the parent process to
register listener functions that are called when certain events occur
during the life cycle of the child process.
The child_process.exec() and child_process.execFile() methods
additionally allow for an optional callback function to be specified
that is invoked when the child process terminates.

What role plays the V8 engine in Node.js?

In the past days I've been researching to understand how the Node.js event-based style can handle much more concurrent request than the classic multithreading approach. At the end is all about less memory footprint and context-switchs because Node.js only use a couple of threads (the V8 single thread and a bunch of C++ worker threads plus the main-thread of libuv).
But how can handle a huge number of requests with a few threads, because at the end some thread must be blocked waiting, for example, a database read operation.
I think that the idea is: instead of having both the client thread and the database thread blocked, that only the database thread be blocked and alert the client thread when it ends.
This is how I understand Node.js works.
I've been wondering what gives to Node.js the capability to handle HTTP requests.
Based on what I read until now, I understand that libuv is who does that work:
Handles represent long-lived objects capable of performing certain
operations while active. Some examples: a prepare handle gets its
callback called once every loop iteration when active, and a TCP
server handle get its connection callback called every time there is a
new connection.
So, the thread that is waiting incoming http request is the main-thread of libuv that executes the libuv event loop.
So when we write
const http = require('http');
const hostname = '127.0.0.1';
const port = 1337;
http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('Hello World\n');
}).listen(port, hostname, () => {
console.log(`Server running at http://${hostname}:${port}/`);
});
... I'm putting in the libuv a callback that will be executed in the V8 engine when a request comes in?
The order of the events will then be
A TCP packet arrives
The OS creates an event and sends to the event loop
The event loop handles the event and creates a V8 event
If I execute blocking code inside the anonymous function that handles the request I will be blocking the V8 thread.
In order to avoid this I need to execute non-blocking code that will be executed in another thread. I suppose this "another thread" is the main-thread of libuv where
network I/O is always performed in a single thread, each loop’s thread
This thread will not block because uses OS syscalls that are async.
epoll on Linux, kqueue on OSX and other BSDs, event ports on SunOS
and IOCP on Windows
I also suppose that http.request is using libuv for achive this.
Similary, if I need to do some file I/O without blocking the V8 thread
I will use the FileSystem module of Node. This time the libuv main thread can't handle this in a non-blocking way because the OS doesn't offer this feature.
Unlike network I/O, there are no platform-specific file I/O primitives
libuv could rely on, so the current approach is to run blocking file
I/O operations in a thread pool.
In this case a classic thread pool is needed in order to not block the libuv event-loop.
Now, if I need to query a database all the responsability of not block either the V8 thread and the libuv thread is in hands of the driver developer.
If the driver doesn't use the libuv it will block the V8 engine.
Instead, if it uses the libuv but the underlying database doesn't have async capabilities, then it will block a worker thread.
Finally, if the database gives async capabilities it will only block the database thread. (In this case I could avoid libuv at all and call the driver directly from the V8 thread)
If this conclusions describes correctly, although in a simplistic way, the ways that libuv and the V8 works together in Node.js, I can't see the benefits of use V8 because we could do all the work in libuv directly (unless the objective is to give the developer a language that allows write an event-based code in a simpler way).

From what I know the difference is mainly asynchronous I/O. In traditional process-per-request or thread-per-request servers, I/O, most notably network I/O, is traditionally synchronous I/O. Node.js uses fewer threads than Apache or whatever, and it can handle the traffic mostly because it uses asynchronous network I/O.
Node.js needs V8 to actually interpret the JS code and turn it into machine code. Libuv is needed to do real I/O. I do not know much more than that :)

There is an excellent post about node.js V8 engine: How JavaScript works: inside the V8 engine + 5 tips on how to write optimized code. It explained many deep-detailed aspects of the engine and some great recommendations while using it.
Simply put, what V8 engine (and other javascript engines) does is to execute javascript code. However, the V8 engine obtains a high performance execution compared to the others.
V8 translates JavaScript code into more efficient machine code instead
of using an interpreter. It compiles JavaScript code into machine code
at execution by implementing a JIT (Just-In-Time) compiler ...

The I/O is nonblocking and asynchronous via libuv which underlying uses the OS primitives like epoll or similar depending on platform to make i/o non blocking. The Nodejs event loop gets an event queued back when an event on an fd (tcp socket as an example) happens

Is NodeJS really Single-Threaded?

Node.js solves "One Thread per Connection Problem" by putting the event-based model at its core, using an event loop instead of threads.
All the expensive I/O operations are always executed asynchronously with a callback that gets executed when the initiated operation completes.
The Observation IF any Operation occurs is handled by multiplexing mechanisms like epoll().
My question is now:
Why doesn't NodeJS block while using the blocking Systemcalls
select/epoll/kqueue?
Or isn't NodeJS single threaded at all, so that a second Thread is
necessary to observe all the I/O-Operations with select/epoll/kqueue?

NodeJS is evented (2nd line from the website), not single-threaded. It internally handles threading needed to do select/epoll/kqueue handling without the user explicitly having to manage that, but that doesn't mean there is no thread usage within it.

No.
When I/O operations are initiated they are delegated to libuv, which manages the request using its own (multi-threaded, asynchronous) environment. libuv announces the completion of I/O operations, allowing any callbacks waiting on this event to be re-introduced to the main V8 thread for execution.
V8 -> Delegate I/O (libuv) -> Thread pool -> Multi threaded async

JavaScript is single threaded, so is event-model. But Node stack is not single-threaded.
Node utilizes V8 engine for concurrency.

No Nodejs in the whole is not single-threaded, but Node-Event loop (which nodeJS heavily uses) is single-threaded
Some of the node framework/Std Lib are not single-threaded

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string