I need some clarification on what exactly are the nodejs worker threads doing.
I found contradicting info on this one. Some people say worker threads handle all IO, others say they handle only blocking posix requests (for which there is no async version).
For example, assume that I am not blocking the main loop myself with some unreasonable processing. I am just invoking functions from available modules and providing the callbacks. I can see that if this requires some blocking or computationally-expensive operation then it is handled to a worker thread. But for some async IO, is it initiated from the main libuv loop? Or is it passed to a worker thread, to be initiated from there?
Also, would a nodejs worker thread ever initiate a blocking (synchroneous) IO operation when the OS supports an async mode to do the same thing? Is it documented anywhere what kind of operations may end up blocking a worker thread for a longer time?
I'm asking this because there is a fixed-size worker pool and I want to avoid making mistakes with it. Thanks.
Network I/O on all platforms is done on the main thread. File I/O is a different story: on Windows it is done truly asynchronously and non-blocking, but on all other platforms synchronous file I/O operations are performed in a thread pool to be async and non-blocking.
Related
I see clear the cluster method as it deploys different whole processes. And I guess the professional programmers made "worker_threads" library for some good reason... but I still need to clear this point for my understanding:
In a normal single threaded process the event loop thread has the aid of the default worker pool to unload its heavy I/O tasks, so the main thread is not blocked.
At the same time, user defined "worker threads" will be used for the same reason with their own event loops and NodeJS instances.
What's the point of spawning those event loop and Nodejs instances when they are not the bottle neck as the libuv is intended to manage to spawn the workers.
Is this meaning that the default worker pool may not be enough? I mean just a quantity matter or concept?
There are two types of operation(call) in Nodejs blocking and non-blocking
non-blocking
Nodejs use Libuv for IO non-blocking operation. Network, file, and DNS IO operations run asynchronously by Libuv. Nodejs use the following scheme:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, Libuv's thread pool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the thread pool are:
all fs APIs, other than the file watcher APIs and those that are:
explicitly synchronous asynchronous crypto APIs such as crypto.pbkdf2(),
crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
dns.lookup() all zlib *APIs, other than those that are explicitly synchronous.
So we don't have direct access to the Libuv thread pool. We may define our own uses of the thread pool using C++ add-ons.
Blocking calls
Nodejs execute blocking code in the main thread. fs.readfileSync(), compression-algorithm, encrypting data, image-resize, calculating primes for the large range are some examples of blocking operation. Nodejs golden rule is never block event-loop(main thread). We can execute these operations asynchronously by creating child process using cluster module or child-process module. But creating a child process is a heavy task in terms of OS resources and that's why worker-thread was born.
Using worker-thread you can execute blocking javascript code in worker-thread hence unblocking the main thread and you can communicate to parent thread(main thread) via message passing. Worker threads are still lightweight as compared to a child process.
Read more here:
https://nodesource.com/blog/worker-threads-nodejs
https://blog.insiderattack.net/deep-dive-into-worker-threads-in-node-js-e75e10546b11
I can see that NodeJS is bringing in multi-threading support via its worker threads module. My current assumption (I have not yet explored personally) is that I can offload a long running /cpu intensive operation to these worker threads.
I want to understand the behaviour if this long running piece of code has some intermittent event callbacks or chain of promises. Do these callbacks still execute on the worker threads, or do they get passed on back to the main thread?
If these promises come back to main thread, the advantage of executing the worker thread may be lost.
Can someone clarify?
Update => Some context of the question
I have a http req that initiates some background processing and returns a 202 status. After receiving such request, I am starting a background processing via
setTimeout (function() { // performs long running file read operations.. })
and immediately return a 202 to the caller.
However, I have observed that, during this time while this background operation is going, other http requests are either not being processed, or very very sluggish at the best.
My hypothesis is that this continuous I/O processing of a million+ lines is filling up the event loop with callbacks / promises that the main thread is unable to process other pending I/O tasks such as accepting new requests.
I have explored the nodejs cluster option and this works well, as the long task is delegated to one of the child processes, and other instances of cluster are available to take up additional requests.
But I was thinking that worker threads might solve the same problem, without the overhead of cloning the process.
I assume each worker thread would have its own event loop.
So if you emit an event in a worker thread, only that thread would receive it and trigger the callback. The same for promises, if you create a promise within a worker, it will only be resolved by that worker.
This is supported by their statement in the documentation regarding Class: Worker: Most Node.js APIs are available inside of it (with some exceptions that are not related to event processing).
However they mention this earlier in the docs:
Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.
I think some small scale async code in worker threads would be fine, but having more callbacks/promises would hurt performance. Some benchmarks could shed some light on this.
The reactor pattern which is utilized by libuv for handling IO is synchronous by design but libuv supports async io. How is this possible? Does libuv extend the reactor's design somehow to support async io? Does using multiple threads/event loops aid in achieving this?
The I/O model of Node and libuv is very similar to what nginx does internally.
The libuv uses a single-threaded event loop and non-blocking asynchronous I/O. All functions are synchronous in a way that they run to completion but some clever hackery with promises and generators can be used to appear that they don't (when in fact both the invocation of the generator function is non-blocking and returns the generator object immediately and the generator methods like .next() run to completion), plus the new async/await syntax makes it very convenient.
For operations that cannot be accomplished in a non-blocking way Node uses a thread pool to run the blocking operations in separate threads but this is done transparently and it is never exposed to the application code written in JavaScript (you need to step down to C++ to work with that directly).
See: http://docs.libuv.org/en/v1.x/design.html
Unlike network I/O, there are no platform-specific file I/O primitives libuv could rely on, so the current approach is to run blocking file I/O operations in a thread pool. [...]
libuv currently uses a global thread pool on which all loops can queue work on. 3 types of operations are currently run on this pool:
File system operations
DNS functions (getaddrinfo and getnameinfo)
User specified code via uv_queue_work()
See also those answers for more details:
what is mean by event loop in node.js ? javascript event loop or libuv event loop?
*NodeJS event loop internal working
Prevent NodeJS from exiting event-loop
How node.js server serve next request, if current request have huge computation?
Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?
Speed up setInterval
Async.js - Is parallel really parallel?
Node.js: Asynchronous Callback Execution. Is this Zalgo?
See the links and illustration in those answers. There are a lot of resources to read bout that topic.
I have recently studied Node.js and tried understanding the Node.js architecture in depth. But still after going through multiple articles and links like stack overflow, node.js blogs, I am confused as how both single thread with event loop and the multi-threaded blocking I/O requests that are part of client requests or events can happen at the same time.
According to my study, the single thread with event loop keeps on polling on event queue to know if the client request has come. As soon as, the event or request is found, it checks if it is blocking i/o or non-blocking operation. If it is found to be non-blocking then the response is sent back to the client immediately. But if the request has blocking i/o operation then the request is assigned a thread from threadpool,and the single thread continues with the other requests. Essentially, this means that every blocking I/O operation within the clients requests are assigned a thread and here system is working as multithreaded.
My confusion is that how can that single thread and the threaded blocking I/O operations be performed at the same time. The execution of single thread is concurrent but the blocking i/o(s) are also happening at the same time. How can this be achieved on a single machine which is having both single thread and blocking I/O thread executing in parallel on a single core processor when CPU can execute one thread at a time. Also, are these threads both single event loop thread and threadpool threads user-level threads?
Although, I know that the Blocking I/O threads are handled by the libraries of the external modules but still those modules will be using up threads and executing them in the same space as that of the single level thread. So, how are the two getting executed?
I am new to this framework.
Node.js process consists of the main thread which runs an event loop and worker threads. These worker threads are not explicitly available to the coder.
Now when you do a syscall from your code (i.e. calling a Node.js function which internally does a syscall) then depending on whether these are blocking (e.g. file I/O) or non-blocking (e.g. socket I/O) the job might be send to a worker thread. However a callback is always registered with the event loop. So if the worker finishes processing a job it notifies the event loop and so from coder's point of view the operation was asynchronous.
Now it doesn't really matter whether CPU is multi or single threaded. That's because reading from a disk takes some time and during that time CPU is not busy on that thread. The OS knows that it should switch context during that time. So even if you have single threaded CPU then the event loop takes majority of its time.
And also by threads I understand real kernel-space threads, not user-space threads. Doing that over user-space threads is pointless since you would block the whole kernel-space thread during blocking I/O.
Node.js solves "One Thread per Connection Problem" by putting the event-based model at its core, using an event loop instead of threads.
All the expensive I/O operations are always executed asynchronously with a callback that gets executed when the initiated operation completes.
The Observation IF any Operation occurs is handled by multiplexing mechanisms like epoll().
My question is now:
Why doesn't NodeJS block while using the blocking Systemcalls
select/epoll/kqueue?
Or isn't NodeJS single threaded at all, so that a second Thread is
necessary to observe all the I/O-Operations with select/epoll/kqueue?
NodeJS is evented (2nd line from the website), not single-threaded. It internally handles threading needed to do select/epoll/kqueue handling without the user explicitly having to manage that, but that doesn't mean there is no thread usage within it.
No.
When I/O operations are initiated they are delegated to libuv, which manages the request using its own (multi-threaded, asynchronous) environment. libuv announces the completion of I/O operations, allowing any callbacks waiting on this event to be re-introduced to the main V8 thread for execution.
V8 -> Delegate I/O (libuv) -> Thread pool -> Multi threaded async
JavaScript is single threaded, so is event-model. But Node stack is not single-threaded.
Node utilizes V8 engine for concurrency.
No Nodejs in the whole is not single-threaded, but Node-Event loop (which nodeJS heavily uses) is single-threaded
Some of the node framework/Std Lib are not single-threaded