I understand how Node.js works with single thread. Mostly it is using asynchronous methods/modules in order to keep the main runtime thread free as much as possible.
However, some of the asynchronous modules internally are using threads to do their job. Example for this is reading file or other high intensive CPU task. This is done in background and it is abstracted for the Node developer.
My question is , how internally Socket.IO works, does it use threads like the above examples ? Does it use separate thread per connection ? If so , does it mean that we will have 1000 threads, if we have 1000 connected clients ?
Node does not use the thread pool (or separate threads) for sockets, instead it uses whatever platform-specific mechanism for polling sockets for data (e.g. epoll on Linux, kqueue on OS X (IIRC), I/O completion ports on Windows, etc.) on the main thread.
Socket.io works on the event loop like most node applications. No tricky thread business AFAIK. You can check out the source yourself here: https://github.com/Automattic/socket.io
Related
If I use async functions, or functions with callbacks like the native fs module, http etc, will they run by default across all cpu cores?
Or the entire thing will just use 1 core?
Some asynchronous operations in node.js (such as file I/O in the fs module) will use additional threads within the node.js process via a thread pool in libuv. It would depend upon the size of your thread pool and what types of operations and upon your host OS for how many additional CPUs will be engaged. It does not necessarily help overall throughput to engage many CPUs on file I/O that is all going through the same disk since reading/writing is often bottlenecked by the position of the read/write head on the disk anyway.
Some asynchronous operations such as networking (like the http module) are non-blocking and asynchronous by nature and do not do their networking with threads or trigger any meaningful use of additional CPUs.
None of this will run your own Javascript in multiple threads since Javascript itself all executes in one thread.
To fully engage multiple CPUs, you can:
Put some of your own Javascript into the new nodejs Worker Threads and communicate back to the main node.js thread via messaging.
Fire up your own node.js child processes to do work in those child processes and communicate back results using one of the many interprocess communications options.
Use node.js clustering so that incoming requests can be split among available queues. This requires making sure any server state is shareable among all the clustered processes (typically stored in some database that all processes can access). This will allow separate requests to use separate CPUs - it won't help a single request use more CPUs. You would need to use #1 and/or #2 for that.
Node.js maintains an event loop but then it also has by default four threads for the complicated requests. How this is single threaded when there are more threads available in the thread pool?
Also, the threads assigned by the event loop for the complicated task are the dedicated threads then how it's different from other multithreading concepts?
In the context to which you're referring, "single threaded" means that your Javascript runs as a single thread. No two pieces of Javascript are ever running at the same time either literally or time sliced (note: as of 2020 node.js does now have WorkerThreads, but those are something different from this original discussion). This massively simplifies Javascript development because there is no need to do thread synchronization for Javascript variables which are shared between different pieces of Javascript because only one piece of Javascript can ever be running at the same time.
All that said, node.js does use threads internal to its implementation. The default four threads you mention are used in a thread pool for disk I/O. Because disk I/O is normally a synchronous operation at the OS level that blocks the calling thread and node.js has a design where all I/O operations should be offered as asynchronous operations, the node.js designers decided to fulfill the asynchronous interface by using a pool of threads in order to implement (in native code), the fs module disk I/O interface (yes there are non-blocking disk I/O operations in some operating systems, but the node.js designers decided not to use them). This all happens under the covers in native code and does not affect the fact that your Javascript runs only in a single thread.
Here's a summary of how a disk I/O call works in node.js. Let's assume there's already an open file handle.
Javascript code calls fs.write() on an existing file handle.
fs module packages the arguments to the function and then calls native code.
Native code gets a thread from the thread pool and initiates the OS call to write data to that file
Native code returns from the function
fs module returns from the fs.write() call
Javascript continues to execute (whatever statements came after the fs.write() call
Some time later the native code fs.write() call on a thread finishes. It obtains a mutex protecting the event loop and inserts an event in the event queue.
When the Javascript engine is done executing whatever stream of Javascript it was running, it checks the event queue to see if there are any other events to run.
When it finds an event in the event queue, it removes it from the event queue and executes the callback associated with that event, starting a new stream of running Javascript.
Because a new event is never acted upon until the current stream of Javascript is done executing, this is where Javascript gets is event-driven, single threaded nature even though native code threads may be used to implement some library functions. Those threads are used to make a blocking operation into a non-blocking operation, but do not affect the single threaded-ness of Javascript execution itself.
The key here is that node.js is event driven. Every new operation that triggers some Javascript to run is serialized through the event queue and the next event is not serviced until the current stream of Javascript has finished executing.
In the node.js architecture the only way to get two pieces of Javascript to run independently and at the same time is to use a separate node.js process for each. Then, they will run as two completely separate operations and the OS will manage them separately. If your computer has at least two cores, then they can literally run at the same time, each on their own core. If your computer has only one core, they will essentially be in their own process thread and the OS will time slice them (sharing the one CPU between them).
I will tell it in a clear and simple way and clear the confusion :
Node Event Loop is SINGLE-THREADED But THE Other processes are not.
The confusion came from c++, which Node uses underline ( NodeJs is about 30% js + 70% c++ ).So, By default, The JS part of NodeJs is single-threaded BUT it uses a thread pool of c++. So, We have a single thread JS which is the event loop of NodeJs + 4 threads of c++ if needed for asynchronous I/O operations.
It is also important to know that The event loop is like a traffic organizer, Every request go through the loop ( which is single-thread ) then the loop organizes them to the pool threads if I/O processes are needed, so if you have a high computational app that does like heavy lifting image-processing, video-editing, audio-processing or 3d-graphics ..etc, which is not needed for most apps,So NodeJs will be a bottleneck for that high load computational app and the traffic organizer will be unhappy.
While NodeJS shine for I/O bound apps ( most apps ) Like apps dealing with databases and filesystem.
Again: By default, NodeJs uses a 4 thread pool (PLUS one thread for the event loop itself ). so by default (total of 5) because of the underlying c++ system.
As a general idea, The CPU could contain one or more cores, it depends on your server(money).
Each core could have threads. Watch your activity Monitor discover how many threads you are using.
Each process has multiple threads.
The multi-threading of Node is due to that node depends on V8 and libuv ( C Library ).
So Long story short:-
Node is single-threaded for the event loop itself but there are many operations that are done outside the event Loop, Like crypto and file system (fs ). if you have two calls for crypto then each of them will reach each THREAD ( imagine 3 calls to crypto and 1 for fs, These calls will be distributed one for each thread from the 4 thread pool )
Finally: It is very easy to increase the default number of threads of the C-Library libuv thread pool which is 4 by default by changing the value of process.env.uv_threadpool_size. and also you could use clustering ( PM2 recommended ) to like clone the event-loop, like have multiple event-loops in case the single-threaded one is not enough for your high load app.
So nobody illustrates that thread pool is a c++ thing that’s nodeJs control mostly not the developer, which still asking How it’s single-thread while having a thread-pool !!
Hope that simplifies that advanced topic.
By default, the execution of your JavaScript code runs on a single thread.
However, node.js tries to make most long-running calls async. For some that just involves doing async OS calls, but for some others node.js will execute the call itself on a secondary thread, while continuing to run other JS code. Once the async call terminated, the Js callback or Promise handler will run.
For async processing, Node.js was created explicitly as an experiment. It is believed that more performance and scalability can be achieved by doing async processing on a single thread under typical web loads than the typical thread based implementation.
As far as I know, all IO requests and other asynchronous tasks are done by libuv in nodejs.
I want to know if libuv is using threading. If it is, is it using all available core or not?
First of all, what is libuv. As mentioned in the documentation, it's a multi-platform support library with a focus on asynchronous I/O.
libuv doesn't use thread for asynchronous tasks, but for those that aren't asynchronous by nature.
As an example, it doesn't use threads to deal with sockets, it uses threads to make synchronous fs calls asynchronous.
When threads are involved, libuv uses a thread pool the size of which you can change at compile-time using UV_THREADPOOL_SIZE.
node.js is provided with a precompiled version of libuv and thus a fixed UV_THREADPOOL_SIZE parameter.
It goes without saying that it has nothing to do with the number of cores of your chip.
I'm tempted to affirm that you can safely ignore the topic, for libuv and thus node.js don't use threads intensively for their purposes (unless you are using them in a really perverse way or if you are running an high number of libuv work requests).
Feel free to run an instance of node.js per core if you need as most of the users do.
The design overview section of libuv is also clear enough about this point:
The I/O (or event) loop is the central part of libuv. It establishes the content for all I/O operations, and it’s meant to be tied to a single thread. One can run multiple event loops as long as each runs in a different thread.
The libuv module has a responsibility that is relevant for some particular functions in the standard library. for SOME standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely.They make something called a thread pool that thread pool is a series of four threads that can be used for running computationally intensive tasks such as hashing functions.
By default libuv creates four threads in this thread pool. Thread Pool in the picture is organized by the Libuv So that means that in addition to that thread used for the event loop there are four other threads that can be used to offload expensive calculations that need to occur inside of our application. Many of the functions include in the node standard library will automatically make use of this thread pool.
Network (Network IO) is responsible for api requests, File system (File IO) is fs module. so node.js single thread delegates those heavy work to the libuv
If you have too many function calls, It will use all of the cores. CPU cores do not actually speed up the processing function calls, they just allow for some amount of concurrency inside of the work that you are doing.
From here:
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems the user will sometimes want to launch
a cluster of Node.js processes to handle the load.
The cluster module allows easy creation of child processes that all
share server ports.
Multiple processes could be better than multithreading in some cases. Some people even think theads are evil. Maybe node.js is designed in such a way to take advantage of processes better than threads.
I know multithreading is ideal for this situation, but would there be any instance of where this situation could be applicable?
Yes you could be serving multiple clients at once from a single thread. This is typically implemented using the select() or poll() socket functions.
A single threaded select() based polling server can use less system resources than a multi-threaded server.
I'm following the boost-asio tutorial and don't know how to make a multi-threaded server using boost. I've compiled and tested the daytime client and daytime synchronous server and improved the communication (server asks the client for a command, processes it and then returns the result to the client). But this server can handle only one client at one time.
I would like to use boost to make a multi-threaded server. There is also daytime asynchronous server which executes
boost::asio::io_service io_service;
tcp_server server(io_service);
io_service.run();
in the main program function. The question is - is boost creating a thread for each client somewhere inside? Is this a multi-threaded solution? If not - how to make a multi-threaded server with boost? Thanks for any advice.
have a look at this tutorial. in short terms:
io_service.run() in multiple threads gives a thread pool
multiple io_services give completely separated threads
You don't need to explicitly work with threads when you want to support multiple clients. But for that you should use asynchronous calls (as opposed to synchronous, which are used in the tutorials you listed). Have a look at the asynchronous echo tcp server example, it serves multiple clients without using threads.
is boost creating a thread for each client somewhere inside?
When working with asynchronous calls, boost asio is doing these things behind the scenes. It could use threads, but it usually doesn't because there are other, preferred mechanisms for working with multiple sockets at once. For example on linux you have epoll, select and poll (in order of preference). I'm not sure what the situation is on windows, there might be other mechanisms or the preference order might be different. But in any case, boost asio takes care of this, chooses the best mechanism there is for your platform and hides it behind those asynchronous calls.