CPU utilization of Node.js on Amazon EC2 - node.js

Seeing as how node is single threaded, If I have node server running on an amazon EC2 instance with 4 EC2 Compute units will it run any faster / handle more load than if I have 2 EC2 Compute units?
Does CPU utilization on amazon require a program to be multithreaded to fully use all resources?

To fully utilize compute resources of N cores, you need at least N threads ready to do useful work. This has nothing to do with EC2; it's just the way computers work. I assume from your question that you are choosing between the m1.medium and m1.large instance types, which have 1 and 2 dedicated cores, respectively (the m1.small is half of a shared core, and the m1.xlarge is the full dedicated 4-core box). Thus, you need at least 2 processes doing useful work in order to utilize the larger box (unless you just want access to more memory / io).
Each Node.js process is single threaded by design. This lets it provide a clean programming paradigm free of locking semantics. This is very much by design.
For a Node.js app to utilize multiple cores, it must spawn multiple processes. These processes would then use some form of messaging (pipes, sockets, etc) to communicate -- versus "shared memory" where code can directly mutate memory locations visible to multiple processes, something that would require locking semantics.
In practice, this is dirt simple easy to set up. Back in Node.JS v0.6.X the "cluster" module was integrated into the standard distribution, making it easy to set up multiple node workers that can listen on a single port. Note that this "cluster" module is NOT the same as the learnboost "cluster" module which has a different API and owns the "cluster" name in the NPMjs registry.
http://nodejs.org/docs/latest/api/cluster.html
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i &lt numCPUs; i++) {
cluster.fork();
}
} else {
http.Server(function(req, res) { ... }).listen(8000);
}

The short answer to your question is that adding more cores in order to improve your node performance will not work, if all you do is write "standard" single threaded javascript (you will be bound by a single CPU).
The reason is that node.js uses an event loop for processing, so if all you are doing is starting up a single node.js process without anything else, it will not be multi-threaded and thus not use more than one CPU (core).
However, you can use the node.js cluster API to fork the node process so you can take advantage of multiple CPUs (cores): https://nodejs.org/docs/latest/api/cluster.html. If you write your code that way, then having more compute units will help you.
There is one caveat, in that EC2 compute units are detailed per instance. For some instances you can get more "compute units" per virtual core. So if you pick an instance that has 2 compute units per virtual core versus one that has one per core, you will be able to execute node on a CPU that has more compute units. However, it looks like after 2 compute units the computing power is split per core which means you won't get any benefit from the multiple cores.

Amazon's concept of total "EC2 Compute Units" for an instance type does not map directly to a CPU or core. It is the number of cores multiplied by the speed of each core in EC2 compute units (their own relative measurement).
Amazon does list how many virtual cores each instance type has:
http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?instance-types.html
Your best option is to use all of the cores as others point out. However, if you end up with a single threaded solution, then you will want to focus on the speed of the individual cores, not the total EC2 compute units of all the cores added together.

In Node.js, your code is single-threaded, but calls that e.g. access the file system or a database server do not use the main node.js thread. The main thread keeps executing while other threads are waiting for 4GB to be read from disk to RAM or for the DB server to return a response. Once the action finishes, the supplied callback is put in a queue to execute in the main thread. More or less, anyway.
The advantage being that in a server situation, you have one very fast thread that can handle thousands of concurrent requests without putting any one entirely on hold or spawning an OS thread for each client request-response cycle.
More to the point, you should benchmark your specific use case on EC2 -- multiple processors may be useful when running a single instance of node if the app does a lot of IO.

If I have node server running on an amazon EC2 instance with 4 EC2 Compute units will it run any faster / handle more load than if I have 2 EC2 Compute units?
No, if you are using node.js in a server capacity you will only have access to a single core.
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/');
Spawns a single listener, that doesn't mean only a single connection though. Node.js breaks conventional thought that way. The Event Loop will not block connections unless you code improperly. This post helps to explain the event loop and how important it is to understand it. Took me a while to really 'get' the implications.
Does CPU utilization on amazon require a program to be multithreaded to fully use all resources?
Yes, properly configured apache/nginx will take advantage of multi-cpu configurations. node.js servers are being developed that will also take advantage of these kind of configurations.

Just a quick addition to those above making good points as to the function of modern (old thread here) Node.JS, not only is Node implemented on top of V8, and LibUV, making use of a internal thread pool, but ACTUALLY, your JS code can be multi threaded. No, I don't just mean the thread_workers API. It is possible, even probably, that some of your dependencies, are using C++/V8/NAPI bindings for JS, and directly using the underlying thread pool.
For example:
You'll see that the standard bcrypt library on npm implements its blowfish utilities with multithreading in C++. Many people don't read the docs right, and are confused as to why running some cryptographic work from libraries in other worker threads doen't speed up their service.

Related

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?
Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Can you get more performance using multiple websockets on one Node.js server?

Ok guys, is it useful to generate more than one websocket instance on a Node.js server? I mean possibly you also create subworkers..
I know, it depends on your hardware and the network cards maximum. But can you reach this maximium with only one task or can you get more performance with several parallel processes?
It depends on the CPU count in your server. Parallel processes (the cluster module) aren't a silver bullet for performance - on a 1 CPU system they might even decrease it just due to overhead. But if you have more than one CPU, a NodeJS process CANNOT use that extra horsepower. The cluster module solves this. Used properly you can use all the available CPU power on the system.
Note that this has nothing to do with WebSockets themselves. This is a core NodeJS architectural constraint. But WebSockets are a great use-case where clustering is advantageous...

Clustering Node.js on Bluemix

Will a Node.js app on Bluemix automatically be scaled to run on multiple processors, or do I need to implement that myself using Node's clustering API? And if I do use clustering, will there be more than one CPU available?
Short answer: You need to use node cluster module to take full advantage of all cores in each instance. Or, you can also just increase the number of instances.
Long answer: Each instance of your application that you push to bluemix runs in a warden container. Resource control is managed by linux cgroups. The number of cores per instance is not something you can control. Running a quick test on Bluemix, os.cpus() showed 4 cores. If you want to take advantage of all 4 cores, in your one Bluemix instance (warden container) of your node.js application, then you should use nodes cluster module.
Keep in mind, you can also just increase the number of instances (horizontal scaling), which could achieve near linear results depending on your bottleneck on use of external services. So if you have 3 instances, each of those instances has 4 cores, and the built-in load balancer distributes traffic among the 3 instances.
The hybrid model that Ram suggested makes sense. You might want to do some benchmark to determine how many processes you want to run in one application container. You can use "cf app " to monitor the CPU utilization of each app instances under load, and if it's not fully consuming the CPU then it may make sense to spawn more processes.
However, please note -
* CPU might not be the bottleneck, in which case spawn more processes in the app container or scaling more app container instances won't help;
* The more processes you spawn in one container, the more memory it consumes, so make sure you do not spawn too many and exceed the allocated memory number (otherwise the app container will be killed).

NodeJS in MultiCore System

"Node.js is limited to a single thread". how the nodeJS will react when we are deploying in Multi-Core systems? will it boost the performance?
The JavaScript running in the Node.js V8 engine is single-threaded, but the underlying libuv multi-platform support library is multi-threaded and those threads will be distributed across the CPU cores by the operating system according to it's scheduling algorithm, so with your JavaScript application running asynchronously (and single-threaded) at the top level, you still benefit from multi-core under the covers.
As others have mentioned, the Node.js Cluster module is an excellent way to exploit multi-core for concurrency at the application (JavaScript V8) level, and since Express is cluster aware, you can have multiple worker processes executing concurrent server logic, without needing a unique listening port for each process. Impressive.
As others have mentioned, you will need Redis or equivalent to share data among the cluster worker processes. You will also want a logging facility that is cluster aware, so the cluster master and all worker processes can log to a single shared log file. The Node log4node module is a good choice here, and it works with logrotate.
Typical web examples show using the runtime detected number of cores as the number of cluster worker processes to fork, but I prefer to make that a configuration option in a config.yaml file so I can tune the number of worker processes running the main JavaScript application as needed.
Nodejs runs in one thread, but you can start multiple nodejs processes.
If you are, for example, building web server you can route every request to one of nodejs processes.
Edit: As hereandnow78 and vkurchatkin suggested, maybe the best way to use power of multi core system would be to use nodejs cluster module
cluster module is the solution.
But u need to know that, node.js cluster is, it invokes child process. It means each process cannot share the data.
To share data, u need to use Redis or other IMDG to share the data across the cluster nodes.

Node.js child_process.fork() to run on different CPU cores

I have an application that runs long-executing processes. To make it faster, I do simple sharding of data and want to run them in parallel, simply by .fork() 2 instances of same application.
I'm having 2 Cores machine there and want to make sure that 2 Cores are utilized and first instance is running on first core, second on second core.
I know about cluster module, but it seems not relevant in this case, since I don't need HTTP services running and load-balancing between them. Just workers (mean, they dont need to communicate with each other, send messages or whatever - they just do HTTP requests and store data to database).
Is there possible at all to control which CPU core would node.js process take? How to monitor that on Mac/Linux?
Cluster module is exactly what you need : http://nodejs.org/api/cluster.html
You want this, which is a specific part of the cluster API:
cluster.setupMaster([settings])
https://nodejs.org/api/cluster.html#cluster_cluster_setupmaster_settings
I also wrote a module to do this, and there are several out there:
https://www.npmjs.com/package/poolio
my module works well, but the API is not that straightforward, so I would recommend using the core module

Resources