Clustering Node.js on Bluemix - node.js

Will a Node.js app on Bluemix automatically be scaled to run on multiple processors, or do I need to implement that myself using Node's clustering API? And if I do use clustering, will there be more than one CPU available?

Short answer: You need to use node cluster module to take full advantage of all cores in each instance. Or, you can also just increase the number of instances.
Long answer: Each instance of your application that you push to bluemix runs in a warden container. Resource control is managed by linux cgroups. The number of cores per instance is not something you can control. Running a quick test on Bluemix, os.cpus() showed 4 cores. If you want to take advantage of all 4 cores, in your one Bluemix instance (warden container) of your node.js application, then you should use nodes cluster module.
Keep in mind, you can also just increase the number of instances (horizontal scaling), which could achieve near linear results depending on your bottleneck on use of external services. So if you have 3 instances, each of those instances has 4 cores, and the built-in load balancer distributes traffic among the 3 instances.

The hybrid model that Ram suggested makes sense. You might want to do some benchmark to determine how many processes you want to run in one application container. You can use "cf app " to monitor the CPU utilization of each app instances under load, and if it's not fully consuming the CPU then it may make sense to spawn more processes.
However, please note -
* CPU might not be the bottleneck, in which case spawn more processes in the app container or scaling more app container instances won't help;
* The more processes you spawn in one container, the more memory it consumes, so make sure you do not spawn too many and exceed the allocated memory number (otherwise the app container will be killed).

Related

One NodeJs instance or multiple, in a single core VPS

I'm on confusion about this.
Let's assume I have a 1 core VPS, and I have a Nodejs server running.
Now, I launch another Nodejs instance and a load balancer to distribute requests (on the same VPS).
Will the performance increase because I will have 2 Nodejs servers sharing the work?
Or It will decrease because 1 node is already enough to handle all the requests so adding another one now plus the load balancer will just consume more of the VPS?
If you create more instances than the number of CPUs, while there is an active process running on one instance, the other instances will compete for CPU to satisfy any incoming request and that will lead so using more CPU than saving time. Although negligible, having the same number of instances as cores will have better performance.

pm2 safe way to use max_memory_restart

I'm building a Node.js + Express web application using pm2 cluster mode as a load balancer. This turned out to be a big performance improvement, as my application now spawns an instance of itself for each one of my CPU cores.
To make the most advantage of it, i'm using a custom start script-- in which I added pm2's max_memory_restart option, so if one of the instances exceed 400mb memory usage it restarts itself. Seeing that behavior in action, I couldn't avoid myself to question if it is safe to use this option. Although it's nice to have an auto-restart kick in when memory grows over certain point, I thought of two possible downsides:
If one of my endpoints has memory intensive usage, said instance could restart itself in the middle of processing giving the user an error
If my server has, let's say, 2GB of RAM and 8 CPU cores, then the max_memory_restart option should be max 256mb if I'm running pm2 in cluster mode, as it applies for each instance. Isn't there any risk giving a fairly low max_memory_restart value here? Theoretically the instances would be restarting frequently in this case
Given these scenarios, Is it safe/adequate to use pm2's max_memory_restart option?

Docker containers and Node.js clusters

I have an api server running Node.js that was using it's cluster module and testing looked to be pretty good. Now our IT department wants to move to using Docker containers which I am happy about but I've never actually used it other than just playing around. But I had a thought, the Node.js app runs within a single Docker process so the cluster module wouldn't really be the best as the single Docker process can be a slow point of the setup until the request is split up within that process by the cluster module.
So really a cluster of Docker containers running being able to start and stop them on the fly is more important than using Node.js' cluster module correct?
If I have a cluster of containers, would using Node.js' cluster module get me anything? The api endpoints take less than .5sec to return (usually quite a bit less).
I'm using MySQL (believe it's a single server, nothing more currently) so there shouldn't be any reason to use a data integrity solution then.
What I've seen as the best solution when using Docker is to keep as fewer processes per container as possible since containers are lightweight; you don't want processes trying to use more than one CPU. So, running a cluster in the container won't add any value and might worsen latency.
Here https://medium.com/#CodeAndBiscuits/understanding-nodejs-clustering-in-docker-land-64ce2306afef#.9x6j3b8vw Chad Robinson explains the idea in general terms.
Kubernetes, Rancher, Mesos and other container management layers handle the load-balancing. They provide "scheduling" (moving those Docker container slices around different CPUs and machines to get a good usage across the cluster) and "networking" (load balancing inbound requests to those containers) layers internally.
Update
I think it's worth adding the link Why it is recommended to run only one process in a container? where people share their ideas and experiences, but chiefly from Jon there are some interesting points:
Provided that you give a single responsibility (single process, function or concern) to a container: Good idea Docker names this 'concern' ;)
Scaling containers horizontally is easier.
It can be re-used in different projects.
Identifying issues and troubleshooting is a breeze compared to do it in an entire application environment. Also, logging and reporting can be more accurate and detailed.
Upgrades/Downgrades can be gradually and fully controlled.
Security can be applied to specific resources and at different levels.
You'll have to measure to be sure, but my hunch would be running with node's cluster module would be worthwhile. It would get you more CPU utilization with the least amount of extra overhead. No extra containers to manage (start, stop, monitor). Plus the cluster workers have an efficient communication mechanism. The most reasonable evolution (don't skip steps) would seem to me:
1 container, 1 node process
1 container, several clustered node workers
several containers, each with several node workers
I have a system with 4 logical cores with me and I ran following line on my machine as well as on docker installed on same machine.
const numCPUs = require('os').cpus().length;
console.log(numCPUs)
This lines prints 4 on my machine and 2 inside docker container. Which means if we use clustering in docker container only 2 instance would be running. So docker container doesn't see cores same as actual machine does. Also running 5 docker container with clustering mode enabled gives 10 instance of machine which ultimately be manages by kernel of OS with 4 logical cores.
So I think best approach is to use multiple docker container instance in swarm mode with node.js clustering disabled. This should give the best performance.

What is the optimal way to run a Node API in Docker on Amazon ECS?

With the advent of docker and scheduling & orchestration services like Amazon's ECS, I'm trying to determine the optimal way to deploy my Node API. With Docker and ECS aside, I've wanted to take advantage of the Node cluster library to gracefully handle crashing the node app in the event of an asynchronous error as suggested in the documentation, by creating a master process and multiple worker processors.
One of the benefits of the cluster approach, besides gracefully handling errors, is creating a worker processor for each available CPU. But does this make sense in the docker world? Would it make sense to have multiple node processes running in a single docker container that was going to be scaled into a cluster of EC2 instances on ECS?
Without the Node cluster approach, I'd lose the ability to gracefully handle errors and so I think that at a minimum, I should run a master and one worker processes per docker container. I'm still confused as to how many CPUs to define in the Task Definition for ECS. The ECS documentation says something about each container instance having 1024 units per CPU; but that isn't the same thing as EC2 compute units, is it? And with that said, I'd need to pick EC2 instance types with the appropriate amount of vCPUs to achieve this right?
I understand that achieving the most optimal configuration may require some level of benchmarking my specific Node API application, but it would be awesome to have a better idea of where to start. Maybe there is some studying/research I need to do? Any pointers to guide me on the path or recommendations would be most appreciated!
Edit: To recap my specific questions:
Does it make sense to run a master/worker cluster as described here inside a docker container to achieve graceful crashing?
Would it make sense to use nearly identical code as described in the Cluster docs, to 'scale' to available CPUs via require('os').cpus().length?
What does Amazon mean in the documentation for ECS Task Definitions, where it says for the cpus setting, that a container instance has 1024 units per CPU? And what would be a good starting point for the this setting?
What would be a good starting point for the instance type to use for an ECS cluster aimed at serving a Node API based on the above? And how do the available vCPUs affect the previous questions?
All these technologies are new and best practices are still being established, so consider these to be tips from my experience only.
One-process-per-container is more of a suggestion than a hard and fast rule. It's fine to run multiple processes in a container when you have a use for it, especially in this case where a master process forks workers. Just use a single container and allow it to fork one process per core, as you've suggested in the question.
On EC2, instance types have a number of vCPUs, which will appear as a core to the OS. For the ECS cluster use an EC2 instance type such as the c3.xlarge with four vCPUs. In ECS this translates to 4096 CPU units. If you want the app to make use of all 4 vCPUs, create a task definition that requires 4096 cpu units.
But if you're doing all this only to stop the app from crashing you could also just use a restart policy to restart the container if it crashes. It appears that restart policies are not yet supported by ECS though.
That seems like a really good pattern. It's similar to what is done with Erlang/OTP, and I don't think anyone would argue that it's one of the most robust systems on the planet. Now the question is how to implement.
I would leverage patterns from Heroku or other similar PaaS systems that have a little bit more maturity. I'm not saying that amazon is the wrong place to do this, but simply that a lot of work has been done with this in other areas that you can translate. For instance, this article has a recipe in it:
https://devcenter.heroku.com/articles/node-cluster
As far as the relationships between vCPU and Compute Units, it looks like it's just a straight ratio of 1/1024. It is a move toward microcharges based on CPU utilization. They are taking these even farther with the lambda work. They are charging you based on fractions of a second that you utilize.
In the docker world you would run 1 nodejs per docker container but you would run many such containers on each of your ec2 instances. If you use something like fig you can use fig scale <n> to run many redundant containers an an instance. This way you don't have to have to define your nodejs count ahead of time and each of your nodejs processes is isolated from the others.

CPU utilization of Node.js on Amazon EC2

Seeing as how node is single threaded, If I have node server running on an amazon EC2 instance with 4 EC2 Compute units will it run any faster / handle more load than if I have 2 EC2 Compute units?
Does CPU utilization on amazon require a program to be multithreaded to fully use all resources?
To fully utilize compute resources of N cores, you need at least N threads ready to do useful work. This has nothing to do with EC2; it's just the way computers work. I assume from your question that you are choosing between the m1.medium and m1.large instance types, which have 1 and 2 dedicated cores, respectively (the m1.small is half of a shared core, and the m1.xlarge is the full dedicated 4-core box). Thus, you need at least 2 processes doing useful work in order to utilize the larger box (unless you just want access to more memory / io).
Each Node.js process is single threaded by design. This lets it provide a clean programming paradigm free of locking semantics. This is very much by design.
For a Node.js app to utilize multiple cores, it must spawn multiple processes. These processes would then use some form of messaging (pipes, sockets, etc) to communicate -- versus "shared memory" where code can directly mutate memory locations visible to multiple processes, something that would require locking semantics.
In practice, this is dirt simple easy to set up. Back in Node.JS v0.6.X the "cluster" module was integrated into the standard distribution, making it easy to set up multiple node workers that can listen on a single port. Note that this "cluster" module is NOT the same as the learnboost "cluster" module which has a different API and owns the "cluster" name in the NPMjs registry.
http://nodejs.org/docs/latest/api/cluster.html
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i &lt numCPUs; i++) {
cluster.fork();
}
} else {
http.Server(function(req, res) { ... }).listen(8000);
}
The short answer to your question is that adding more cores in order to improve your node performance will not work, if all you do is write "standard" single threaded javascript (you will be bound by a single CPU).
The reason is that node.js uses an event loop for processing, so if all you are doing is starting up a single node.js process without anything else, it will not be multi-threaded and thus not use more than one CPU (core).
However, you can use the node.js cluster API to fork the node process so you can take advantage of multiple CPUs (cores): https://nodejs.org/docs/latest/api/cluster.html. If you write your code that way, then having more compute units will help you.
There is one caveat, in that EC2 compute units are detailed per instance. For some instances you can get more "compute units" per virtual core. So if you pick an instance that has 2 compute units per virtual core versus one that has one per core, you will be able to execute node on a CPU that has more compute units. However, it looks like after 2 compute units the computing power is split per core which means you won't get any benefit from the multiple cores.
Amazon's concept of total "EC2 Compute Units" for an instance type does not map directly to a CPU or core. It is the number of cores multiplied by the speed of each core in EC2 compute units (their own relative measurement).
Amazon does list how many virtual cores each instance type has:
http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?instance-types.html
Your best option is to use all of the cores as others point out. However, if you end up with a single threaded solution, then you will want to focus on the speed of the individual cores, not the total EC2 compute units of all the cores added together.
In Node.js, your code is single-threaded, but calls that e.g. access the file system or a database server do not use the main node.js thread. The main thread keeps executing while other threads are waiting for 4GB to be read from disk to RAM or for the DB server to return a response. Once the action finishes, the supplied callback is put in a queue to execute in the main thread. More or less, anyway.
The advantage being that in a server situation, you have one very fast thread that can handle thousands of concurrent requests without putting any one entirely on hold or spawning an OS thread for each client request-response cycle.
More to the point, you should benchmark your specific use case on EC2 -- multiple processors may be useful when running a single instance of node if the app does a lot of IO.
If I have node server running on an amazon EC2 instance with 4 EC2 Compute units will it run any faster / handle more load than if I have 2 EC2 Compute units?
No, if you are using node.js in a server capacity you will only have access to a single core.
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/');
Spawns a single listener, that doesn't mean only a single connection though. Node.js breaks conventional thought that way. The Event Loop will not block connections unless you code improperly. This post helps to explain the event loop and how important it is to understand it. Took me a while to really 'get' the implications.
Does CPU utilization on amazon require a program to be multithreaded to fully use all resources?
Yes, properly configured apache/nginx will take advantage of multi-cpu configurations. node.js servers are being developed that will also take advantage of these kind of configurations.
Just a quick addition to those above making good points as to the function of modern (old thread here) Node.JS, not only is Node implemented on top of V8, and LibUV, making use of a internal thread pool, but ACTUALLY, your JS code can be multi threaded. No, I don't just mean the thread_workers API. It is possible, even probably, that some of your dependencies, are using C++/V8/NAPI bindings for JS, and directly using the underlying thread pool.
For example:
You'll see that the standard bcrypt library on npm implements its blowfish utilities with multithreading in C++. Many people don't read the docs right, and are confused as to why running some cryptographic work from libraries in other worker threads doen't speed up their service.

Resources