After reading a handful of articles on scaling node apps I have not yet made up my mind about when should I use node builtin cluster or simply adding more dynos.
Let me tell you I have already read the following threads on StackOverflow:
How to properly scale nodejs app on heroku using clusters
Running Node.js App with cluster module is meaningless in Heroku?
As far as I understood it, if I make use of node cluster functionality I will end up with the total memory available divided by the number of forked processes.
On the other hand, if I add one more dyno I will double the memory available.
So, what is the point of using node clusters?
It's not really an either-or situation. You can make use of multiple node cluster instances on multiple dynos. Memory isn't really what you want to look at, though, since that would be a shared resource. CPU / core usage is more relevant to clustering in node, since each node process can only make use of one CPU core at a time.
It's really going to depend on which dynos you are using, too.
Have you seen these suggestions on the official heroku docs yet?
Related
We have a production chat app built in socketio/nodejs.
We use express.
Nodejs is a bit old : 10.21.0
SocketIO in 3.1.1
Our computer is a VM with 4vCPU and 16 GB RAM.
We use pm2 to manage starting node app with env variables.
We are facing an issue when there are about 500 users in chat and when they write. Bandwidth usage is around 250 Mbps in upload (but we have 10G so no issue). Issue begins here, we can see in our logs full of connection/disconnection and pm2 restart app.
In checking in more details, in launching "pm2 monit" we can see that only one processor is used and it is higher than 100% most of the time.
We read few documentation about clustering (cluster + fork). It seems to be interesting but in our case when we tested it, it's like we had few chat apps so for the same "chat room", users are in different workers so it's not OK.
Do you have an idea how we can fix that and use all processor/core ?
We are already thinking of starting with upgrading nodejs?
Thanks
Niko
Since Node.js is always single-threaded (aside from worker threads), upgrading Node won't get you much anywhere (aside from newer Nodes shipping newer V8 engines, which might be faster).
it's like we had few chat apps so for the same "chat room", users are in different workers so it's not OK.
This sounds like you've architected your app to use global variables or in-process state like that for these shared rooms. If you want to use cluster or PM2's multiple process mode, that state will need to live somewhere else, maybe a second Node application or, say, a Redis server.
They both solve the same issue - scalability. When to use which?
And is there a point to integrating cluster API for node app running inside a docker container?
They're not really equivalent. Microservices solve an organizational and code management problem, scalability in a very dynamic way, reducing tight coupling, and keeping bugs isolated to one microservice). cluster solves scalability in a very limited way, by spinning out cluster workers on the same machine. If you have one large app and generally scale vertically (by increasing the amount of computing power your hosts have), cluster is great. If not, breaking things down int services (or further down into microservices) is also great.
You can also do both (your second question), for example running Node apps in containers on Kubernetes, where the Node apps use cluster. Depending on how your containers get run and how many vCPUs they're allocated, it may or may not have any effect, but it's only a couple lines of code so it doesn't hurt to add it.
I am very new to Node.js and express. I am currently learning it by building my own services.
I recently read about clusters. I understood what clusters do. What I am not able to understand is how to make use of clusters in a production application.
One way I can think of is to use the Master process to just sit in front and route the incoming request to the next available child process in a round robin fashion. I am not sure if this is how it is designed to be used. I would like to know how should clusters be used in a typical web application.
Thanks.
The node.js cluster modules is used with node.js any time you want to spread out the request processing across multiple node.js processes. This is most often used when you wish to increase your ability to handle more requests/second and you have multiple CPU cores in your server. By default, a single instance of node.js will not fully utilize multiple cores because the core Javascript you run in your server is single threaded (uses one core). Node.js itself does use threads for some things internally, but that's still unlikely to fully utilize a mult-core system. Setting up a clustered node.js process for each CPU core will allow you to better maximize the available compute resources.
Clustering also provides you with some additional fault tolerance. If one cluster process goes down, you can still have other live clusters serving requests while the disabled cluster restarts.
The cluster module for node.js has a couple different scheduling algorithms - the round robin you mention is one. You can read more about that here: Cluster Round-Robin Load Balancing.
Because each cluster is a separate process, there is no automatic shared data among the different cluster processes. As such, clustering is simplest to implement either where there is no shared data or where the shared data is already in a place that it can be accessed by multiple processes (such as in a database).
Keep in mind that a single node.js process (if written to properly use async I/O and not heavily compute bound) can server many requests itself at once. Clustering is when you want to expand scalability beyond what one instance can deliver.
I have created a poc on cluster in nodejs and added some details in the below blogs. Once go through it. It may provide some clearance.
https://jksnu.blogspot.com/2022/02/cluster-in-node-js-application.html
https://jksnu.blogspot.com/2022/02/cluster-management-in-node-js.html
I recently started with node and I have been reading a lot about its limitation of it being single threaded and how it does not utilise your cores and then I read this
http://bit.ly/1n2YW68 (which talk about the new cluster module of nodejs for loadbalancing)
Now I'm not sure I completely agree to it :) because the first thing that I thought of before starting with node on how to make it utilise cores with proper load balancing is via web-server some like upstream module like nginx
like doing something like this
upstream domain1 {
server http://nodeapp1;
server http://nodeapp2;
server http://nodeapp3;
}
So my question is there an advantage to use such cluster module for load balancing to utilise the cores does it has any significant advantage over web server load balancing
or is blog post too far from real use.
Note: I'm ain't concerned about load balancing handle by various app server like passenger(passenger has nodejs support as well but something that I'm not looking for answer :)) which I already know since I'm mostly a ruby programmer
One other option you can use to cluster NodeJs applications is to deploy the app using PM2.
Clustering is just easy as this, You don't need to implement clustering by hand
pm2 start app.js -i max
PM2 is an expert to auto detect the number of available CPUs and run as many processes as possible
Read about PM2 cluster mode here
http://pm2.keymetrics.io/docs/usage/cluster-mode/
For controlling the load of IO operations, I wrote a library called QueueP using the memoization concept. You can even customize the memoization logic and gain speedup values of more than 10, sometimes
https://www.npmjs.com/package/queuep
As far as I know, the built in node cluster is not a good solution yet (load is not evenly distributed across cores). Until v0.12: http://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/
So you should use nginx until then. After that we will see some benchmarks comparing both options and see if the built in cluster module is a good choice.
I have just learned about Heroku and was pretty much excited to test it out. Ive quickly assembled their demo's with Node.js Language and stumbled across a problem. When running the application locally, apache benchmark prints roughly about 3500 request/s but when its on the cloud that drops to 10 request/s and does not increase or lower based on network latency. I cannot believe that this is the performance they are asking 5 cents/hour for and highly suspect my application to be not multi-threaded.
This is my code.js: http://pastebin.com/hyM47Ue7
What configuration do I need to apply in order to get it 'running' (faster) on Heroku ? Or what other web servers for node.js could I use ?
I am thankful for every answer on this topic.
Your little example is not multi-threaded. (Not even on your own machine.) But you don't need to pay for more dyno's immediately, as you can make use of multiple cores on a dyno, see this answer Running Node.js App with cluster module is meaningless in Heroku?
To repeat that answer: a node solution to using multiple processes that should increase your throughput is to use the (built-in) cluster module.
I would guess that you can easily get more than 10 req/s from a heroku dyno without a problem, see this benchmark, for example:
http://openhood.com/ruby/node/heroku/sinatra/mongo_mapper/unicorn/express/mongoose/cluster/2011/06/14/benchmark-ruby-versus-node-js/
What do you use to benchmark?
You're right, the web server is not multi-threaded, until you pay for more web dynos. I've found Heroku is handy for prototyping; depending on the monetary value of your time, you may or may not want to use it to set up a scalable server instead of using EC2 directly.