Which one is better?
I have activated Nodejs clustering mode with workers but now I discovered PM2 that does the same thing.
I'm using keymetrics to see the stats from my webserver and I have noticed that when I launch my NodeJS node (with a built in cluster) without using PM2 cluster feature, Keymetrics reports 20/30MB of Ram used.
If I deactivate clustering (inside node) and I switch on PM2 cluster, keymetrics reports about 300MB of Ram usage.
Now, which method is better and why with a built in cluster keymetrics reports only 30MB of ram usage?
It actually depends on how your Node application works. If your application is stateless then it is easy to use pm2 cluster mode as it does not require much effort (or no effort) in code changes. But if your application uses local data, sessions or using sockets then it is recommended to use Node.js inbuilt cluster module and start your application normally using pm2.
My Node application is using sockets and MQTT so I can't directly use pm2 cluster mode (pm2 start app.js -i max) as same node application will run on every CPU and it was creating multiple socket connection with the client. So I have to manage Clusters and Workers manually using Node cluster and have to use sticky-sessions and socket.io-redis like node packages to setup proper communication flow between all workers. And then starting my node app using simply pm2 start app.js
Below are some links which can be helpful.
PM2 Clustur mode
PM2 Recommendation Note
Node Cluster
I use PM2. There are a number of reasons it is better.
Unlike using core's clustering, your code needs little to no modification to use PM2. Clustering logic doesn't belong in every app we ever build.
It scales from the command line. I can simply run pm2 scale my-app +1 to add another worker in realtime after deployment.
You should already be using PM2 anyway to keep the process alive. So clustering comes for free.
I cannot reproduce anything close to your 300MB number. In fact, I recently had a leaky app that I had to use --max-memory-restart on and even in that situation memory usage usually stayed below 100MB. Though it wouldn't surprise me in the slightest if PM2's clustering used more memory, simply because it does a lot for you out-of-the-box.
My suggestion would be to not prematurely optimize. Use PM2 until you genuinely need to squeeze every drop of memory / performance out of your systems (definitely not before you have lots of traffic). At that point you can figure out what the bare minimum is you need from clustering and can re-implement just those parts yourself.
Resources
Clustering walkthrough: https://keymetrics.io/2015/03/26/pm2-clustering-made-easy/
PM2 tutorial: https://futurestud.io/tutorials/pm2-cluster-mode-and-zero-downtime-restarts
Related
I have a simple stateless Node app that I want to instantiate across a multi-core (multi-vCPU AWS instance) server, and I understand how PM2's cluster mode works to obviate the need for using the Cluster module in the app code.
I have a dual core AWS t2.medium EC2 instance, PM2 I believe is configured correctly and at startup it invokes two processes for the app with distinct PM2 IDs and PIDs.
PM2 is starting the app as follows:
pm2 start [app_name] -i max
PM2 lists the two processes with distinct PM2 IDs and distinct PIDs as expected.
However...
ps -U [username] -au
...suggests both processes are running on the same core.
Am I missing something? (Probably!)
Thanks in advance to anyone who can shed some light on this.
Processes aren't bound to cores, but rather assigned by the OS's scheduler. When your clustered program comes under load, the OS will use both cores to schedule your processes and, of course, all the other stuff it needs to run.
I use PM2 to start my application in cluster mode. But as I know, in that case PM2 does not allow to run my code in master process, but I need to collect metrics (CPU usage, memory etc.).
Is it possible to aggregate metrics or get metrics for whole app (PM2 cluster mode) in child workers and, for example, show these metrics on /metrics route?
Unfortunately, I cannot to find any open source libs for that :(
I found pm2 web + pmx. It is solved my problem.
I understand that PM2 Cluster Mode allows us to easily scale across CPUs on a single machine. Does it create multiple instances of the node application it is scaling? Essentially, is it the same thing as running multiple node applications on different ports with a reverse proxy like Nginx?
Then, there's Node Cluster which forks a child process. Is this approach more efficient compared to PM2 Cluster Mode as it is running a single Node Application and using worker threads to process incoming requests?
they basically do the same, PM2 will use Node Cluster under the hood, it will make things easier since you don't have to programmatically handle forking in your code, just run it as is.
note that Cluster Mode will not support session stickiness so make sure your app is stateless.
I'm working on a project with Node.js that involves a server. Now due to large number of jobs, I need to perform clustering to divide the jobs between different servers (different physical machines). Note that my jobs has nothing to do do with internet, so I cannot use stateless connection (or redis to keep state) and a load balancer in front of the servers to distribute the connection.
I already read about the "cluster" module, but, from what i understood, it seems to scale only on multiprocessors on the same machine.
My question: is there any suitable distributed module available in Node.js for my work? What about Apache mesos? I have heard that mesos can abstract multiple physical machines into a single server? is it correct? If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Thanks
My question: is there any suitable distributed module available in Node.js for my work?
Don't know.
I have heard that mesos can abstract multiple physical machines into a single server? is it correct?
Yes. Almost. It allows you to pool resources (CPU, RAM, DISK) across multiple machines, gives you ability to allocate resources for your applications, run and manage the said applications. So you can ask Mesos to run X instances of node.js and specify how much resource does each instance needs.
http://mesos.apache.org
https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Admittedly, I don't know anything about node.js or clustering in node.js. Going by http://nodejs.org/api/cluster.html, it just forks off a bunch of child workers and then round robins the connection between them. You have 2 options off the top of my head:
Run node.js on Mesos using an existing framework such as Marathon. This will be fastest way to get something going on Mesos. https://github.com/mesosphere/marathon
Create a Mesos framework for node.js, which essentially does what cluster node.js is doing, but across the machines. http://mesos.apache.org/documentation/latest/app-framework-development-guide/
In both these solutions, you have the option of letting Mesos create as many instances of node.js as you need, or, use Mesos to run cluster node.js on each machine and let it manage all the workers on that machine.
I didn't google, but there might already be a node.js mesos framework out there!
Is it possible to build nodejs server with master/slave mode or cluster mode with only one CPU core, so that the others could be up once the current thread is down?
Yes, while using the core cluster module, you can spawn more children than cores. It is not recommended for regular use due context switching and the overhead incurred with new node processes.
However, if this is to load new code, an overall different approach is required. There are some existing modules that can assist with zero downtime reloads and they mostly proxy requests to new instances to perform the switch.