NodeJS has its own modules for managing clustering and process restart:
clustering module which allows node to run multiple processes based on the # of cores in the machine. This will also spawn new processes when old ones shutdown.
domain module allows node to stop taking requests and shutdown the processes after an error has occurred.
Then there's PM2, and I've seen guides like this one saying that PM2 allows for logging, some stats monitoring, process restart, and clustering for nodejs.
Other than the stats monitoring and logging, can someone explain what the difference between the two is? Are they supposed to be used together or do I pick one or the other?
In a production environment, how does each fare in shutting down + restarting on bootup for the nodejs app:
System needs to restart (applying system patches, etc)
Restarting all nodejs processes to apply new code changes on server.
PM2 uses cluster under the hood, and the makes the whole cluster management easier. For your requirements, you want to look at PM2.
Related
I have a node.js web application that runs on my amazon aws server using nginx and pm2. The application processes files for the user, which is done using a job system and child processes. In short, when the application starts via pm2, i create a child process for each cpu core of the server. Each child process (worker) then completes jobs from the job queue.
My question is, could i replicate this in docker or would i need to modify it somehow. One assumption i had was that i would need to create a container for the database, one container for the application, and then multiple worker containers to do the processing, so that if one crashes i just spin up another worker.
I have been doing research online, including a udemy course to get my head around this stuff, but i haven't come across an example or something i can relate to my problem/question.
Any help, reading material or suggestions would be greatly appreciated.
Containers run at the same performance level as the host OS. There is no process performance hit. I created a whitepaper with Docker and HPE on this.
You wouldn't use pm2 or nodemon, which are meant to start multiple processes of your node app and restart them if they fail. That's the job of Docker now.
If in Swarm, you'd just increase the replica count of your service to be similar to the number of CPU/threads you'd want to run at the same time in the swarm.
I don't mention the nodemon/pm2 thing for Swarm in my node-docker-good-defaults so I'll at that as an issue to update it for.
For example, in the Python world you would use uWSGI or Gunicorn to restart your Python web app if stopped running for any reason, e.g. memory leaks, unexpected runtime errors, etc. However this is done in such a way that connections aren't dropped (so no 502s).
Looking at the options for Node it seems PM2 is a popular choice but I have two concerns:
Can it make the same guarantees regarding connection draining (no 502s, please)?
When I looked at PM2 before it seemed to cause significant performance degradation in my application where every millisecond of latency counts (100s of added ms).
So my question is, where performance is a serious consideration and we can't drop connections while restarting, what are Node's uWSGI and Gunicorn equivalents?
Here are some strategies:
Use node.js clustering with N worker processes. You can then restart any single worker process and not affect overall availability.
Use a load balancer in front of multiple clusters. Then temporarily configure the load balancer to only send traffic to one cluster. When the deconfigured cluster has finished with all open connnections, you can then restart all the processes in that cluster.
For even more flexibility, use multiple clusters on separate machines. That allows you to even take a server machine down for hardware maintenance without disrupting overall availability.
If you have resources among multiple clustered processes such as databases, then you will also need redundancy for them in order to be able to restart them without interruption.
Now of course, you have to make sure that taking some part of your system out of service for reboot or maintenance still leaves you with enough service capacity so you would typically do this when overall service load is low (4am for your largest user base).
PM2 is one such tool that allows you to do portions of what is recommended here (such as clustering and seamlessly restarting part of a cluster). There are other tools.
Do we need cluster module for a node.js script which just fetches some job from gearman server or from a rest api like in AWS SQS and performs it?
What i know is that, cluster is more useful in case of socket sharing (ex listening on a port) like in a web server.
PS: I am already using monit to monitor and restart these daemon process in case of a crash and in future planning to use pm2 (in non cluster mode, i.e. without -i flag).
No, you do not have to use the cluster module in order to service multiple operations from some sort of work queue. You should use the cluster module when it's specific features match up well with the type of work you are doing (load balancing multiple incoming connections).
In fact, if the operations to be done are mostly asynchronous things (such as sending an update to an external database), you may not even need multiple processes. But, if you do need multiple processes, then you can use the various options in the process module to start other work processes to carry out individual tasks and use the main central server to monitor and coordinate these other processes. This is part of what the cluster module does, but it's use is more specialized than just something that starts other external processes that you could code yourself.
There are a bunch of answers around this on Stackoverflow, but I'm not sure I've found anything really complete, or that up to date.
We are using node.js with node clusters on elastic beanstalk. This allows us to, in theory, use all the CPUs on a box by spinning up individual runtimes of node, one for each CPU, balances the calls between the different runtimes, and restarts the individual runtimes if they die for some reason.
However, clustering currently is suspect in how it handles the load balancing - it tends to continuously pick one or two CPUs, at least on Linux. Version 12 relieves this issues, but AWS does not support it yet.
I'm pretty sure I can configure nginx on beanstalk to handle the load balancing, though I'd love to see a working example of the nginx conf file. But I'm wondering if anyone has figured out how to start up the correct number of node runtimes, based on CPUs, and also how the restart works - if a worker dies, I want to start a new one up immediately. Beanstalk handles the main cluster manager now, and the cluster manager takes care of the workers.
I'm open to something like Docker in beanstalk, also, but we like being in beanstalk for the ease of set up, scaling and management.
I am refactoring a couple of node.js services. All of them used to start with forever on virtual servers, if the process crashed they just relaunch.
Now, moving to containerised and state-less application structures, I think the process should exit and the container should be restarted on a failure.
Is that correct? Are there benefits or disadvantages?
My take is do not use an in-container process supervisor (forever, pm2) and instead use docker restart policy via the --restart=always (or one of the other flavors of that option). This is more inline with the overall docker philosophy, and should operate very similarly to in-container process supervision since docker containers start running very quickly.
The strongest advocate for running in-container process supervision I've seen is in the phusion baseimage-docker README if you want to explore the other position on this topic.
While it's a good idea to use --restart=always as a failsafe, container restarting is relatively slow (5+ seconds with the simple Hello World Node server described here), so you can minimize app downtime using something like forever.
A downside of restarting the process within the container is that crash recovery can now happen two ways, which might have implications for your monitoring, etc.
Node needs clustering setup if you are running on a server with multiple CPUs.
With PM2 you get that without writing any extra code. http://pm2.keymetrics.io/docs/usage/cluster-mode/
Unless you are using a bunch of servers with single CPU instances than i would say use PM2 in production.
pm2 will also be quicker to restart than docker