We currently use pm2 to keep our nodejs process alive, we don't use the cluster mode (and the related load balance feature).
Our php team uses supervisord to manage their php process alive, as laravel suggests. Now we are investigating the possibility of using supervisord to manage our nodejs process too. We mainly need 2 things from a process manager, keep a process alive and log the event when it crushes and restarts.
In term of keeping a process alive I do find pm2 & supervisord share some similarity. But pm2 have more restart policies, e.g. pm2 has a CRON time which supervisord doesn't have (correct me if I am wrong). Without cron time feature we will have to just resort to cronjob, so it is a nice to have feature but not a must.
supervisord has process groups and priority order, which through my experience with node, I don't find many use cases.
So to us it seems doable, but we don't have enough experience with supervisord and we are afraid we may miss something here, especially the big one like you should not do that in the first place! Have anyone done this before ?
BTW, my question is kind of the opposite of Running a python script in virtual environment with node.js pm2
When you use pm2 or supervisord, the crash message will be hidden by pm2 or supervisord. So the key point is that when the worker crashese, you should have the ability to know it, such as to hack the pm2 program, when the worker crashes, send a message to the monitor system.
Related
Context
I've added configuration validation to some of the modules that compose my Node.js application. When they are starting, each one checks if it is properly configured and have access to the resources it needs (e.g. can write to a directory). If it detects that something is wrong it sends a SIGINT to itself (process.pid) so the application is gracefully shutdown (I close the http server, close possible connections to Redis and so on). I want the operator to realize there is a configuration and/or environment problem and fix it before starting the application.
I use pm2 to start/stop/reload the application and I like the fact pm2 will automatically restart it in case it crashes later on, but I don't want it to restart my application in the above scenario because the root cause won't be eliminated by simply restarting the app, so pm2 will keep restarting it up to max_restarts (defaults to 10 in pm2).
Question
How can I prevent pm2 from keeping restarting my application when it is aborted during startup?
I know pm2 has the --wait-ready option, but given we are talking about multiple modules with asynchronous startup logic, I find very hard to determine where/when to process.send('ready').
Possible solution
I'm considering making all my modules to emit an internal "ready" event and wire the whole thing chaining the "ready" events to finally be able to send the "ready" to pm2, but I would like to ask first if that would be a little bit of over engineering.
Thanks,
Roger
Is it more reliable to use daemontools or supervisord, or that I use crontab that runs a script of mine to keep checking if the process still exists and if not, start it again?
What is the best way to guarantee for sure that a process is always running, and running in a healthy condition? (i.e., it isn't running but stalled in some error, where it should be killed and started again).
Btw, this is a Java process that I start like java -jar app.jar.
Thanks!
We use monit for such tasks. It could run process if it is in down state
Personally, I use supervisord for my servers. It's really easy to configure and understand, and it can be configured to automatically re-run a failing process several times.
You should probably start by reading the oficial docs, specifically, this section to set up start retries.
NodeJS has its own modules for managing clustering and process restart:
clustering module which allows node to run multiple processes based on the # of cores in the machine. This will also spawn new processes when old ones shutdown.
domain module allows node to stop taking requests and shutdown the processes after an error has occurred.
Then there's PM2, and I've seen guides like this one saying that PM2 allows for logging, some stats monitoring, process restart, and clustering for nodejs.
Other than the stats monitoring and logging, can someone explain what the difference between the two is? Are they supposed to be used together or do I pick one or the other?
In a production environment, how does each fare in shutting down + restarting on bootup for the nodejs app:
System needs to restart (applying system patches, etc)
Restarting all nodejs processes to apply new code changes on server.
PM2 uses cluster under the hood, and the makes the whole cluster management easier. For your requirements, you want to look at PM2.
I am refactoring a couple of node.js services. All of them used to start with forever on virtual servers, if the process crashed they just relaunch.
Now, moving to containerised and state-less application structures, I think the process should exit and the container should be restarted on a failure.
Is that correct? Are there benefits or disadvantages?
My take is do not use an in-container process supervisor (forever, pm2) and instead use docker restart policy via the --restart=always (or one of the other flavors of that option). This is more inline with the overall docker philosophy, and should operate very similarly to in-container process supervision since docker containers start running very quickly.
The strongest advocate for running in-container process supervision I've seen is in the phusion baseimage-docker README if you want to explore the other position on this topic.
While it's a good idea to use --restart=always as a failsafe, container restarting is relatively slow (5+ seconds with the simple Hello World Node server described here), so you can minimize app downtime using something like forever.
A downside of restarting the process within the container is that crash recovery can now happen two ways, which might have implications for your monitoring, etc.
Node needs clustering setup if you are running on a server with multiple CPUs.
With PM2 you get that without writing any extra code. http://pm2.keymetrics.io/docs/usage/cluster-mode/
Unless you are using a bunch of servers with single CPU instances than i would say use PM2 in production.
pm2 will also be quicker to restart than docker
We have a custom setup which has several daemons (web apps + background tasks) running. I am looking at using a service which helps us to monitor those daemons and restart them if their resource consumption exceeds over a level.
I will appreciate any insight on when one is better over the other. As I understand monit spins up a new process while supervisord starts a sub process. What is the pros and cons of this approach ?
I will also be using upstart to monitor monit or supervisord itself. The webapp deployment will be done using capistrano.
Thanks
I haven't used monit but there are some significant flaws with supervisord.
Programs should run in the foreground
This means you can't just execute /etc/init.d/apache2 start. Most times you can just write a one liner e.g. "source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND" but sometimes you need your own wrapper script. The problem with wrapper scripts is that you end up with two processes, a parent and child. See the the next flaw...
supervisord does not manage child processes
If your program starts child process, supervisord wont detect this. If the parent process dies (or if it's restarted using supervisorctl) the child processes keep running but will be "adopted" by the init process and stay running. This might prevent future invocations of your program running or consume additional resources. The recent config options stopasgroup and killasgroup are supposed to fix this, but didn't work for me.
supervisord has no dependency management - see #122
I recently setup squid with qlproxy. qlproxyd needs to start first otherwise squid can fail. Even though both programs were managed with supervisord there was no way to ensure this. I needed to write a start script for squid that made it wait for the qlproxyd process. Adding the start script resulted in the orphaned process problem described in flaw 2
supervisord doesn't allow you to control the delay between startretries
Sometimes when a process fails to start (or crashes), it's because it can't get access to another resource, possibly due to a network wobble. Supervisor can be set to restart the process a number of times. Between restarts the process will enter a "BACKOFF" state but there's no documentation or control over the duration of the backoff.
In its defence supervisor does meet our needs 80% of the time. The configuration is sensible and documentation pretty good.
If you want to additionally monitor resources you should settle for monit. In addition to just checking whether a process is running (availability), monit can also perform some checks of resource usage (performance, capacity usage), load levels and even basic security checks (md5sum of a bianry file, config file, etc). It has a rule-based config which is quite easy to comprehend. Also there is a lot of ready to use configs: http://mmonit.com/wiki/Monit/ConfigurationExamples
Monit requires processes to create PID files, which can be a flaw, because if a process does not create pid file you have to create some wrappers around. See http://mmonit.com/wiki/Monit/FAQ#pidfile
Supervisord on the other hand is more bound to a process, it spawns it by itself. It cannot make any resource based checks as monit. It has a nice CLI servicectl and a web GUI though.