Loving node but I have a question about making my application more fault tolerant. I like to assume the worst so lets assume my application has a fault that leads to application exit every now and then. PM2 or your preferred monitor will reboot it but some requests may fail while it restarts / before load balancers detect its down.
Are there any good containers or tools I can put in front of node to buffer these requests (for a reasonable amount of time) until the app bootstraps and comes back online? Perhaps Nginx can do this?
Related
I have a Parse Server which is a Node.js + express wrapper for a mobile app (about 100 simultaneous users every day), hosted on DigitalOcean. The app server communicates with MongoDB, which is hosted on another droplet of DigitalOcean. I'm using pm2 as a process manager and its monitoring tool, which is web-based. On the same process, we operate LiveQuery, a WebSocket server made by the Parse community as well.
The thing is, I've been having some performance issues with the server. Everything works smoothly, until the Active handles rise up uncontrollably! (see the image below) It's like after one point the server says "I'm done! Now I rest!"
Usually the active handles stay between 30 to 70. The moment I restart the process with pm2 restart everything goes back to normal!
I've been having this issue for quite some time now and I haven’t been able to figure out what’s causing it! Any help will be greatly appreciated!
EDIT: I did a stress test where I created 200 LiveQuery sockets for 1 user, instead of 2 that a user normally has and there was a spike of 300 active handles, for like 5 seconds! The moment the sockets were all created, everything went back to normal!
I usually use restart based on memory usage
pm2 start filename.js --max-memory-restart 160 --exp-backoff-restart-delay=100
pm2 has also built-in cron job or autostart script setup in case the server ever restarts, see https://pm2.keymetrics.io/docs/usage/restart-strategies/
it would be could if pm2 would provide restart options based on active connections or heap memory
Context
I've added configuration validation to some of the modules that compose my Node.js application. When they are starting, each one checks if it is properly configured and have access to the resources it needs (e.g. can write to a directory). If it detects that something is wrong it sends a SIGINT to itself (process.pid) so the application is gracefully shutdown (I close the http server, close possible connections to Redis and so on). I want the operator to realize there is a configuration and/or environment problem and fix it before starting the application.
I use pm2 to start/stop/reload the application and I like the fact pm2 will automatically restart it in case it crashes later on, but I don't want it to restart my application in the above scenario because the root cause won't be eliminated by simply restarting the app, so pm2 will keep restarting it up to max_restarts (defaults to 10 in pm2).
Question
How can I prevent pm2 from keeping restarting my application when it is aborted during startup?
I know pm2 has the --wait-ready option, but given we are talking about multiple modules with asynchronous startup logic, I find very hard to determine where/when to process.send('ready').
Possible solution
I'm considering making all my modules to emit an internal "ready" event and wire the whole thing chaining the "ready" events to finally be able to send the "ready" to pm2, but I would like to ask first if that would be a little bit of over engineering.
Thanks,
Roger
For example, in the Python world you would use uWSGI or Gunicorn to restart your Python web app if stopped running for any reason, e.g. memory leaks, unexpected runtime errors, etc. However this is done in such a way that connections aren't dropped (so no 502s).
Looking at the options for Node it seems PM2 is a popular choice but I have two concerns:
Can it make the same guarantees regarding connection draining (no 502s, please)?
When I looked at PM2 before it seemed to cause significant performance degradation in my application where every millisecond of latency counts (100s of added ms).
So my question is, where performance is a serious consideration and we can't drop connections while restarting, what are Node's uWSGI and Gunicorn equivalents?
Here are some strategies:
Use node.js clustering with N worker processes. You can then restart any single worker process and not affect overall availability.
Use a load balancer in front of multiple clusters. Then temporarily configure the load balancer to only send traffic to one cluster. When the deconfigured cluster has finished with all open connnections, you can then restart all the processes in that cluster.
For even more flexibility, use multiple clusters on separate machines. That allows you to even take a server machine down for hardware maintenance without disrupting overall availability.
If you have resources among multiple clustered processes such as databases, then you will also need redundancy for them in order to be able to restart them without interruption.
Now of course, you have to make sure that taking some part of your system out of service for reboot or maintenance still leaves you with enough service capacity so you would typically do this when overall service load is low (4am for your largest user base).
PM2 is one such tool that allows you to do portions of what is recommended here (such as clustering and seamlessly restarting part of a cluster). There are other tools.
This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using Socket.io) for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.
I heard of node.js is very suitable for applications where a persistent connection from the browser to the server is needed. That "long-polling" technique is used, that allows to send updates to the user in real time without needing a lot of server resources. A more traditional server model would need a thread for every single user.
My question, what is done instead, how are the requests served differently?
Why doesn't it take so much resources?
Nodejs is event-driven. The node script is started and then loops continuously, waiting for events to be fired, until stopped. Once running, the overhead associated with loading is done.
Compare this to a more traditional language such as c#.net or PHP, where a request causes the server to load and run the script and it's dependencies. The script then does its' task (often serving a web page) and then shuts down. Another page is requested, the whole process starts again.