Node app memory usage increasing with time - node.js

I am running the MEAN application which has client and server both separate folders and I am using pm2 for this. When I do pm2 restart all then application working fine and using less memory but after some time it's getting increased and slowing the application.
I have attached 2 screenshots of 10 hrs interval to explain my problem.
Check Image: After restarting PM2
Check Image: After 10 hrs of running application

Nothing can be said without looking at your code. It looks like you must be storing data in memory which gets bigger over time. Please check if you can clear all such variables and keep there size minimum.

Primarily, a more efficient code that avoids redundant vars would be the best solution but that is easier said than done, if you are using PM2 you can use max_memory_restart and give it a memory limit for the application. Doing so will make your app restart every time it goes over and thus, keeps your systems clear.
You can enable this by editing your ecosystem.config.js
You can read it more about it on their docs: https://pm2.keymetrics.io/docs/usage/application-declaration/

Related

node.js CPU usage spikes

I have an express.js app running in cluster mode (nodejs cluster module) on a linux production server. I'm using PM2 to manage this app. It usually uses less than 3% CPU. However, sometimes the CPU usage spikes up to 100% for a short period of time (less than a minute). I'm not able to reproduce this issue. This only happens once a day or two.
Is there any way to find out which function or route is causing the sudden CPU spikes, using PM2? Thanks.
i think have some slow synchronous execution on some request in your application.
add log every request income on middleware and store to elastic search and find what request have long response time or use newrelic (easy way but spend more money).
use blocked-at to find slow synchronous execution if detect try to use worker threads or use lib workerpool
My answer is based purely on my experience with the topic
Before going to production make local testing like:
stress testing.
longevity testing.
For both tests try to use tool like JMeter where you can put your one/multiple endpoints and run loads of them in period of time while monitoring CPU & MEMORY Usage.
If everything is fine, try also to stop the test and run the api manually try to monitor its behavior, this will help you if there is
memory leak from the APIs themselves
Is your app going through .map() , .reduce() for huge arrays?
Is your app is working significantly better after reboot?
if yes, then you need to suspect that the express app experiencing memory leak and Garbage collector trying to clean the mess.
If it's possible, try to rewrite the app using fastify, personally, this did not make the app much faster, but able to handle 1.5X more requests.

PM2 Is Working But Has Large Restart Count With No Errors?

So recently I've been using PM2, and in terms of functionality and such, it's working perfectly.
However, I've noticed that if I leave my node app running over night and check it in the morning, it jumps in "restart" count about 20 times.
For example, I've only had it on the server for the past couple of days and the restart count is 100+.
There are no error logs, the output logs look fine and have nothing dodgy. So I'm not sure what is causing this.
The app itself runs fine if you access it, but I've found it's never good practice to leave something like this unfixed.
Perhaps it could be a memory leak or something? If someone could point me in the right direction that would be really helpful.
I'm not really sure what to provide, so if needed let me know and I can provide my config/error files and such for PM2.
Thanks.
May be there is a memory leak in your app and it is running out of memory while processing/running try increasing the memory like this.
PM2 start --name my-process --max-memory-restart 5000M index.js

Node app unresponsive after certain amount of time

I'm trying to figure out why my nodejs app becomes unresponsive after 11h 20min. It happens every time, no matter if I run it on amazon-linux or Red Hat.
My Stack:
nodejs (v. 6.9.4)
mongodb (3.2)
pm2 process manager
AWS EC2 instance T2 medium
Every time I'm running the app it becomes unresponsive with an error returned to the browser:
net::ERR_CONNECTION_RESET
Pm2 doesn't restart the app, so I suspect it has nothing to do with nodejs, I also analysed the app and it doesn't have memory leaks. Db logs also look alright.
The only constant factor is the fact that the app crashes after it runs for 11h 20min.
I'm handling all possible errors from the nodejs app, but no errors in the log files occur so I suspect it has to be something else.
I also checked var/log/messages and /home/centos/messages but nothing related to the crash of the app there either.
/var/log/mongodb/mongo.log doesn't show anything specific either.
What would be the best way to approach the problem ?
Any clues how can I debug it or what could be the reason ?
Thanks
Copied from the comment since it apparently led to the solution:
You're leaking something other than memory is my guess, maybe file descriptors. Try using netstat or lsof to see if there are a lot more open connections or files than you expect.

Issue in pm2 - It stops responding

Am facing issue in my application servers. Assume that - there are two nodes in the Load-balancer.
Suddenly one of the node from them becomes unhealthy.
When I logged in that instance. There were no logs coming in pm2.
then I check its CPU it was very high.
So please guide me how can I fix this issue. Or any way to debug it.
Check out flame graphs to see where your Node app is CPU bound.
You can also use the new debugging system in Node 6.3 (--inspect) to debug with the full power of Chrome DevTools.
PM2 has some limited protection for runaway issues like this via the max-memory-restart option. Typically, high CPU will also correlate with high memory usage and this option can be used to restart your app when it begins consuming large amounts of memory (which in your case may or may not be the correct moment but it should help).
--max-memory-restart <memory> specify max memory amount used to autorestart (in octet or use syntax like 100M)

Node.js app has periodic slowness and/or timeouts (does not accept incoming requests)

This problem is killing the stability of my production servers.
To recap, the basic idea is that my node server(s) sometimes intermittently slow down, sometimes resulting in Gateway Timeouts. As best as I can tell from my logs, something is blocking the node thread (meaning that the incoming request is not accepted), but I cannot for the life of me figure out what.
The problem ranges in severity. Sometimes what should be <100ms requests take ~10 seconds to complete; sometimes they never even get accepted by the node server at all. In short, it is as though some random task is working and blocking the node thread for a period of time, thus slowing down (or even blocking) incoming requests; the one thing I can say for sure is that the need-to-fix-symptom is a "Gateway Timeout".
The issue comes and goes without warning. I have not been able to correlate it against CPU usage, RAM usage, uptime, or any other relevant statistic. I've seen the servers handle a large load fine, and then have this error with a small load, so it does not even appear to be load-related. It is not unusual to see the error around 1am PST, which is the smallest load time of the day! Restarting the node app does seem to maybe make the problem go away for a while, but that really doesn't tell me much. I do wonder if it might be a bug in node.js... not very comforting, considering it is killing my production servers.
The first thing I did was to make sure I had upgraded node.js to the latest (0.8.12), as well as all my modules (here they are). Of course, I also have plenty of error catchers in place. I'm not doing anything funky like printing out lots to the console or writing to lots of files.
At first, I thought it was outbound HTTP requests blocking the incoming socket, because the express middleware was not even picking up the inbound request, but I gave up the theory because it looks like the node thread itself became busy.
Next, I went through all my code with JSHint and fixed literally every single warning, including a few accidental globals (forgetting to write "var") but this didn't help
After that, I assumed that perhaps I was running out of memory. But, my heap snapshots via nodetime are looking pretty good now (described below).
Still thinking that memory might be an issue, I took a look at garbage collection. I enabled the --nouse-idle-notification flag and did some more code optimization to NULL objects when they were not needed.
Still convinced that memory was the issue, I added the --expose-gc flag and executed the gc(); command every minute. This did not change anything, except to occasionally make requests a bit slower perhaps.
In a desperate attempt, I setup the "cluster" module to use 2 workers and automatically restart them every 30 min. Still, no luck.
I increased the ulimit to over 10,000 and kept an eye on the open files. There seem to be < 300 open files (or sockets) per node.js app, and increasing the ulimit thus had no impact.
I've been logging my server with nodetime and here's the jist of it:
CentOS 5.2 running on the Amazon Cloud (m1.large instance)
Greater than 5000 MB free memory at all times
Less than 150 MB heap size at all times
CPU usage is less than 60% at all times
I've also checked my MongoDB servers, which have <5% CPU usage and no requests are taking > 100ms to complete, so I highly doubt there's a bottleneck.
I've wrapped (almost) all my code using Q-promises (see code sample), and of course have avoided Sync() calls like the plague. I've tried to replicate the issue on my testing server (OSX), but have had little luck. Of course, this may be just because the production servers are being used by so many people in so many unpredictable ways that I simply cannot replicate via stress tests...
Many months after I first asked this question, I found the answer.
In a nutshell, the problem was that I was not piping a big asset when transferring it from one server to another. In other words, I was downloading an image from one server, before uploading it to a S3 bucket. Instead of streaming the download into the upload, I downloaded the file into memory, and then uploaded it.
I'm not sure why this did not show up as a memory spike, or elsewhere in my statistics.
My guess is Mongoose. If you are storing large payloads in Mongo, Mongoose can be pretty slow due to how it builds the Mongoose objects. See https://github.com/LearnBoost/mongoose/issues/950 for more details on the problem. If this is the problem you wouldn't see it in Mongo itself since the query returns quickly, but object instantiation could take 75x the query time.
Try setting up timers around (process.hrtime()) before and after you the Mongoose objects are being created to see if that might be the problem. If this is the problem, I would switch to using the node Mongo driver directly instead of going through Mongoose.
You are heavily leaking memory, try setting every object to null as soon as you don't need it anymore! Read this.
More information about hunting down memory leaks can be found here.
Give special attention to having multiple references to the same object and check if you have circular references, those are a pain to debug but will help you very much.
Try invoking the garbage collector manually every minute or so (I don't know if you can do this in node.js cause I'm more of a c++ and php coder). From my years of experience working with c++ I can tell you the most likely cause of your application slowing down over time is memory leaks, find them and plug them, you'll be ok!
Also assuming you're not caching and/or processing images, audio or video in memory or anything like that 150M heap is a lot! Those could be hundreds of thousands or even millions of small objects.
You don't have to be running out of memory for your application to slow down... just searching for free memory with that many objects already allocated is a huge job for the memory allocator, it takes a lot of time to allocate each new object and as you leak more and more memory that time only increases.
Is "--nouse-idle-connection" a mistake? do you really mean "--nouse_idle_notification".
I think it's maybe some issues about gc with too many tiny objects.
node is single process, so watch the most busy cpu core is much important than the load.
when your program is slow, you can execute "gdb node pid" and "bt" to see what node is busy doing.
What I'd do is set up a parallel node instance on the same server with some kind of echo service and test that one. If it runs fine, you narrow down your problem to your program code (and not a scheduler/OS-level problem). Then, step by step, include the modules and test again. Certainly this is a lot of work, takes long and I dont know if it is doable on your system.
If you need to get this working now, you can go the NASA redundancy route:
Bring up a second copy of your production servers, and put a proxy in front of them which routes each request to both stacks and returns the first response. I don't recommend this as a perfect long-term solution but it should help significantly reduce issues in production now, and help you gather log data that you could replay to recreate the issues on non-production servers.
Obviously, this is straight-forward for read requests, but more complex for commands which write to the db.
We have a similar problem with our Node.js server. It didn't scale well for weeks and we had tried almost everything as you had. Our problem was in the implicit backlog value which is set very low for high-concurrent environments.
http://nodejs.org/api/http.html#http_server_listen_port_hostname_backlog_callback
Setting the backlog to a significantly higher value (e.g. 10000) as well as tune networking in our kernel (/etc/sysctl.conf on Linux) as described in manual section helped a lot. From this time forward we don't have any timeouts in our Node.js server.

Resources