System CPU usage spikes when running chat application made in Express JS - node.js

I have developed a chat application using node JS, express JS, mongodb and socket io. Messages sent & received are stored in mongodb. But when I run the chat application the server CPU usage spikes continuously. See the attached screenshot:
It seems that something is stacking up. When I stop the chat application then the usage is resumed to minimum. What can be possible methods to fix this ?

There are a few potential methods you could use to debug this issue:
Use the node.js built-in debugger. You can set breakpoints in your code and then run your chat application under the debugger to see where the CPU usage is spikes are happening.
Use a profiler to take a look at what your chat application is doing when it's running. This can help you to identify which parts of the code are using up the most CPU time.
Use a performance monitoring tool to track the CPU usage of your chat application over time. This can help you to identify whether the issue is getting worse or if there are any patterns to the spikes in usage.
Try to reproduce the issue in a test environment and then use a debugging tool like strace or ltrace to see what system calls are being made when the CPU usage spikes. This can help you to identify what the chat application is doing that is causing the issue.
It's hard to answer your question without more information, but some sanity checks:
Make sure that you are not running too many processes on your server. If you are running multiple node.js applications on the same server, that can lead to high CPU usage. Try running each application on its own server, or limiting the number of processes that are running on each server.
Try using a different server environment. If you are using a shared hosting environment, the CPU usage may be spikes due to other users on the same server. Try using a dedicated server, or a virtual private server, which can help to reduce the CPU usage.

Related

node.js CPU usage spikes

I have an express.js app running in cluster mode (nodejs cluster module) on a linux production server. I'm using PM2 to manage this app. It usually uses less than 3% CPU. However, sometimes the CPU usage spikes up to 100% for a short period of time (less than a minute). I'm not able to reproduce this issue. This only happens once a day or two.
Is there any way to find out which function or route is causing the sudden CPU spikes, using PM2? Thanks.
i think have some slow synchronous execution on some request in your application.
add log every request income on middleware and store to elastic search and find what request have long response time or use newrelic (easy way but spend more money).
use blocked-at to find slow synchronous execution if detect try to use worker threads or use lib workerpool
My answer is based purely on my experience with the topic
Before going to production make local testing like:
stress testing.
longevity testing.
For both tests try to use tool like JMeter where you can put your one/multiple endpoints and run loads of them in period of time while monitoring CPU & MEMORY Usage.
If everything is fine, try also to stop the test and run the api manually try to monitor its behavior, this will help you if there is
memory leak from the APIs themselves
Is your app going through .map() , .reduce() for huge arrays?
Is your app is working significantly better after reboot?
if yes, then you need to suspect that the express app experiencing memory leak and Garbage collector trying to clean the mess.
If it's possible, try to rewrite the app using fastify, personally, this did not make the app much faster, but able to handle 1.5X more requests.

How to prevent a Node.js API REST app to be unresponsive on high CPU/RAM usage?

I'm building a Node.js + Express API REST app. So long the CPU and RAM usage is on normal levels, but one of the latest endpoints designed is taking up too much RAM and CPU:
I'm talking about an API whose goal is to generate .pdf files in real time using template data (we're using the library pdf-puppeteer). But when this API is tasked to generate hundreds of .pdf's the Node.js application goes unresponsive and we cannot call the other API's as they give either a timeout error or they take too long to give a response, even the simpler ones.
I'm using pm2 for load balancing, and we've tried to delegate the pdf creation process to worker processes so the event loop doesn't get blocked. And it was succesful to some extent, but still the CPU and RAM consumption is very high and the API's start to get unresponsive neverthless.
So how can this high CPU and RAM usage be prevented on heavy processes, so the application doesn't get unresponsive? Maybe using a throttling approach?
You can use Docker/Kubernetes stack. And scale up your environment.

Garbage collection causes lag on connected sockets (NodeJS Server)

I am hosting a game website on Heroku that runs on a NodeJS Server. Clients are connected via sockets using the package socket.io.
Once in a while when the garbage collection cycle is triggered, connected clients would experience severe lag and often, disconnections. This is experienced by the clients through delayed incoming chat and delayed inputs to the game.
When I look into the logs, I find error messages relating to the Garbage Collection. Please see the attached logs below. When these GC events happen, sometimes it causes massive memory spikes to the point where the app would exceed it's allowed 0.5GB RAM and would be killed by Heroku. Lately however, the memory spikes don't occur as often, but the severe lag on the client side still happens around once or twice a day.
One aspect of the lag is through the chat. When a user types a message through "All Chat" (and any chat channel), the server currently console.log()'s it to the standard out. I happened to be watching the logs live one time during a spike event and noticed that chat being outputted to the terminal was in real time with no delay, however clients (I was also on the website myself as a client) received these messages in a very delayed fashion.
I have found online a NodeJS bug (that I think was fixed) that would cause severe lag when too much was being console.loged to the screen so I ran a stress test by sending 1000 messages from the client per second, for a minute. I could not reproduce the spike.
I have read many guides on finding memory leaks, inspecting the stack etc. but I'm very unsure how to run these tests on a live Heroku server. I have suspicions that my game objects on closing, are not being immediately cleared out and are all being cleared at once, causing the memory spikes, but I am not confident. I don't know how to best debug this. It is also difficult for me to catch this happening live as it only happens when more than 30+ people are logged in (Doesn't happen often as this is still a fairly small site).
The error messages include references to the circular-json module I use, and I also suspect that this may be causing infinite callbacks on itself somehow and not clearing out correctly, but I am not sure.
For reference, here is a copy of the source code: LINK
Here is a snippet of the memory when a spike happens:
Memory spike
Crash log 1: HERE
Crash log 2: HERE
Is there a way I can simulate sockets or simulate the live server's environment (i.e. connected clients) locally?
Any advice on how to approach or debug this problem would be greatly appreciated. Thank you.
Something to consider is that console.log will increase memory usage. If you are logging verbosely with large amounts of data this can accumulate. Looking quickly at the log, it seems you are running out of memory? This would mean the app starts writing to disk which is slower and will also run garbage collection spiking CPU.
This could mean a memory-leak due to resources not being killed/closed and simply accumulating. Debugging this can be a PITA.
Node uses 1.5GB to keep long-live objects around. Seems like you on a 500mb container so best to configure the web app to start like:
web: node --optimize_for_size --max_old_space_size=460 server.js
While you need to get to the bottom of the leak, you can also increase availability by running more than one worker and also more than one node instance and use socket.io-redis to keep the instance is in sync. I highly recommend this route.
Some helpful content on Nodejs memory on Heroku.
You can also spin up multiple connections via node script to interact with your local dev server using socket.io-client and monitor the memory locally and add logging to ensure connections are being closed correctly etc.
I ended up managing to track my "memory leak" down. It turns out I was saving the games (in JSONified strings) to the database too frequently and the server/database couldn't keep up. I've reduced the frequency of game saves and I haven't had any issues.
The tips provided by Samuel were very helpful too.

Issue in pm2 - It stops responding

Am facing issue in my application servers. Assume that - there are two nodes in the Load-balancer.
Suddenly one of the node from them becomes unhealthy.
When I logged in that instance. There were no logs coming in pm2.
then I check its CPU it was very high.
So please guide me how can I fix this issue. Or any way to debug it.
Check out flame graphs to see where your Node app is CPU bound.
You can also use the new debugging system in Node 6.3 (--inspect) to debug with the full power of Chrome DevTools.
PM2 has some limited protection for runaway issues like this via the max-memory-restart option. Typically, high CPU will also correlate with high memory usage and this option can be used to restart your app when it begins consuming large amounts of memory (which in your case may or may not be the correct moment but it should help).
--max-memory-restart <memory> specify max memory amount used to autorestart (in octet or use syntax like 100M)

Troubleshooting a Hanging Java Web App

I have a web application that hangs under high loads. I'm not going to go into the specifics of the code because I really just want some troubleshooting advice and tooling recommendations.
It's a web app, so each request get's a thread. Under a high load test, the app begins to consume all of the cpu, while becoming unresponsive. I suspect that the request threads are hanging in the new code that we are testing. Due to the fact of the cpu consumption, I'm assuming this must be on my app side. My understanding, which could be wrong, is that total cpu consumption indicated my first troubleshooting efforts should be in looking at the code that's consuming those cycles.
What are some tools and/or methods for inspecting which threads are hanging and on what lines of code? Again, I can easily force the app into the problematic behavior.
I've found and been trying out visualvm. Seems like the perfect tool. Still open for suggestions though. I looked at eclipse TPTP and it seems to be end-of-life-ing as well as requiring a more heavy weight deployment.
You can insert logging messages at starting a thread and closing a thread. Then you start the application and inspect the output while penetrating the code.
Another way is to look for memory leaks. If you are sure you haven't one, you can extend the virtual memory of your JVM.
#chad: do you have Database in whole picture...you may want to start by looking what is happening at DB side...you can very well look into DB locks, current sessions etc.

Resources