NodeJs slowing down when process consuming big amount of memory - node.js

So, i have a nodejs process which is, when scaled consumes around 5-8gb of RAM. It is running within the docker container. launching with the arg --max-old-space-size=12192 to increase the node process limit.
The memory consumption is OK, since i try to use a dedicated server (AMD EPYC CPU, 64GB memory) in place of the horizontal scaling with AWS or other cloud provider, because it is 10x cheaper in case if i can make it work on dedicated server (most of expenses for AWS/Google Cloud goes for network traffic, while the VDS have unlimited. The network side is already optimised with the use of GraphQL and minimising the amount of requests). The process itself processes huge amount of data in memory, in multithreaded fashion. There is no further significant optimisation from the side of the process code itself.
When the process memory consumption reaches 3Gb+, it is significantly slowing down. Docker is not limiting the container resources. The server itself is running on 5-10% load in terms of memory and CPU. SSD driver -> low drive load (low amount of I/O on the server side).
I guess re-writing the app to golang for example might improve it significantly, however that is really a lot of work.
Anything can be done on the server setup / nodejs app side to prevent slowing down?
Thanks!

Solved by horizontal scaling within a single VDS with docker.
As jfriend00 noticed, the nature of the problem might be with the garbage collection - personally have no other guess.

Related

How to prevent a Node.js API REST app to be unresponsive on high CPU/RAM usage?

I'm building a Node.js + Express API REST app. So long the CPU and RAM usage is on normal levels, but one of the latest endpoints designed is taking up too much RAM and CPU:
I'm talking about an API whose goal is to generate .pdf files in real time using template data (we're using the library pdf-puppeteer). But when this API is tasked to generate hundreds of .pdf's the Node.js application goes unresponsive and we cannot call the other API's as they give either a timeout error or they take too long to give a response, even the simpler ones.
I'm using pm2 for load balancing, and we've tried to delegate the pdf creation process to worker processes so the event loop doesn't get blocked. And it was succesful to some extent, but still the CPU and RAM consumption is very high and the API's start to get unresponsive neverthless.
So how can this high CPU and RAM usage be prevented on heavy processes, so the application doesn't get unresponsive? Maybe using a throttling approach?
You can use Docker/Kubernetes stack. And scale up your environment.

100% Memory usage on Azure App Service Plan with two Apps - working set used 10gb+

I've got an app service plan with 14gb of memory - it should be plenty for my application's needs. There are two application services running on it, each identical - the private memory consumption of these hovers around 1gb but can spike to 4gb during periods of high usage. One app has a heavier usage pattern than the other.
Lately, during periods of high usage, I've noticed that the heavily used service can become unresponsive, and memory usage stays at 100% in the App Service Plan.
The high traffic service is using 4gb of private memory and starting to massively slow down. When I head over to the /scm.../ProcessExplorer/ page, I can see that the low traffic service has 1gb private memory used and 10gb of 'Working Set'.
As I understand it, on a single machine at least, the working set should be freed up when that memory is needed on another process. Does this happen naturally when two App Services share a single Plan?
It looks to me like the working set on the low-traffic instance is not being freed up to supply the needs of the high-traffic App Service.
If this is indeed the case, the simple fix is to move them to separate App Service Plans, each with 7gb of memory. However this seems like it might potentially be just shifting the problem around - has anyone else noticed similar issues with multiple Apps on a single App Service Plan? As far as I understand it, these shouldn't interfere with one another to the extent that they all need to be separated. Or have I got the wrong diagnosis?
In some high memory-consumption scenarios, your app might truly require more computing resources. In that case, consider scaling to a higher service tier so the application gets all the resources it needs. Other times, a bug in the code might cause a memory leak. A coding practice also might increase memory consumption. Getting insight into what's triggering high memory consumption is a two-part process. First, create a process dump, and then analyze the process dump. Crash Diagnoser from the Azure Site Extension Gallery can efficiently perform both these steps. For more information.
refer Capture and analyze a dump file for intermittent high memory for Web Apps.
In the end we solved this one via mitigation, rather than getting to the root cause.
We found a mitigation strategy to our previous memory issues several months ago, which was just to restart the server each night using a powershell script. This seems to prevent the memory just building up over time, and only costs us a few seconds of downtime. Our system doesn't have much overnight traffic as our users are all based in the same geographic location.
However we recently found that the overnight restart was reporting 'success' but actually failing each night due to expired credentials. Which meant that the memory issues we were having in the question I posted were actually exacerbated by server uptimes of several weeks. Restoring the overnight restart resolved the memory issues we were seeing, and we certainly don't see our system ever using 10gb+ again.
We'll investigate the memory issues if they rear their heads again. KetanChawda-MSFT's suggestion of using memory dumps to analyse the memory usage will be employed for this investigation when it's needed.

For a node web server, is it better to have more vCPUs or RAM

I am running a node app on a Digital Ocean cloud server, and the app merely services API requests. All client-side assets are served by a CDN, and the DB is accessed remotely, rather than stored on the server instance itself.
I have the choice of a greater number of vCPUs or RAM. I have no idea what that means in any way, so any feedback is a great help.
A single node.js server will run your Javascript on only one CPU so it doesn't help your Javascript run any faster to have more CPUs unless you cluster your app and run multiple node.js processes sharing the load of your app or unless there are other processes on the same server that are being used by your server.
Having more RAM (memory) will only improve things if you actually need more RAM. That depends entirely upon what the memory usage profile is of your app and how much RAM you already have available. Probably, you would already know if you were running out of RAM because you either get drastic slow-down when the OS starts page swapping or your process crashes when out of memory.
So, in order to know which would benefit you more, you really need more data on how your existing app is performing (whether it is ever bog down with CPU intensive operations and how much RAM it uses compared to how much you have available). It is quite possible that neither will actually matter to you - it totally depends upon the usage profile or your server process.
If you have no more data than this and have to make a choice, choose the vCPUs because there are some circumstances where it might help you (and gives you the option to go to clustering in the future if needed) whereas adding more RAM when you aren't even using what you already have won't help you at all.

Solr I/O increases over time

I am running around eight solr servers (version 3.5) instances behind a Load Balancer. All servers are identical and the LB is weighted by number connections. The servers have around 4M documents and receive a constant flow of queries. When the solr server starts, it works fine. But after some time running, it starts to take longer respond to queries, and the server I/O goes crazy to 100%. Look at the New Relic graphic:
If the servers behaves well in the beginning, I it starts to fail after some time? Then if I restart the server, it gets back to low I/O for same time and this repeats over and over.
The answer to this question is related to the content in this blog post.
What happens in this case is that queries are highly dependent of reading solr indexes. These indexes are in disk, to I/O i high. To optimize disk accesses, Linux OS creates a cache in memory for the most accessed disk areas. It uses free memory (not occupied my applications) for this cache. When the memory is full, the server needs to read from disks again. For this reason, when solr restarts, JVM occupies less memory and there is more free space for disk cache.
(The problem is happening in a server with 15Gb RAM and a 20Gb solr index)
The solution is to simple increase the server's RAM, so the whole index fits into memory and no I/O is required.

What is the minimum system requirement to run nodejs app with pm2?

I am new to pm2 concept,I am facing problem where my cpu usage increases and reaches upto 100% memory and my server goes down resulting to crashing of website,so can anyone please consult me on this.Do I need to change the configuration of my production(live) server such as increasing memory?My code is also neccessary and sufficient.I am ec2 user.
The system requirements will mostly depend on your application which you told nothing about. If CPU reaches 100% then you likely have some tight loop that is actively adding delays by burning cycles synchronously or something like that. The 100% memory usage can mean memory leaks and in that case no RAM will be sufficient because leaking memory will use up all your RAM eventually, no matter how large it is.
You need to profile your application with real usage patterns on a system where that app works and only then you will know how much resources it needs. This is true for every kind of application.
Additionally if you notice that resources usage grown over time then it may be a sign of some resource leaking, like memory leaking, spawning processes that don't exit but use CPU and RAM, etc.
first of all i would like to suggest you to follow these guideline for production envoiremnt.
1) disable morgon if you enable it as a dev envoiremnt.
2) use nginx or pm2 for load balancing.
or you can easily handle load balancing by using this command
pm2 start server.js -i 10
3)handle uncaugh exception. ie:
process.on("uncaughtException".function (err){
//do error handling
})

Resources