How to track object inside heap in node.js to find memory leak? - node.js

I have memory leak, and I know where is it (I think so), but I don't know why it is happening.
Memory leak occurs while load-testing following endpoint (using restify.js server):
server.get('/test',function(req,res,next){
fetchSomeDataFromDB().done(function(err,rows){
res.json({ items: rows })
next()
})
})
I am pretty sure that res object is not disposed (by garbage collector). On every request memory used by app is growing. I have done some additional test:
var data = {}
for(var i = 0; i < 500; ++i) {
data['key'+i] = 'abcdefghijklmnoprstuwxyz1234567890_'+i
}
server.get('/test',function(req,res,next){
fetchSomeDataFromDB().done(function(err,rows){
res._someVar = _.extend({},data)
res.json({ items: rows })
next()
})
})
So on each request I am assigning big object to res object as its attribute. I observed that with this additional attribute memory grows much faster. Memory grows like 100Mb per 1000 requests done during 60 sec. After next same test memory grows 100mb again, and so on. Now when I know that res object is not "released" how I can track what is still keeping reference to res? Let say I will perform heap snapshot - how I can find what is referecing res?
screenshot of heap comparison between 10 requests:
Actually it seems that Instance.DAO is leaking?? this class belongs to ORM that I am using to query DB... What do you think?
One more screen of same coparison sorted by #delta:

It seems more likely that the GC hasn't collected the object yet since you are not leaking res anywhere in this code. Try running your script with the --expose-gc node argument and then set up an interval that periodically calls gc();. This will force the GC to run instead of being lazy.
If after that you find that are leaking memory for sure, you could use tools like the heapdump module to use the Chrome developer heap inspector to see what objects are taking up space.

Related

Any way to hint/force Garbage Collection on Node Google Cloud Function

I've got a Google Cloud Function that is running out of memory even though it shouldn't need to.
The function compiles info from a number of spreadsheets, the spreadsheets are large but handled sequentially.
Essentially the function does:
spreadsheets.forEach(spreadsheet => {
const data = spreadsheet.loadData();
mainSpreadsheet.saveData(data);
});
The data is discarded on each loop, so the garbage collector could clean up the memory, but in practice that doesn't seem to be happening and the process is crashing close to the end.
I can see from other answers it is possible to force garbage collection or even prevent node from over allocating memory
However, both of these involve command line arguments which I can't control with a cloud function. Is there any work around, or am I stuck with this as an issue when using Google Cloud Functions?
A colleague tipped me off that changing the code to
spreadsheets.forEach(spreadsheet => {
let data = spreadsheet.loadData();
mainSpreadsheet.saveData(data);
data = null;
});
Might be enough to tip the GC off that it can clean up that structure.
I was skeptical, but the function is now running to completion. Turns out you can hint to the GC in node

Strange memory usage when using CacheEntryProcessor and modifying cache entry

I wonder if anybody can explain what is going wrong with my code.
I have an IgniteCache of Long->Object[] which is a kind of batching mechanism.
The cache is onheap,partitioned and has one backup configured.
I want to modify some of the objects within the cache entry value array.
So I wrote and implementation of CacheEntryProcessor
#Override
public Object process(MutableEntry<Long, Object[]> entry, Object... arguments)
throws EntryProcessorException {
boolean updated = false;
int key = (int)arguments[0];
Set<Long> someIds = Ignition.ignite().cluster().nodeLocalMap().get(key);
Object[] values = entry.getValue();
for (int i = 0; i < values.length; i++) {
Person p = (Person) values[i];
if (someIds.contains(p.getId())) {
p.modify();
if (!updated) {
updated = true;
}
}
}
if (updated) {
entry.setValue(values);
}
return null;
}
}
When the cluster is loaded with data each node consumes around 20GB of heap.
When I run the cache processor with cache.invokeAll on multiple node cluster I have a crazy memory behavior - when the processor is being run I see memory usage going up to even 48GB or higher eventually leading to node separation from the cluster cause GC took too long.
However, if I remove the entry.setValue(values) line, which stores back the modified array into the cache everything is ok, apart from the fact that the data will not be replicated since the cache is not aware of the change - the update is only visible on the primary node :(
Can anybody tell me how to make it work? What is wrong with this approach?
First of all, I would not recommend to allocate large heap sizes. This will very likely cause a long GC pause even if everything is working properly. Basically, JVM will not clean up memory until it reaches certain threshold, and when it does reach, there will be to much garbage to collect. Try switching to off-heap or start more Ignite nodes.
The fact that more garbage is generated in case you update the entry, makes perfect sense. Basically each time you update you replace old value with a new one, and the old one becomes garbage.
If none of this helps, grab a heap dump and check what is occupying the memory.

NodeJS V8 not doing proper garbage collecting

I have this simple http server
var http = require('http');
var server = http.createServer(function(request, response) {
var data = [];
for ( var i=0; i < 1000000; i++ ) {
data.push({});
}
response.end('Done');
});
server.listen(3000);
When I start the server the process uses around 8MB of memory.
When I make a request to the server the memory usage rises to above 100MB and it just stays like that. Then I hold F5 for a few seconds to spam some requests and the memory usage grows above 400MB at some points. When all the request are processed the server is still using above 100MB.
When I make another request the memory sometimes goes above or close to 200MB or stays approximately the same. I let the server running for a while and the memory doesn't get released.
I tried putting date = null and the result was the same. Then I tried running the server with the --expose-gc flag and putting global.gc() after I null the value and the results are better but the memory still stays above 50MB.
If your system has an abundance of memory available, there is unlikely to be any condition triggering a need for garbage collection. If you can run up the memory usage to a maximum point and continue serving requests, clearly garbage collection is working, as memory will need to be freed before more is allocated.
You can try starting up a different process to deliberately suck up more memory, then look again to see if the original node.js process' garbage collection seems to behave more aggressively.

Memory leak with nodejs

I load my calendar from google. But every time I do it, node uses 2 mb more memory. Even if I delete the module. I need to load it every 5 or 10 min so I can see if there are changes. this is my code.
google-calender.js module
module.exports = {
loadCalendars: function(acces, res){
gcal = require('google-calendar');
google_calendar = new gcal.GoogleCalendar(acces);
google_calendar.calendarList.list(function(err, calendarList) {
toLoadCalenders = calendarList.items.length;
loaded = 0;
data = [];
for(var i = 0; i < toLoadCalenders; i++){
google_calendar.events.list(calendarList.items[i].id, function(err, calendarList) {
loaded++;
data.push(calendarList);
if (loaded == toLoadCalenders) {
res.send(data);
}
});
}
});
}
}
main.js
app.get('/google-calender', function (req, res) {
google = require('./google-calender');
google.loadCalendars(acces, res);
setTimeout(function(){
delete google;
},500);
});
Does anyone know how I can prevent memory leak here?
Well, memory leak topic is kind a tough area for any developer, first of all you need to know if you have a memory leak or not, I recommend using node inspector and do the following:
1- run your node app with node-inspector on.
2- take a heap snapshot on fresh start so you can know the initial memory size been used by your app.
3- do some requests, you may use some benchmarking tool, meanwhile take a second snapshot.
4- compare snapshot number one with snapshot number 2, detect where the increasing happening, then note that.
5- stop making requests and wait a little bit so we make sure garbage collector has finished its work then take third snapshot.
6- compare the size of snapshot 3 with snapshot 2, you may see that snapshot 3 has freed size more than snapshot 2.
You may do this test many times, if always the last snapshot has increasing memory allocation than its predecessors snapshots, then you may have a memory leak.
How to fix memory leak ?
well, you need to be familiar with cases in javascript where memory leaks happen, you can read this article and match similar cases in your code.
then you can read the details of snapshot tries you had and match the linearly increasing part, and figure out where in your code you have such data types or arrays, objects or even module codes that been repeatedly required and never disposed.
Actually for coincidence, we had such case today and we had to go through this troubleshooting steps to get our hands on the problem.
Good luck my friend.

Streaming / Piping JSON.stringify output in Node.js / Express

I have a scenario where I need to return a very large object, converted to a JSON string, from my Node.js/Express RESTful API.
res.end(JSON.stringify(obj));
However, this does not appear to scale well. Specifically, it works great on my testing machine with 1-2 clients connecting, but I suspect that this operation may be killing the CPU & memory usage when many clients are requesting large JSON objects simultaneously.
I've poked around looking for an async JSON library, but the only one I found seems to have an issue (specifically, I get a [RangeError]). Not only that, but it returns the string in one big chunk (eg, the callback is called once with the entire string, meaning memory footprint is not decreased).
What I really want is a completely asynchronous piping/streaming version of the JSON.stringify function, such that it writes the data as it is packed directly into the stream... thus saving me both memory footprint, and also from consuming the CPU in a synchronous fashion.
Ideally, you should stream your data as you have it and not buffer everything into one large object. If you cant't change this, then you need to break stringify into smaller units and allow main event loop to process other events using setImmediate. Example code (I'll assume main object has lots of top level properties and use them to split work):
function sendObject(obj, stream) {
var keys = Object.keys(obj);
function sendSubObj() {
setImmediate(function(){
var key = keys.shift();
stream.write('"' + key + '":' + JSON.stringify(obj[key]));
if (keys.length > 0) {
stream.write(',');
sendSubObj();
} else {
stream.write('}');
}
});
})
stream.write('{');
sendSubObj();
}
It sounds like you want Dominic Tarr's JSONStream. Obviously, there is some assembly required to merge this with express.
However, if you are maxing out the CPU attempting to serialize (Stringify) an object, then splitting that work into chunks may not really solve the problem. Streaming may reduce the memory footprint, but won't reduce the total amount of "work" required.

Resources