Memory leak with nodejs

Memory leak with nodejs - node.js

I load my calendar from google. But every time I do it, node uses 2 mb more memory. Even if I delete the module. I need to load it every 5 or 10 min so I can see if there are changes. this is my code.
google-calender.js module
module.exports = {
loadCalendars: function(acces, res){
gcal = require('google-calendar');
google_calendar = new gcal.GoogleCalendar(acces);
google_calendar.calendarList.list(function(err, calendarList) {
toLoadCalenders = calendarList.items.length;
loaded = 0;
data = [];
for(var i = 0; i < toLoadCalenders; i++){
google_calendar.events.list(calendarList.items[i].id, function(err, calendarList) {
loaded++;
data.push(calendarList);
if (loaded == toLoadCalenders) {
res.send(data);
}
});
}
});
}
}
main.js
app.get('/google-calender', function (req, res) {
google = require('./google-calender');
google.loadCalendars(acces, res);
setTimeout(function(){
delete google;
},500);
});
Does anyone know how I can prevent memory leak here?

Well, memory leak topic is kind a tough area for any developer, first of all you need to know if you have a memory leak or not, I recommend using node inspector and do the following:
1- run your node app with node-inspector on.
2- take a heap snapshot on fresh start so you can know the initial memory size been used by your app.
3- do some requests, you may use some benchmarking tool, meanwhile take a second snapshot.
4- compare snapshot number one with snapshot number 2, detect where the increasing happening, then note that.
5- stop making requests and wait a little bit so we make sure garbage collector has finished its work then take third snapshot.
6- compare the size of snapshot 3 with snapshot 2, you may see that snapshot 3 has freed size more than snapshot 2.
You may do this test many times, if always the last snapshot has increasing memory allocation than its predecessors snapshots, then you may have a memory leak.
How to fix memory leak ?
well, you need to be familiar with cases in javascript where memory leaks happen, you can read this article and match similar cases in your code.
then you can read the details of snapshot tries you had and match the linearly increasing part, and figure out where in your code you have such data types or arrays, objects or even module codes that been repeatedly required and never disposed.
Actually for coincidence, we had such case today and we had to go through this troubleshooting steps to get our hands on the problem.
Good luck my friend.

Related

Any way to hint/force Garbage Collection on Node Google Cloud Function

I've got a Google Cloud Function that is running out of memory even though it shouldn't need to.
The function compiles info from a number of spreadsheets, the spreadsheets are large but handled sequentially.
Essentially the function does:
spreadsheets.forEach(spreadsheet => {
const data = spreadsheet.loadData();
mainSpreadsheet.saveData(data);
});
The data is discarded on each loop, so the garbage collector could clean up the memory, but in practice that doesn't seem to be happening and the process is crashing close to the end.
I can see from other answers it is possible to force garbage collection or even prevent node from over allocating memory
However, both of these involve command line arguments which I can't control with a cloud function. Is there any work around, or am I stuck with this as an issue when using Google Cloud Functions?

A colleague tipped me off that changing the code to
spreadsheets.forEach(spreadsheet => {
let data = spreadsheet.loadData();
mainSpreadsheet.saveData(data);
data = null;
});
Might be enough to tip the GC off that it can clean up that structure.
I was skeptical, but the function is now running to completion. Turns out you can hint to the GC in node

Strange memory usage when using CacheEntryProcessor and modifying cache entry

I wonder if anybody can explain what is going wrong with my code.
I have an IgniteCache of Long->Object[] which is a kind of batching mechanism.
The cache is onheap,partitioned and has one backup configured.
I want to modify some of the objects within the cache entry value array.
So I wrote and implementation of CacheEntryProcessor
#Override
public Object process(MutableEntry<Long, Object[]> entry, Object... arguments)
throws EntryProcessorException {
boolean updated = false;
int key = (int)arguments[0];
Set<Long> someIds = Ignition.ignite().cluster().nodeLocalMap().get(key);
Object[] values = entry.getValue();
for (int i = 0; i < values.length; i++) {
Person p = (Person) values[i];
if (someIds.contains(p.getId())) {
p.modify();
if (!updated) {
updated = true;
}
}
}
if (updated) {
entry.setValue(values);
}
return null;
}
}
When the cluster is loaded with data each node consumes around 20GB of heap.
When I run the cache processor with cache.invokeAll on multiple node cluster I have a crazy memory behavior - when the processor is being run I see memory usage going up to even 48GB or higher eventually leading to node separation from the cluster cause GC took too long.
However, if I remove the entry.setValue(values) line, which stores back the modified array into the cache everything is ok, apart from the fact that the data will not be replicated since the cache is not aware of the change - the update is only visible on the primary node :(
Can anybody tell me how to make it work? What is wrong with this approach?

First of all, I would not recommend to allocate large heap sizes. This will very likely cause a long GC pause even if everything is working properly. Basically, JVM will not clean up memory until it reaches certain threshold, and when it does reach, there will be to much garbage to collect. Try switching to off-heap or start more Ignite nodes.
The fact that more garbage is generated in case you update the entry, makes perfect sense. Basically each time you update you replace old value with a new one, and the old one becomes garbage.
If none of this helps, grab a heap dump and check what is occupying the memory.

What exactly is a TickObject and how prevent it becoming a memory leak?

I've noticed this a couple of times in a few of my scripts. Mainly scripts focusing around array iteration and manipulation.
I keep seeing massive memory leaks that eventually lead to the script running out of memory and dying. After inspection, it seems to be caused by a massive amount of TickObjects being created and not cleaned up.
A bit of reading pushed me to think that TickObject is an internal feature of node to manage asynchronous events. Is that what they're actually used for? Why would they spiral out of control? And how can it be prevented?
In case it helps, heres a dump (warning its around 312mb) https://www.dropbox.com/s/57t70t2igpo8kbi/heapdump-604700798.654094.heapsnapshot?dl=0 of an example where its spiralling out of control.
EDIT:
Managed to simplify the offending code. And strangely looks to be a combo of using process.stdout.write?
var a = Array(100)
.fill()
.map(_ => Math.round(Math.random()) ? true : false),
i = 0;
while (true) {
a.map(_ => !_);
process.stdout.write(`#${++i}\r`);
}
Run this and you'll quickly run out of memory. Is this behaviour to be expected? (I assume not) Or is it just weird behaviour with node?

Worked it out. The array was a red herring and in reality the actual problem was the process.stdout.write
On each write, an afterWrite clean up function is being queued up as a TickObject each time. As we never leave the Tick/context we're in, this just amass till node explodes. Solution? Make any long/forever running code blocks asynchronous with ticks as well.
var a = Array(100)
.fill()
.map(_ => Math.round(Math.random()) ? true : false),
i = 0;
(function whileLoop () {
a.map(_ => !_);
process.stdout.write(`#${++i}\r`);
process.nextTick(whileLoop);
})();
Victory!

How to dump large data sets in mongodb from node.js

I'm trying to dump approx 2.2 million objects in mongodb (using mongoose). The problem is when I save all the objects one by one It gets stuck. I've kepts a sample code below. If I run this code for 50,000 it works great. But if I increase data size to approx 500,000 it gets stuck.I want to know what is wrong with this approach and I want to find a better way to do this. I'm quite new to nodejs. I've tried loop's and everything no help finally I found this kind of solution. This one works fine for 50k objects but gets stuck for 2.2 Million objects. and I get this after sometime
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Aborted (core dumped)
var connection = mongoose.createConnection("mongodb://localhost/entity");
var entitySchema = new mongoose.Schema({
name: String
, date: Date
, close : Number
, volume: Number
, adjClose: Number
});
var Entity = connection.model('entity', entitySchema)
var mongoobjs =["2.2 Millions obejcts here populating in code"] // works completely fine till here
async.map(mongoobjs, function(object, next){
Obj = new Entity({
name : object.name
, date: object.date
, close : object.close
, volume: object.volume
, adjClose: object.adjClose
});
Obj.save(next);
}, function(){console.log("Saved")});

Thanks cdbajorin
This seem to be much better way and a little faster batch approach for for doing this. So what I learned was that in my earlier approach, "new Entity(....)" was taking time and causing memory overflow. Still not sure why.
So, What I did was rather than using this line
Obj = new Entity({
name : object.name
, date: object.date
, close : object.close
, volume: object.volume
, adjClose: object.adjClose
});
I just created JSON objects and stored in an array.
stockObj ={
name : object.name
, date: object.date
, close : object.close
, volume: object.volume
, adjClose: object.adjClose
};
mongoobjs.push(stockObj); //array of objs.
and used this command... and Voila It worked !!!
Entity.collection.insert(mongoobjs, function(){ console.log("Saved succesfully")});

nodejs uses v8 which has the unfortunate property, from the perspective of developers coming from other interpreted languages, of severely restricting the amount of memory you can use to something like 1.7GB regardless of available system memory.
There is really only one way, afaik, to get around this - use streams. Precisely how you do this is up to you. For example, you can simply stream data in continuously, process it as it's coming in, and let the processed objects get garbage collected. This has the downside of being difficult to balance input to output.
The approach we've been favoring lately is to have an input stream bring work and save it to a queue (e.g. an array). In parallel you can write a function that is always trying to pull work off the queue. This makes it easy to separate logic and throttle the input stream in case work is coming in (or going out) too quickly.
Say for example, to avoid memory issues, you want to stay below 50k objects in the queue. Then your stream-in function could pause the stream or skip the get() call if the output queue has > 50k entries. Similarly, you might want to batch writes to improve server efficiency. So your output processor could avoid writing unless there are at least 500 objects in the queue or if it's been over 1 second since the last write.
This works because javascript uses an event loop which means that it will switch between asynchronous tasks automatically. Node will stream data in for some period of time then switch to another task. You can use setTimeout() or setInterval() to ensure that there is some delay between function calls, thereby allowing another asynchronous task to resume.
Specifically addressing your problem, it looks like you are individually saving each object. This will take a long time for 2.2 million objects. Instead, there must be a way to batch writes.

As an addition to answers provided in this thread, I was successful with
Bulk Insert (or batch insertion of ) 20.000+ documents (or objects)
Using low memory (250 MB) available within cheap offerings of Heroku
Using one instance, without any parallel processing
The Bulk operation as specified with MongoDB native driver was used, and the following is the code-ish that worked for me:
var counter = 0;
var entity= {}, entities = [];// Initialize Entities from a source such as a file, external database etc
var bulk = Entity.collection.initializeOrderedBulkOp();
var size = MAX_ENTITIES; //or `entities.length` Defined in config, mine was 20.0000
//while and -- constructs is deemed faster than other loops available in JavaScript ecosystem
while(size--){
entity = entities[size];
if( entity && entity.id){
// Add `{upsert:true}` parameter to create if object doesn't exist
bulk.find({id: entity.id}).update({$set:{value:entity.value}});
}
console.log('processing --- ', entity, size);
}
bulk.execute(function (error) {
if(error) return next(error);
return next(null, {message: 'Synced vector data'});
});
Entity is a mongoose model.
Old versions of mongodb may not support Entity type as it was made available from version 3+.
I hope this answer helps someone.
Thanks.

How to track object inside heap in node.js to find memory leak?

I have memory leak, and I know where is it (I think so), but I don't know why it is happening.
Memory leak occurs while load-testing following endpoint (using restify.js server):
server.get('/test',function(req,res,next){
fetchSomeDataFromDB().done(function(err,rows){
res.json({ items: rows })
next()
})
})
I am pretty sure that res object is not disposed (by garbage collector). On every request memory used by app is growing. I have done some additional test:
var data = {}
for(var i = 0; i < 500; ++i) {
data['key'+i] = 'abcdefghijklmnoprstuwxyz1234567890_'+i
}
server.get('/test',function(req,res,next){
fetchSomeDataFromDB().done(function(err,rows){
res._someVar = _.extend({},data)
res.json({ items: rows })
next()
})
})
So on each request I am assigning big object to res object as its attribute. I observed that with this additional attribute memory grows much faster. Memory grows like 100Mb per 1000 requests done during 60 sec. After next same test memory grows 100mb again, and so on. Now when I know that res object is not "released" how I can track what is still keeping reference to res? Let say I will perform heap snapshot - how I can find what is referecing res?
screenshot of heap comparison between 10 requests:
Actually it seems that Instance.DAO is leaking?? this class belongs to ORM that I am using to query DB... What do you think?
One more screen of same coparison sorted by #delta:

It seems more likely that the GC hasn't collected the object yet since you are not leaking res anywhere in this code. Try running your script with the --expose-gc node argument and then set up an interval that periodically calls gc();. This will force the GC to run instead of being lazy.
If after that you find that are leaking memory for sure, you could use tools like the heapdump module to use the Chrome developer heap inspector to see what objects are taking up space.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string