jsdom and node.js leaking memory - node.js

I found a few reference to people having a similar issue where the answer always was, make sure you call window.close() when done. However that does not seem to be working for me (node 0.8.14 and jsdom 0.3.1)
A simple repro
var util = require('util');
var jsdom=require('jsdom');
function doOne() {
var htmlDoc = '<html><head></head><body id="' + i + '"></body></html>';
jsdom.env(htmlDoc, null, null, function(errors, window) {
window.close();
});
}
for (var i=1;i< 100000;i++ ) {
doOne();
if(i % 500 == 0) {
console.log(i + ":" + util.inspect(process.memoryUsage()));
}
}
console.log ("done");
Output I get is
500:{ rss: 108847104, heapTotal: 115979520, heapUsed: 102696768 }
1000:{ rss: 198250496, heapTotal: 194394624, heapUsed: 190892120 }
1500:{ rss: 267304960, heapTotal: 254246912, heapUsed: 223847712 }
...
11000:{ rss: 1565204480, heapTotal: 1593723904, heapUsed: 1466889432 }
At this point the fan goes wild and the test actually stops...or at leasts starts going very slowly
Does anyone have any other tips than window.close to get rid of the memory leak (or it sure looks like a memory leak)
Thanks!
Peter

Using jsdom 0.6.0 to help scrape some data and ran into the same problem.
window.close only helped slow the memory leak, but it did eventually creep up till the process got killed.
Running the script with
node --expose-gc myscript.js
Until they fix the memory leak, manually calling the garbage collector in addition to calling window.close seems to work:
if (process.memoryUsage().heapUsed > 200000000) { // memory use is above 200MB
global.gc();
}
Stuck that after the call to window.close. Memory use immediately drops back to baseline (around 50MB for me) every time it gets triggered. Barely perceptible halt.
update: also consider calling global.gc() multiple times in succession rather than only once (i.e. global.gc();global.gc();global.gc();global.gc();global.gc();)
Calling window.gc() multiple times was more effective (based on my imperfect tests), I suspect because it possibly caused chrome to trigger a major GC event rather than a minor one. - https://github.com/cypress-io/cypress/issues/350#issuecomment-688969443

You are not giving the program any idle time to do garbage collection. I believe you will run into the same problem with any large object graph created many times tightly in a loop with no breaks.
This is substantiated by CheapSteaks's answer, which manually forces the garbage collection. There can't be a memory leak in jsdom if that works, since memory leaks by definition prevent the garbage collector from collecting the leaked memory.

I had the same problem with jsdom and switcht to cheerio, which is much faster than jsdom and works even after scanning hundreds of sites. Perhaps you should try it, too. Only problem is, that it dosent have all the selectors which you can use in jsdom.
hope it works for you, too.
Daniel

with gulp, memory usage, cleanup, variable delete, window.close()
var gb = setInterval(function () {
//only call if memory use is bove 200MB
if (process.memoryUsage().heapUsed > 200000000) {
global.gc();
}
}, 10000); // 10sec
gulp.task('tester', ['clean:raw2'], function() {
return gulp.src('./raw/*.html')
.pipe(logger())
.pipe(map(function(contents, filename) {
var doc = jsdom.jsdom(contents);
var window = doc.parentWindow;
var $ = jquery(window);
console.log( $('title').text() );
var html = window.document.documentElement.outerHTML;
$( doc ).ready(function() {
console.log( "document loaded" );
window.close();
});
return html;
}))
.pipe(gulp.dest('./raw2'))
.on('end', onEnd);
});
and I had constatly between 200mb - 300mb usage, for 7k files. it took 30 minutes.
It might be helpful for someone, as i googled and didnt find anything helpful.

A work around for this is to run the jsdom related code in a forked child_process and send back the relevant results when done. then kill the child_process.

Related

Does v8/Node actually garbage collect during function calls? - Or is this a sailsJS memory leak

I am creating a sailsJS webserver with a background task that needs to run continuously (if the server is idle). - This is a task to synchronize a database with some external data and pre-cache data to speed up requests.
I am using sails version 1.0. Tthe adapter is postgresql (adapter: 'sails-postgresql'), adapter version: 1.0.0-12
Now while running this application I noticed a major problem: it seems that after some time the application inexplicably crashes with an out of heap memory error. (I can't even catch this, the node process just quits).
While I tried to hunt for a memory leak I tried many different approaches, and ultimately I can reduce my code to the following function:
async DoRun(runCount=0, maxCount=undefined) {
while (maxCount === undefined || runCount < maxCount) {
this.count += 1;
runCount += 1;
console.log(`total run count: ${this.count}`);
let taskList;
try {
this.active = true;
taskList = await Task.find({}).populate('relatedTasks').populate('notBefore');
//taskList = await this.makeload();
} catch (err) {
console.error(err);
this.active = false;
return;
}
}
}
To make it "testable" I reduced the heap size allowed to be used by the application: --max-old-space-size=100; With this heapsize it always crashes about around 2000 runs. However even with an "unlimited" heap it crashes after a few (ten)thousand runs.
Now to further test this I commented out the Task.find() command and implimented a dummy that creates the "same" result".
async makeload() {
const promise = new Promise(resolve => {
setTimeout(resolve, 10, this);
});
await promise;
const ret = [];
for (let i = 0; i < 10000; i++) {
ret.push({
relatedTasks: [],
notBefore: [],
id: 1,
orderId: 1,
queueStatus: 'new',
jobType: 'test',
result: 'success',
argData: 'test',
detail: 'blah',
lastActive: new Date(),
updatedAt: Date.now(),
priority: 2 });
}
return ret;
}
This runs (so far) good even after 20000 calls, with 90 MB of heap allocated.
What am I doing wrong in the first case? This let me to believe that sails is having a memory leak? Or is node unable to free the database connections somehow?
I can't seem to see anything that is blatantly "leaking" here? As I can see in the log this.count is not a string so it's not even leaking there (same for runCount).
How can I progress from this point?
EDIT
Some further clarifications/summary:
I run on node 8.9.0
Sails version 1.0
using sails-postgresql adapter (1.0.0-12) (beta version as other version doesn't work with sails 1.0)
I run with the flag: --max-old-space-size=100
Environment variable: node_env=production
It crashes after approx 2000-2500 runs when in production environment (500 when in debug mode).
I've created a github repository containing a workable example of the code;
here. Once again to see the code at any point "soon" set the flag --max-old-space-size=80 (Or something alike)
I don't know anything about sailsJS, but I can answer the first half of the question in the title:
Does V8/Node actually garbage collect during function calls?
Yes, absolutely. The details are complicated (most garbage collection work is done in small incremental chunks, and as much as possible in the background) and keep changing as the garbage collector is improved. One of the fundamental principles is that allocations trigger chunks of GC work.
The garbage collector does not care about function calls or the event loop.

Releasing big node module from memory

I'm loading a big js module into memory and I want to release it when is no longer needed to free RAM.
The code I am using to test is like this:
var lex = require('./lex.js'); //Big module (10M array)
setInterval(() => console.log(process.memoryUsage()), 1000);
setTimeout(() => {
lex = null;
delete require.cache[require.resolve('./lex.js')];
}, 5000);
// this script outputs each second
// { rss: 151756800, heapTotal: 131487520, heapUsed: 108843840 }
// { rss: 151760896, heapTotal: 131487520, heapUsed: 108850024 }
// ...
// and after 5 seconds there is no change
After 5 seconds the process is still using the same memory as after the initial module load.
What am I doing wrong? Thank you!
Delete Require cache will help you to load once again the content not from the cache ,as of me it will not delete or free up your memory

node process memory usage, resident set size constantly increasing

Quted from What do the return values of node.js process.memoryUsage() stand for? RSS is the resident set size, the portion of the process's memory held in RAM
(how much memory is held in the RAM by this process in Bytes) file size of 'text.txt' used in example is here is 370KB (378880 Bytes)
var http = require('http');
var fs = require('fs');
var express = require('express');
var app = express();
console.log("On app bootstrap = ", process.memoryUsage());
app.get('/test', function(req, res) {
fs.readFile(__dirname + '/text.txt', function(err, file) {
console.log("When File is available = ", process.memoryUsage());
res.end(file);
});
setTimeout(function() {
console.log("After sending = ", process.memoryUsage());
}, 5000);
});
app.listen(8081);
So on app bootstrap: { rss: 22069248, heapTotal: 15551232, heapUsed: 9169152 }
After i made 10 request for '/test' situation is:
When File is available = { rss: 33087488, heapTotal: 18635008, heapUsed: 6553552 }
After sending = { rss: 33447936, heapTotal: 18635008, heapUsed: 6566856 }
So from app boostrap to 10nth request rss is increased for 11378688 bytes which is roughly 30 times larger than size of text.txt file.
I know that this code will buffers up the entire data.txt file into memory for every request before writing the result back to clients, but i expected that after the requst is finished occupied memory for 'text.txt' will be released? But that is not the case?
Second how to set up maximum size of RAM memory which node process can consume?
In js garbage collector does not run immediately after execution of your code. Thus the memory is not freed immediately after execution. You can run GC independently, after working with large objects, if you care about memory consumption. More information you can find here.
setTimeout(function() {
global.gc();
console.log("After sending = ", process.memoryUsage());
}, 5000);
To look at your memory allocation you can run your server with v8-profiler and get a Heap snapshot. More info here.
Try running your example again and give the process some time to run garbage collection. Keep an eye on your process' memory usage with a system monitor and it should clear after some time. If it doesn't go down the process can't go higher in memory usage than mentioned below.
According to the node documentation the memory limit is 512 mb for 32 bit and 1 gb for 64 bit. They can be increased if necessary.

How to find source of memory leak in Node.JS App

I have a memory leak in Node.js/Express app. The app dies after 3-5 days with the following log message:
FATAL ERROR: JS Allocation failed - process out of memory
I setup a server without users connecting, and it still crashes, so I know leak is originating in the following code which runs in the background to sync api changes to the db.
poll(config.refreshInterval)
function poll(refreshRate) {
return apiSync.syncDatabase()
.then(function(){
return wait(refreshRate)
})
.then(function(){
return poll(refreshRate)
})
}
var wait = function wait(time) {
return new Promise(function(resolve){
applog.info('waiting for %s ms..', time)
setTimeout(function(){
resolve(true)
},time)
})
}
What techniques are available for profiling the heap to find the source object(s) of what is taking all the memory?
This takes awhile to crash, so I would need something that logs and I can come back later and analyze.
Is there any option like Java's JVM flag -XX:HeapDumpOnOutOfMemoryError ?
Check out node-memwatch.
It provides a heap diff class:
var hd = new memwatch.HeapDiff();
// your code here ...
var diff = hd.end();
It also has event emitters for leaks:
memwatch.on('leak', function(info) {
// look at info to find out about what might be leaking
});

How to prevent memory leaks in node.js?

We know node.js provides us with great power but with great power comes great responsibility.
As far as I know the V8 engine doesn't do any garbage collection. So what are the most common mistakes we should avoid to ensure that no memory is leaking from my node server.
EDIT:
Sorry for my ignorance, V8 does have a powerful garbage collector.
As far as I know the V8 engine doesn't
do any garbage collection.
V8 has a powerful and intelligent garbage collector in build.
Your main problem is not understanding how closures maintain a reference to scope and context of outer functions. This means there are various ways you can create circular references or otherwise create variables that just do not get cleaned up.
This is because your code is ambigious and the compiler can not tell if it is safe to garbage collect it.
A way to force the GC to pick up data is to null your variables.
function(foo, cb) {
var bigObject = new BigObject();
doFoo(foo).on("change", function(e) {
if (e.type === bigObject.type) {
cb();
// bigObject = null;
}
});
}
How does v8 know whether it is safe to garbage collect big object when it's in an event handler? It doesn't so you need to tell it it's no longer used by setting the variable to null.
Various articles to read:
http://www.ibm.com/developerworks/web/library/wa-memleak/
I wanted to convince myself of the accepted answer, specifically:
not understanding how closures maintain a reference to scope and context of outer functions.
So I wrote the following code to demonstrate how variables can fail to be cleaned up, which people may find of interest.
If you have watch -n 0.2 'ps -o rss $(pgrep node)' running in another terminal you can watch the leak occurring. Note how commenting in either the buffer = null or using nextTick will allow the process to complete:
(function () {
"use strict";
var fs = require('fs'),
iterations = 0,
work = function (callback) {
var buffer = '',
i;
console.log('Work ' + iterations);
for (i = 0; i < 50; i += 1) {
buffer += fs.readFileSync('/usr/share/dict/words');
}
iterations += 1;
if (iterations < 100) {
// buffer = null;
// process.nextTick(function () {
work(callback);
// });
} else {
callback();
}
};
work(function () {
console.log('Done');
});
}());
active garbage collection with:
node --expose-gc test.js
and use with:
global.gc();
Happy Coding :)

Resources