I've been learning about memory management in Nodejs and I'm trying to understand why the following two behaviors occurs:
PS: I'm using the following utility functions to help me print memory to console:
function toMb (bytes) {
return (bytes / 1000000).toFixed(2);
}
function printMemoryData(){
const memory = process.memoryUsage();
return {
rss: `${toMb(memory.rss)} MB -> Resident Set Size - total memory allocated for the process execution`,
heapTotal: `${toMb(memory.heapTotal)} MB -> total size of the allocated heap`,
heapUsed: `${toMb(memory.heapUsed)} MB -> actual memory used during the execution`,
external: `${toMb(memory.external)} MB -> V8 external memory`,
};
}
Part 1) fs.readFile with encoding vs buffers
When I do:
let data;
fs.readFile('path/to/500MB', {}, function(err, buffer){
data = buffer
console.log('Memory usage after files read:', printMemoryData());
});
I get the following output:
Memory usage after files read: {
rss: '565.22 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '11.01 MB -> total size of the allocated heap',
heapUsed: '5.66 MB -> actual memory used during the execution',
external: '524.91 MB -> V8 external memory'
}
Even though I'm storing the data in a local data variable/v8object, the heap isn't used.
But when I do add the encoding:
fs.readFile('path/to/500MB', {encoding: 'utf-8'}, function(err, buffer){
console.log('Memory usage after files read:', printMemoryData());
});
I get the following output:
Memory usage after files read: {
rss: '1088.71 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '535.30 MB -> total size of the allocated heap',
heapUsed: '529.95 MB -> actual memory used during the execution',
external: '524.91 MB -> V8 external memory'
}
Why does the heap get used here instead of in the first function call without an encoding? I don't even have to store the buffer in a local variable for the heap to be used. I also understand that after the next event loop tick in the second example the heap will be cleaned up. But this leads me to my next question in Part 2
Part 2) This part is the same as part 1 but with streams.
const readStream = fs.createReadStream('path/to/500MB');
let data;
readStream.on('data', (buffer) => {
data+= buffer;
});
readStream.on('close', () => {
console.log(printMemoryData());
});
I get the output:
{
rss: '574.57 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '692.75 MB -> total size of the allocated heap',
heapUsed: '508.72 MB -> actual memory used during the execution',
external: '7.97 MB -> V8 external memory'
}
Why the difference in behavior with streams in heap used in part 2 vs the first function without encoding in part 1?
They both have an increase in RSS, but only in the streams does the heap get used when I store the buffer in a local variable/v8 object.
Thanks for any feedback.
I want to determine how large an array would be in memory for when my function executes. Determining the size of the array is easy but I am not seeing a correlation to the size of my array to the Max Memory used that gets recorded at the end of a Lambda execution.
There is no apparent coloration after inspecting process.memoryUsage() before and after setting the array as well as the Max Memory used reported by the Lambda. I can't find a good resource that indicates how/what Lambda actually uses to determine the memory used. Any help would be appreciated?
This question made me curious myself, so I decided to run some tests to see how memory allocation works inside an AWS Lambda container.
Test 1: Create array with 100,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 100000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 56 MB
2019-04-30T01:00:59.577Z cd473d5b-986c-436e-8b36-b114410c84cf { rss: 35299328,
heapTotal: 11853824,
heapUsed: 7590320,
external: 8224 }
REPORT RequestId: 2a7548f9-5d2f-4060-8f9e-deb228730d8c Duration: 155.74 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 56 MB
Test 2: Create array with 1,000,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 1000000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 99 MB
2019-04-30T01:03:44.582Z 547a9de8-35f7-48e2-a53f-ab669b188f9a { rss: 80093184,
heapTotal: 55263232,
heapUsed: 52951088,
external: 8224 }
REPORT RequestId: 547a9de8-35f7-48e2-a53f-ab669b188f9a Duration: 801.68 ms Billed Duration: 900 ms Memory Size: 128 MB Max Memory Used: 99 MB
Test 3: Create array with 10,000,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 10000000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 128 MB
REPORT RequestId: f1df4f39-e0fc-4b44-8f90-c3c0e3d9c12d Duration: 3001.33 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 128 MB
2019-04-30T00:54:32.970Z f1df4f39-e0fc-4b44-8f90-c3c0e3d9c12d Task timed out after 3.00 seconds
I think we can pretty confidently say that the memory used by the lambda container does go up based on the size of an array in memory; in our third test we ended up maxing out our memory and timing out. My assumption here is that the process that controls the execution of the lambda also monitors how much memory that execution acquires; likely by cat /proc/meminfo as trognanders suggests.
Okay so I used the following code and increased the amount of array values to get a correlation. Three tests were done on each max value of the array. Lambda was set at 1024MB. Each array element is 10 chars/bytes long.
const util = require('util');
const exec = util.promisify(require('child_process').exec);
async function GetContainerUsage()
{
const { stdout, stderr } = await exec('cat /proc/meminfo');
// console.log(stdout);
let memInfoSplits = stdout.split(/[\n: ]/).filter( val => val.trim());
// console.log(memInfoSplits[19]); // This returns the "Active" value which seems to be used
return Math.round(memInfoSplits[19] / 1024);
}
function GetMemoryUsage()
{
const used = process.memoryUsage();
for (let key in used)
used[key] = Math.round((used[key] / 1024 / 1024));
return used;
}
exports.handler = async (event, context) =>
{
let max = event.ArrTotal;
let arr = [];
for(let i = 0; i < max; i++)
{
arr.push("1234567890"); //10 Bytes
}
let csvLine = [];
let jsMemUsed = GetMemoryUsage();
let containerMemUsed = await GetContainerUsage();
csvLine.push(event.ArrTotal);
csvLine.push(jsMemUsed.rss);
csvLine.push(jsMemUsed.heapTotal);
csvLine.push(jsMemUsed.heapUsed);
csvLine.push(jsMemUsed.external);
csvLine.push(containerMemUsed);
console.log(csvLine.join(','));
return true;
};
This output the following values used in the CSV:
Array Count, JS rss, JS heapTotal, JS heapUsed, external, System Active, Lambda reported usage
1,30,7,5,0,53,54
1,31,7,5,0,53,55
1,30,8,5,0,53,55
1000,30,8,5,0,53,55
1000,30,8,5,0,53,55
1000,30,8,5,0,53,55
10000,30,8,5,0,53,55
10000,31,8,6,0,54,56
10000,33,7,5,0,54,57
100000,32,12,7,0,56,57
100000,34,11,8,0,57,59
100000,36,12,10,0,59,61
1000000,64,42,39,0,88,89
1000000,60,36,34,0,84,89
1000000,60,36,34,0,84,89
10000000,271,248,244,0,294,297
10000000,271,248,244,0,295,297
10000000,271,250,244,0,295,297
Which if graphed becomes:
So at 10 Million elements the array is assumed to be 10mil*10bytes = 100MB. There must be some overhead I am missing somewhere as there is about 200MB used elsewhere. But at least there is a clear linear correlation, which I can now work with.
Capacity specification for FaaS vs PaaS
The whole idea of doing computing with lambda functions(FaaS) is that least bother on capacity planning. Now, given that its not possible for the cloud provider to default a lot of choices, memory settings and timeouts are some settings AWS uses to configure the function. Apparently, if you test it out you may see that memory settings are not just determining the memory but also the CPU compute capacity. This is as quoted by AWS -
Lambda allocates CPU power linearly in proportion to the amount of memory configured. At 1,792 MB, a function has the equivalent of 1 full vCPU (one vCPU-second of credits per second)
Ref https://docs.aws.amazon.com/lambda/latest/dg/resource-model.html
Hence, its not just enough to consider runtime memory footprint, but also for CPU speed with which it executed and finishes the function.
AWS does not call out what capacity or CPU/Memory/Server Type/IOPS they use in these containers and neither they show that usage in any CW metrics like an EC2 instance.
Hence we need to choose memory setting based on testing.
Each lambda(nodejs) will have its own memory footprint and dedicated set of node module dependencies. Hence, each one needs to load and performance tested to tune the memory and timeout settings and cannot be planned upfront.
General research observation
With any standard nodejs based lambda function, which has logging and does just hello world, deployed without a VPC
128 MB may show an execution time of say 150+ ms and a billing of
200ms for 128 MB
256 MB may show an execution time of say 80+ ms and
a billing of 100ms for 256 MB
Lower memory setting does not mean lower cost essentially and hence fine tuning based on load & performance test is the best way to determine the memory setting that can be used.
Attributes like timeout, is purely based on how long the function takes to complete the activity, which can be way high up for batch job operations(say 10m) vs a webservice which expects a quick response(say 10s). Timing out earlier instead of waiting on any long pending dependencies are important to avoid high billing in case of high throughput APIs. In case of API, slow timeout can result in alternate containers(functions) to spin up to scale for new requests which can also impact the number of IPs being allocated within the subnet which hosts the function(in case function runs within a vpc ).
Lambda limits on ENI and IPs or maximum lambda concurrency within a account/region is important factors to consider while planning for the capacity.
Ref https://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm trying to ID what's slowing down a DB connection. I've narrowed it down to probably being a memory leak. Following instructions in this guide, I've set up a heap profiling function to run at intervals throughout the program. Essentially like this:
setInterval(function(){heapingFunction()},100);
//some code
const pgClient = new pg.Client(dbConfig);
app.listen(port, function (err) {
if (err) {
logger.debug(err);
} else {
logger.info("listening at " + port + " port");
}
});
pgClient.connect()
.then(function (connection) {
logger.info("database connect");
console.log("database connect");
return pgClient.query("query");
})
.then(function (result) {Queries})
When I run this, instead of consistent heap snapshots at .1s intervals, I get a huge jump:
info: listening at 3001 port
Program is using 39904440 bytes of Heap.
Program is using 39927960 bytes of Heap.
Program is using 40055272 bytes of Heap.
Program is using 40086448 bytes of Heap.
info: database connect
database connect
Program is using 206523904 bytes of Heap.
Program is using 206546224 bytes of Heap.
Program is using 206665472 bytes of Heap.
Program is using 206874608 bytes of Heap.
Program is using 206929464 bytes of Heap.
It works as expected until the heap snapshot before info: database connect. There, it stops until the DB connect (~5min). As you can see, it's using 5x as much memory once it resumes (It's also much slower). It would be more useful to have snapshots during this time period, not just before and after. What's going on here? Is the memory leak so severe that setInterval can't even run?
I use the node-memwatch to monitor the memory usage of the node application. The simplified code is as below
#file test.js
var memwatch = require('memwatch');
var util = require('util');
var leak = [];
setInterval(function() {
leak.push(new Error("leak string"));
}, 1);
memwatch.on('stats', function(stats) {
console.log('MEM watch: ' + JSON.stringify(stats));
console.log('Process: ' + util.inspect(process.memoryUsage()));
});
Run 'node test.js', I get the output below.
MEM watch: {"num_full_gc":1,"num_inc_gc":6,"heap_compactions":1,"usage_trend":0,"estimated_base":8979176,"current_base":8979176,"min":0,"max":0}
Process: { rss: 28004352, heapTotal: 19646208, heapUsed: 9303856 }
Does anyone know what do the estimated_base and current_base mean? In the page https://github.com/lloyd/node-memwatch, they are not described detailedly.
Regards,
Jeffrey
Memwatch splits its results into two Periods.
The RECENT_PERIOD which takes 10 consecutive GCs and the ANCIENT_PERIOD which is 120 consecutive GCs.
estimated_base = The Heap Size after 10 consecutive GCs have been executed. This is the RECENT_PERIOD.
current_base = The Heap size exactly after a GC.
base min = The Minimum value recorded for the Heap size for the given
period.
base max = the Maximum value recorded for the Heap size for the given
period.
If you follow this link you will be able to check out the code: Memwatch
I found a few reference to people having a similar issue where the answer always was, make sure you call window.close() when done. However that does not seem to be working for me (node 0.8.14 and jsdom 0.3.1)
A simple repro
var util = require('util');
var jsdom=require('jsdom');
function doOne() {
var htmlDoc = '<html><head></head><body id="' + i + '"></body></html>';
jsdom.env(htmlDoc, null, null, function(errors, window) {
window.close();
});
}
for (var i=1;i< 100000;i++ ) {
doOne();
if(i % 500 == 0) {
console.log(i + ":" + util.inspect(process.memoryUsage()));
}
}
console.log ("done");
Output I get is
500:{ rss: 108847104, heapTotal: 115979520, heapUsed: 102696768 }
1000:{ rss: 198250496, heapTotal: 194394624, heapUsed: 190892120 }
1500:{ rss: 267304960, heapTotal: 254246912, heapUsed: 223847712 }
...
11000:{ rss: 1565204480, heapTotal: 1593723904, heapUsed: 1466889432 }
At this point the fan goes wild and the test actually stops...or at leasts starts going very slowly
Does anyone have any other tips than window.close to get rid of the memory leak (or it sure looks like a memory leak)
Thanks!
Peter
Using jsdom 0.6.0 to help scrape some data and ran into the same problem.
window.close only helped slow the memory leak, but it did eventually creep up till the process got killed.
Running the script with
node --expose-gc myscript.js
Until they fix the memory leak, manually calling the garbage collector in addition to calling window.close seems to work:
if (process.memoryUsage().heapUsed > 200000000) { // memory use is above 200MB
global.gc();
}
Stuck that after the call to window.close. Memory use immediately drops back to baseline (around 50MB for me) every time it gets triggered. Barely perceptible halt.
update: also consider calling global.gc() multiple times in succession rather than only once (i.e. global.gc();global.gc();global.gc();global.gc();global.gc();)
Calling window.gc() multiple times was more effective (based on my imperfect tests), I suspect because it possibly caused chrome to trigger a major GC event rather than a minor one. - https://github.com/cypress-io/cypress/issues/350#issuecomment-688969443
You are not giving the program any idle time to do garbage collection. I believe you will run into the same problem with any large object graph created many times tightly in a loop with no breaks.
This is substantiated by CheapSteaks's answer, which manually forces the garbage collection. There can't be a memory leak in jsdom if that works, since memory leaks by definition prevent the garbage collector from collecting the leaked memory.
I had the same problem with jsdom and switcht to cheerio, which is much faster than jsdom and works even after scanning hundreds of sites. Perhaps you should try it, too. Only problem is, that it dosent have all the selectors which you can use in jsdom.
hope it works for you, too.
Daniel
with gulp, memory usage, cleanup, variable delete, window.close()
var gb = setInterval(function () {
//only call if memory use is bove 200MB
if (process.memoryUsage().heapUsed > 200000000) {
global.gc();
}
}, 10000); // 10sec
gulp.task('tester', ['clean:raw2'], function() {
return gulp.src('./raw/*.html')
.pipe(logger())
.pipe(map(function(contents, filename) {
var doc = jsdom.jsdom(contents);
var window = doc.parentWindow;
var $ = jquery(window);
console.log( $('title').text() );
var html = window.document.documentElement.outerHTML;
$( doc ).ready(function() {
console.log( "document loaded" );
window.close();
});
return html;
}))
.pipe(gulp.dest('./raw2'))
.on('end', onEnd);
});
and I had constatly between 200mb - 300mb usage, for 7k files. it took 30 minutes.
It might be helpful for someone, as i googled and didnt find anything helpful.
A work around for this is to run the jsdom related code in a forked child_process and send back the relevant results when done. then kill the child_process.