Memory issue processing large image files

Memory issue processing large image files - node.js

I am processing image files in AWS Lambda with GraphicsMagick/ImageMagick, node.js. Some of the files are > 200MB in size which causes the Lambda function to hit memory limits. I have set the maximum memory of 1.5GB.
The log file displays:
REPORT RequestId: xxx Duration: 23200.51 ms Billed Duration: 23300 ms Memory Size: 1536 MB Max Memory Used: 1536 MB
Code:
async.series([
function getOriginalSize(p_next) {
// size
gm(s3_img.Body).size(function (err, size) {
if (!err) {
width_orig = size.width;
height_orig = size.height;
p_next(null, 'getOriginalSize');
}
});
},
function identify(p_next) {
gm(s3_img).flatten();
gm(s3_img.Body).identify(function(err, id_info){
// THIS IS WHERE THE FOLLOWING ERROR OCCURS:
// { [Error: Command failed: ] code: null, signal: 'SIGKILL' }
...
...
...
});
}
]);
I have not found an answer to this and would be grateful for any kind of tips or comments.

Related

NodeJS heapUsage unchanged when using fs.readFile vs streams

I've been learning about memory management in Nodejs and I'm trying to understand why the following two behaviors occurs:
PS: I'm using the following utility functions to help me print memory to console:
function toMb (bytes) {
return (bytes / 1000000).toFixed(2);
}
function printMemoryData(){
const memory = process.memoryUsage();
return {
rss: `${toMb(memory.rss)} MB -> Resident Set Size - total memory allocated for the process execution`,
heapTotal: `${toMb(memory.heapTotal)} MB -> total size of the allocated heap`,
heapUsed: `${toMb(memory.heapUsed)} MB -> actual memory used during the execution`,
external: `${toMb(memory.external)} MB -> V8 external memory`,
};
}
Part 1) fs.readFile with encoding vs buffers
When I do:
let data;
fs.readFile('path/to/500MB', {}, function(err, buffer){
data = buffer
console.log('Memory usage after files read:', printMemoryData());
});
I get the following output:
Memory usage after files read: {
rss: '565.22 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '11.01 MB -> total size of the allocated heap',
heapUsed: '5.66 MB -> actual memory used during the execution',
external: '524.91 MB -> V8 external memory'
}
Even though I'm storing the data in a local data variable/v8object, the heap isn't used.
But when I do add the encoding:
fs.readFile('path/to/500MB', {encoding: 'utf-8'}, function(err, buffer){
console.log('Memory usage after files read:', printMemoryData());
});
I get the following output:
Memory usage after files read: {
rss: '1088.71 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '535.30 MB -> total size of the allocated heap',
heapUsed: '529.95 MB -> actual memory used during the execution',
external: '524.91 MB -> V8 external memory'
}
Why does the heap get used here instead of in the first function call without an encoding? I don't even have to store the buffer in a local variable for the heap to be used. I also understand that after the next event loop tick in the second example the heap will be cleaned up. But this leads me to my next question in Part 2
Part 2) This part is the same as part 1 but with streams.
const readStream = fs.createReadStream('path/to/500MB');
let data;
readStream.on('data', (buffer) => {
data+= buffer;
});
readStream.on('close', () => {
console.log(printMemoryData());
});
I get the output:
{
rss: '574.57 MB -> Resident Set Size - total memory allocated for the process execution',
heapTotal: '692.75 MB -> total size of the allocated heap',
heapUsed: '508.72 MB -> actual memory used during the execution',
external: '7.97 MB -> V8 external memory'
}
Why the difference in behavior with streams in heap used in part 2 vs the first function without encoding in part 1?
They both have an increase in RSS, but only in the streams does the heap get used when I store the buffer in a local variable/v8 object.
Thanks for any feedback.

Alexa error with session attributes

I am using Alexa Node.js sdk, to implement a skill. On session start (at LaunchRequest intent), I want to store some variables in the session attributes. As per the blog here, I am using this.attributes.key to store the session attributes.
const handlers = {
'LaunchRequest': function () {
database.startSession()
.then(data => {
// console.log(data); // data does have token
this.attributes.token=data.token;
// this.attributes['token']=data.token; // Tried this too
this.emit(':ask', responses.launch, responses.launchReprompt);
})
.catch(err => {
console.error(err);
this.emit(":ask", responses.error);
});
},
.... More handlers
}
However, the on launch command, I get this error,
There was a problem with the requested skill's response
I see no error in logs.
This is my response (as visible in alexa test developer console)
{
"body": {
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak> Ok, exiting App. </speak>"
},
"shouldEndSession": true
},
"sessionAttributes": {},
"userAgent": "ask-nodejs/1.0.25 Node/v8.10.0"
}
}
As per here, the sessionAttributes should contain what I set as session variables using this.attributes, but this is somehow empty.
How can I resolve this?
Edit: If I comment out this.attributes line, I get the welcome message correctly.
This is my startSession function, if its helpful.
async function startSession() {
return {
token: await getToken(),
... More attributes
};
}
Edit 2: Very wierd thing that I noticed. If I just do this.attributes.token="foobar", the session attribute gets set correctly. So I am assuming there is a problem with my async function. Note that console.log(data) still prints the data correctly with token attribute.
Edit 3: Cloudwatch logs
START RequestId: Version: $LATEST
2018-08-15T14:00:47.639Z Warning: Application ID is not set
END RequestId: REPORT RequestId: Duration: 315.05
ms Billed Duration: 400 ms Memory Size: 128 MB Max Memory Used: 73 MB
START RequestId: Version: $LATEST
2018-08-15T14:00:47.749Z Warning: Application ID is not set
2018-08-15T14:00:48.564Z { token: 'token', filter:
'foobar'} END RequestId: REPORT RequestId: Duration:
849.98 ms Billed Duration: 900 ms Memory Size: 128 MB Max Memory Used: 74 MB START RequestId: Version: $LATEST
2018-08-15T14:00:49.301Z Warning: Application ID is not
set END RequestId: REPORT RequestId:
Duration: 0.72 ms Billed Duration: 100 ms Memory Size:
128 MB Max Memory Used: 74 MB

We found that the max size of the response object is 24kb, reference1, reference2, reference3.
My data size was way more than 24kb. Hence the session attributes did not get stored, and it resulted in exit intent. The solution is to store it in some db like DynamoDB.
Special credits to Will.

AWS Lambda Timeouts randomly

We have created a NodeJS based Lambda function named - execInspector which gets triggered everyday once. This function is created based on AWS Lambda blueprint --> "inspector-scheduled-run" in NodeJS.
The problem we see is the scheduled job fails randomly one day or the other. We are getting only the below logs from the cloudwatch log stream.
In a week, it randomly runs =~ 4/5 times & fails remaining days. Based on the log, it consumes only very little amount of memory/time for its execution but not sure why it fails randomly. It also retries itself 3 times before getting killed.
From the below log we could also find that the job only takes 35 MB avg. & takes only 60 sec to complete on an avg. We tried modifying the NodeJS run time, increasing memory, timeouts well beyond this limit but nothing worked out.
Can you please help with some alternate approaches to handle these failures automatically & if anyone has insights on why its happening?
Additional Inputs:
We have already given 5 mins of maximum timeout also, but it fails saying "timed out after 300 secs.".
What i mean here is the task of just triggering the inspector takes only less than 30 secs on avg. Since, its a PaaS based solution, I cannot expect always this to be completed within 30 secs. But 60 secs should be more than enough for this to handle a job which it was able to complete within 30 secs.
Sample CloudWatch Successful log:
18:01:00
START RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Version: $LATEST
18:01:03
2017-05-25T18:01:02.935Z 12eb468a-4174-11e7-be7b-6d0faaa584aa { assessmentRunArn: 'arn:aws:inspector:us-east-1:102461617910:target/0-Ly60lmEP/template/0-POpZxSLA/run/0-MMx30fLl' }
2017-05-25T18:01:02.935Z 12eb468a-4174-11e7-be7b-6d0faaa584aa { assessmentRunArn: 'arn:aws:inspector:us-east-1:102461617910:target/0-Ly60lmEP/template/0-POpZxSLA/run/0-MMx30fLl' }
18:01:03
END RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa
END RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa
18:01:03
REPORT RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Duration: 2346.37 ms Billed Duration: 2400 ms Memory Size: 128 MB Max Memory Used: 33 MB
REPORT RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Duration: 2346.37 ms Billed Duration: 2400 ms Memory Size: 128 MB Max Memory Used: 33 MB
Cloudwatch log:
Similar log below is repeated 3 times which seems to be a retry attempt
06:32:52
START RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3 Version: $LATEST
06:32:56
2017-05-24T06:32:55.942Z 80190395-404a-11e7-845d-1f88a00ed4f3 Execution Started...
06:33:52
END RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3
06:33:52
REPORT RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3 Duration: 60000.88 ms Billed Duration: 60000 ms Memory Size: 128 MB Max Memory Used: 32 MB
06:33:52
2017-05-24T06:33:52.437Z 80190395-404a-11e7-845d-1f88a00ed4f3 Task timed out after 60.00 seconds
2017-05-24T06:33:52.437Z 80190395-404a-11e7-845d-1f88a00ed4f3 Task timed out after 60.00 seconds
Lambda code:
'use strict';
/**
* A blueprint to schedule a recurring assessment run for an Amazon Inspector assessment template.
*
* This blueprint assumes that you've already done the following:
* 1. onboarded with the Amazon Inspector service https://aws.amazon.com/inspector
* 2. created an assessment target - what hosts you want to assess
* 3. created an assessment template - how you want to assess your target
*
* Then, all you need to do to use this blueprint is to define an environment variable in the Lambda console called
* `assessmentTemplateArn` and provide the template arn you want to run on a schedule.
*/
const AWS = require('aws-sdk');
const inspector = new AWS.Inspector();
const params = {
assessmentTemplateArn: process.env.assessmentTemplateArn,
};
exports.handler = (event, context, callback) => {
try {
// Inspector.StartAssessmentRun response will look something like:
// {"assessmentRunArn":"arn:aws:inspector:us-west-2:123456789012:target/0-wJ0KWygn/template/0-jRPJqnQh/run/0-Ga1lDjhP"
inspector.startAssessmentRun(params, (error, data) => {
if (error) {
console.log(error, error.stack);
return callback(error);
}
console.log(data);
return callback(null, data);
});
} catch (error) {
console.log('Caught Error: ', error);
callback(error);
}
};

The log says your request is timing out after 60 seconds. You can set it as high as 5 minutes according to this https://aws.amazon.com/blogs/aws/aws-lambda-update-python-vpc-increased-function-duration-scheduling-and-more/ If your task takes about 60 seconds and the timeout is 60 secs then maybe some are timing out. Thats what the log suggests to me. Otherwise, post some code from the function

node process memory usage, resident set size constantly increasing

Quted from What do the return values of node.js process.memoryUsage() stand for? RSS is the resident set size, the portion of the process's memory held in RAM
(how much memory is held in the RAM by this process in Bytes) file size of 'text.txt' used in example is here is 370KB (378880 Bytes)
var http = require('http');
var fs = require('fs');
var express = require('express');
var app = express();
console.log("On app bootstrap = ", process.memoryUsage());
app.get('/test', function(req, res) {
fs.readFile(__dirname + '/text.txt', function(err, file) {
console.log("When File is available = ", process.memoryUsage());
res.end(file);
});
setTimeout(function() {
console.log("After sending = ", process.memoryUsage());
}, 5000);
});
app.listen(8081);
So on app bootstrap: { rss: 22069248, heapTotal: 15551232, heapUsed: 9169152 }
After i made 10 request for '/test' situation is:
When File is available = { rss: 33087488, heapTotal: 18635008, heapUsed: 6553552 }
After sending = { rss: 33447936, heapTotal: 18635008, heapUsed: 6566856 }
So from app boostrap to 10nth request rss is increased for 11378688 bytes which is roughly 30 times larger than size of text.txt file.
I know that this code will buffers up the entire data.txt file into memory for every request before writing the result back to clients, but i expected that after the requst is finished occupied memory for 'text.txt' will be released? But that is not the case?
Second how to set up maximum size of RAM memory which node process can consume?

In js garbage collector does not run immediately after execution of your code. Thus the memory is not freed immediately after execution. You can run GC independently, after working with large objects, if you care about memory consumption. More information you can find here.
setTimeout(function() {
global.gc();
console.log("After sending = ", process.memoryUsage());
}, 5000);
To look at your memory allocation you can run your server with v8-profiler and get a Heap snapshot. More info here.

Try running your example again and give the process some time to run garbage collection. Keep an eye on your process' memory usage with a system monitor and it should clear after some time. If it doesn't go down the process can't go higher in memory usage than mentioned below.
According to the node documentation the memory limit is 512 mb for 32 bit and 1 gb for 64 bit. They can be increased if necessary.

What do estimated_base and current_base mean in the stats of the node-memwatch

I use the node-memwatch to monitor the memory usage of the node application. The simplified code is as below
#file test.js
var memwatch = require('memwatch');
var util = require('util');
var leak = [];
setInterval(function() {
leak.push(new Error("leak string"));
}, 1);
memwatch.on('stats', function(stats) {
console.log('MEM watch: ' + JSON.stringify(stats));
console.log('Process: ' + util.inspect(process.memoryUsage()));
});
Run 'node test.js', I get the output below.
MEM watch: {"num_full_gc":1,"num_inc_gc":6,"heap_compactions":1,"usage_trend":0,"estimated_base":8979176,"current_base":8979176,"min":0,"max":0}
Process: { rss: 28004352, heapTotal: 19646208, heapUsed: 9303856 }
Does anyone know what do the estimated_base and current_base mean? In the page https://github.com/lloyd/node-memwatch, they are not described detailedly.
Regards,
Jeffrey

Memwatch splits its results into two Periods.
The RECENT_PERIOD which takes 10 consecutive GCs and the ANCIENT_PERIOD which is 120 consecutive GCs.
estimated_base = The Heap Size after 10 consecutive GCs have been executed. This is the RECENT_PERIOD.
current_base = The Heap size exactly after a GC.
base min = The Minimum value recorded for the Heap size for the given
period.
base max = the Maximum value recorded for the Heap size for the given
period.
If you follow this link you will be able to check out the code: Memwatch

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Memory issue processing large image files - node.js

Related

NodeJS heapUsage unchanged when using fs.readFile vs streams

Alexa error with session attributes

AWS Lambda Timeouts randomly

node process memory usage, resident set size constantly increasing

What do estimated_base and current_base mean in the stats of the node-memwatch

Categories

Resources