I want to determine how large an array would be in memory for when my function executes. Determining the size of the array is easy but I am not seeing a correlation to the size of my array to the Max Memory used that gets recorded at the end of a Lambda execution.
There is no apparent coloration after inspecting process.memoryUsage() before and after setting the array as well as the Max Memory used reported by the Lambda. I can't find a good resource that indicates how/what Lambda actually uses to determine the memory used. Any help would be appreciated?
This question made me curious myself, so I decided to run some tests to see how memory allocation works inside an AWS Lambda container.
Test 1: Create array with 100,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 100000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 56 MB
2019-04-30T01:00:59.577Z cd473d5b-986c-436e-8b36-b114410c84cf { rss: 35299328,
heapTotal: 11853824,
heapUsed: 7590320,
external: 8224 }
REPORT RequestId: 2a7548f9-5d2f-4060-8f9e-deb228730d8c Duration: 155.74 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 56 MB
Test 2: Create array with 1,000,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 1000000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 99 MB
2019-04-30T01:03:44.582Z 547a9de8-35f7-48e2-a53f-ab669b188f9a { rss: 80093184,
heapTotal: 55263232,
heapUsed: 52951088,
external: 8224 }
REPORT RequestId: 547a9de8-35f7-48e2-a53f-ab669b188f9a Duration: 801.68 ms Billed Duration: 900 ms Memory Size: 128 MB Max Memory Used: 99 MB
Test 3: Create array with 10,000,000 elements in memory
Memory size: 128MB
exports.handler = async (event) => {
const arr = [];
for (let i = 0; i < 10000000; i++) {
arr.push(i);
}
console.log(process.memoryUsage());
return 'done';
};
Result: 128 MB
REPORT RequestId: f1df4f39-e0fc-4b44-8f90-c3c0e3d9c12d Duration: 3001.33 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 128 MB
2019-04-30T00:54:32.970Z f1df4f39-e0fc-4b44-8f90-c3c0e3d9c12d Task timed out after 3.00 seconds
I think we can pretty confidently say that the memory used by the lambda container does go up based on the size of an array in memory; in our third test we ended up maxing out our memory and timing out. My assumption here is that the process that controls the execution of the lambda also monitors how much memory that execution acquires; likely by cat /proc/meminfo as trognanders suggests.
Okay so I used the following code and increased the amount of array values to get a correlation. Three tests were done on each max value of the array. Lambda was set at 1024MB. Each array element is 10 chars/bytes long.
const util = require('util');
const exec = util.promisify(require('child_process').exec);
async function GetContainerUsage()
{
const { stdout, stderr } = await exec('cat /proc/meminfo');
// console.log(stdout);
let memInfoSplits = stdout.split(/[\n: ]/).filter( val => val.trim());
// console.log(memInfoSplits[19]); // This returns the "Active" value which seems to be used
return Math.round(memInfoSplits[19] / 1024);
}
function GetMemoryUsage()
{
const used = process.memoryUsage();
for (let key in used)
used[key] = Math.round((used[key] / 1024 / 1024));
return used;
}
exports.handler = async (event, context) =>
{
let max = event.ArrTotal;
let arr = [];
for(let i = 0; i < max; i++)
{
arr.push("1234567890"); //10 Bytes
}
let csvLine = [];
let jsMemUsed = GetMemoryUsage();
let containerMemUsed = await GetContainerUsage();
csvLine.push(event.ArrTotal);
csvLine.push(jsMemUsed.rss);
csvLine.push(jsMemUsed.heapTotal);
csvLine.push(jsMemUsed.heapUsed);
csvLine.push(jsMemUsed.external);
csvLine.push(containerMemUsed);
console.log(csvLine.join(','));
return true;
};
This output the following values used in the CSV:
Array Count, JS rss, JS heapTotal, JS heapUsed, external, System Active, Lambda reported usage
1,30,7,5,0,53,54
1,31,7,5,0,53,55
1,30,8,5,0,53,55
1000,30,8,5,0,53,55
1000,30,8,5,0,53,55
1000,30,8,5,0,53,55
10000,30,8,5,0,53,55
10000,31,8,6,0,54,56
10000,33,7,5,0,54,57
100000,32,12,7,0,56,57
100000,34,11,8,0,57,59
100000,36,12,10,0,59,61
1000000,64,42,39,0,88,89
1000000,60,36,34,0,84,89
1000000,60,36,34,0,84,89
10000000,271,248,244,0,294,297
10000000,271,248,244,0,295,297
10000000,271,250,244,0,295,297
Which if graphed becomes:
So at 10 Million elements the array is assumed to be 10mil*10bytes = 100MB. There must be some overhead I am missing somewhere as there is about 200MB used elsewhere. But at least there is a clear linear correlation, which I can now work with.
Capacity specification for FaaS vs PaaS
The whole idea of doing computing with lambda functions(FaaS) is that least bother on capacity planning. Now, given that its not possible for the cloud provider to default a lot of choices, memory settings and timeouts are some settings AWS uses to configure the function. Apparently, if you test it out you may see that memory settings are not just determining the memory but also the CPU compute capacity. This is as quoted by AWS -
Lambda allocates CPU power linearly in proportion to the amount of memory configured. At 1,792 MB, a function has the equivalent of 1 full vCPU (one vCPU-second of credits per second)
Ref https://docs.aws.amazon.com/lambda/latest/dg/resource-model.html
Hence, its not just enough to consider runtime memory footprint, but also for CPU speed with which it executed and finishes the function.
AWS does not call out what capacity or CPU/Memory/Server Type/IOPS they use in these containers and neither they show that usage in any CW metrics like an EC2 instance.
Hence we need to choose memory setting based on testing.
Each lambda(nodejs) will have its own memory footprint and dedicated set of node module dependencies. Hence, each one needs to load and performance tested to tune the memory and timeout settings and cannot be planned upfront.
General research observation
With any standard nodejs based lambda function, which has logging and does just hello world, deployed without a VPC
128 MB may show an execution time of say 150+ ms and a billing of
200ms for 128 MB
256 MB may show an execution time of say 80+ ms and
a billing of 100ms for 256 MB
Lower memory setting does not mean lower cost essentially and hence fine tuning based on load & performance test is the best way to determine the memory setting that can be used.
Attributes like timeout, is purely based on how long the function takes to complete the activity, which can be way high up for batch job operations(say 10m) vs a webservice which expects a quick response(say 10s). Timing out earlier instead of waiting on any long pending dependencies are important to avoid high billing in case of high throughput APIs. In case of API, slow timeout can result in alternate containers(functions) to spin up to scale for new requests which can also impact the number of IPs being allocated within the subnet which hosts the function(in case function runs within a vpc ).
Lambda limits on ENI and IPs or maximum lambda concurrency within a account/region is important factors to consider while planning for the capacity.
Ref https://docs.aws.amazon.com/lambda/latest/dg/limits.html
It is quite surprising to see the counters heapUsed and external showing reduction but still the heapTotal showing a spike.
***Memory Log - Before soak summarization 2
"rss":217214976,"heapTotal":189153280,"heapUsed":163918648,"external":1092977
Spike in rss: 4096
Spike in heapTotal: 0
Spike in heapUsed: 22240
Spike in external: 0
***Memory Log - Before summarizing log summary for type SOAK
"rss":220295168,"heapTotal":192294912,"heapUsed":157634440,"external":318075
Spike in rss: 3080192
Spike in heapTotal: 3141632
Spike in heapUsed: -6284208
Spike in external: -774902
Any ideas why the heapTotal is drastically increasing despite the heapUsed and external going drastically down. I mean I really thought that heapTotal = heapUsed + external.
I am using the following code to track memory
var prevStats;
function logMemory (path,comment) {
if (! fs.existsSync(path)) {
fs.mkdirSync(path, DIR_MODE);
}
path = pathUtils.posix.join(path,"memoryLeak.txt");
if (comment) comment = comment.replace(/(.+)/,' - $1');
var memStats = process.memoryUsage();
fs.appendFileSync(path,`\r\n\r\n***Memory Log ${comment}\r\n`+JSON.stringify(process.memoryUsage()));
if (prevStats) {
_.forEach (
Object.keys(memStats),
key => {
if (memStats.hasOwnProperty(key)) {
fs.appendFileSync(path,`\r\nSpike in ${key}: ${memStats[key]-prevStats[key]}`);
}
}
);
}
prevStats = memStats;
}
We have created a NodeJS based Lambda function named - execInspector which gets triggered everyday once. This function is created based on AWS Lambda blueprint --> "inspector-scheduled-run" in NodeJS.
The problem we see is the scheduled job fails randomly one day or the other. We are getting only the below logs from the cloudwatch log stream.
In a week, it randomly runs =~ 4/5 times & fails remaining days. Based on the log, it consumes only very little amount of memory/time for its execution but not sure why it fails randomly. It also retries itself 3 times before getting killed.
From the below log we could also find that the job only takes 35 MB avg. & takes only 60 sec to complete on an avg. We tried modifying the NodeJS run time, increasing memory, timeouts well beyond this limit but nothing worked out.
Can you please help with some alternate approaches to handle these failures automatically & if anyone has insights on why its happening?
Additional Inputs:
We have already given 5 mins of maximum timeout also, but it fails saying "timed out after 300 secs.".
What i mean here is the task of just triggering the inspector takes only less than 30 secs on avg. Since, its a PaaS based solution, I cannot expect always this to be completed within 30 secs. But 60 secs should be more than enough for this to handle a job which it was able to complete within 30 secs.
Sample CloudWatch Successful log:
18:01:00
START RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Version: $LATEST
18:01:03
2017-05-25T18:01:02.935Z 12eb468a-4174-11e7-be7b-6d0faaa584aa { assessmentRunArn: 'arn:aws:inspector:us-east-1:102461617910:target/0-Ly60lmEP/template/0-POpZxSLA/run/0-MMx30fLl' }
2017-05-25T18:01:02.935Z 12eb468a-4174-11e7-be7b-6d0faaa584aa { assessmentRunArn: 'arn:aws:inspector:us-east-1:102461617910:target/0-Ly60lmEP/template/0-POpZxSLA/run/0-MMx30fLl' }
18:01:03
END RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa
END RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa
18:01:03
REPORT RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Duration: 2346.37 ms Billed Duration: 2400 ms Memory Size: 128 MB Max Memory Used: 33 MB
REPORT RequestId: 12eb468a-4174-11e7-be7b-6d0faaa584aa Duration: 2346.37 ms Billed Duration: 2400 ms Memory Size: 128 MB Max Memory Used: 33 MB
Cloudwatch log:
Similar log below is repeated 3 times which seems to be a retry attempt
06:32:52
START RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3 Version: $LATEST
06:32:56
2017-05-24T06:32:55.942Z 80190395-404a-11e7-845d-1f88a00ed4f3 Execution Started...
06:33:52
END RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3
06:33:52
REPORT RequestId: 80190395-404a-11e7-845d-1f88a00ed4f3 Duration: 60000.88 ms Billed Duration: 60000 ms Memory Size: 128 MB Max Memory Used: 32 MB
06:33:52
2017-05-24T06:33:52.437Z 80190395-404a-11e7-845d-1f88a00ed4f3 Task timed out after 60.00 seconds
2017-05-24T06:33:52.437Z 80190395-404a-11e7-845d-1f88a00ed4f3 Task timed out after 60.00 seconds
Lambda code:
'use strict';
/**
* A blueprint to schedule a recurring assessment run for an Amazon Inspector assessment template.
*
* This blueprint assumes that you've already done the following:
* 1. onboarded with the Amazon Inspector service https://aws.amazon.com/inspector
* 2. created an assessment target - what hosts you want to assess
* 3. created an assessment template - how you want to assess your target
*
* Then, all you need to do to use this blueprint is to define an environment variable in the Lambda console called
* `assessmentTemplateArn` and provide the template arn you want to run on a schedule.
*/
const AWS = require('aws-sdk');
const inspector = new AWS.Inspector();
const params = {
assessmentTemplateArn: process.env.assessmentTemplateArn,
};
exports.handler = (event, context, callback) => {
try {
// Inspector.StartAssessmentRun response will look something like:
// {"assessmentRunArn":"arn:aws:inspector:us-west-2:123456789012:target/0-wJ0KWygn/template/0-jRPJqnQh/run/0-Ga1lDjhP"
inspector.startAssessmentRun(params, (error, data) => {
if (error) {
console.log(error, error.stack);
return callback(error);
}
console.log(data);
return callback(null, data);
});
} catch (error) {
console.log('Caught Error: ', error);
callback(error);
}
};
The log says your request is timing out after 60 seconds. You can set it as high as 5 minutes according to this https://aws.amazon.com/blogs/aws/aws-lambda-update-python-vpc-increased-function-duration-scheduling-and-more/ If your task takes about 60 seconds and the timeout is 60 secs then maybe some are timing out. Thats what the log suggests to me. Otherwise, post some code from the function
I have this Button click handler (MonoMac on OS X 10.9.3):
partial void OnDoButtonClick(NSObject sender)
{
DoButton.Enabled = false;
// Start animation
ProgressIndicator.StartAnimation(this);
ThreadPool.QueueUserWorkItem(_ => {
// Perform a task that last for about a second:
Thread.Sleep(1 * 1000);
// Stop animation:
InvokeOnMainThread(() => {
ProgressIndicator.StopAnimation(this);
DoButton.Enabled = true;
});
});
}
However, when i run the code by pressing the button, the main thread stops the following error occurs:
(lldb) quit* thread #1: tid = 0x2bf20, 0x98fd9f7a libsystem_kernel.dylib`mach_msg_trap + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
And, the following log is recorded in the system log:
2014/05/21 13:10:51.752 com.apple.debugserver-310.2[3553]: 1 +0.000001 sec [0de1/1503]: error: ::read ( 0, 0x107557a40, 1024 ) => -1 err = Connection reset by peer (0x00000036)
2014/05/21 13:10:51.752 com.apple.debugserver-310.2[3553]: 2 +0.000001 sec [0de1/0303]: error: ::ptrace (request = PT_THUPDATE, pid = 0x0ddc, tid = 0x1a03, signal = -1) err = Invalid argument (0x00000016)
2014/05/21 13:10:51.753 com.apple.debugserver-310.2[3553]: Exiting.
2014/05/21 13:11:05.000 kernel[0]: process <AppName>[3548] caught causing excessive wakeups. Observed wakeups rate (per sec): 1513; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45061
2014/05/21 13:11:05.302 ReportCrash[3555]: Invoking spindump for pid=3548 wakeups_rate=1513 duration=30 because of excessive wakeups
2014/05/21 13:11:07.452 spindump[3556]: Saved wakeups_resource.spin report for <AppName> version 1.2.1.0 (1) to /Library/Logs/DiagnosticReports/<AppName>_2014-05-21-131107_<UserName>-MacBook-Pro.wakeups_resource.spin
Extract from above: Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45061
The problem does NOT happen if I remove the ProgressIndicator.StartAnimation(this); and ProgressIndicator.StopAnimation(this); lines.
Why is the main thread stopped by SIGSTOP?
Is there a way to find out the cpu usage in % for a node.js process via code. So that when the node.js application is running on the server and detects the CPU exceeds certain %, then it will put an alert or console output.
On *nix systems can get process stats by reading the /proc/[pid]/stat virtual file.
For example this will check the CPU usage every ten seconds, and print to the console if it's over 20%. It works by checking the number of cpu ticks used by the process and comparing the value to a second measurement made one second later. The difference is the number of ticks used by the process during that second. On POSIX systems, there are 10000 ticks per second (per processor), so dividing by 10000 gives us a percentage.
var fs = require('fs');
var getUsage = function(cb){
fs.readFile("/proc/" + process.pid + "/stat", function(err, data){
var elems = data.toString().split(' ');
var utime = parseInt(elems[13]);
var stime = parseInt(elems[14]);
cb(utime + stime);
});
}
setInterval(function(){
getUsage(function(startTime){
setTimeout(function(){
getUsage(function(endTime){
var delta = endTime - startTime;
var percentage = 100 * (delta / 10000);
if (percentage > 20){
console.log("CPU Usage Over 20%!");
}
});
}, 1000);
});
}, 10000);
Try looking at this code: https://github.com/last/healthjs
Network service for getting CPU of remote system and receiving CPU usage alerts...
Health.js serves 2 primary modes: "streaming mode" and "event mode". Streaming mode allows a client to connect and receive streaming CPU usage data. Event mode enables Health.js to notify a remote server when CPU usage hits a certain threshold. Both modes can be run simultaneously...
You can use the os module now.
var os = require('os');
var loads = os.loadavg();
This gives you the load average for the last 60seconds, 5minutes and 15minutes.
This doesnt give you the cpu usage as a % though.
Use node process.cpuUsage function (introduced in node v6.1.0).
It shows time that cpu spent on your node process. Example taken from docs:
const previousUsage = process.cpuUsage();
// { user: 38579, system: 6986 }
// spin the CPU for 500 milliseconds
const startDate = Date.now();
while (Date.now() - startDate < 500);
// At this moment you can expect result 100%
// Time is *1000 because cpuUsage is in us (microseconds)
const usage = process.cpuUsage(previousUsage);
const result = 100 * (usage.user + usage.system) / ((Date.now() - startDate) * 1000)
console.log(result);
// set 2 sec "non-busy" timeout
setTimeout(function() {
console.log(process.cpuUsage(previousUsage);
// { user: 514883, system: 11226 } ~ 0,5 sec
// here you can expect result about 20% (0.5s busy of 2.5s total runtime, relative to previousUsage that is first value taken about 2.5s ago)
}, 2000);
see node-usage for tracking process CPU and Memory Usage (not the system)
Another option is to use node-red-contrib-os package