Very lazy speed of scanning in Lambda and DAX for DynamoDB - node.js

I'm getting a problem using Lambda and DAX.
In lambda, Without Nodejs DAX clien the scan average time is 900ms but If I use DAX, it's 4500ms. It's weird because if I use DAX, estimated time will be short than before.
This is latest code. In here I'm only getting one record but still it's same. )
const AWS = require('aws-sdk');
const AmazonDaxClient = require('amazon-dax-client');
const config = require('../config.json');
AWS.config.update({
region: config.region,
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey
});
var dax = null;
var daxClient=null;
const daxConfig = {
endpoints:[config.daxEndpoints],
region:config.region
}
if(dax == null & daxClient == null) {
console.log('initialized');
dax = new AmazonDaxClient(daxConfig);
daxClient = new AWS.DynamoDB.DocumentClient({service: dax });
}
exports.main = function(event, context, callback) {
context.callbackWaitsForEmptyEventLoop = false
const params = {
TableName: "game_dev"
};
daxClient.scan(params, function(err, data) {
if (err) {
console.log(JSON.stringify(err));
} else {
console.log("Query succeeded.");
}
});
}

AWS Lambda uses reusable containers. On first load it is an "initialized container". 2nd load on out till it refreshes (4hrs?) it reuses the initialized container. It may even scale out depending on the workload.
The trick you need to master in your code is to not reinitialize the DynamoDb client on each call. This AWS link goes into more detail on it: AWS Best Practices on AWS Service Client Initialization
So for your case, set up your client variables, dax and daxclient, outside the scope of the function handler. In C# I declare them outside the scope of the handler then initialize them in the handler if they are null. If not null I skip initialization and reuse. Not sure what best practice is for nodejs though.

Related

Why Does my AWS lambda function randomly fail when using private elasticache network calls as well as external API calls?

I am trying to write a caching function that returns cached elasticcache data or makes an api call to retrieve that data. However, the lambda function seems to be very unrealiable and timing out often.
It seems that the issue is having redis calls as well as public api calls causes the issue. I can confirm that I have setup aws correctly with a subnet with an internet gateway and a private subnet with a nat gateway. The function works, but lonly 10 % of the time.The remaining times exceution is stopped right before making the API call.
I have also noticed that the api calls fail after creating the redis client. If I make the external api call prior to making the redis check it seems the function is a lot more reliable and doesn't time out.
Not sure what to do. Is it best practice to seperate these 2 tasks or am I doing something wrong?
let data = null;
module.exports.handler = async (event) => {
//context.callbackWaitsForEmptyEventLoop = false;
let client;
try {
client = new Redis(
6379,
"redis://---.---.ng.0001.use1.cache.amazonaws.com"
);
client.get(event.token, async (err, result) => {
if (err) {
console.error(err);
} else {
data = result;
await client.quit();
}
});
if (data && new Date().getTime() / 1000 - eval(data).timestamp < 30) {
res.send(`({
"address": "${token}",
"price": "${eval(data).price}",
"timestamp": "${eval(data).timestamp}"
})`);
} else {
getPrice(event); //fetch api data
}
```
There a lot of misunderstand in your code. I'll try to guide you to fix it and understand how to do that correctly.
You are mixing asynchronous and synchronous code in your function.
You should use JSON.parse instead of eval to parse the data because eval allows arbitrary code to be executed in your function
You're using the res.send to return response to the client instead of callback. Remember the usage of res.send is only in express and you're using a lambda and to return the result to client you need to use callback function
To help you in this task, I completely rewrite your code solving these misundersand.
const Redis = require('ioredis');
module.exports.handler = async (event, context, callback) => {
// prefer to use lambda env instead of put directly in the code
const client = new Redis(
"REDIS_PORT_ENV",
"REDIS_HOST_ENV"
);
const data = await client.get(event.token);
client.quit();
const parsedData = JSON.parse(data);
if (parsedDate && new Date().getTime() / 1000 - parsedData.timestamp < 30) {
callback(null, {
address: event.token,
price: parsedData.price,
timestamp: parsedData.timestamp
});
} else {
const dataFromApi = await getPrice(event);
callback(null, dataFromApi);
}
};
There another usage with lambdas that return an object instead of pass a object inside callback, but I think you get the idea and understood your mistakes.
Follow the docs about correctly usage of lambda:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/using-lambda-functions.html
To undestand more about async and sync in javascript:
https://www.freecodecamp.org/news/synchronous-vs-asynchronous-in-javascript/
JSON.parse x eval: JSON.parse vs. eval()

AWS Cognito lambda triggers twice

I'm using an AWS Lambda function (using nodejs).
Once any request from APP to Cognito to signUp users.
Then I have set the Pre sign-up trigger to validate the user's customer and check users custom attribute available in our database or not. If yes then return an error and else insert new records in DB and return the event to Cognito.
TimeoutInfo - 5 min.
It happens sometime in the request, not all the time.
RequestId as different. (it will trigger 3 times sometime and most of the time twice)
Lambda trigger code as below.
users/index.js
const handler = async (event, context) => {
log.info('createUserLambda:start');
// immediately return once the call back is called to avoid
// lambda time out because of any open db connections
context.callbackWaitsForEmptyEventLoop = false;
return await preUserCreate(event);
};
exports.handler = handler;
users/users.js
export const preUserCreate = async (event) => {
log.info('preUserCreate:Start');
let userAttributes = event.request.userAttributes;
const currentDate = moment().utc().format('YYYY-MM-DD HH:mm:ss');
try {
let userParams = {
'docStatus': 'VRF'
};
let docParams = [{
'docNumber': userAttributes['custom:document_number'] ? userAttributes['custom:document_number'] : '',
'createdDate': currentDate
}];
if (docParams.length && docParams[0].docNumber) {
let documentExit = await getDocs(docParams[0].docNumber);
if (documentExit.length) {
log.info('preUserCreate:Error');
throw new Error('Document number already exist.');;
}
}
let documentRs = await insertDocument(docParams);
userParams = {
'did': documentRs[0].id,
'id': event.userName,
'createdDate': currentDate,
'updatedDate': currentDate,
...userParams
};
let userRs = await insertUser([userParams]);
if (docParams.length && docParams[0].docNumber) {
let resultData = await getUserAccountFromAPI(docParams[0].docNumber);
if (resultData) {
let foramattedData = await formattedAccountsData(resultData, userRs[0].id, documentRs[0].id);
await insertUserAccounts(foramattedData);
}
}
log.info('preUserCreate:Success');
event.response = {
'autoConfirmUser': false,
'autoVerifyPhone': false,
'autoVerifyEmail': false
};
return event;
} catch (error) {
log.info('preUserCreate:Error', error);
throw (error);
}
}
This likely happens because of the Cognito-imposed execution timeout of 5 seconds for integration Lambdas - and it cannot be changed. Also note that the maximum amount of times that Cognito will (re-)attempt to call the function is 3 times.
In the Customizing User Pool Workflows with Lambda Triggers section it states that:
Important
Amazon Cognito invokes Lambda functions synchronously. When
called, your Lambda function must respond within 5 seconds. If it does
not, Amazon Cognito retries the call. After 3 unsuccessful attempts,
the function times out. This 5-second timeout value cannot be changed.
Therefore to reduce the execution time it would be worth to consider introducing caching where possible. Including database connections etc.
Do however note that you have little to no control over how often Lambdas are re-used versus re-launched and you will need to keep this in mind in terms of warm-up times.
Any chance you are running your lambda in a VPC? I've seen similar behavior with a Cognito trigger that ran in a VPC when it was cold started. Once the lambda was warm the problem went away
My hunch was that internally Cognito has a very short timeout period for executing the trigger, and if the trigger didn't reply in time, it would automatically retry.
We ended up having to add logic to our trigger to test for this scenario so that we weren't duplicating writes to our database.

Why AWS Lambda execution time is long using pg-promise

I started using AWS Lambda to perform a very simple task which is executing an SQL query to retrieve records from an RDS postgres database and create SQS message base on the result.
Because Amazon is only providing aws-sdk module (using node 4.3 engine) by default and we need to execute this SQL query, we have to create a custom deployment package which includes pg-promise. Here is the code I'm using:
console.info('Loading the modules...');
var aws = require('aws-sdk');
var sqs = new aws.SQS();
var config = {
db: {
username: '[DB_USERNAME]',
password: '[DB_PASSWORD]',
host: '[DB_HOST]',
port: '[DB_PORT]',
database: '[DB_NAME]'
}
};
var pgp = require('pg-promise')({});
var cn = `postgres://${config.db.username}:${config.db.password}#${config.db.host}:${config.db.port}/${config.db.database}`;
if (!db) {
console.info('Connecting to the database...');
var db = pgp(cn);
} else {
console.info('Re-use database connection...');
}
console.log('loading the lambda function...');
exports.handler = function(event, context, callback) {
var now = new Date();
console.log('Current time: ' + now.toISOString());
// Select auction that need to updated
var query = [
'SELECT *',
'FROM "users"',
'WHERE "users"."registrationDate"<=${now}',
'AND "users"."status"=1',
].join(' ');
console.info('Executing SQL query: ' + query);
db.many(query, { status: 2, now: now.toISOString() }).then(function(data) {
var ids = [];
data.forEach(function(auction) {
ids.push(auction.id);
});
if (ids.length == 0) {
callback(null, 'No user to update');
} else {
var sqsMessage = {
MessageBody: JSON.stringify({ action: 'USERS_UPDATE', data: ids}), /* required */
QueueUrl: '[SQS_USER_QUEUE]', /* required */
};
console.log('Sending SQS Message...', sqsMessage);
sqs.sendMessage(sqsMessage, function(err, sqsResponse) {
console.info('SQS message sent!');
if (err) {
callback(err);
} else {
callback(null, ids.length + ' users were affected. SQS Message created:' + sqsResponse.MessageId);
}
});
}
}).catch(function(error) {
callback(error);
});
};
When testing my lambda function, if you look at the WatchLogs, the function itself took around 500ms to run but it says that it actually took 30502.48 ms (cf. screenshots).
So I'm guessing it's taking 30 seconds to unzip my 318KB package and start executing it? That for me is just a joke or am I missing something? I tried to upload the zip and also upload my package to S3 to check if it was faster but I still have the same latency.
I noticed that the Python version can natively perform SQL request without any custom packaging...
All our applications are written in node so I don't really want to move away from it, however I have a hard time to understand why Amazon is not providing basic npm modules for database interactions.
Any comments or help are welcome. At this point I'm not sure Lambda would be benefic for us if it takes 30 seconds to run a script that is triggered every minute...
Anyone facing the same problem?
UPDATE: This is how you need to close the connection as soon as you don't need it anymore (thanks again to Vitaly for his help):
exports.handler = function(event, context, callback) {
[...]
db.many(query, { status: 2, now: now.toISOString() }).then(function(data) {
pgp.end(); // <-- This is important to close the connection directly after the request
[...]
The execution time should be measured based on the length of operations being executed, as opposed to how long it takes for the application to exit.
There are many libraries out there that make use of a connection pool in one form or another. Those typically terminate after a configurable period of inactivity.
In case of pg-promise, which in turn uses node-postgres, such period of inactivity is determined by parameter poolIdleTimeout, which defaults to 30 seconds. With pg-promise you can access it via pgp.pg.defaults.poolIdleTimeout.
If you want your process to exit after the last query has been executed, you need to shut down the connection pool, by calling pgp.end(). See chapter Library de-initialization for details.
It is also shown in most of the code examples, as those need to exit right after finishing.

Share code between AWS lambda functions in node.js

It seems it is not possible to pass around some code (containing data and functions) that is invoked as a AWS lambda function within another AWS lambda function.
For example, take this customConfigLambda:
var callbackPayload = {};
callbackPayload.fooData = 'fooFromData';
callbackPayload.fooFunction = function() {return 'fooFromFunction'; };
exports.handler = (event, context, callback) => {
callback(null, callbackPayload);
};
When I call the previous AWS lambda function in another AWS lambda function like here:
var AWS = require('aws-sdk');
AWS.config.update({accessKey: '123', secretAccessKey: 'abc', region: 'us-east-1' });
var lambda = new AWS.Lambda({region: 'us-east-1'});
exports.handler = (event, context, callback) => {
var params = {FunctionName: 'customConfigLambda'};
lambda.invoke(params, function(err, callbackPayload) {
if (err) {
// do nothing
}
else {
console.log('callbackPayload:', JSON.stringify(callbackPayload, null, 2));
}
});
};
Then I can see only callbackPayload.fooData but not callbackPayload.fooFunction.
How can I have some callbackPayload.fooFunction(s) shared between multiple other AWS lambda functions?
As of AWS Reinvent 2018, Amazon has introduced Lambda Layers.
Lambda Layers, a way to centrally manage code and data that is shared across multiple functions.
The idea is that now you can put common components in a ZIP file and upload it as a Lambda Layer. Your function code doesn’t need to be changed and can reference the libraries in the layer as it would normally do instead of packaging them separately.
See the docs on Using the Callback Parameter at:
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-handler.html#nodejs-prog-model-handler-callback
It says this about the result (the callbackPayload in your code):
result – is an optional parameter that you can use to provide the
result of a successful function execution. The result provided must be
JSON.stringify compatible. If an error is provided, this parameter is
ignored.
To be JSON.stringify compatible you cannot have any functions there. See the http://json.org/ to see what is valid JSON (only strings, numbers, objects, arrays, true, false and null).
If you want to share code between your AWS Lambda functions in a broad sense, you have to require the same Node module in both of them, so that you can make a common set of functions available to all of your AWS Lamda handlers. But you cannot pass around arbitrary code between them because those will not survive the JSON.stringify.
As a test you can try running this in the browser:
var callbackPayload = {};
callbackPayload.fooData = 'fooFromData';
callbackPayload.fooFunction = function() {return 'fooFromFunction'; };
alert(JSON.stringify(callbackPayload));
(see DEMO)
or this in Node:
var callbackPayload = {};
callbackPayload.fooData = 'fooFromData';
callbackPayload.fooFunction = function() {return 'fooFromFunction'; };
console.log(JSON.stringify(callbackPayload));
and see the result:
{"fooData":"fooFromData"}
The functions is stripped out during the serialization process.
Of course you could do something like this:
callbackPayload.fooFunction
= function() {return 'fooFromFunction'; }.toString();
and get a JSON result:
{"fooData":"fooFromData","fooFunction":"function () {return 'fooFromFunction'; }"}
which you could theoretically eval on the other end but I wouldn't really recommend it.

Best Practices in Serverless Framework

I am new beginner in serverless framwork.
When study Best Practices in Serverless.
here
I have a question about "Initialize external services outside of your Lambda code".
How to implement it?
For example: Below code in handler.js
const getOneUser = (event, callback) => {
let response = null;
// validate parameters
if (event.accountid && process.env.SERVERLESS_SURVEYTABLE) {
let docClient = new aws.DynamoDB.DocumentClient();
let params = {
TableName: process.env.SERVERLESS_USERTABLE,
Key: {
accountid: event.accountid,
}
};
docClient.get(params, function(err, data) {
if (err) {
// console.error("Unable to get an item with the request: ", JSON.stringify(params), " along with error: ", JSON.stringify(err));
return callback(getDynamoDBError(err), null);
} else {
if (data.Item) { // got response
// compose response
response = {
accountid: data.Item.accountid,
username: data.Item.username,
email: data.Item.email,
role: data.Item.role,
};
return callback(null, response);
} else {
// console.error("Unable to get an item with the request: ", JSON.stringify(params));
return callback(new Error("404 Not Found: Unable to get an item with the request: " + JSON.stringify(params)), null);
}
}
});
}
// incomplete parameters
else {
return callback(new Error("400 Bad Request: Missing parameters: " + JSON.stringify(event)), null);
}
};
The question is that how to initial DynamoDB outside of my Lambda code.
Update 2:
Is below code optimized?
Handler.js
let survey = require('./survey');
module.exports.handler = (event, context, callback) => {
return survey.getOneSurvey({
accountid: event.accountid,
surveyid: event.surveyid
}, callback);
};
survey.js
let docClient = new aws.DynamoDB.DocumentClient();
module.exports = (() => {
const getOneSurvey = (event, callback) {....
docClient.get(params, function(err, data)...
....
};
return{
getOneSurvey : getOneSurvey,
}
})();
Here's the quote in question:
Initialize external services outside of your Lambda code
When using services (like DynamoDB) make sure to initialize outside of your lambda code. Ex: module initializer (for Node), or to a static constructor (for Java). If you initiate a connection to DDB inside the Lambda function, that code will run on every invoke.
In other words, in the same file, but outside of -- before -- the actual handler code.
let docClient = new aws.DynamoDB...
...
const getOneUser = (event, callback) => {
....
docClient.get(params, ...
When the container starts, the code outside the handler runs. When subsequent function invocations reuse the same container, you save resources and time by not instantiating the external services again. Containers are often reused, but each container only handles one concurrent request at a time, and how often they are reused and for how long is outside your control... Unless you update the function, in which case any existing containers will no longer be reused, because they'd have the old version of the function.
Your code will work as written, but isn't optimized.
The caveat with this approach that arises in current generation Node.js Lambda functions (Node 4.x/6.x) is that some objects -- notably, those that create literal persistent connections to external services -- will prevent the event loop from becoming empty (a common example is a mysql database connection, which is holding a live TCP connection to the server; by contrast, a DynamoDB "connection" is actually connectionless, since it's transport protocol is HTTPS). In this case you need to either take a different approach or allow lambda to not wait for an empty event loop before freezing the container, by setting context.callbackWaitsForEmptyEventLoop to false before calling the callback... but only do this if needed and only if you fully understand what it means. Setting it by default because some guy on the Internet said it was a good idea will potentially bring you mysterious bugs, later.

Resources