AWS Cognito lambda triggers twice - node.js

I'm using an AWS Lambda function (using nodejs).
Once any request from APP to Cognito to signUp users.
Then I have set the Pre sign-up trigger to validate the user's customer and check users custom attribute available in our database or not. If yes then return an error and else insert new records in DB and return the event to Cognito.
TimeoutInfo - 5 min.
It happens sometime in the request, not all the time.
RequestId as different. (it will trigger 3 times sometime and most of the time twice)
Lambda trigger code as below.
users/index.js
const handler = async (event, context) => {
log.info('createUserLambda:start');
// immediately return once the call back is called to avoid
// lambda time out because of any open db connections
context.callbackWaitsForEmptyEventLoop = false;
return await preUserCreate(event);
};
exports.handler = handler;
users/users.js
export const preUserCreate = async (event) => {
log.info('preUserCreate:Start');
let userAttributes = event.request.userAttributes;
const currentDate = moment().utc().format('YYYY-MM-DD HH:mm:ss');
try {
let userParams = {
'docStatus': 'VRF'
};
let docParams = [{
'docNumber': userAttributes['custom:document_number'] ? userAttributes['custom:document_number'] : '',
'createdDate': currentDate
}];
if (docParams.length && docParams[0].docNumber) {
let documentExit = await getDocs(docParams[0].docNumber);
if (documentExit.length) {
log.info('preUserCreate:Error');
throw new Error('Document number already exist.');;
}
}
let documentRs = await insertDocument(docParams);
userParams = {
'did': documentRs[0].id,
'id': event.userName,
'createdDate': currentDate,
'updatedDate': currentDate,
...userParams
};
let userRs = await insertUser([userParams]);
if (docParams.length && docParams[0].docNumber) {
let resultData = await getUserAccountFromAPI(docParams[0].docNumber);
if (resultData) {
let foramattedData = await formattedAccountsData(resultData, userRs[0].id, documentRs[0].id);
await insertUserAccounts(foramattedData);
}
}
log.info('preUserCreate:Success');
event.response = {
'autoConfirmUser': false,
'autoVerifyPhone': false,
'autoVerifyEmail': false
};
return event;
} catch (error) {
log.info('preUserCreate:Error', error);
throw (error);
}
}

This likely happens because of the Cognito-imposed execution timeout of 5 seconds for integration Lambdas - and it cannot be changed. Also note that the maximum amount of times that Cognito will (re-)attempt to call the function is 3 times.
In the Customizing User Pool Workflows with Lambda Triggers section it states that:
Important
Amazon Cognito invokes Lambda functions synchronously. When
called, your Lambda function must respond within 5 seconds. If it does
not, Amazon Cognito retries the call. After 3 unsuccessful attempts,
the function times out. This 5-second timeout value cannot be changed.
Therefore to reduce the execution time it would be worth to consider introducing caching where possible. Including database connections etc.
Do however note that you have little to no control over how often Lambdas are re-used versus re-launched and you will need to keep this in mind in terms of warm-up times.

Any chance you are running your lambda in a VPC? I've seen similar behavior with a Cognito trigger that ran in a VPC when it was cold started. Once the lambda was warm the problem went away
My hunch was that internally Cognito has a very short timeout period for executing the trigger, and if the trigger didn't reply in time, it would automatically retry.
We ended up having to add logic to our trigger to test for this scenario so that we weren't duplicating writes to our database.

Related

Why Does my AWS lambda function randomly fail when using private elasticache network calls as well as external API calls?

I am trying to write a caching function that returns cached elasticcache data or makes an api call to retrieve that data. However, the lambda function seems to be very unrealiable and timing out often.
It seems that the issue is having redis calls as well as public api calls causes the issue. I can confirm that I have setup aws correctly with a subnet with an internet gateway and a private subnet with a nat gateway. The function works, but lonly 10 % of the time.The remaining times exceution is stopped right before making the API call.
I have also noticed that the api calls fail after creating the redis client. If I make the external api call prior to making the redis check it seems the function is a lot more reliable and doesn't time out.
Not sure what to do. Is it best practice to seperate these 2 tasks or am I doing something wrong?
let data = null;
module.exports.handler = async (event) => {
//context.callbackWaitsForEmptyEventLoop = false;
let client;
try {
client = new Redis(
6379,
"redis://---.---.ng.0001.use1.cache.amazonaws.com"
);
client.get(event.token, async (err, result) => {
if (err) {
console.error(err);
} else {
data = result;
await client.quit();
}
});
if (data && new Date().getTime() / 1000 - eval(data).timestamp < 30) {
res.send(`({
"address": "${token}",
"price": "${eval(data).price}",
"timestamp": "${eval(data).timestamp}"
})`);
} else {
getPrice(event); //fetch api data
}
```
There a lot of misunderstand in your code. I'll try to guide you to fix it and understand how to do that correctly.
You are mixing asynchronous and synchronous code in your function.
You should use JSON.parse instead of eval to parse the data because eval allows arbitrary code to be executed in your function
You're using the res.send to return response to the client instead of callback. Remember the usage of res.send is only in express and you're using a lambda and to return the result to client you need to use callback function
To help you in this task, I completely rewrite your code solving these misundersand.
const Redis = require('ioredis');
module.exports.handler = async (event, context, callback) => {
// prefer to use lambda env instead of put directly in the code
const client = new Redis(
"REDIS_PORT_ENV",
"REDIS_HOST_ENV"
);
const data = await client.get(event.token);
client.quit();
const parsedData = JSON.parse(data);
if (parsedDate && new Date().getTime() / 1000 - parsedData.timestamp < 30) {
callback(null, {
address: event.token,
price: parsedData.price,
timestamp: parsedData.timestamp
});
} else {
const dataFromApi = await getPrice(event);
callback(null, dataFromApi);
}
};
There another usage with lambdas that return an object instead of pass a object inside callback, but I think you get the idea and understood your mistakes.
Follow the docs about correctly usage of lambda:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/using-lambda-functions.html
To undestand more about async and sync in javascript:
https://www.freecodecamp.org/news/synchronous-vs-asynchronous-in-javascript/
JSON.parse x eval: JSON.parse vs. eval()

Trying to run a Cloud Function with LRO

Background
I am working on creating an autonomous Google AutoML end<>end system. I created a cloud function that receives a cloud pub/sub message when training starts. The cloud function uses the operation ID to get the operation status of the training. If the training of the model is complete(operation metadata = true), the function will send the model ID to a deployment function and send a pub/sub message with the modelID for the model to be called on prediction from. I found a solution from SO from this post How to programmatically get model id from google-cloud-automl with node.js client library
Problem
The issue I am coming across is with the cloud function timeout of 10 minutes. I wrote this question on reddit on potential solutions. https://www.reddit.com/r/googlecloud/comments/jqr213/cloud_function_to_compute_engine/ The Compute Engine solution seems not practical for a system mainly written in a cloud function environment. While trying to implement the cron job solution, I thought of the retry feature for cloud functions. It keeps the same event and will retry the function for up to a week. The documentation for retry is https://cloud.google.com/functions/docs/bestpractices/retries How could I include a cancel of the function to keep it retrying until it becomes true and completes the deployment and pub/sub message? My thought is to include the ending of the system in the if else statement, I am just struggling to find documentation of this/ if it would actually work.
Code
const {AutoMlClient} = require('#google-cloud/automl').v1;
// Instantiates a client
const client = new AutoMlClient();
exports.helloPubSub = (event, context) => {
//Imports the Google Cloud AutoML library
const message = event.data
? Buffer.from(event.data, 'base64').toString()
: 'Hello, World';
const model = message;
console.log(model);
const modelpath = message.replace('"','');
const modelID = modelpath.replace('"','');
const message1 = model.replace('projects/170974376642/locations/us-central1/operations/','');
const message2 = message1.replace('"','');
const message3 = message2.replace('"','');
console.log(`Operation ID is: ${message3}`)
getOperationStatus(message3, modelID);
}
// [START automl_vision_classification_deploy_model_node_count]
async function getOperationStatus(opId, message) {
console.log('Starting operation status');
const opped = opId;
const data = message;
const projectId = '170974376642';
const location = 'us-central1';
const operationId = opId;
// Construct request
const request = {
name: `${message}`,
};
console.log('Made it to the response');
const [response] = await client.operationsClient.getOperation(request);
console.log(`Name: ${response.name}`);
console.log(`Operation details:`);
var apple = JSON.stringify(response);
console.log(apple);
console.log('Loop until the model is ready to deploy');
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
}
async function pubSub(id) {
const topicName = 'modelID';
const data = JSON.stringify({foo: `${id}`});
async function publishMessage() {
// Publishes the message as a string, e.g. "Hello, world!" or JSON.stringify(someObject)
const dataBuffer = Buffer.from(data);
try {
const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
console.log(`Message ${messageId} published.`);
} catch (error) {
console.error(`Received error while publishing: ${error.message}`);
process.exitCode = 1;
}
}
publishMessage();
// [END pubsub_publish_with_error_handler]
// [END pubsub_quickstart_publisher]
process.on('unhandledRejection', err => {
console.error(err.message);
process.exitCode = 1;
});
}
async function deployModelWithNodeCount(message) {
const projectId = 'ireda1';
const location = 'us-central1';
const modelId = message;
// Construct request
const request = {
name: client.modelPath(projectId, location, modelId),
imageClassificationModelDeploymentMetadata: {
nodeCount: 1,
},
};
const [operation] = await client.deployModel(request);
// Wait for operation to complete.
const [response] = await operation.promise();
console.log(`Model deployment finished. ${response}`);
}
// [END automl_vision_classification_deploy_model_node_count]
There are several improvements that you can consider for your code. First of all, it is important to understand that Cloud Functions are short-lived. 9 minutes is the maximum, your function will be active. Cloud Functions are not meant for background operations, if you are looking at a solution, which can be executed in the background and requires minimal infrastructure, I would recommend having a look at Cloud Run.
Now lets have a look at some parts of the code and how it can be improved with a different architecture maintaining Cloud Functions and PubSub as the backbone.
Waiting on model deployment
The code you use is:
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
First of all, I would strongly suggest not to use recursion here, because a) this can be handled via a simple loop, b) you are bombarding the service without any time out or back-off policy. The latter might result in either your service crashing or endpoint starting to reject your requests.
To improve your code, you can for example set at least timeout function, like this:
setTimeout(getOperationStatus(opped, data), 1000)
For readability, I would also suggest just to use a loop in the future since you are using async patterns anyways:
status = getOperationStatus(opped, data);
while(!status){
await new Promise(t => setTimeout(t, 1000));
status = getOperationStatus(opped, data);
}
In this case, you need to separate it into two functions - 1) getOperationStatus, which actually just return status, and 2) waitForDeployment, which polls for the status, compares it with the expected result, and decides to a) wait & retry or b) abandon & return
This might make your code better, but does not solve the fundamental problem of the system design. To understand this, let's have a look a splitting responsibility and structuring the system differently. As a side note, the guide here is not meant for a Cloud Function application.
A few explanations:
Activation Function initializes the entire process, it calls the Vision Auto ML to start the deployment. It only gets the ID of the operation and pushes it to the queue
Cloud Scheduler pushes a trigger to PubSub (alternatively it can also call the function as an endpoint) every X minutes/seconds saying that it is time to check on the progress
Polling Function once triggered ask for the next ID to check, queries Cloud AutoML and if finished, acknowledges the message and writes the results, otherwise exits. You need to be careful with the configuration of acknowledgments here. Useful information is here
Polling of the status
The minor thing I have noticed is how you are polling the status. Why don't your just query this URL GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id and get status of done (check here for details)
Conclusion: Cloud Functions are short-lived and must handle only one operation at a time, no waiting. If you want a simple loop for waiting for results, use Cloud Run

Lambda function only putting one data point into InfluxDB

I have a Lambda function that is designed to take a message from a SQS queue and then input a value called perf_value which is just an integer. The CloudWatch logs show it firing each time and logging Done as seen in the .then() block of my write point. With it firing each time I am still only seeing a single data point in InfluxDB Cloud. I can't figure out why it is only inputting a single value then nothing after that. I don't see a backlog in SQS and no error messages in CloudWatch either. I'm guessing it is a code issue or InfluxDB Cloud setup though I used defaults which you would expect to actually work for multiple data points
'use strict';
const {InfluxDB, Point, HttpError} = require('#influxdata/influxdb-client')
const InfluxURL = 'https://us-west-2-1.aws.cloud2.influxdata.com'
const token = '<my token>=='
const org = '<my org>'
const bucket= '<bucket name>'
const writeApi = new InfluxDB({url: InfluxURL, token}).getWriteApi(org, bucket, 'ms')
module.exports.perf = function (event, context, callback) {
context.callbackWaitsForEmptyEventLoop = false;
let input = JSON.parse(event.Records[0].body);
console.log(input)
const point = new Point('elapsedTime')
.tag(input.monitorID, 'monitorID')
.floatField('elapsedTime', input.perf_value)
// .timestamp(input.time)
writeApi.writePoint(point)
writeApi
.close()
.then(() => {
console.log('Done')
})
.catch(e => {
console.error(e)
if (e instanceof HttpError && e.statusCode === 401) {
console.log('Unauthorized request')
}
console.log('\nFinished ERROR')
})
return true
};
EDIT**
Still have been unable to resolve the issue. I can get one datapoint to go into the influxdb and then nothing will show up.
#Joshk132 -
I believe the problem is here:
writeApi
.close() // <-- here
.then(() => {
console.log('Done')
})
You are closing the API client object after the first write so you are only able to write once. You can use flush() instead if you want to force sending the Point immediately.

Get all messages from AWS SQS in NodeJS

I have the following function that gets a message from aws SQS, the problem is I get one at a time and I wish to get all of them, because I need to check the ID for each message:
function getSQSMessages() {
const params = {
QueueUrl: 'some url',
};
sqs.receiveMessage(params, (err, data) => {
if(err) {
console.log(err, err.stack)
return(err);
}
return data.Messages;
});
};
function sendMessagesBack() {
return new Promise((resolve, reject) => {
if(Array.isArray(getSQSMessages())) {
resolve(getSQSMessages());
} else {
reject(getSQSMessages());
};
});
};
The function sendMessagesBack() is used in another async/await function.
I am not sure how to get all of the messages, as I was looking on how to get them, people mention loops but I could not figure how to implement it in my case.
I assume I have to put sqs.receiveMessage() in a loop, but then I get confused on what do I need to check and when to stop the loop so I can get the ID of each message?
If anyone has any tips, please share.
Thank you.
I suggest you to use the Promise api, and it will give you the possibility to use async/await syntax right away.
const { Messages } = await sqs.receiveMessage(params).promise();
// Messages will contain all your needed info
await sqs.sendMessage(params).promise();
In this way, you will not need to wrap the callback API with Promises.
SQS doesn't return more than 10 messages in the response. To get all the available messages, you need to call the getSQSMessages function recursively.
If you return a promise from getSQSMessages, you can do something like this.
getSQSMessages()
.then(data => {
if(!data.Messages || data.Messages.length === 0){
// no messages are available. return
}
// continue processing for each message or push the messages into array and call
//getSQSMessages function again.
});
You can never be guaranteed to get all the messages in a queue, unless after you get some of them, you delete them from the queue - thus ensuring that the next requests returns a different selection of records.
Each request will return 'upto' 10 messages, if you don't delete them, then there is a good chance that the next request for 'upto' 10 messages will return a mix of messages you have already seen, and some new ones - so you will never really know when you have seen them all.
It maybe that a queue is not the right tool to use in your use case - but since I don't know your use case, its hard to say.
I know this is a bit of a necro but I landed here last night while trying to pull some all messages from a dead letter queue in SQS. While the accepted answer, "you cannot guarantee to get all messages" from the queue is absolutely correct I did want to drop an answer for anyone that may land here as well and needs to get around the 10 message limit per request from AWS.
Dependencies
In my case I have a few dependencies already in my project that I used to make life simpler.
lodash - This is something we use in our code for help making things functional. I don't think I used it below but I'm including it since it's in the file.
cli-progress - This gives you a nice little progress bar on your CLI.
Disclaimer
The below was thrown together during troubleshooting some production errors integrating with another system. Our DLQ messages contain some identifiers that I need in order to formulate cloud watch queries for troubleshooting. Given that these are two different GUIs in AWS switching back and forth is cumbersome given that our AWS session are via a form of federation and the session only lasts for one hour max.
The script
#!/usr/bin/env node
const _ = require('lodash');
const aswSdk = require('aws-sdk');
const cliProgress = require('cli-progress');
const queueUrl = 'https://[put-your-url-here]';
const queueRegion = 'us-west-1';
const getMessages = async (sqs) => {
const resp = await sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
}).promise();
return resp.Messages;
};
const main = async () => {
const sqs = new aswSdk.SQS({ region: queueRegion });
// First thing we need to do is get the current number of messages in the DLQ.
const attributes = await sqs.getQueueAttributes({
QueueUrl: queueUrl,
AttributeNames: ['All'], // Probably could thin this down but its late
}).promise();
const numberOfMessage = Number(attributes.Attributes.ApproximateNumberOfMessages);
// Next we create a in-memory cache for the messages
const allMessages = {};
let running = true;
// Honesty here: The examples we have in existing code use the multi-bar. It was about 10PM and I had 28 DLQ messages I was looking into. I didn't feel it was worth converting the multi-bar to a single-bar. Look into the docs on the github page if this is really a sticking point for you.
const progress = new cliProgress.MultiBar({
format: ' {bar} | {name} | {value}/{total}',
hideCursor: true,
clearOnComplete: true,
stopOnComplete: true
}, cliProgress.Presets.shades_grey);
const progressBar = progress.create(numberOfMessage, 0, { name: 'Messages' });
// TODO: put in a time limit to avoid an infinite loop.
// NOTE: For 28 messages I managed to get them all with this approach in about 15 seconds. When/if I cleanup this script I plan to add the time based short-circuit at that point.
while (running) {
// Fetch all the messages we can from the queue. The number of messages is not guaranteed per the AWS documentation.
let messages = await getMessages(sqs);
for (let i = 0; i < messages.length; i++) {
// Loop though the existing messages and only copy messages we have not already cached.
let message = messages[i];
let data = allMessages[message.MessageId];
if (data === undefined) {
allMessages[message.MessageId] = message;
}
}
// Update our progress bar with the current progress
const discoveredMessageCount = Object.keys(allMessages).length;
progressBar.update(discoveredMessageCount);
// Give a quick pause just to make sure we don't get rate limited or something
await new Promise((resolve) => setTimeout(resolve, 1000));
running = discoveredMessageCount !== numberOfMessage;
}
// Now that we have all the messages I printed them to console so I could copy/paste the output into LibreCalc (excel-like tool). I split on the semicolon for rows out of habit since sometimes similar scripts deal with data that has commas in it.
const keys = Object.keys(allMessages);
console.log('Message ID;ID');
for (let i = 0; i < keys.length; i++) {
const message = allMessages[keys[i]];
const decodedBody = JSON.parse(message.Body);
console.log(`${message.MessageId};${decodedBody.id}`);
}
};
main();

Firestore trigger timeouts occasionally

I have a Cloud Firestore trigger that takes care of adjusting the balance of a user's wallet in my app.
exports.onCreateTransaction = functions.firestore
.document('accounts/{accountId}/transactions/{transactionId}')
.onCreate(async (snap, context) => {
const { accountId, transactionId } = context.params;
const transaction = snap.data();
// See the implementation of alreadyTriggered in the next code block
const alreadyTriggered = await firestoreHelpers.triggers.alreadyTriggered(context);
if (alreadyTriggered) {
return null;
}
if (transaction.status === 'confirmed') {
const accountRef = firestore
.collection('accounts')
.doc(accountId);
const account = (await accountRef.get()).data();
const balance = transaction.type === 'deposit' ?
account.balance + transaction.amount :
account.balance - transaction.amount;
await accountRef.update({ balance });
}
return snap.ref.update({ id: transactionId });
});
As a trigger may actually be called more than once, I added this alreadyTriggered helper function:
const alreadyTriggered = (event) => {
return firestore.runTransaction(async transaction => {
const { eventId } = event;
const metaEventRef = firestore.doc(`metaEvents/${eventId}`);
const metaEvent = await transaction.get(metaEventRef);
if (metaEvent.exists) {
console.error(`Already triggered function for event: ${eventId}`);
return true;
} else {
await transaction.set(metaEventRef, event);
return false;
}
})
};
Most of the time everything works as expected. However, today I got a timeout error which caused data inconsistency in the database.
Function execution took 60005 ms, finished with status: 'timeout'
What was the reason behind this timeout? And how do I make sure that it never happens again, so that my transaction amounts are successfully reflected in the account balance?
That statement about more-than-once execution was a beta limitation, as stated. Cloud Functions is out of beta now. The current guarantee is at-least-once execution by default. you only get multiple possible events if you enable retries in the Cloud console. This is something you should do if you want to make sure your events are processed reliably.
The reason for the timeout may never be certain. There could be any number of reasons. Perhaps there was a hiccup in the network, or a brief amount of downtime somewhere in the system. Retries are supposed to help you recover from these temporary situations by delivering the event potentially many times, so your function can succeed.

Resources