Get all messages from AWS SQS in NodeJS - node.js

I have the following function that gets a message from aws SQS, the problem is I get one at a time and I wish to get all of them, because I need to check the ID for each message:
function getSQSMessages() {
const params = {
QueueUrl: 'some url',
};
sqs.receiveMessage(params, (err, data) => {
if(err) {
console.log(err, err.stack)
return(err);
}
return data.Messages;
});
};
function sendMessagesBack() {
return new Promise((resolve, reject) => {
if(Array.isArray(getSQSMessages())) {
resolve(getSQSMessages());
} else {
reject(getSQSMessages());
};
});
};
The function sendMessagesBack() is used in another async/await function.
I am not sure how to get all of the messages, as I was looking on how to get them, people mention loops but I could not figure how to implement it in my case.
I assume I have to put sqs.receiveMessage() in a loop, but then I get confused on what do I need to check and when to stop the loop so I can get the ID of each message?
If anyone has any tips, please share.
Thank you.

I suggest you to use the Promise api, and it will give you the possibility to use async/await syntax right away.
const { Messages } = await sqs.receiveMessage(params).promise();
// Messages will contain all your needed info
await sqs.sendMessage(params).promise();
In this way, you will not need to wrap the callback API with Promises.

SQS doesn't return more than 10 messages in the response. To get all the available messages, you need to call the getSQSMessages function recursively.
If you return a promise from getSQSMessages, you can do something like this.
getSQSMessages()
.then(data => {
if(!data.Messages || data.Messages.length === 0){
// no messages are available. return
}
// continue processing for each message or push the messages into array and call
//getSQSMessages function again.
});

You can never be guaranteed to get all the messages in a queue, unless after you get some of them, you delete them from the queue - thus ensuring that the next requests returns a different selection of records.
Each request will return 'upto' 10 messages, if you don't delete them, then there is a good chance that the next request for 'upto' 10 messages will return a mix of messages you have already seen, and some new ones - so you will never really know when you have seen them all.
It maybe that a queue is not the right tool to use in your use case - but since I don't know your use case, its hard to say.

I know this is a bit of a necro but I landed here last night while trying to pull some all messages from a dead letter queue in SQS. While the accepted answer, "you cannot guarantee to get all messages" from the queue is absolutely correct I did want to drop an answer for anyone that may land here as well and needs to get around the 10 message limit per request from AWS.
Dependencies
In my case I have a few dependencies already in my project that I used to make life simpler.
lodash - This is something we use in our code for help making things functional. I don't think I used it below but I'm including it since it's in the file.
cli-progress - This gives you a nice little progress bar on your CLI.
Disclaimer
The below was thrown together during troubleshooting some production errors integrating with another system. Our DLQ messages contain some identifiers that I need in order to formulate cloud watch queries for troubleshooting. Given that these are two different GUIs in AWS switching back and forth is cumbersome given that our AWS session are via a form of federation and the session only lasts for one hour max.
The script
#!/usr/bin/env node
const _ = require('lodash');
const aswSdk = require('aws-sdk');
const cliProgress = require('cli-progress');
const queueUrl = 'https://[put-your-url-here]';
const queueRegion = 'us-west-1';
const getMessages = async (sqs) => {
const resp = await sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
}).promise();
return resp.Messages;
};
const main = async () => {
const sqs = new aswSdk.SQS({ region: queueRegion });
// First thing we need to do is get the current number of messages in the DLQ.
const attributes = await sqs.getQueueAttributes({
QueueUrl: queueUrl,
AttributeNames: ['All'], // Probably could thin this down but its late
}).promise();
const numberOfMessage = Number(attributes.Attributes.ApproximateNumberOfMessages);
// Next we create a in-memory cache for the messages
const allMessages = {};
let running = true;
// Honesty here: The examples we have in existing code use the multi-bar. It was about 10PM and I had 28 DLQ messages I was looking into. I didn't feel it was worth converting the multi-bar to a single-bar. Look into the docs on the github page if this is really a sticking point for you.
const progress = new cliProgress.MultiBar({
format: ' {bar} | {name} | {value}/{total}',
hideCursor: true,
clearOnComplete: true,
stopOnComplete: true
}, cliProgress.Presets.shades_grey);
const progressBar = progress.create(numberOfMessage, 0, { name: 'Messages' });
// TODO: put in a time limit to avoid an infinite loop.
// NOTE: For 28 messages I managed to get them all with this approach in about 15 seconds. When/if I cleanup this script I plan to add the time based short-circuit at that point.
while (running) {
// Fetch all the messages we can from the queue. The number of messages is not guaranteed per the AWS documentation.
let messages = await getMessages(sqs);
for (let i = 0; i < messages.length; i++) {
// Loop though the existing messages and only copy messages we have not already cached.
let message = messages[i];
let data = allMessages[message.MessageId];
if (data === undefined) {
allMessages[message.MessageId] = message;
}
}
// Update our progress bar with the current progress
const discoveredMessageCount = Object.keys(allMessages).length;
progressBar.update(discoveredMessageCount);
// Give a quick pause just to make sure we don't get rate limited or something
await new Promise((resolve) => setTimeout(resolve, 1000));
running = discoveredMessageCount !== numberOfMessage;
}
// Now that we have all the messages I printed them to console so I could copy/paste the output into LibreCalc (excel-like tool). I split on the semicolon for rows out of habit since sometimes similar scripts deal with data that has commas in it.
const keys = Object.keys(allMessages);
console.log('Message ID;ID');
for (let i = 0; i < keys.length; i++) {
const message = allMessages[keys[i]];
const decodedBody = JSON.parse(message.Body);
console.log(`${message.MessageId};${decodedBody.id}`);
}
};
main();

Related

Ability to provide insights from Redis Bull Queue data

I have an application that makes API calls to another system, and it queues these API calls in a queue using Bull and Redis.
However, occasionally it gets bogged down with lots of API calls, or something stops working properly, and I want an easy way for users to check if the system is just "busy". (Otherwise, if they perform some action, and 10 minutes later it hasn't completed, they'll keep trying it again, and then we get a backlog of more entries (and in some cases data issues where they've issued duplicate parts, etc.)
Here's what a single "key" looks like for a successful API call in the queue:
HSET "bull:webApi:4822" "timestamp" "1639085540683"
HSET "bull:webApi:4822" "returnvalue" "{"id":"e1df8bb4-fb6c-41ad-ba62-774fe64b7882","workOrderNumber":"WO309967","status":"success"}"
HSET "bull:webApi:4822" "processedOn" "1639085623027"
HSET "bull:webApi:4822" "data" "{"id":"e1df8bb4-fb6c-41ad-ba62-774fe64b7882","token":"eyJ0eXAiOiJKV1QiL....dQVyEpXt64Fznudfg","workOrder":{"members":{"lShopFloorLoad":true,"origStartDate":"2021-12-09T00:00:00","origRequiredQty":2,"requiredQty":2,"requiredDate":"2021-12-09T00:00:00","origRequiredDate":"2021-12-09T00:00:00","statusCode":"Released","imaItemName":"Solid Pin - Black","startDate":"2021-12-09T00:00:00","reference":"HS790022053","itemId":"13840402"}},"socketId":"3b9gejTZjAXsnEITAAvB","type":"Create WO"}"
HSET "bull:webApi:4822" "delay" "0"
HSET "bull:webApi:4822" "priority" "0"
HSET "bull:webApi:4822" "name" "__default__"
HSET "bull:webApi:4822" "opts" "{"lifo":true,"attempts":1,"delay":0,"timestamp":1639085540683}"
HSET "bull:webApi:4822" "finishedOn" "1639085623934"
You can see in this case it took 83 seconds to process. (1639085540 - 1639085623)
I'd like to be able to provide summary metrics like:
Most recent API call was added to queue X seconds ago
Most recent successful API call completed X seconds ago and took XX seconds to
complete.
I'd also like to be able to provide a list of the 50 most recent API calls, formatted in a nice way and tagged with "success", "pending", or "failed".
I'm fairly new to Redis and Bull, and I'm trying to figure out how to query this data (using Redis in Node.js) and return this data as JSON to the application.
I can pull a list of keys like this:
// #route GET /status
async function status(req, res) {
const client = createClient({
url: `redis://${REDIS_SERVER}:6379`
});
try {
client.on('error', (err) => console.log('Redis Client Error', err));
await client.connect();
const value = await client.keys('*');
res.json(value)
} catch (error) {
console.log('ERROR getting status: ', error.message, new Date())
res.status(500).json({ message: error.message })
} finally {
client.quit()
}
}
Which will return ["bull:webApi:3","bull:webApi:1","bull:webApi:2"...]
But how can I pull the values associated to the respective keys?
And how can I find the key with the highest number, and then pull the details for the "last 50". In SQL, it would be like doing a ORDER BY key_number DESC LIMIT 50 - but I'm not sure how to do it in Redis.
I'm a bit late here, but if you're not set on manually digging around in Redis, I think Bull's API, in particular Queue#getJobs(), has everything you need here, and should be much easier to work with. Generally, you shouldn't have to reach into Redis to do any common tasks like this, that's what Bull is for!
If I understand you goal correctly, you should be able to do something like:
const Queue = require('bull')
async function status (req, res) {
const { listNum = 10 } = req.params
const api_queue = new Queue('webApi', `redis://${REDIS_SERVER}:6379`)
const current_timestamp_sec = new Date().getTime() / 1000 // convert to seconds
const recent_jobs = await api_queue.getJobs(null, 0, listNum)
const results = recent_jobs.map(job => {
const processed_on_sec = job.processedOn / 1000
const finished_on_sec = job.finishedOn / 1000
return {
request_data: job.data,
return_data: job.returnvalue,
processedOn: processed_on_sec,
finishedOn: finished_on_sec,
duration: finished_on_sec - processed_on_sec,
elapsedSinceStart: current_timestamp_sec - processed_on_sec,
elapsedSinceFinished: current_timestamp_sec - finished_on_sec
}
})
res.json(results)
}
That will get you the most recent numList* jobs in your queue. I haven't tested this full code, and I'll leave the error handling and adding of your custom fields to the job data up to you, but the core of it is solid and I think that should meet your needs without ever having to think about how Bull stores things in Redis.
I also included a suggestion on how to deal with timestamps a bit more nicely, you don't need to do string processing to convert milliseconds to seconds. If you need them to be integers you can wrap them in Math.floor().
* at least that many, anyway - see the second note below
A couple notes:
The first argument of getJobs() is a list of statuses, so if you want to look at just completed jobs, you can pass ['completed'], or completed and active, do ['completed', 'active'], etc. If no list is provided (null) it defaults to all statuses.
As mentioned in the reference I linked, the limit here is per state - so you'll likely get more than listNum jobs back. It doesn't seem like that should be a problem for your use case, but if it is, you can sort the list returned (probably by job id) and just return the first listNum - you're guaranteed to get at least that many (assuming there are that many jobs in your queue), and won't get more than 6*listNum (since there are 6 states).
Folks new to Bull can get nervous about instantiating a Queue object to do stuff like this - but don't be! By itself a Queue instance doesn't do anything, it's just an interface to the given queue. It won't start processing jobs until you call process() to add a processor. This is, incidentally, also how you'd add jobs from a separate process than you run your queues in, but of course nothing will be added unless you call add().
So I've figured out how to pull the data I need. I'm not saying it's a good method, and I'm open to suggestions; but it seems to work to provide a filtered JSON return with the needed data, without changing how the queue functions.
Here's what it looks like:
// #route GET /status/:listNum
async function status(req, res) {
const { listNum = 10} = req.params
const client = createClient({
url: `redis://${REDIS_SERVER}:6379`
});
try {
client.on('error', (err) => console.log('Redis Client Error', err));
await client.connect();
// Find size of queue database
const total_keys = await client.sendCommand(['DBSIZE']);
const upper_value = total_keys;
const lower_value = total_keys - listNum;
// Generate array
const range = (start, stop) => Array.from({ length: (start - stop) + 1}, (_, i) => start - (i));
var queue_ids = range(upper_value, lower_value)
queue_ids = queue_ids.filter(function(x){ return x > 0 }); // Filer out anything less than zero
// Current timestamp in seconds
const current_timestamp = parseInt(String(new Date().getTime()).slice(0, -3)); // remove microseconds ("now")
var response = []; // Initialize array
for(id of queue_ids){ // Loop through queries
// Query value
var value = await client.HGETALL('bull:webApi:'+id);
if(Object.keys(value).length !== 0){ // if returned a value
// Grab most of the request (exclude the token & socketId to save space, not used)
var request_data = JSON.parse(value.data)
request_data.token = '';
request_data.socketId = '';
// Grab & calculate desired times
const processedOn = value.processedOn.slice(0, -3); // remove microseconds ("start")
const finishedOn = value.finishedOn.slice(0, -3); // remove microseconds ("done")
const duration = finishedOn - processedOn; // (seconds)
const elapsedSinceStart = current_timestamp - processedOn;
const elapsedSinceFinished = current_timestamp - finishedOn;
// Grab the returnValue
const return_data = value.returnValue;
// ignoring queue keys of: opts, priority, delay, name, timestamp
const object_data = {request_data: request_data, processedOn: processedOn, finishedOn: finishedOn, return_data: return_data, duration: duration, elapsedSinceStart: elapsedSinceStart, elapsedSinceFinished: elapsedSinceFinished }
response.push(object_data);
}
}
res.json(response);
} catch (error) {
console.log('ERROR getting status: ', error.message, new Date());
res.status(500).json({ message: error.message });
} finally {
client.quit();
}
}
It's looping the Redis query, so I wouldn't want to use this for hundreds of keys, but for 10 or even 50 I'm thinking it should work.
For now I've resorted to getting the total number of keys and working backwards:
await client.sendCommand(['DBSIZE']);
In my case it will return a total number slightly higher than the highest key id (~ a handful of status keys), but at least gets close, and then I just filter out any non-responses.
I've looked at ZRANGE a bit, but I can't figure out how to get it to give me the last id. When I have a Redis database (Bull Queue) like this:
If there's a simple Redis command I can run that will return "3", I'd probably use that instead. (since bull:webApi:3 has the highest number)
(In actual use case, this might be 9555 or some high number; I just want to get the highest numbered key that exists.)
For now I'll try using the method I've come up with above.

Trying to run a Cloud Function with LRO

Background
I am working on creating an autonomous Google AutoML end<>end system. I created a cloud function that receives a cloud pub/sub message when training starts. The cloud function uses the operation ID to get the operation status of the training. If the training of the model is complete(operation metadata = true), the function will send the model ID to a deployment function and send a pub/sub message with the modelID for the model to be called on prediction from. I found a solution from SO from this post How to programmatically get model id from google-cloud-automl with node.js client library
Problem
The issue I am coming across is with the cloud function timeout of 10 minutes. I wrote this question on reddit on potential solutions. https://www.reddit.com/r/googlecloud/comments/jqr213/cloud_function_to_compute_engine/ The Compute Engine solution seems not practical for a system mainly written in a cloud function environment. While trying to implement the cron job solution, I thought of the retry feature for cloud functions. It keeps the same event and will retry the function for up to a week. The documentation for retry is https://cloud.google.com/functions/docs/bestpractices/retries How could I include a cancel of the function to keep it retrying until it becomes true and completes the deployment and pub/sub message? My thought is to include the ending of the system in the if else statement, I am just struggling to find documentation of this/ if it would actually work.
Code
const {AutoMlClient} = require('#google-cloud/automl').v1;
// Instantiates a client
const client = new AutoMlClient();
exports.helloPubSub = (event, context) => {
//Imports the Google Cloud AutoML library
const message = event.data
? Buffer.from(event.data, 'base64').toString()
: 'Hello, World';
const model = message;
console.log(model);
const modelpath = message.replace('"','');
const modelID = modelpath.replace('"','');
const message1 = model.replace('projects/170974376642/locations/us-central1/operations/','');
const message2 = message1.replace('"','');
const message3 = message2.replace('"','');
console.log(`Operation ID is: ${message3}`)
getOperationStatus(message3, modelID);
}
// [START automl_vision_classification_deploy_model_node_count]
async function getOperationStatus(opId, message) {
console.log('Starting operation status');
const opped = opId;
const data = message;
const projectId = '170974376642';
const location = 'us-central1';
const operationId = opId;
// Construct request
const request = {
name: `${message}`,
};
console.log('Made it to the response');
const [response] = await client.operationsClient.getOperation(request);
console.log(`Name: ${response.name}`);
console.log(`Operation details:`);
var apple = JSON.stringify(response);
console.log(apple);
console.log('Loop until the model is ready to deploy');
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
}
async function pubSub(id) {
const topicName = 'modelID';
const data = JSON.stringify({foo: `${id}`});
async function publishMessage() {
// Publishes the message as a string, e.g. "Hello, world!" or JSON.stringify(someObject)
const dataBuffer = Buffer.from(data);
try {
const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
console.log(`Message ${messageId} published.`);
} catch (error) {
console.error(`Received error while publishing: ${error.message}`);
process.exitCode = 1;
}
}
publishMessage();
// [END pubsub_publish_with_error_handler]
// [END pubsub_quickstart_publisher]
process.on('unhandledRejection', err => {
console.error(err.message);
process.exitCode = 1;
});
}
async function deployModelWithNodeCount(message) {
const projectId = 'ireda1';
const location = 'us-central1';
const modelId = message;
// Construct request
const request = {
name: client.modelPath(projectId, location, modelId),
imageClassificationModelDeploymentMetadata: {
nodeCount: 1,
},
};
const [operation] = await client.deployModel(request);
// Wait for operation to complete.
const [response] = await operation.promise();
console.log(`Model deployment finished. ${response}`);
}
// [END automl_vision_classification_deploy_model_node_count]
There are several improvements that you can consider for your code. First of all, it is important to understand that Cloud Functions are short-lived. 9 minutes is the maximum, your function will be active. Cloud Functions are not meant for background operations, if you are looking at a solution, which can be executed in the background and requires minimal infrastructure, I would recommend having a look at Cloud Run.
Now lets have a look at some parts of the code and how it can be improved with a different architecture maintaining Cloud Functions and PubSub as the backbone.
Waiting on model deployment
The code you use is:
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
First of all, I would strongly suggest not to use recursion here, because a) this can be handled via a simple loop, b) you are bombarding the service without any time out or back-off policy. The latter might result in either your service crashing or endpoint starting to reject your requests.
To improve your code, you can for example set at least timeout function, like this:
setTimeout(getOperationStatus(opped, data), 1000)
For readability, I would also suggest just to use a loop in the future since you are using async patterns anyways:
status = getOperationStatus(opped, data);
while(!status){
await new Promise(t => setTimeout(t, 1000));
status = getOperationStatus(opped, data);
}
In this case, you need to separate it into two functions - 1) getOperationStatus, which actually just return status, and 2) waitForDeployment, which polls for the status, compares it with the expected result, and decides to a) wait & retry or b) abandon & return
This might make your code better, but does not solve the fundamental problem of the system design. To understand this, let's have a look a splitting responsibility and structuring the system differently. As a side note, the guide here is not meant for a Cloud Function application.
A few explanations:
Activation Function initializes the entire process, it calls the Vision Auto ML to start the deployment. It only gets the ID of the operation and pushes it to the queue
Cloud Scheduler pushes a trigger to PubSub (alternatively it can also call the function as an endpoint) every X minutes/seconds saying that it is time to check on the progress
Polling Function once triggered ask for the next ID to check, queries Cloud AutoML and if finished, acknowledges the message and writes the results, otherwise exits. You need to be careful with the configuration of acknowledgments here. Useful information is here
Polling of the status
The minor thing I have noticed is how you are polling the status. Why don't your just query this URL GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id and get status of done (check here for details)
Conclusion: Cloud Functions are short-lived and must handle only one operation at a time, no waiting. If you want a simple loop for waiting for results, use Cloud Run

converting promiseAll to gradual promises resolve(every 3promises for example) does not work

I have a list of promises and currently I am using promiseAll to resolve them
Here is my code for now:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id, authorization, pageNumber);
if (!urlObject.url) {
// throw error
}
const data = await rp.get({
gzip: true,
headers: {
"Accept-Encoding": "gzip,deflate",
},
json: true,
uri: `${urlObject.url}`,
})
const objects = data.objects.filter((object: any) => object.type === "observed-data" && object.created);
return new Promise((resolve, reject) => {
this._resultsDatastore.bulkInsert(
databaseName,
objects
).then(succ => {
resolve(succ)
}, err => {
reject(err)
})
})
})
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
So as you see here I use promise all and it works:
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
However I noticed it affects the database performance wise so I decided to resolve every 3 of them at a time.
for that I was thinking of different ways like cwait, async pool or wrting my own iterator
but I get confused on how to do that?
For example when I use cwait:
let promiseQueue = new TaskQueue(Promise,3);
const all=new Promise.map(pageFutures, promiseQueue.wrap(()=>{}));
I do not know what to pass inside the wrap so I pass ()=>{} for now plus I get
Property 'map' does not exist on type 'PromiseConstructor'.
So whatever way I can get it working(my own iterator or any library) I am ok with as far as I have a good understanding of it.
I appreciate if anyone can shed light on that and help me to get out of this confusion?
First some remarks:
Indeed, in your current setup, the database may have to process several bulk inserts concurrently. But that concurrency is not caused by using Promise.all. Even if you had left out Promise.all from your code, it would still have that behaviour. That is because the promises were already created, and so the database requests will be executed any way.
Not related to your issue, but don't use the promise constructor antipattern: there is no need to create a promise with new Promise when you already have a promise in your hands: bulkInsert() returns a promise, so return that one.
As your concern is about the database load, I would limit the work initiated by the pageFutures promises to the non-database aspects: they don't have to wait for eachother's resolution, so that code can stay like it was.
Let those promises resolve with what you currently store in objects: the data you want to have inserted. Then concatenate all those arrays together to one big array, and feed that to one database bulkInsert() call.
Here is how that could look:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id,
authorization, pageNumber);
if (!urlObject.url) { // throw error }
const data = await rp.get({
gzip: true,
headers: { "Accept-Encoding": "gzip,deflate" },
json: true,
uri: `${urlObject.url}`,
});
// Return here, don't access the database yet...
return data.objects.filter((object: any) => object.type === "observed-data"
&& object.created);
});
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e);
return []; // in case of error, still return an array
}).flat(); // flatten it, so all data chunks are concatenated in one long array
// Don't create a new Promise with `new`, only to wrap an other promise.
// It is an antipattern. Use the promise returned by `bulkInsert`
return this._resultsDatastore.bulkInsert(databaseName, objects);
This uses .flat() which is rather new. In case you have no support for it, look at the alternatives provided on mdn.
First, you asked a question about a failing solution attempt. That is called X/Y problem.
So in fact, as I understand your question, you want to delay some DB request.
You don't want to delay the resolving of a Promise created by a DB request... Like No! Don't try that! The promise wil resolve when the DB will return a result. It's a bad idea to interfere with that process.
I banged my head a while with the library you tried... But I could not do anything to solve your issue with it. So I came with the idea of just looping the data and setting some timeouts.
I made a runnable demo here: Delaying DB request in small batch
Here is the code. Notice that I simulated some data and a DB request. You will have to adapt it. You also will have to adjust the timeout delay. A full second certainly is too long.
// That part is to simulate some data you would like to save.
// Let's make it a random amount for fun.
let howMuch = Math.ceil(Math.random()*20)
// A fake data array...
let someData = []
for(let i=0; i<howMuch; i++){
someData.push("Data #"+i)
}
console.log("Some feak data")
console.log(someData)
console.log("")
// So we have some data that look real. (lol)
// We want to save it by small group
// And that is to simulate your DB request.
let saveToDB = (data, dataIterator) => {
console.log("Requesting DB...")
return new Promise(function(resolve, reject) {
resolve("Request #"+dataIterator+" complete.");
})
}
// Ok, we have everything. Let's proceed!
let batchSize = 3 // The amount of request to do at once.
let delay = 1000 // The delay between each batch.
// Loop through all the data you have.
for(let i=0;i<someData.length;i++){
if(i%batchSize == 0){
console.log("Splitting in batch...")
// Process a batch on one timeout.
let timeout = setTimeout(() => {
// An empty line to clarify the console.
console.log("")
// Grouping the request by the "batchSize" or less if we're almost done.
for(let j=0;j<batchSize;j++){
// If there still is data to process.
if(i+j < someData.length){
// Your real database request goes here.
saveToDB(someData[i+j], i+j).then(result=>{
console.log(result)
// Do something with the result.
// ...
})
} // END if there is still data.
} // END sending requests for that batch.
},delay*i) // Timeout delay.
} // END splitting in batch.
} // END for each data.

AWS Cognito lambda triggers twice

I'm using an AWS Lambda function (using nodejs).
Once any request from APP to Cognito to signUp users.
Then I have set the Pre sign-up trigger to validate the user's customer and check users custom attribute available in our database or not. If yes then return an error and else insert new records in DB and return the event to Cognito.
TimeoutInfo - 5 min.
It happens sometime in the request, not all the time.
RequestId as different. (it will trigger 3 times sometime and most of the time twice)
Lambda trigger code as below.
users/index.js
const handler = async (event, context) => {
log.info('createUserLambda:start');
// immediately return once the call back is called to avoid
// lambda time out because of any open db connections
context.callbackWaitsForEmptyEventLoop = false;
return await preUserCreate(event);
};
exports.handler = handler;
users/users.js
export const preUserCreate = async (event) => {
log.info('preUserCreate:Start');
let userAttributes = event.request.userAttributes;
const currentDate = moment().utc().format('YYYY-MM-DD HH:mm:ss');
try {
let userParams = {
'docStatus': 'VRF'
};
let docParams = [{
'docNumber': userAttributes['custom:document_number'] ? userAttributes['custom:document_number'] : '',
'createdDate': currentDate
}];
if (docParams.length && docParams[0].docNumber) {
let documentExit = await getDocs(docParams[0].docNumber);
if (documentExit.length) {
log.info('preUserCreate:Error');
throw new Error('Document number already exist.');;
}
}
let documentRs = await insertDocument(docParams);
userParams = {
'did': documentRs[0].id,
'id': event.userName,
'createdDate': currentDate,
'updatedDate': currentDate,
...userParams
};
let userRs = await insertUser([userParams]);
if (docParams.length && docParams[0].docNumber) {
let resultData = await getUserAccountFromAPI(docParams[0].docNumber);
if (resultData) {
let foramattedData = await formattedAccountsData(resultData, userRs[0].id, documentRs[0].id);
await insertUserAccounts(foramattedData);
}
}
log.info('preUserCreate:Success');
event.response = {
'autoConfirmUser': false,
'autoVerifyPhone': false,
'autoVerifyEmail': false
};
return event;
} catch (error) {
log.info('preUserCreate:Error', error);
throw (error);
}
}
This likely happens because of the Cognito-imposed execution timeout of 5 seconds for integration Lambdas - and it cannot be changed. Also note that the maximum amount of times that Cognito will (re-)attempt to call the function is 3 times.
In the Customizing User Pool Workflows with Lambda Triggers section it states that:
Important
Amazon Cognito invokes Lambda functions synchronously. When
called, your Lambda function must respond within 5 seconds. If it does
not, Amazon Cognito retries the call. After 3 unsuccessful attempts,
the function times out. This 5-second timeout value cannot be changed.
Therefore to reduce the execution time it would be worth to consider introducing caching where possible. Including database connections etc.
Do however note that you have little to no control over how often Lambdas are re-used versus re-launched and you will need to keep this in mind in terms of warm-up times.
Any chance you are running your lambda in a VPC? I've seen similar behavior with a Cognito trigger that ran in a VPC when it was cold started. Once the lambda was warm the problem went away
My hunch was that internally Cognito has a very short timeout period for executing the trigger, and if the trigger didn't reply in time, it would automatically retry.
We ended up having to add logic to our trigger to test for this scenario so that we weren't duplicating writes to our database.

Better way to write a simple Node redis loop (using ioredis)?

So, I'm stilling learning the JS/Node way from a long time in other languages.
I have a tiny micro-service that reads from a redis channel, temp stores it in a working channel, does the work, removes it, and moves on. If there is more in the channel it re-runs immediately. If not, it sets a timeout and checks again in 1 second.
It works fine...but timeout polling doesn't seem to be the "correct" way to approach this. And I haven't found much about using BRPOPLPUSH to try to block (vs. RPOPLPUSH) and wait in Node....or other options like that. (Pub/Sub isn't an option here...this is the only listener, and it may not always be listening.)
Here's the short essence of what I'm doing:
var Redis = require('ioredis');
var redis = new Redis();
var redisLoop = function () {
redis.rpoplpush('channel', 'channel-working').then(function (result) {
if (result) {
processJob(result); //do stuff
//delete the item from the working channel, and check for another item
redis.lrem('channel-working', 1, result).then(function (result) { });
redisLoop();
} else {
//no items, wait 1 second and try again
setTimeout(redisLoop, 1000);
}
});
};
redisLoop();
I feel like I'm missing something really obvious. Thanks!
BRPOPLPUSH doesn't block in Node, it blocks in the client. In this instance I think it's exactly what you need to get rid of the polling.
var Redis = require('ioredis');
var redis = new Redis();
var redisLoop = function () {
redis.brpoplpush('channel', 'channel-working', 0).then(function (result) {
// because we are using BRPOPLPUSH, the client promise will not resolve
// until a 'result' becomes available
processJob(result);
// delete the item from the working channel, and check for another item
redis.lrem('channel-working', 1, result).then(redisLoop);
});
};
redisLoop();
Note that redis.lrem is asynchronous, so you should use lrem(...).then(redisLoop) to ensure that your next tick executes only after the item is successfully removed from channel-working.

Resources