Inform browser clients when Lambda function is done using Amazon SQS - node.js

In my scenario I'm trying to implement server less backend that runs pretty long time consuming calculations. This calculations is managed by Lambda that refers to some external API.
In oder to request this I'm using Amazon API Gateway which has 10 seconds execution limitation. However Lambda runs about 100 seconds.
To avoid this limitation I'm using 2nd Lambda function to execute this time consuming calculation & report that calculation is started.
I looks very similar to this:
var AWS = require('aws-sdk');
var colors = require('colors');
var functionName = 'really-long'
var lambda = new AWS.Lambda({apiVersion: '2015-03-31'});
var params = {
FunctionName: functionName,
InvocationType: 'Event'
};
lambda.invoke(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(functionName.green + " was successfully executed and returned:\n" + JSON.stringify(data, null, 2).gray); // successful response
});
console.log("All done!".rainbow);
This code is executed over AWS API Gateway by thousands of clients browsers independently.
To inform each particular client that his Lambda function execution was successfully done I'v planed to use AWS SQS (because of long polling and some other useful functionalities out of the box).
So my question is:
How can I determine on the client which message in the queue belongs to this particular client? Or should I iterate over all queue to find proper messages by some request ID parameter in every client browser? I guess that this method will be inefficient when 1000 client will be simultaneously waiting for their results.
I do understand that I can write results to DynamoDB for example and periodically poll DB for the result via some homemade API. But is there any elegant solution to notify browser based client about completion of execution of time consuming Lambda function based on some Amazon PaaS solution?

Honestly the DynamoDB route is probably your best bet. You can generate a uuid in the first Lambda function executed by the API Gateway. Pass that uuid to the long-running Lambda function. Before the second function completes have it write to a DynamoDB table with two columns: uuid and result.
The API Gateway responds to the client with the uuid it generated. The client then long-polls with a getItem request against your DynamoDB table (either via the aws-sdk directly or through another API Gateway request). Once it responds successfully, remove said item from the DynamoDB table.

The context object of the lambda function will have the AWS request ID returned to the client that invoked the Lambda function.
So, client will have the lambda request ID of Lambda 1, Lambda 1 Context object will have the same request Id (irrespective of lambda retries, request ID remains same). So pass this request ID to Lambda 2 there by actual request ID is chained till the end.
Polling using the request id from client is fairly easy on any data store like dynamodb.

Related

How to process data from AWS SQS?

I have a problem with understanding of working SQS from AWS in NodeJS. I'm creating a simple endpoint which receive data sended to my server and add to SQS queue:
export const receiveMessageFromSDK = async (req: Request, res: Response) => {
const payload = req.body;
try {
await sqs.sendMessage({
QueueUrl: process.env.SQS_QUEUE_URL,
MessageBody: payload
}).promise();
} catch (error) {
//something else
}
}
and ok, it working and MessageBody is adding to my SQS queue. And right now I have a problem with processing of the data from this queue... How to do this?
In Google Cloud I'm creating simple request queue with sending (in body payload) also url of endpoint which should process data from queue, then gcloud send to this endpoint this body and my server starting business logic on this data. But how to do this in SQS? How to receive data from this queue and start processing data on my side?
I'm thinking about QueueUrl param but in docs is written that this value should be an url from aws console like https://sqs.eu-central-1.amazonaws.com... so I have no idea how to process this data from queue on my side.
So..., can anybody help me?
Thanks, in advice for any help!
Should I call launch this function on cron or AWS can eq. send request on providing from me endpoint with this data etc
SQS is for pulling data, which means that you have to have a cron job working (or any equivalent system in your app) that iteratively pulls your queue for messages using receiveMessage, , e.g. every 10 seconds.
If you want push type messaging system, then you have to use SNS instead of SQS. SNS can push messages to your HTTP/HTTPS endpoint automatically if your application exposes any such endpoint for the messages.

Firebase cloud functions - what happens with multiple HTTP triggers at once

I have a firebase cloud function that is an endpoint for an external API, and it handles a POST request.
This external API POSTS data to my cloud function endpoint at random intervals (this cloud function gets pinged with a POST request based on when a result is returned from this external API, and there can be multiple at once and its unpredictable)
exports.handleResults = functions.https.onRequest((req, res) => {
if (req.method === 'POST') {
// run code here that handles the POST payload
}
})
What happens when there is more than one POST request that come in at the same time?
Is there a queue? Does it finish the first request before moving on to the next?
Or if another request comes in while the function is running, does it block/ignore the request until the function is done?
Cloud Functions will automatically scale up the server instances running your functions when it determines that more capacity is needed. Those instances will run your function concurrently. The instances will be scaled down when they are no longer needed. The exact behavior is not documented - it should be considered an implementation detail that may change over time.
To learn more about this, watch my video about Cloud Functions scaling and isolation.

Why does my Lambda function time out even though the API Gateway callback has already been called?

I have an AWS API Gateway method that proxies requests through to AWS Lambda. However, it errors after three seconds with the following in the logs:
Endpoint response body before transformations: {"errorMessage":"2017-09-05T16:30:49.987Z 922186c0-9257-11e7-9db3-51921d5597a2 Task timed out after 3.00 seconds"}
Thus, I went on to check my Node 6.10 AWS Lambda function to see why it was timing out. I added logging statements before and after every function call. Surprisingly, it did everything it's supposed to do: called the API Gateway callback, and run a query against the database after that. All that takes 0.6s, and as far as I'm aware there's no other code left to run. Nevertheless, it appears to keep on running for the rest of the three seconds and then timing out. (This is, I think, because I'm leaving a connection to the database open.)
The logs statements I placed before and after the callback call indicate that the that call is executed in under half a second. Yet, that response doesn't seem to make it to API Gateway, whereas the error after three seconds does.
What could be potential reasons for this, and how can I debug it?
By default calling the callback() function in a NodeJS Lambda function does not end the function execution. It will continue running until the event loop is empty. A common issue with NodeJS Lambda functions continuing to run after callback is called occurs when you are holding on to open database connections. You haven't posted any code, so I can't give specific recommendations, but you would need to determine if you are leaving database connections open in your code or something similar.
Alternatively, you can change the behavior such that the execution ends as soon as the callback function is called by setting callbackWaitsForEmptyEventLoop = false on the context object.
Your API gateway has a fixed timeout of 29 seconds. Most queries do complete within this timeframe.
Increase your lambda execution timeout to anywhere between 30 Sec to 3 Min 00 Sec.
Use context.succeed() instead of callback.
This worked for me.
const mysql = require('mysql');
const connection = mysql.createConnection({
host : 'your_mysql_host',
user : 'your_mysql_user',
password : 'your_mysql_password',
database : 'your_mysql_db'
});
exports.handler = (event, context) => {
var userId = event.params.querystring.userid;
const sql = 'SELECT * FROM users where USER_ID=' + userId;
var response = {
"statusCode": 200,
"body": "body_text_goes_here"
}
if(userId){
connection.query(sql, function (error, results, fields) {
if (error) {
context.succeed(error);
} else {
response.body = results;
context.succeed(response);
}
});
}
}
I had same issue and I've updated the timeout in my code but no luck finally increased lambda execution time that fixed my problem.
How to increase an AWS Lambda timeout?

Is using PostgreSQL on stateless FaaS like AWS lambda a good idea?

I'd like to use Postgresql as a database on my AWS lambda functions but I'm worried about performance.
I'm worried that Lambdas are stateless and only exist in the time they're executing so I imagine every time the Lambda is triggered it'll try to initiate a brand new PG connection.
I'm not sure if this decreases performance or causes issues with stale connections somehow. Anyone know more about this?
I know DynamoDB is more in line with Lambda but I really need a relational database but at the same time Lambda's scalability.
You can make use of the container execution model of AWS lambda. When a lambda is invoked, AWS spins up a container to run the code inside the handler function. So if you define the PG connection outside the handler function it will be shared among the invocations of Lambda functions. You can find that in the above link.
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. You can add logic in your code to check if a connection already exists before creating one.
const pg = require('pg');
const client = new pg.Client(<connection_string>);
exports.handler = (event, context, cb) => {
client.query('SELECT * FROM users WHERE ', (err, users) => {
// Do stuff with users
cb(null); // Finish the function cleanly
});
};
Refer this blog post.
But there is a caveat.
When you write your Lambda function code, do not assume that AWS Lambda always reuses the container because AWS Lambda may choose not to reuse the container. Depending on various other factors, AWS Lambda may simply create a new container instead of reusing an existing container.
Additionally you can create a scheduled job to warm up lambda function. (runs in every 5mins)

aws lambda execution after callback guaranteed?

My node4 lambda function called via API GW makes a sequence of slow API calls.
In order to not let users wait until everything completes, I'm planning to have my code look like this:
function(event, context, callback) {
...
// Return users API GW call now
callback(null, data);
// Do the heavy lifting afterwards.
longApiCall().then(otherLongApiCalls)
}
But now I read in the AWS docs:
"the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller"
Does that mean the API GW returns the response data before or after the longApiCalls complete?
If after, is there a suggested way for how to "return early" before everything is finished?
In your current configuration API Gateway will wait until the Lambda function has finished executing before sending a response. Your options are:
Change the API Gateway endpoint's integration type to AWS Service and have API Gateway invoke the Lambda function asynchronously. This is documented here.
Have the Lambda function that API Gateway invokes do nothing but invoke another Lambda function asynchronously and then return.
Have API Gateway, or a Lambda function called by API Gateway, send a message to an SNS topic. Then have the SNS topic trigger a Lambda function that handles the long API calls. This would decouple your microservices a bit.
Have API Gateway, or a Lambda function called by API Gateway, trigger an AWS Step Function that is configured to handle the long API calls via one or multiple Lambda functions. I would suggest this approach if the long API calls run the risk of running over a single Lambda function's execution time limit of 5 minutes.
Option 5. Let your lambda function queue a message to SQS and poll the queue from another lambda or ec2 or wherer you want to do the heavy lifting.

Resources