Why does my Lambda function time out even though the API Gateway callback has already been called? - node.js

I have an AWS API Gateway method that proxies requests through to AWS Lambda. However, it errors after three seconds with the following in the logs:
Endpoint response body before transformations: {"errorMessage":"2017-09-05T16:30:49.987Z 922186c0-9257-11e7-9db3-51921d5597a2 Task timed out after 3.00 seconds"}
Thus, I went on to check my Node 6.10 AWS Lambda function to see why it was timing out. I added logging statements before and after every function call. Surprisingly, it did everything it's supposed to do: called the API Gateway callback, and run a query against the database after that. All that takes 0.6s, and as far as I'm aware there's no other code left to run. Nevertheless, it appears to keep on running for the rest of the three seconds and then timing out. (This is, I think, because I'm leaving a connection to the database open.)
The logs statements I placed before and after the callback call indicate that the that call is executed in under half a second. Yet, that response doesn't seem to make it to API Gateway, whereas the error after three seconds does.
What could be potential reasons for this, and how can I debug it?

By default calling the callback() function in a NodeJS Lambda function does not end the function execution. It will continue running until the event loop is empty. A common issue with NodeJS Lambda functions continuing to run after callback is called occurs when you are holding on to open database connections. You haven't posted any code, so I can't give specific recommendations, but you would need to determine if you are leaving database connections open in your code or something similar.
Alternatively, you can change the behavior such that the execution ends as soon as the callback function is called by setting callbackWaitsForEmptyEventLoop = false on the context object.

Your API gateway has a fixed timeout of 29 seconds. Most queries do complete within this timeframe.
Increase your lambda execution timeout to anywhere between 30 Sec to 3 Min 00 Sec.
Use context.succeed() instead of callback.
This worked for me.
const mysql = require('mysql');
const connection = mysql.createConnection({
host : 'your_mysql_host',
user : 'your_mysql_user',
password : 'your_mysql_password',
database : 'your_mysql_db'
});
exports.handler = (event, context) => {
var userId = event.params.querystring.userid;
const sql = 'SELECT * FROM users where USER_ID=' + userId;
var response = {
"statusCode": 200,
"body": "body_text_goes_here"
}
if(userId){
connection.query(sql, function (error, results, fields) {
if (error) {
context.succeed(error);
} else {
response.body = results;
context.succeed(response);
}
});
}
}

I had same issue and I've updated the timeout in my code but no luck finally increased lambda execution time that fixed my problem.
How to increase an AWS Lambda timeout?

Related

the right way to return a response from firebase a Google Cloud Pub/Sub Triggers

I am trying to implement a firebase cron function from the functions-cron example
var functions = require('firebase-functions');
exports.hourly_job =
functions.pubsub.topic('hourly-tick').onPublish((event) => {
console.log("This job is ran every hour!")
});
and I just wants to know the right way to return 200 response as it s requested because I have the error log
Function returned undefined, expected Promise or value
since I dont have access to response object as in the HTTP triggers I just want to know if returning the 200 int value is sufficient ?
the doc states the following
Cron retries
If a cron job's request handler returns a status code that is not in the range 200–299 (inclusive) App Engine considers the job to have failed. By default, failed jobs are not retried. You can cause failed jobs to be retried by including a retry_parameters block in your configuration file.
Pub/Sub triggers don't have a response. They simply receive messages as they appear. Only HTTPS triggers require a response sent to the client.
If you want to prevent that warning message, simply return null at the end of your function as you show it now. The actual return value is meaningless. If you're doing async work in the function, you should instead return a promise that's resolved when the work is complete.

Is using PostgreSQL on stateless FaaS like AWS lambda a good idea?

I'd like to use Postgresql as a database on my AWS lambda functions but I'm worried about performance.
I'm worried that Lambdas are stateless and only exist in the time they're executing so I imagine every time the Lambda is triggered it'll try to initiate a brand new PG connection.
I'm not sure if this decreases performance or causes issues with stale connections somehow. Anyone know more about this?
I know DynamoDB is more in line with Lambda but I really need a relational database but at the same time Lambda's scalability.
You can make use of the container execution model of AWS lambda. When a lambda is invoked, AWS spins up a container to run the code inside the handler function. So if you define the PG connection outside the handler function it will be shared among the invocations of Lambda functions. You can find that in the above link.
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. You can add logic in your code to check if a connection already exists before creating one.
const pg = require('pg');
const client = new pg.Client(<connection_string>);
exports.handler = (event, context, cb) => {
client.query('SELECT * FROM users WHERE ', (err, users) => {
// Do stuff with users
cb(null); // Finish the function cleanly
});
};
Refer this blog post.
But there is a caveat.
When you write your Lambda function code, do not assume that AWS Lambda always reuses the container because AWS Lambda may choose not to reuse the container. Depending on various other factors, AWS Lambda may simply create a new container instead of reusing an existing container.
Additionally you can create a scheduled job to warm up lambda function. (runs in every 5mins)

NodeJs Execute function mutiple times without delay

i will share the code directly
app.get('/ListBooks', function (req, res) {
console.log("Function called");
//internally calls another URL and sends its response to browser
request({
url: 'someURLinRESTServer',
method: 'POST',
json: MyJsonData
}, function (error, response, body) {
if (error) {
console.log("/Call Failed ->" + error);
res.status(200).send('Failed');
} else {
console.log("/Call got Response");
console.log(response.statusCode, body);
res.send(body); res.end();
}
})
now when the browser generates a request on http://localhost/ListBooks
my node console shows the first message "Function called" and waits for internal REST URL Response
the real problem occurs only when the REST SERVER is down
then if i try to call http://localhost/ListBooks from another browser tab the Node server console doesnt show any changes and only after the repsonse of previous function REST CALL call it displays console message of second function call on app.get('/ListBooks'
i thought node js makes async functions bt here i dnt want functions to wait likes this for multiple instance calls
or is it just a delay in printing message and each function call executes separately .Plz clarify ...
If this is only occurring when the REST server is down (as your comment indicates), then that's just a function of how long your calls to request() take to fail. And, each separate call to request() goes through its own cycle of trying to connect and then eventually timing out. If both are timing out, then you will issue request1, then request2, then some timeout amount of time will pass and request1 will fail and then request2 will fail shortly after it. This has nothing to do with how express handles multiple requests and everything to do with how the calls to your REST server behave.
You can set the timeout option for request() if you want to shorten how long it will wait for a response, but you do need to make sure you don't shorten it so much that a busy REST server that just takes a little while to actually respond gets timed out.
or is it just a delay in printing message and each function call
executes separately
Each call is acting completely separately. There is no serialization of these responses by node.js or by Express. The appearance of serialization is just because they both take the same amount of time to fail with a timeout so they will fail one after the other.

Inform browser clients when Lambda function is done using Amazon SQS

In my scenario I'm trying to implement server less backend that runs pretty long time consuming calculations. This calculations is managed by Lambda that refers to some external API.
In oder to request this I'm using Amazon API Gateway which has 10 seconds execution limitation. However Lambda runs about 100 seconds.
To avoid this limitation I'm using 2nd Lambda function to execute this time consuming calculation & report that calculation is started.
I looks very similar to this:
var AWS = require('aws-sdk');
var colors = require('colors');
var functionName = 'really-long'
var lambda = new AWS.Lambda({apiVersion: '2015-03-31'});
var params = {
FunctionName: functionName,
InvocationType: 'Event'
};
lambda.invoke(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(functionName.green + " was successfully executed and returned:\n" + JSON.stringify(data, null, 2).gray); // successful response
});
console.log("All done!".rainbow);
This code is executed over AWS API Gateway by thousands of clients browsers independently.
To inform each particular client that his Lambda function execution was successfully done I'v planed to use AWS SQS (because of long polling and some other useful functionalities out of the box).
So my question is:
How can I determine on the client which message in the queue belongs to this particular client? Or should I iterate over all queue to find proper messages by some request ID parameter in every client browser? I guess that this method will be inefficient when 1000 client will be simultaneously waiting for their results.
I do understand that I can write results to DynamoDB for example and periodically poll DB for the result via some homemade API. But is there any elegant solution to notify browser based client about completion of execution of time consuming Lambda function based on some Amazon PaaS solution?
Honestly the DynamoDB route is probably your best bet. You can generate a uuid in the first Lambda function executed by the API Gateway. Pass that uuid to the long-running Lambda function. Before the second function completes have it write to a DynamoDB table with two columns: uuid and result.
The API Gateway responds to the client with the uuid it generated. The client then long-polls with a getItem request against your DynamoDB table (either via the aws-sdk directly or through another API Gateway request). Once it responds successfully, remove said item from the DynamoDB table.
The context object of the lambda function will have the AWS request ID returned to the client that invoked the Lambda function.
So, client will have the lambda request ID of Lambda 1, Lambda 1 Context object will have the same request Id (irrespective of lambda retries, request ID remains same). So pass this request ID to Lambda 2 there by actual request ID is chained till the end.
Polling using the request id from client is fairly easy on any data store like dynamodb.

MongoDB connections from AWS Lambda

I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.
This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.
Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?
Thanks.
AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.
This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.
The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.
Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.
You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be inefficient for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.
To avoid connecting for every request (in cases where the lambda process is re-used):
Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the db promise returned from MongoClient.connect).
In order for the lambda not to hang waiting for you to close the db connection, db.close(), after servicing a request tell it not wait for an empty event loop.
Example:
var db = MongoClient.connect(MongoURI);
module.exports.targetingSpec = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
db.then((db) => {
// use db
});
};
From the documentation about context.callbackWaitsForEmptyEventLoop:
callbackWaitsForEmptyEventLoop
The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.
Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)
I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.
As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.
The Lambda basically does:
Build up or reuse connection
Query one record
Write or update one record
close the connection or leave it open
The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.
The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.
When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.
For a large scale deployment one should run more comprehensive tests.
Yes, there is a way to cache/retain connection to MongoDB and its name is pool connection. and you can use it with lambda functions as well like this:
for more information you can follow these links:
Using Mongoose With AWS Lambda
Optimizing AWS Lambda(a bit out date)
const mongoose = require('mongoose');
let conn = null;
const uri = 'YOUR CONNECTION STRING HERE';
exports.handler = async function(event, context) {
// Make sure to add this so you can re-use `conn` between function calls.
context.callbackWaitsForEmptyEventLoop = false;
const models = [{name: 'User', schema: new mongoose.Schema({ name: String })}]
conn = await createConnection(conn, models)
//e.g.
const doc = await conn.model('User').findOne({})
console.log('doc: ', doc);
};
const createConnection = async (conn,models) => {
// Because `conn` is in the global scope, Lambda may retain it between
// function calls thanks to `callbackWaitsForEmptyEventLoop`.
// This means your Lambda function doesn't have to go through the
// potentially expensive process of connecting to MongoDB every time.
if (conn == null || (conn && [0, 3].some(conn.readyState))) {
conn = await mongoose.createConnection(uri, {
// Buffering means mongoose will queue up operations if it gets
// disconnected from MongoDB and send them when it reconnects.
// With serverless, better to fail fast if not connected.
bufferCommands: false, // Disable mongoose buffering
bufferMaxEntries: 0, // and MongoDB driver buffering
useNewUrlParser: true,
useUnifiedTopology: true,
useCreateIndex: true
})
for (const model of models) {
const { name, schema } = model
conn.model(name, schema)
}
}
return conn
}
Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.
The short answer is yes, you need to create a new connection AND close it before the lambda finishes.
The long answer is actually during my tests you can pass down your DB connections in your handler like so(mysql example as that's what I've got to hand), you can't rely on this having a connection so check my example below, it may be that once your Lambda's haven't been executed for ages it does lose the state from the handler(cold start), I need to do more tests to find out, but I have noticed if a Lambda is getting a lot of traffic using the below example it doesn't create a new connection.
// MySQL.database.js
import * as mysql from 'mysql'
export default mysql.createConnection({
host: 'mysql db instance address',
user: 'MYSQL_USER',
password: 'PASSWORD',
database: 'SOMEDB',
})
Then in your handler import it and pass it down to the lambda that's being executed.
// handler.js
import MySQL from './MySQL.database.js'
const funcHandler = (func) => {
return (event, context, callback) => {
func(event, context, callback, MySQL)
}
}
const handler = {
someHandler: funcHandler(someHandler),
}
export default handler
Now in your Lambda you do...
export default (event, context, callback, MySQL) => {
context.callbackWaitsForEmptyEventLoop = false
// Check if their is a MySQL connection if not, then open one.
// Do ya thing, query away etc etc
callback(null, responder.success())
}
The responder example can he found here. sorry it's ES5 because that's where the question was asked.
Hope this helps!
Official Best Practice for Connecting from AWS Lambda
You should define the client to the MongoDB server outside the AWS
Lambda handler function. Don't define a new MongoClient object each
time you invoke your function. Doing so causes the driver to create a
new database connection with each function call. This can be expensive
and can result in your application exceeding database connection
limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Resources
For more info, check this Docs.
We tested an AWS Lambda that connected every minute to our self managed MongoDB.
The connections were unstable and the Lambda failed
We resolved the issue by wrapping the MongoDB with Nginx reverse proxy stream module:
How to setup MongoDB behind Nginx Reverse Proxy
stream {
server {
listen <your incoming Mongo TCP port>;
proxy_connect_timeout 1s;
proxy_timeout 3s;
proxy_pass stream_mongo_backend;
}
upstream stream_mongo_backend {
server <localhost:your local Mongo TCP port>;
}
}
In addition to saving the connection for reuse, increase the memory allocation for the lambda function. AWS allocates CPU proportionally to the memory allocation and when changing from 128MB to 1.5Gb the connection time dropped from 4s to 0.5s when connecting to mongodb atlas.
Read more here: https://aws.amazon.com/lambda/faqs/
I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2.
I have created a mongo DB on the same AWS EC2 account where my lambda function reside.
Now I can access my mongo from the lambda function with the private IP.

Resources