MongoDB connections from AWS Lambda

MongoDB connections from AWS Lambda - node.js

I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.
This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.
Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?
Thanks.

AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.
This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.
The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.
Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.

You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be inefficient for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.
To avoid connecting for every request (in cases where the lambda process is re-used):
Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the db promise returned from MongoClient.connect).
In order for the lambda not to hang waiting for you to close the db connection, db.close(), after servicing a request tell it not wait for an empty event loop.
Example:
var db = MongoClient.connect(MongoURI);
module.exports.targetingSpec = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
db.then((db) => {
// use db
});
};
From the documentation about context.callbackWaitsForEmptyEventLoop:
callbackWaitsForEmptyEventLoop
The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.

Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)

I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.
As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.
The Lambda basically does:
Build up or reuse connection
Query one record
Write or update one record
close the connection or leave it open
The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.
The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.
When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.
For a large scale deployment one should run more comprehensive tests.

Yes, there is a way to cache/retain connection to MongoDB and its name is pool connection. and you can use it with lambda functions as well like this:
for more information you can follow these links:
Using Mongoose With AWS Lambda
Optimizing AWS Lambda(a bit out date)
const mongoose = require('mongoose');
let conn = null;
const uri = 'YOUR CONNECTION STRING HERE';
exports.handler = async function(event, context) {
// Make sure to add this so you can re-use `conn` between function calls.
context.callbackWaitsForEmptyEventLoop = false;
const models = [{name: 'User', schema: new mongoose.Schema({ name: String })}]
conn = await createConnection(conn, models)
//e.g.
const doc = await conn.model('User').findOne({})
console.log('doc: ', doc);
};
const createConnection = async (conn,models) => {
// Because `conn` is in the global scope, Lambda may retain it between
// function calls thanks to `callbackWaitsForEmptyEventLoop`.
// This means your Lambda function doesn't have to go through the
// potentially expensive process of connecting to MongoDB every time.
if (conn == null || (conn && [0, 3].some(conn.readyState))) {
conn = await mongoose.createConnection(uri, {
// Buffering means mongoose will queue up operations if it gets
// disconnected from MongoDB and send them when it reconnects.
// With serverless, better to fail fast if not connected.
bufferCommands: false, // Disable mongoose buffering
bufferMaxEntries: 0, // and MongoDB driver buffering
useNewUrlParser: true,
useUnifiedTopology: true,
useCreateIndex: true
})
for (const model of models) {
const { name, schema } = model
conn.model(name, schema)
}
}
return conn
}

Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.

The short answer is yes, you need to create a new connection AND close it before the lambda finishes.
The long answer is actually during my tests you can pass down your DB connections in your handler like so(mysql example as that's what I've got to hand), you can't rely on this having a connection so check my example below, it may be that once your Lambda's haven't been executed for ages it does lose the state from the handler(cold start), I need to do more tests to find out, but I have noticed if a Lambda is getting a lot of traffic using the below example it doesn't create a new connection.
// MySQL.database.js
import * as mysql from 'mysql'
export default mysql.createConnection({
host: 'mysql db instance address',
user: 'MYSQL_USER',
password: 'PASSWORD',
database: 'SOMEDB',
})
Then in your handler import it and pass it down to the lambda that's being executed.
// handler.js
import MySQL from './MySQL.database.js'
const funcHandler = (func) => {
return (event, context, callback) => {
func(event, context, callback, MySQL)
}
}
const handler = {
someHandler: funcHandler(someHandler),
}
export default handler
Now in your Lambda you do...
export default (event, context, callback, MySQL) => {
context.callbackWaitsForEmptyEventLoop = false
// Check if their is a MySQL connection if not, then open one.
// Do ya thing, query away etc etc
callback(null, responder.success())
}
The responder example can he found here. sorry it's ES5 because that's where the question was asked.
Hope this helps!

Official Best Practice for Connecting from AWS Lambda
You should define the client to the MongoDB server outside the AWS
Lambda handler function. Don't define a new MongoClient object each
time you invoke your function. Doing so causes the driver to create a
new database connection with each function call. This can be expensive
and can result in your application exceeding database connection
limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Resources
For more info, check this Docs.

We tested an AWS Lambda that connected every minute to our self managed MongoDB.
The connections were unstable and the Lambda failed
We resolved the issue by wrapping the MongoDB with Nginx reverse proxy stream module:
How to setup MongoDB behind Nginx Reverse Proxy
stream {
server {
listen <your incoming Mongo TCP port>;
proxy_connect_timeout 1s;
proxy_timeout 3s;
proxy_pass stream_mongo_backend;
}
upstream stream_mongo_backend {
server <localhost:your local Mongo TCP port>;
}
}

In addition to saving the connection for reuse, increase the memory allocation for the lambda function. AWS allocates CPU proportionally to the memory allocation and when changing from 128MB to 1.5Gb the connection time dropped from 4s to 0.5s when connecting to mongodb atlas.
Read more here: https://aws.amazon.com/lambda/faqs/

I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2.
I have created a mongo DB on the same AWS EC2 account where my lambda function reside.
Now I can access my mongo from the lambda function with the private IP.

Related

How to fix MongoDB connection Error in AWS Lambda?

Hey, does Anyone face a MongoDB connection error/issue on AWS lambda?
and if you know the solution, please help me, guys!
my code works perfectly in my local system!
But when I try to post, using the API gateway endpoint, MongoDB disconnect! ( session time-out error )

This is because of your code that doesn't take care of lambda execution environment.
As mentioned in the comment above, between two lambda invocation, you are not releasing the connection to the DB. It is like doing two times in a row mongoose.connect()
I suggest to do something like this:
let cachedClient = null;
async function connectDB() {
if (cachedClient) {
return cachedClient;
}
// Connect to our MongoDB database hosted on MongoDB Atlas
const client = await mongoose.connect(MONGO_DB_URL);
cachedClient = client;
return client;
}
This will reuse the same client across multiple lambda invocation (within the same execution environment). When a new lambda execution environment is created, cachedClient will be null and a new client will be created.
Hope it clarifies.

Mongoose connection pooling creates connections to Mongodb every time a new Lambda is invoked

We are using Mongoose, Nodejs, Serverless, and AWS Lambda. For making use of the same connection instead of opening and closing the connection each time whenever required, I have created a connection pool of size 10 (Which seems to be sufficient for our use-case right now).
But the thing is, when I see the Cloudwatch logs for Lambda, it's not the same connection that is being used.
Every time a new Lambda is called, a new connection is created, while the subsequent calls to that Lambda use the same connection that was opened in the first call.
Resulting in an increase in the number of connections open at a time. At MongoDB Atlas, I can see the number of open connections is way much.
Below is the code I am using for creating a connection if there is no cached connection available. In case it is available, the cached one will be used and a new connection will not be created.
let cached_db;
exports.createConnection = async () => {
if(cached_db == null){
return await mongoose.connect(
connection_uri,
{ 'useUnifiedTopology': true ,
'useNewUrlParser': true,
'useFindAndModify': false ,
'useCreateIndex': true,
'socketTimeoutMS': 60000,
'connectTimeoutMS': 60000,
'poolSize': 10
}
).then(conn => {
cached_db = conn;
return conn;
}).catch((err) => {
console.error('Something went wrong', err);
throw err;
});
} else {
console.log("Cached db in use.");
return cached_db;
}
}
Can the same connection be used across Lambdas? Is there a way to do it?

You should define the client to the MongoDB server outside the AWS Lambda handler function. Don't define a new MongoClient object each time you invoke your function. Doing so causes the driver to create a new database connection with each function call. This can be expensive and can result in your application exceeding database connection limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Pool Size
Connection pool size is a cache of database connections maintained so these connections can be reused when future requests to the database are required. Connection pools are used to enhance the performance of executing commands on a database.
Note: maxPoolSize and poolSize are the same, except they relate to whether you are using the useUnifiedTopology: true setting.
If you are using useUnifiedTopology: true, maxPoolSize is the spec-compliant setting to manage how large connection pools can be.
But if you are using useUnifiedTopology: false (or omits it), poolSize is the same thing but from before we had the unified topology.
Note: Each connection consumes about 1MB of RAM.
Value of the Pool Size
The connection pool is on a per-mongod/mongos basis, so when connecting to a 3-member replica there will be three connection pools (one per mongod), each with a maxPoolSize. Additionally, there is a required monitoring connection for each node as well, so you end up with (maxPoolSize+1)*number_of_nodes TCP connections.
In my opinion, if you don't care about CPU and RAM, you should use all available connections (why not if we already have them, right?).
For example: You have Atlas free cluster with 3 replica sets, that supports maximum number of 500 connections, and you have only one application that connects to it, give all connections to that one application. In order to set the value of poolSize, you can use above calculation of connections:
poolSize = (maximum_connections/number_of_nodes) - 1
poolSize = (500/3) - 1
poolSize = 165
If you would have 2 applications that will connect to that same cluster, give each application half of connections.
If you have limited RAM memory, check how much you can spear and calculate poolSize based on that (as I said in the note, you can assume that one connection will consume about 1MB of RAM).
Resources
For more info, check this official MongoDB Docs.
For connection pool, check this and this.

I found from this blog that Lambda may use same connection if restore the same snapshot and creates new connection if new snapshot generation.
So Lambda can't give assurance that to use same connection if we use outside the handle function.
So in my opinion best approach to optimise number of connection to Mongodb is to close connection before lambda complete so your other service can use free connection.
Use below method to close connection after database interaction finishes.
createConnection.close()

Where should I define the DocumentClient when using ExpressJs?

I couldn't find an answer to something I wonder.
With Mysql in Expressjs, when I declared the MySQL connection in a post handling function, it would create a new connection every time my ExpressJs server got a request. Then, the server would throw an error, when the maximum number of connections were established between the processing server and the database server.
I was wondering if there is the same problem with DynamoDB.DocumentClient()? What is the best way of doing operations with DynamoDB?
Should I have the DocumentClient global as below, or is it okay if I leave it in the post/get functions?
...
// DocumentClient is out of the post function below
const docClient = new AWS.DynamoDB.DocumentClient();
router.post('/loglogbaby', function(req, res){
var params = { ... };
docClient.get(params, function(err,data){...});
req.json({response:"nonobaby"});
}
...

Well it doesn't matter because DynamoDB works with HTTP requests in the back not with connections and pooling etc.. DocumentClient creates a HTTP request at the end. Its a library to make low level api easier. (See here).
So basically you create a programming level object every time you create it. Not new connections. And objects are cheap to create.

Since AWS DynamoDB is already hosted service, there is no problem with where you create DocumentClient object.
Its good practice if you create a global object for it.
You can find here a comparison between MySQL and DyanmoDB.

Is using PostgreSQL on stateless FaaS like AWS lambda a good idea?

I'd like to use Postgresql as a database on my AWS lambda functions but I'm worried about performance.
I'm worried that Lambdas are stateless and only exist in the time they're executing so I imagine every time the Lambda is triggered it'll try to initiate a brand new PG connection.
I'm not sure if this decreases performance or causes issues with stale connections somehow. Anyone know more about this?
I know DynamoDB is more in line with Lambda but I really need a relational database but at the same time Lambda's scalability.

You can make use of the container execution model of AWS lambda. When a lambda is invoked, AWS spins up a container to run the code inside the handler function. So if you define the PG connection outside the handler function it will be shared among the invocations of Lambda functions. You can find that in the above link.
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. You can add logic in your code to check if a connection already exists before creating one.
const pg = require('pg');
const client = new pg.Client(<connection_string>);
exports.handler = (event, context, cb) => {
client.query('SELECT * FROM users WHERE ', (err, users) => {
// Do stuff with users
cb(null); // Finish the function cleanly
});
};
Refer this blog post.
But there is a caveat.
When you write your Lambda function code, do not assume that AWS Lambda always reuses the container because AWS Lambda may choose not to reuse the container. Depending on various other factors, AWS Lambda may simply create a new container instead of reusing an existing container.
Additionally you can create a scheduled job to warm up lambda function. (runs in every 5mins)

Connection to Mongodb-Native-Driver in express.js

I am using mongodb-native-driver in express.js app. I have around 6 collections in the database, so I have created 6 js files with each having a collection as a javascript object (e.g function collection(){}) and the prototypes functions handling all the manipulation on those collections. I thought this would be a good architecture.
But the problem I am having is how to connect to the database? Should I create a connection in each of this files and use them? I think that would be an overkill as the connect in mongodb-native-driver creates a pool of connections and having several of them would not be justified.
So how do I create a single connection pool and use it in all the collections.js files? I want to have the connection like its implemented in mongoose. Let me know if any of my thought process in architecture of the app is wrong.
Using Mongoose would solve these problems, but I have read in several places thats it slower than native-driver and also I would prefer a schema-less models.
Edit: I created a module out of models. Each collection was in a file and it took the database as an argument. Now in the index.js file I called the database connection and kept a variable db after I got the database from the connection. (I used the auto-reconnect feature to make sure that the connection wasn't lost). In the same index.js file I exported each of the collections like this
exports.model1 = require('./model1').(db)
exprorts.model2 = require('./model2').(db)
This ensured that the database part was handled in just one module and the app would just call function that each model.js file exported like save(), fincdbyid() etc (whatever you do in the function is upto you to implement).

how to connect to the database?
In order to connect using the MongoDB native driver you need to do something like the following:
var util = require('util');
var mongodb = require('mongodb');
var client = mongodb.MongoClient;
var auth = {
user: 'username',
pass: 'password',
host: 'hostname',
port: 1337,
name: 'databaseName'
};
var uri = util.format('mongodb://%s:%s#%s:%d/%s',
auth.user, auth.pass, auth.host, auth.port, auth.name);
/** Connect to the Mongo database at the URI using the client */
client.connect(uri, { auto_reconnect: true }, function (err, database) {
if (err) throw err;
else if (!database) console.log('Unknown error connecting to database');
else {
console.log('Connected to MongoDB database server at:');
console.log('\n\t%s\n', uri);
// Create or access collections, etc here using the database object
}
});
A basic connection is setup like this. This is all I can give you going on just the basic description of what you want. Post up some code you've got so far to get more specific help.
Should I create a connection in each of this files and use them?
No.
So how do I create a single connection pool and use it in all the collections.js files?
You can create a single file with code like the above, lets call it dbmanager.js connecting to the database. Export functions like createUser, deleteUser, etc. which operate on your database, then export functionality like so:
module.exports = {
createUser: function () { ; },
deleteUser: function () { ; }
};
which you could then require from another file like so:
var dbman = require('./dbmanager');
dbman.createUser(userData); // using connection established in `dbmanager.js`
EDIT: Because we're dealing with JavaScript and a single thread, the native driver indeed automatically handles connection pooling for you. You can look for this in the StackOverflow links below for more confirmation of this. The OP does state this in the question as well. This means that client.connect should be called only once by an instance of your server. After the database object is successfully retrieved from a call to client.connect, that database object should be reused throughout the entire instance of your app. This is easily accomplished by using the module pattern that Node.JS provides.
My suggestion is to create a module or set of modules which serves as a single point of contact for interacting with the database. In my apps I usually have a single module which depends on the native driver, calling require('mongodb'). All other modules in my app will not directly access the database, but instead all manipulations must be coordinated by this database module.
This encapsulates all of the code dealing with the native driver into a single module or set of modules. The OP seems to think there is a problem with the simple code example I've posted, describing a problem with a "single large closure" in my example. This is all pretty basic stuff, so I'm adding clarification as to the basic architecture at work here, but I still do not feel the need to change any code.
The OP also seems to think that multiple connections could possibly be made here. This is not possible with this setup. If you created a module like I suggest above then the first time require('./dbmanager') is called it will execute the code in the file dbmanager.js and return the module.exports object. The exports object is cached and is also returned on each subsequent call to require('./dbmanager'), however, the code in dbmanager.js will only be executed the first require.
If you don't want to create a module like this then the other option would be to export only the database passed to the callback for client.connect and use it directly in different places throughout your app. I recommend against this however, regardless of the OPs concerns.
Similar, possibly duplicate Stackoverflow questions, among others:
How to manage mongodb connections in nodejs webapp
Node.JS and MongoDB, reusing the DB object
Node.JS - What is the right way to deal with MongoDB connections

As accepted answer says - you should create only one connection for all incoming requests and reuse it, but answer is missing solution, that will create and cache connection. I wrote express middleware to achieve this - express-mongo-db. At first sight this task is trivial, and most people use this kind of code:
var db;
function createConnection(req, res, next) {
if (db) { req.db = db; next(); }
client.connect(uri, { auto_reconnect: true }, function (err, database) {
req.db = db = databse;
next();
});
}
app.use(createConnection);
But this code lead you to connection-leak, when multiple request arrives at the same time, and db is undefined. express-mongo-db solving this by holding incoming clients and calling connect only once, when module is required (not when first request arrives).
Hope you find it useful.

I just thought I would add in my own method of MongoDB connection for others interested or having problems with different methods
This method assumes you don't need authentication(I use this on localhost)
Authentication is still easy to implement
var MongoClient = require('mongodb').MongoClient;
var Server = require('mongodb').Server;
var client = new MongoClient(new Server('localhost',27017,{
socketOptions: {connectTimeoutMS: 500},
poolSize:5,
auto_reconnect:true
}, {
numberOfRetries:3,
retryMilliseconds: 500
}));
client.open(function(err, client) {
if(err) {
console.log("Connection Failed Via Client Object.");
} else {
var db = client.db("theDbName");
if(db) {
console.log("Connected Via Client Object . . .");
db.logout(function(err,result) {
if(!err) {
console.log("Logged out successfully");
}
client.close();
console.log("Connection closed");
});
}
}
});
Credit goes to Brad Davley which goes over this method in his book (page 231-232)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

MongoDB connections from AWS Lambda - node.js

Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)

Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.

I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2. I have created a mongo DB on the same AWS EC2 account where my lambda function reside. Now I can access my mongo from the lambda function with the private IP.

Related

How to fix MongoDB connection Error in AWS Lambda?

Mongoose connection pooling creates connections to Mongodb every time a new Lambda is invoked

Where should I define the DocumentClient when using ExpressJs?

Is using PostgreSQL on stateless FaaS like AWS lambda a good idea?

Connection to Mongodb-Native-Driver in express.js

Categories

Resources