I've created a Lambda that connects inconsistently to a PostgreSQL database in RDS.
When it works, it connects and completes the query in a small amount of time. Less than 100ms, usually only 20ms. I can execute this lambda ten or twenty times and it will work fine. Eventually though, it gets stuck trying to connect to the database.
The connection will time out eventually. I have set the timeout to be thirty seconds. Extending it doesn't seem to make a difference.
When it does get stuck connecting, I can sometimes try it half an hour later or so and it will be fine. Sometimes it will still fail, even after a day.
It usually works for a while when I upload a new zip. Sometimes, when it doesn't connect to the database immediately after the upload of a new zip, I can get it to work by editing the lambda in the AWS Lambda console and clicking Deploy from there.
Has anyone seen this behaviour before? What am I doing wrong?
I'm new to node and Lambdas. I created a lambda in C# as well and get the same behaviour.
The database instance is correctly provisioned and has only a small number of connections.
Here is a sample of the code. I don't use the dbConfig as I have stored them as environment variables. The zip uploaded to Lambda contains this file and the node_modules folder for 'pg' and its dependancies.
'use strict';
var pg = require('pg');
exports.handler = async (event, context, callback) => {
var dbConfig = {
username: '<username>',
password: '<password>',
database: '<database>',
host: '<database_endpoint>',
};
var client = new pg.Client(dbConfig);
console.log('Waiting to connect');
await client.connect();
console.log('Connected');
console.log('Querying the database');
var res = await client.query('SELECT COUNT(*) FROM books;'); // 'books' is a table in the database
console.log(res);
await client.end();
console.log('Connection closed');
};
Using pg.Pool instead of pg.Client hasn't helped.
Moving the connection code outside of the handler only seems to work with callbacks. Callbacks don't seem to work well with Lambdas anymore.
Nikhil B's comment solved this. I removed a subnet with a bad route table and the connection stabilised.
No more timeouts waiting to connect to the db.
Related
Hey, does Anyone face a MongoDB connection error/issue on AWS lambda?
and if you know the solution, please help me, guys!
my code works perfectly in my local system!
But when I try to post, using the API gateway endpoint, MongoDB disconnect! ( session time-out error )
This is because of your code that doesn't take care of lambda execution environment.
As mentioned in the comment above, between two lambda invocation, you are not releasing the connection to the DB. It is like doing two times in a row mongoose.connect()
I suggest to do something like this:
let cachedClient = null;
async function connectDB() {
if (cachedClient) {
return cachedClient;
}
// Connect to our MongoDB database hosted on MongoDB Atlas
const client = await mongoose.connect(MONGO_DB_URL);
cachedClient = client;
return client;
}
This will reuse the same client across multiple lambda invocation (within the same execution environment). When a new lambda execution environment is created, cachedClient will be null and a new client will be created.
Hope it clarifies.
I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.
This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.
Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?
Thanks.
AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.
This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.
The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.
Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.
You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be inefficient for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.
To avoid connecting for every request (in cases where the lambda process is re-used):
Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the db promise returned from MongoClient.connect).
In order for the lambda not to hang waiting for you to close the db connection, db.close(), after servicing a request tell it not wait for an empty event loop.
Example:
var db = MongoClient.connect(MongoURI);
module.exports.targetingSpec = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
db.then((db) => {
// use db
});
};
From the documentation about context.callbackWaitsForEmptyEventLoop:
callbackWaitsForEmptyEventLoop
The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.
Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)
I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.
As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.
The Lambda basically does:
Build up or reuse connection
Query one record
Write or update one record
close the connection or leave it open
The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.
The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.
When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.
For a large scale deployment one should run more comprehensive tests.
Yes, there is a way to cache/retain connection to MongoDB and its name is pool connection. and you can use it with lambda functions as well like this:
for more information you can follow these links:
Using Mongoose With AWS Lambda
Optimizing AWS Lambda(a bit out date)
const mongoose = require('mongoose');
let conn = null;
const uri = 'YOUR CONNECTION STRING HERE';
exports.handler = async function(event, context) {
// Make sure to add this so you can re-use `conn` between function calls.
context.callbackWaitsForEmptyEventLoop = false;
const models = [{name: 'User', schema: new mongoose.Schema({ name: String })}]
conn = await createConnection(conn, models)
//e.g.
const doc = await conn.model('User').findOne({})
console.log('doc: ', doc);
};
const createConnection = async (conn,models) => {
// Because `conn` is in the global scope, Lambda may retain it between
// function calls thanks to `callbackWaitsForEmptyEventLoop`.
// This means your Lambda function doesn't have to go through the
// potentially expensive process of connecting to MongoDB every time.
if (conn == null || (conn && [0, 3].some(conn.readyState))) {
conn = await mongoose.createConnection(uri, {
// Buffering means mongoose will queue up operations if it gets
// disconnected from MongoDB and send them when it reconnects.
// With serverless, better to fail fast if not connected.
bufferCommands: false, // Disable mongoose buffering
bufferMaxEntries: 0, // and MongoDB driver buffering
useNewUrlParser: true,
useUnifiedTopology: true,
useCreateIndex: true
})
for (const model of models) {
const { name, schema } = model
conn.model(name, schema)
}
}
return conn
}
Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.
The short answer is yes, you need to create a new connection AND close it before the lambda finishes.
The long answer is actually during my tests you can pass down your DB connections in your handler like so(mysql example as that's what I've got to hand), you can't rely on this having a connection so check my example below, it may be that once your Lambda's haven't been executed for ages it does lose the state from the handler(cold start), I need to do more tests to find out, but I have noticed if a Lambda is getting a lot of traffic using the below example it doesn't create a new connection.
// MySQL.database.js
import * as mysql from 'mysql'
export default mysql.createConnection({
host: 'mysql db instance address',
user: 'MYSQL_USER',
password: 'PASSWORD',
database: 'SOMEDB',
})
Then in your handler import it and pass it down to the lambda that's being executed.
// handler.js
import MySQL from './MySQL.database.js'
const funcHandler = (func) => {
return (event, context, callback) => {
func(event, context, callback, MySQL)
}
}
const handler = {
someHandler: funcHandler(someHandler),
}
export default handler
Now in your Lambda you do...
export default (event, context, callback, MySQL) => {
context.callbackWaitsForEmptyEventLoop = false
// Check if their is a MySQL connection if not, then open one.
// Do ya thing, query away etc etc
callback(null, responder.success())
}
The responder example can he found here. sorry it's ES5 because that's where the question was asked.
Hope this helps!
Official Best Practice for Connecting from AWS Lambda
You should define the client to the MongoDB server outside the AWS
Lambda handler function. Don't define a new MongoClient object each
time you invoke your function. Doing so causes the driver to create a
new database connection with each function call. This can be expensive
and can result in your application exceeding database connection
limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Resources
For more info, check this Docs.
We tested an AWS Lambda that connected every minute to our self managed MongoDB.
The connections were unstable and the Lambda failed
We resolved the issue by wrapping the MongoDB with Nginx reverse proxy stream module:
How to setup MongoDB behind Nginx Reverse Proxy
stream {
server {
listen <your incoming Mongo TCP port>;
proxy_connect_timeout 1s;
proxy_timeout 3s;
proxy_pass stream_mongo_backend;
}
upstream stream_mongo_backend {
server <localhost:your local Mongo TCP port>;
}
}
In addition to saving the connection for reuse, increase the memory allocation for the lambda function. AWS allocates CPU proportionally to the memory allocation and when changing from 128MB to 1.5Gb the connection time dropped from 4s to 0.5s when connecting to mongodb atlas.
Read more here: https://aws.amazon.com/lambda/faqs/
I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2.
I have created a mongo DB on the same AWS EC2 account where my lambda function reside.
Now I can access my mongo from the lambda function with the private IP.
I have an Express App which connects to a MongoDB server at startup and serves requests on-demand (I don't disconnect - it's a single threaded server so no pooling - fairly simple stuff)
Problem is that it's possible the MongoDB server will be unavailable for periods of time (it's not on-site) and whilst the Express App doesn't crash, it seems that any requests made to the server will run indefinately until the connection is restored!
I'd like to limit that (e.g. throw an error back after a period of time) but I can't seem to make that happen...
I'm using connect options "{server: {auto_reconnect: true}}" which seems to ensure that once the MongoDB server reappears, requests complete (without it, requests made during downtime seem to run forever...) - and I don't have access to the client code so I can't fix it there...
I'd assumed a combination of 'connectTimeoutMS' or 'socketTimeoutMS' would allow me to terminate requests when MongoDB is unavailable for longer periods, but I just can't get those to work (I've tried them as connect options, passing them in the URI etc. etc.)
Any attempt to open a Collection and Find/Insert/Update just 'hangs' until the MongoDB reappears - I've left it over 30 mins and everything was just sitting these (and completed AOK when the network was restored!)
What's the best way around this? Should I open a connection specifically for each request (not really a performance issue - it's not a high volume app) or is there something else I'm missing?
Updated to add the connect code
var myDB
var mongodb = require('mongodb')
var uri = // some env vars and stuff
mongodb.MongoClient.connect(uri, {server: {auto_reconnect: true}}, function (err, db) {
myDB = db
})
myDB is then used elsewhere to open collections - and the handle from that is used to find/insert etc.
If the connection to the DB is interrupted, myDB.collection() calls (or calls to find/insert on their handles) will simply hang until the connection is restored - nothing I've tried will cause them to 'time out' sooner!?
I assume that you are using mongoose as a driver.
You'd catch the error by this.
var db = require('domain').create();
db.on('error', function(err) {
console.log('DB got a problem');
});
db.run(function() {
mongoose.connect(config, options);
});
or you can directly access
mongoose.connection.readyState
to check the statement of your DB.
Connection ready state
0 = disconnected
1 = connected
2 = connecting
3 = disconnecting
Each state change emits its associated event name.
http://mongoosejs.com/docs/api.html
I am using mongodb-native-driver in express.js app. I have around 6 collections in the database, so I have created 6 js files with each having a collection as a javascript object (e.g function collection(){}) and the prototypes functions handling all the manipulation on those collections. I thought this would be a good architecture.
But the problem I am having is how to connect to the database? Should I create a connection in each of this files and use them? I think that would be an overkill as the connect in mongodb-native-driver creates a pool of connections and having several of them would not be justified.
So how do I create a single connection pool and use it in all the collections.js files? I want to have the connection like its implemented in mongoose. Let me know if any of my thought process in architecture of the app is wrong.
Using Mongoose would solve these problems, but I have read in several places thats it slower than native-driver and also I would prefer a schema-less models.
Edit: I created a module out of models. Each collection was in a file and it took the database as an argument. Now in the index.js file I called the database connection and kept a variable db after I got the database from the connection. (I used the auto-reconnect feature to make sure that the connection wasn't lost). In the same index.js file I exported each of the collections like this
exports.model1 = require('./model1').(db)
exprorts.model2 = require('./model2').(db)
This ensured that the database part was handled in just one module and the app would just call function that each model.js file exported like save(), fincdbyid() etc (whatever you do in the function is upto you to implement).
how to connect to the database?
In order to connect using the MongoDB native driver you need to do something like the following:
var util = require('util');
var mongodb = require('mongodb');
var client = mongodb.MongoClient;
var auth = {
user: 'username',
pass: 'password',
host: 'hostname',
port: 1337,
name: 'databaseName'
};
var uri = util.format('mongodb://%s:%s#%s:%d/%s',
auth.user, auth.pass, auth.host, auth.port, auth.name);
/** Connect to the Mongo database at the URI using the client */
client.connect(uri, { auto_reconnect: true }, function (err, database) {
if (err) throw err;
else if (!database) console.log('Unknown error connecting to database');
else {
console.log('Connected to MongoDB database server at:');
console.log('\n\t%s\n', uri);
// Create or access collections, etc here using the database object
}
});
A basic connection is setup like this. This is all I can give you going on just the basic description of what you want. Post up some code you've got so far to get more specific help.
Should I create a connection in each of this files and use them?
No.
So how do I create a single connection pool and use it in all the collections.js files?
You can create a single file with code like the above, lets call it dbmanager.js connecting to the database. Export functions like createUser, deleteUser, etc. which operate on your database, then export functionality like so:
module.exports = {
createUser: function () { ; },
deleteUser: function () { ; }
};
which you could then require from another file like so:
var dbman = require('./dbmanager');
dbman.createUser(userData); // using connection established in `dbmanager.js`
EDIT: Because we're dealing with JavaScript and a single thread, the native driver indeed automatically handles connection pooling for you. You can look for this in the StackOverflow links below for more confirmation of this. The OP does state this in the question as well. This means that client.connect should be called only once by an instance of your server. After the database object is successfully retrieved from a call to client.connect, that database object should be reused throughout the entire instance of your app. This is easily accomplished by using the module pattern that Node.JS provides.
My suggestion is to create a module or set of modules which serves as a single point of contact for interacting with the database. In my apps I usually have a single module which depends on the native driver, calling require('mongodb'). All other modules in my app will not directly access the database, but instead all manipulations must be coordinated by this database module.
This encapsulates all of the code dealing with the native driver into a single module or set of modules. The OP seems to think there is a problem with the simple code example I've posted, describing a problem with a "single large closure" in my example. This is all pretty basic stuff, so I'm adding clarification as to the basic architecture at work here, but I still do not feel the need to change any code.
The OP also seems to think that multiple connections could possibly be made here. This is not possible with this setup. If you created a module like I suggest above then the first time require('./dbmanager') is called it will execute the code in the file dbmanager.js and return the module.exports object. The exports object is cached and is also returned on each subsequent call to require('./dbmanager'), however, the code in dbmanager.js will only be executed the first require.
If you don't want to create a module like this then the other option would be to export only the database passed to the callback for client.connect and use it directly in different places throughout your app. I recommend against this however, regardless of the OPs concerns.
Similar, possibly duplicate Stackoverflow questions, among others:
How to manage mongodb connections in nodejs webapp
Node.JS and MongoDB, reusing the DB object
Node.JS - What is the right way to deal with MongoDB connections
As accepted answer says - you should create only one connection for all incoming requests and reuse it, but answer is missing solution, that will create and cache connection. I wrote express middleware to achieve this - express-mongo-db. At first sight this task is trivial, and most people use this kind of code:
var db;
function createConnection(req, res, next) {
if (db) { req.db = db; next(); }
client.connect(uri, { auto_reconnect: true }, function (err, database) {
req.db = db = databse;
next();
});
}
app.use(createConnection);
But this code lead you to connection-leak, when multiple request arrives at the same time, and db is undefined. express-mongo-db solving this by holding incoming clients and calling connect only once, when module is required (not when first request arrives).
Hope you find it useful.
I just thought I would add in my own method of MongoDB connection for others interested or having problems with different methods
This method assumes you don't need authentication(I use this on localhost)
Authentication is still easy to implement
var MongoClient = require('mongodb').MongoClient;
var Server = require('mongodb').Server;
var client = new MongoClient(new Server('localhost',27017,{
socketOptions: {connectTimeoutMS: 500},
poolSize:5,
auto_reconnect:true
}, {
numberOfRetries:3,
retryMilliseconds: 500
}));
client.open(function(err, client) {
if(err) {
console.log("Connection Failed Via Client Object.");
} else {
var db = client.db("theDbName");
if(db) {
console.log("Connected Via Client Object . . .");
db.logout(function(err,result) {
if(!err) {
console.log("Logged out successfully");
}
client.close();
console.log("Connection closed");
});
}
}
});
Credit goes to Brad Davley which goes over this method in his book (page 231-232)
I am new to node, postgresql, and to the whole web development business. I am currently writing a simple app which connects to a postgres database and display the content of a table in a web view. The app will be hosted in OpenShift.
My main entry is in server.js:
var pg = require('pg');
pg.connect(connection_string, function(err, client) {
// handle error
// save client: app.client = client;
});
Now, to handle the GET / request:
function handle_request(req, res){
app.client.query('...', function(err, result){
if (err) throw err; // Will handle error later, crash for now
res.render( ... ); // Render the web view with the result
});
}
My app seems to work: the table is rendered in the web view correctly, and it works for multiple connections (different web clients from different devices). However, if there is no request for a couple of minutes, then subsequent request will crash the app with time out information. Here is the stack information:
/home/hai/myapp/server.js:98
if (err) throw err;
^
Error: This socket is closed.
at Socket._write (net.js:474:19)
at Socket.write (net.js:466:15)
at [object Object].query (/home/hai/myapp/node_modules/pg/lib/connection.js:109:15)
at [object Object].submit (/home/hai/myapp/node_modules/pg/lib/query.js:99:16)
at [object Object]._pulseQueryQueue (/home/hai/myapp/node_modules/pg/lib/client.js:166:24)
at [object Object].query (/home/hai/myapp/node_modules/pg/lib/client.js:193:8)
at /home/hai/myapp/server.js:97:17
at callbacks (/home/hai/myapp/node_modules/express/lib/router/index.js:160:37)
at param (/home/hai/myapp/node_modules/express/lib/router/index.js:134:11)
at pass (/home/hai/myapp/node_modules/express/lib/router/index.js:141:5)
Is there a way to keep the connection from timed out (better)? Or to reconnect on demand (best)? I have tried to redesign my app by not connecting to the database in the beginning, but upon the GET / request. This solution works only for the first request, then crashed on the second. Any insight is appreciated.
Have you looked into the postgres keepalive setting values? It sends packets to keep idle connections from timing out.
http://www.postgresql.org/docs/9.1/static/runtime-config-connection.html
I also found this similar question:
How to use tcp_keepalives settings in Postgresql?
You could also perform really minor queries from the db at a set interval. However, this method is definitely more hacked.
Edit: You could also try initiating the client like this:
var client = new pg.Client(conString);
Before you make your queries, you can check if the client is still connected. I believe you can use:
if(client.connection._events != null)
client.connect();
faced the same problem.. telling the client to close connection upon the end event
query.on('end', function() {
client.end();
});
did the trick for me...
You can also change the default idle timeout of 30 seconds to whatever value you need. E.g.
pg.defaults.poolIdleTimeout = 600000; // 10 mins
I'm using the parameter keepAlive in true and it works.
This is my configuration and it is solved.
const client_pg = new Client({
connectionString,
keepAlive: true,
keepAliveInitialDelayMillis: 10000
});