I have a route that handles API calls for timepunches. One of the calls is to "clock_in".
router.route('/clock_in').post(managerCheck, startTimeCheck, isClockedIn, clockIn);
Each of these functions will perform it's own db connection, query the db for some info, then respond to the user or go to the next() function.
I'm using pool from 'pg-poll'.
My connection looks like this.
export const **isClockedIn** = (request, response, next) => {
const query = `select * from....`;
const values = [value1, value2];
pool.connect((err, client, release) => {
client.query(query, values, (err, result) => {
//do stuff
}
and the connection is essentially the same for all functions.
What i'd like to do is have only 1 instance of pool.connect then each function in the api call will use that connection to do their client.query. I'm just not sure how i'd set that up.
Hopefully my question is clear. All my code works, it's just not efficient since it's making multiple db connections for 1 api call.
I learned a lot by watching my db connections as I made calls from my API.
When you make your first call with pg.pool a connection will be made to the db. After your query finishes the connection is placed into an idle state, if another pg.pool command is run, it will use that idle connection. The connection will close after 10 seconds of being idle (you can configure this).
You can also set a max amount of connections (default 10). So if you run 10 queries at the same time, they will all open a connection and run. Their connections will be left idle after completion. If you run another 10 at the same time, they will reuse those connections.
if you want to force only 1 connection ever that never closes (not saying you want to do this), you set idle timeout to 0, and max 1 connection. Then if you run 10 queries at once, they will line up and run one at a time.
const pool = new pg.Pool({
user: 'postgres',
host: 'localhost',
database: 'database',
password: 'password',
port: 5000,
idleTimeoutMillis: 0,
max: 1,
});
This page is super helpful, although I didn't understand much of it until I watched the database connection as my API ran.
https://node-postgres.com/api/pool
Note: The above code should be in it's own js file and all connections should reference it. If you create new pg.Pools I believe those will open their own connections which may not be what you want.
Related
I am using an app on GCP with Node.js with Postgresql (Cloud SQL, lowest tier i.e. 25 connections) using the 'pg' package ("pg": "^8.7.3",). I am quite new with this configuration so there may be some very basic errors here.
I configure my pg_client like this
// CLOUD SQL POSTGRESQL DATABASE
const { Client, Pool } = require('pg')
const pg_client = new Pool({
user: process.env.PG_USER,
host: process.env.PG_HOST,
database: process.env.PG_DB,
password: process.env.PG_PWD,
port: 5432,
})
and then, in order to copy the data from a nosql-database with some 50.000+ items I go through them pretty much like this. I know the code doesn't make perfect sense but this is how the SQL calls are being made:
fiftyThoussandOldItems.forEach(async (item) => {
let nameId = await pg_client.query("SELECT id from some1000items where name='John'")
pg_client.query("INSERT into items (id, name, url) VALUES (nameId, 1,2)"
})
This does however quickly render sorry, too many clients already :: proc.c:362 and error: remaining connection slots are reserved for non-replication superuser connections.
I have done similar runs before without experiencing this issue (but then with about 1000 items).
As far as I understand, I do Not need to do a pg_client.connect() and pg_client.release() (or is it .end()) any longer, according to a SO-answer I unfortunately can't find any longer. Is this really correct? (When I tried to before, I ended up with a lot of other issues that causes other types of problems)
So, my questions are:
What am I doing wrong? Do I need to use pg_client.connect() before every SQL-call and then pg_client.release() after every SQL-call? Or is it pg_client.end()?
Is there a way to have this automatically handled? It doesn't seem very DRY and bug prone.
We are using Mongoose, Nodejs, Serverless, and AWS Lambda. For making use of the same connection instead of opening and closing the connection each time whenever required, I have created a connection pool of size 10 (Which seems to be sufficient for our use-case right now).
But the thing is, when I see the Cloudwatch logs for Lambda, it's not the same connection that is being used.
Every time a new Lambda is called, a new connection is created, while the subsequent calls to that Lambda use the same connection that was opened in the first call.
Resulting in an increase in the number of connections open at a time. At MongoDB Atlas, I can see the number of open connections is way much.
Below is the code I am using for creating a connection if there is no cached connection available. In case it is available, the cached one will be used and a new connection will not be created.
let cached_db;
exports.createConnection = async () => {
if(cached_db == null){
return await mongoose.connect(
connection_uri,
{ 'useUnifiedTopology': true ,
'useNewUrlParser': true,
'useFindAndModify': false ,
'useCreateIndex': true,
'socketTimeoutMS': 60000,
'connectTimeoutMS': 60000,
'poolSize': 10
}
).then(conn => {
cached_db = conn;
return conn;
}).catch((err) => {
console.error('Something went wrong', err);
throw err;
});
} else {
console.log("Cached db in use.");
return cached_db;
}
}
Can the same connection be used across Lambdas? Is there a way to do it?
You should define the client to the MongoDB server outside the AWS Lambda handler function. Don't define a new MongoClient object each time you invoke your function. Doing so causes the driver to create a new database connection with each function call. This can be expensive and can result in your application exceeding database connection limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Pool Size
Connection pool size is a cache of database connections maintained so these connections can be reused when future requests to the database are required. Connection pools are used to enhance the performance of executing commands on a database.
Note: maxPoolSize and poolSize are the same, except they relate to whether you are using the useUnifiedTopology: true setting.
If you are using useUnifiedTopology: true, maxPoolSize is the spec-compliant setting to manage how large connection pools can be.
But if you are using useUnifiedTopology: false (or omits it), poolSize is the same thing but from before we had the unified topology.
Note: Each connection consumes about 1MB of RAM.
Value of the Pool Size
The connection pool is on a per-mongod/mongos basis, so when connecting to a 3-member replica there will be three connection pools (one per mongod), each with a maxPoolSize. Additionally, there is a required monitoring connection for each node as well, so you end up with (maxPoolSize+1)*number_of_nodes TCP connections.
In my opinion, if you don't care about CPU and RAM, you should use all available connections (why not if we already have them, right?).
For example: You have Atlas free cluster with 3 replica sets, that supports maximum number of 500 connections, and you have only one application that connects to it, give all connections to that one application. In order to set the value of poolSize, you can use above calculation of connections:
poolSize = (maximum_connections/number_of_nodes) - 1
poolSize = (500/3) - 1
poolSize = 165
If you would have 2 applications that will connect to that same cluster, give each application half of connections.
If you have limited RAM memory, check how much you can spear and calculate poolSize based on that (as I said in the note, you can assume that one connection will consume about 1MB of RAM).
Resources
For more info, check this official MongoDB Docs.
For connection pool, check this and this.
I found from this blog that Lambda may use same connection if restore the same snapshot and creates new connection if new snapshot generation.
So Lambda can't give assurance that to use same connection if we use outside the handle function.
So in my opinion best approach to optimise number of connection to Mongodb is to close connection before lambda complete so your other service can use free connection.
Use below method to close connection after database interaction finishes.
createConnection.close()
I need to reuse socket for two connect calls made using http.request. I tried passing custom agent limiting number of sockets but the first socket is removed before the 2nd connect call is made by code:
https://github.com/nodejs/node/blob/master/lib/_http_client.js#L438
mock code:
var options = {
method: 'CONNECT', agent: new http.Agent({ keepAlive: true, maxSockets: 1 })
};
var request = this.httpModule.request(options);
request.on('connect', (res, sock, head) => {
console.log(sock.address());
// some processing...
var request2 = this.httpModule.request(options);
request2.on('connect', (res, sock, head) => {
console.log(sock.address());
});
request2.end();
});
request.end();
Is there some way by which I can reuse the same socket for two connect calls?
The two unique sockets are required for this form of communication.
Each socket in this case represents a connection between a client and a server. There is no such socket that represents n clients and one server, so to speak. They also don't act like "threads" here, where one socket can perform work for many clients.
By setting the max sockets to 1, you've requested that only 1 client connection be active at any time. When you try to connect that second client, it kills the first one because the max is reached and we need room for a new connection!
If you want to recycle sockets -- For example, a client connects, refreshes the page after an hour, and the same client triggers another connection -- There's probably not a way to do it this high in the technology stack, and it would be far more complicated and unnecessary than destroying the old socket to make way for a new one anyway. If you don't understand why you would or wouldn't need to do this, you don't need to do it.
If you want to send a message to many clients (and you wanted to accomplish it "under one socket" in your question), consider using the broadcast and emit methods.
I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.
This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.
Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?
Thanks.
AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.
This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.
The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.
Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.
You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be inefficient for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.
To avoid connecting for every request (in cases where the lambda process is re-used):
Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the db promise returned from MongoClient.connect).
In order for the lambda not to hang waiting for you to close the db connection, db.close(), after servicing a request tell it not wait for an empty event loop.
Example:
var db = MongoClient.connect(MongoURI);
module.exports.targetingSpec = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
db.then((db) => {
// use db
});
};
From the documentation about context.callbackWaitsForEmptyEventLoop:
callbackWaitsForEmptyEventLoop
The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.
Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)
I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.
As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.
The Lambda basically does:
Build up or reuse connection
Query one record
Write or update one record
close the connection or leave it open
The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.
The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.
When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.
For a large scale deployment one should run more comprehensive tests.
Yes, there is a way to cache/retain connection to MongoDB and its name is pool connection. and you can use it with lambda functions as well like this:
for more information you can follow these links:
Using Mongoose With AWS Lambda
Optimizing AWS Lambda(a bit out date)
const mongoose = require('mongoose');
let conn = null;
const uri = 'YOUR CONNECTION STRING HERE';
exports.handler = async function(event, context) {
// Make sure to add this so you can re-use `conn` between function calls.
context.callbackWaitsForEmptyEventLoop = false;
const models = [{name: 'User', schema: new mongoose.Schema({ name: String })}]
conn = await createConnection(conn, models)
//e.g.
const doc = await conn.model('User').findOne({})
console.log('doc: ', doc);
};
const createConnection = async (conn,models) => {
// Because `conn` is in the global scope, Lambda may retain it between
// function calls thanks to `callbackWaitsForEmptyEventLoop`.
// This means your Lambda function doesn't have to go through the
// potentially expensive process of connecting to MongoDB every time.
if (conn == null || (conn && [0, 3].some(conn.readyState))) {
conn = await mongoose.createConnection(uri, {
// Buffering means mongoose will queue up operations if it gets
// disconnected from MongoDB and send them when it reconnects.
// With serverless, better to fail fast if not connected.
bufferCommands: false, // Disable mongoose buffering
bufferMaxEntries: 0, // and MongoDB driver buffering
useNewUrlParser: true,
useUnifiedTopology: true,
useCreateIndex: true
})
for (const model of models) {
const { name, schema } = model
conn.model(name, schema)
}
}
return conn
}
Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.
The short answer is yes, you need to create a new connection AND close it before the lambda finishes.
The long answer is actually during my tests you can pass down your DB connections in your handler like so(mysql example as that's what I've got to hand), you can't rely on this having a connection so check my example below, it may be that once your Lambda's haven't been executed for ages it does lose the state from the handler(cold start), I need to do more tests to find out, but I have noticed if a Lambda is getting a lot of traffic using the below example it doesn't create a new connection.
// MySQL.database.js
import * as mysql from 'mysql'
export default mysql.createConnection({
host: 'mysql db instance address',
user: 'MYSQL_USER',
password: 'PASSWORD',
database: 'SOMEDB',
})
Then in your handler import it and pass it down to the lambda that's being executed.
// handler.js
import MySQL from './MySQL.database.js'
const funcHandler = (func) => {
return (event, context, callback) => {
func(event, context, callback, MySQL)
}
}
const handler = {
someHandler: funcHandler(someHandler),
}
export default handler
Now in your Lambda you do...
export default (event, context, callback, MySQL) => {
context.callbackWaitsForEmptyEventLoop = false
// Check if their is a MySQL connection if not, then open one.
// Do ya thing, query away etc etc
callback(null, responder.success())
}
The responder example can he found here. sorry it's ES5 because that's where the question was asked.
Hope this helps!
Official Best Practice for Connecting from AWS Lambda
You should define the client to the MongoDB server outside the AWS
Lambda handler function. Don't define a new MongoClient object each
time you invoke your function. Doing so causes the driver to create a
new database connection with each function call. This can be expensive
and can result in your application exceeding database connection
limits.
As an alternative, do the following:
Create the MongoClient object once.
Store the object so your function can reuse the MongoClient across function invocations.
Step 1
Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:
mongo-client.js:
const { MongoClient } = require('mongodb');
// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);
module.exports = client.connect();
Step 2
Import the new module and use it in function handlers to connect to database.
some-file.js:
const clientPromise = require('./mongodb-client');
// Handler
module.exports.handler = async function(event, context) {
// Get the MongoClient by calling await on the connection promise. Because
// this is a promise, it will only resolve once.
const client = await clientPromise;
// Use the connection to return the name of the connected database for example.
return client.db().databaseName;
}
Resources
For more info, check this Docs.
We tested an AWS Lambda that connected every minute to our self managed MongoDB.
The connections were unstable and the Lambda failed
We resolved the issue by wrapping the MongoDB with Nginx reverse proxy stream module:
How to setup MongoDB behind Nginx Reverse Proxy
stream {
server {
listen <your incoming Mongo TCP port>;
proxy_connect_timeout 1s;
proxy_timeout 3s;
proxy_pass stream_mongo_backend;
}
upstream stream_mongo_backend {
server <localhost:your local Mongo TCP port>;
}
}
In addition to saving the connection for reuse, increase the memory allocation for the lambda function. AWS allocates CPU proportionally to the memory allocation and when changing from 128MB to 1.5Gb the connection time dropped from 4s to 0.5s when connecting to mongodb atlas.
Read more here: https://aws.amazon.com/lambda/faqs/
I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2.
I have created a mongo DB on the same AWS EC2 account where my lambda function reside.
Now I can access my mongo from the lambda function with the private IP.
I have an Express App which connects to a MongoDB server at startup and serves requests on-demand (I don't disconnect - it's a single threaded server so no pooling - fairly simple stuff)
Problem is that it's possible the MongoDB server will be unavailable for periods of time (it's not on-site) and whilst the Express App doesn't crash, it seems that any requests made to the server will run indefinately until the connection is restored!
I'd like to limit that (e.g. throw an error back after a period of time) but I can't seem to make that happen...
I'm using connect options "{server: {auto_reconnect: true}}" which seems to ensure that once the MongoDB server reappears, requests complete (without it, requests made during downtime seem to run forever...) - and I don't have access to the client code so I can't fix it there...
I'd assumed a combination of 'connectTimeoutMS' or 'socketTimeoutMS' would allow me to terminate requests when MongoDB is unavailable for longer periods, but I just can't get those to work (I've tried them as connect options, passing them in the URI etc. etc.)
Any attempt to open a Collection and Find/Insert/Update just 'hangs' until the MongoDB reappears - I've left it over 30 mins and everything was just sitting these (and completed AOK when the network was restored!)
What's the best way around this? Should I open a connection specifically for each request (not really a performance issue - it's not a high volume app) or is there something else I'm missing?
Updated to add the connect code
var myDB
var mongodb = require('mongodb')
var uri = // some env vars and stuff
mongodb.MongoClient.connect(uri, {server: {auto_reconnect: true}}, function (err, db) {
myDB = db
})
myDB is then used elsewhere to open collections - and the handle from that is used to find/insert etc.
If the connection to the DB is interrupted, myDB.collection() calls (or calls to find/insert on their handles) will simply hang until the connection is restored - nothing I've tried will cause them to 'time out' sooner!?
I assume that you are using mongoose as a driver.
You'd catch the error by this.
var db = require('domain').create();
db.on('error', function(err) {
console.log('DB got a problem');
});
db.run(function() {
mongoose.connect(config, options);
});
or you can directly access
mongoose.connection.readyState
to check the statement of your DB.
Connection ready state
0 = disconnected
1 = connected
2 = connecting
3 = disconnecting
Each state change emits its associated event name.
http://mongoosejs.com/docs/api.html