I have a lambda function that runs inside a Step Function Map State. In this lambda I have a db.connect() function that opens a new connection (since lambda executed inside a loop, it will create multiple connections). The question is, should I close the MongoDB connection after that lambda executes the code? If I don't close the connection, will it stay open? If so, will MongoDB server try to close the connection after a while? What's the best practice for this situation? Thanks!
λ
export const handler = async (_, context) => {
context.callbackWaitsForEmptyEventLoop = false
try {
const dbConn = await db.connect()
...
} finally {
await db.disconnect() // is this necessary?
}
}
As per documentation outlined here [1]:
If you create your client outside the function and reuse it this would be a better practice. Potentially the connection would close itself after the defined timeout. So I don't think you need to explicitly close out the connection.
Related
I have been trying out aws lambda with node+knex and have the same issue as this question :
AWS Lambda with Knex JS and RDS Postgres
Here is my code for handler.js:
const knex = require('knex')({
client: 'pg',
connection: {...},
});
reports = []
async function run() {
///running some codes
foo.query().insert ()
bar.query().insert()
knex.client.destroy();
return reports
}
module.exports.testing123 = async event => {
const results = await run()
return results
}
The first time running the function will always work fine but if I try to invoke it again it will return error:
2020-09-20T20:14:09.251Z 4ca734a3-a780-44e3-a881-eebdc27effb0 ERROR Invoke Error {"errorType":"Error","errorMessage":"Unable to acquire a connection","stack":["Error: Unable to acquire a connection"," at Client_PG.acquireConnection (/var/task/node_modules/knex/lib/client.js:340:30)"," at Runner.ensureConnection (/var/task/node_modules/knex/lib/runner.js:248:8)"," at Runner.run (/var/task/node_modules/knex/lib/runner.js:27:12)"," at Builder.Target.then (/var/task/node_modules/knex/lib/interface.js:15:43)"," at processTicksAndRejections (internal/process/task_queues.js:97:5)"]}
removing the 'knex.client.destroy()' line will fix this but I dont think that is the right solution as we should always destroy after using the connection.
deploying the code again also will run fine for the first time
Anything defined outside of the handler being exported is persisted on the lambda instance that is provisioned. So what's happening is that the instance is starting with the knex connection, but is being destroyed on the first request. You aren't creating a new connection to knex on each new request to your lambda in this case.
If you want to connect and destroy per request, then you need to instantiate knex inside of the function.
export const client = new MongoClient(
process.env.ATLAS_URI,
// TODO: Figure out what this is and why it's needed to turn off deprecation warning
{
useUnifiedTopology: true,
}
);
Following this guide and all make sense...but she is just doing one 'call' and then close().
I need to keep doing repeated calls:
export const getAllProducts = async () => {
try {
await client.connect();
const cursor = await client.db("products").collection("data").find();
return await cursor.toArray();
} catch (err) {
throw new Error(err);
} finally {
await client.close();
}
};
The first call is fine. After that: Error: MongoError: Topology is closed, please connect
I honestly don't quite understand what Topology means, but evidently it's the close() that's contributing to the issue.
It doesn't make sense that I set up new MongoClient and the ATLAS_URI does have the 'database name' in there...so why I have to connect specify that again?
Anyway, the main part of my ❓ stands: Do I just keep a separate process going and not close it? Do I start back with a whole new MongoClient each time? 😕
I'll just put a brief answer here incase anyone runs into this.
The Mongodb documentation for the Node.js driver will give you simple examples that include the client.connect()and client.close() methods just to give you a runnable example of making a simple call to the database but in a real server application you are just opening the connection to the client once during start up and typically only closing when the server application is being closed.
So in short: You don't need to open and close and connection everytime you want to perform some action on your database.
I'm trying to save some json files inside my database using a custom function I've wrote. To achieve that I must connect to the database which I'm trying to do using this piece of code at the start of the function:
let url = "mongodb://localhost:27017/database";
(async () => {
const directory = await fs.promises.readdir(__dirname + '/files')
let database = await mongoose.createConnection(url, {useNewUrlParser:true, useUnifiedTopology:true});
database.on('error', error => {
throw console.log("Couldn't Connect To The Database");
});
database.once('open', function() {
//Saving the data using Schema and save();
Weirdly enough, when executing database.once('open', function()) the callback function isn't being called at all and the program just skips the whole saving part and gets right to the end of the function.
I've searched the web for a solution, and one solution suggested to use mongoose.createConnection instant of mongoose.connect.
As you can see it didn't really fixed the issue and the callback function is still not being called.
How can I fix it, and why it happens?
Thanks!
mongoose.createConnection creates a connection instance and allows you to manage multiple db connections as the documentation states. In your case using connect() should be sufficient (connect will create one default connection which is accessible under mongoose.connection).
By awaiting the connect-promise you don't actually need to listen for the open event, you can simply do:
...
await mongoose.connect(url, {useNewUrlParser:true, useUnifiedTopology:true});
mongoose.model('YourModel', YourModelSchema);
...
I am using gremlin#3.3.5 for my Node.js 8.10 application with AWS Lambdas. The process works all fine for a single invocation. Here is my very sample code.
const gremlin = require('gremlin');
const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
const Graph = gremlin.structure.Graph;
exports.handler = (event, context, callback) => {
dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin');
const graph = new Graph();
const g = graph.traversal().withRemote(dc);
try {
const result = await g.V().limit(1).count().next();
dc.close();
callback(null, { result: result });
} catch (exception) {
callback('Error');
throw error;
}
}
When I run this process for single invocation, it appears to work all fine, but soon as I try to run a batch process of operations (something like 100,000 requests / hr), I am experiencing in CloudWatch log metrics that my connections are not closed successfully. I have tried a number of implementation of this, like callbackWaitForEventLoopEmpty, but that seizes the lambda. When I remove callback (or return similarly), this process works fine with batch operations too. But I do want to return data from this lambda with information that is passed to my step function to trigger another lambda based on that information.
After doing some research, I have found out the problem was with how gremlin package was handling the event of closing a connection didn't favor serverless architecture. When triggered driver.close(). When driver is instantiated, it creates instance of client, which inside itself creates instance of connection, which creates instance of websocket using ws library. Now ws.close() event gracefully closes all the events, which doesn't wait for event to be called before my callback is called and that event remains open and leaks. So after explicitly calling dc._client._connection.ws.terminate() on connection instance and then dc.close() closes connection immediately.
g.V().limit(1).count().next() is asynchronous.
Try this:
exports.handler = async (event) => {
try {
dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin');
const graph = new Graph();
const g = graph.traversal().withRemote(dc);
const result = await g.V().limit(1).count().next();
dc.close();
return result;
} catch (error) {
throw error;
}
}
Since your Lambda runtime is Node.js 8.10 you don't need to use callback.
I have an amazon beanstalk node app that uses the postgres amazon RDS. To interface node with postgres I use node postgres. Code looks like this:
var pg = require('pg'),
done,client;
function DataObject(config,success,error) {
var PG_CONNECT = "postgres://"+config.username+":"+config.password+"#"+
config.server+":"+config.port+"/"+config.database;
self=this;
pg.connect(PG_CONNECT, function(_error, client, done) {
if(_error){ error();}
else
{
self.client = client;
self.done = done;
success();
}
});
}
DataObject.prototype.add_data = function(data,success,error) {
self=this;
this.client.query('INSERT INTO sample (data) VALUES ($1,$2)',
[data], function(_error, result) {
self.done();
success();
});
};
To use it I create my data object and then call add_data every time new data comes along. Within add_data I call 'this/self.done()' to release the connection back to the pool. Now when I repeatedly make those requests the client.query never gets back. Under what circumstance could this lead to a blocking/not responding database interface?
The way you are using pool is incorrect.
You are asking for a connection from pool in the function DataObject. This function acts as a constructor and is executed once per data object. Thus only one connection is asked for from the pool.
When we call add_data the first time, the query is executed and the connection is returned to the pool. Thus the consequent calls are not successful since the connection is already returned.
You can verify this by logging _error:
DataObject.prototype.add_data = function(data,success,error) {
self=this;
this.client.query('INSERT INTO sample (data) VALUES ($1,$2)',
[data], function(_error, result) {
if(_error) console.log(_error); //log the error to console
self.done();
success();
});
};
There are couple of ways you can do it differently:
Ask for a connection for every query made. Thus you'll need to move the code which ask for pool to function add_data.
Release client after performing all queries. This is a tricky way since calls are made asynchronously, you need to be careful that client is not shared i.e. no new request be made until client.query callback function is done.