"connection terminated unexpectedly" error with Node, Postgres on AWS Lambda - node.js

I have a number of Node functions running on AWS Lambda. These functions have been using the Node 8 runtime but AWS sent out an end-of-life notice saying that functions should be upgraded to the latest LTS. With that, I upgraded one on my functions to use Node 12. After being in production for a bit, I'm starting to see a ton of connection terminated unexpectedly errors when querying the database.
Here are the errors that I'm seeing:
The connection terminated unexpectedly error
And Error [ERR_STREAM_DESTROYED]: Cannot call write after a stream was destroyed - this seems to happen on the 1st or second invocation after seeing the connection terminated unexpectedly error.
I'm using Knex.js for querying the database. I was running older version of knex and node-postgres and recently upgraded to see if it would resolve the issue, but no luck. Here are the versions of knex and node-postgres that I'm currently running:
"knex": "^0.20.8"
"pg": "^7.17.1"
The only change I've made to this particular function is the upgrade to Node 12. I've also tried Node 10, but the same issue persists. Unfortunately, AWS won't let me downgrade to Node 8 to verify that it is indeed an issue. None of my other functions running on Node 8 are experiencing this issue.
I've researched knex, node-postgres and tarn.js (the Knex connection pooling library) to see if any related issues or solutions popped up, but so far, I haven't had any luck.
UPDATE:
Example of a handler. Note that this is happening on many different Lambdas, all running Node 12.
require('../../helpers/knex')
const { Rollbar } = require('#scoutforpets/utils')
const { Email } = require('#scoutforpets/notifications')
const { transaction: tx } = require('objection')
const Invoice = require('../../models/invoice')
// configure rollbar for error logging
const rollbar = Rollbar.configureRollbar(process.env.ROLLBAR_TOKEN)
/**
*
* #param {*} event
*/
async function handler (event) {
const { invoice } = event
const { id: invoiceId } = invoice
try {
return tx(Invoice, async Invoice => {
// send the receipt
await Email.Customer.paymentReceipt(invoiceId, true)
// convert JSON to model
const i = Invoice.fromJson(invoice)
// mark the invoice as having been sent
await i.markAsSent()
})
} catch (err) {
return err
}
}
module.exports.handler = rollbar.lambdaHandler(handler)

Starting with node.js 10 aws lambda make the handler async, so you have to adapt your code.
Docs : https://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-handler.html
The runtime passes three arguments to the handler method. The first
argument is the event object, which contains information from the
invoker. The invoker passes this information as a JSON-formatted
string when it calls Invoke, and the runtime converts it to an object.
When an AWS service invokes your function, the event structure varies
by service.
The second argument is the context object, which contains information
about the invocation, function, and execution environment. In the
preceding example, the function gets the name of the log stream from
the context object and returns it to the invoker.
The third argument, callback, is a function that you can call in
non-async functions to send a response. The callback function takes
two arguments: an Error and a response. When you call it, Lambda waits
for the event loop to be empty and then returns the response or error
to the invoker. The response object must be compatible with
JSON.stringify.
For async functions, you return a response, error, or promise to the
runtime instead of using callback.
exports.handler = async function(event, context, callback) {
console.log("EVENT: \n" + JSON.stringify(event, null, 2))
return context.logStreamName
}
Thx!

I think you need to set the right connection pooling config.
See the docs here: https://github.com/marcogrcr/sequelize/blob/patch-1/docs/manual/other-topics/aws-lambda.md
const { Sequelize } = require("sequelize");
let sequelize = null;
async function loadSequelize() {
const sequelize = new Sequelize(/* (...) */, {
// (...)
pool: {
/*
* Lambda functions process one request at a time but your code may issue multiple queries
* concurrently. Be wary that `sequelize` has methods that issue 2 queries concurrently
* (e.g. `Model.findAndCountAll()`). Using a value higher than 1 allows concurrent queries to
* be executed in parallel rather than serialized. Careful with executing too many queries in
* parallel per Lambda function execution since that can bring down your database with an
* excessive number of connections.
*
* Ideally you want to choose a `max` number where this holds true:
* max * EXPECTED_MAX_CONCURRENT_LAMBDA_INVOCATIONS < MAX_ALLOWED_DATABASE_CONNECTIONS * 0.8
*/
max: 2,
/*
* Set this value to 0 so connection pool eviction logic eventually cleans up all connections
* in the event of a Lambda function timeout.
*/
min: 0,
/*
* Set this value to 0 so connections are eligible for cleanup immediately after they're
* returned to the pool.
*/
idle: 0,
// Choose a small enough value that fails fast if a connection takes too long to be established.
acquire: 3000,
/*
* Ensures the connection pool attempts to be cleaned up automatically on the next Lambda
* function invocation, if the previous invocation timed out.
*/
evict: CURRENT_LAMBDA_FUNCTION_TIMEOUT
}
});
// or `sequelize.sync()`
await sequelize.authenticate();
return sequelize;
}
module.exports.handler = async function (event, callback) {
// re-use the sequelize instance across invocations to improve performance
if (!sequelize) {
sequelize = await loadSequelize();
} else {
// restart connection pool to ensure connections are not re-used across invocations
sequelize.connectionManager.initPools();
// restore `getConnection()` if it has been overwritten by `close()`
if (sequelize.connectionManager.hasOwnProperty("getConnection")) {
delete sequelize.connectionManager.getConnection;
}
}
try {
return await doSomethingWithSequelize(sequelize);
} finally {
// close any opened connections during the invocation
// this will wait for any in-progress queries to finish before closing the connections
await sequelize.connectionManager.close();
}
};
It's actually for sequelize, not knex, but I'm sure under the hood they work the same way.

I had this problem too, in my case it was cause i tried to connect db in production.
so, I added ssl to Pool, like this:
const pool = new Pool({
connectionString: connectionString,
ssl: {rejectUnauthorized: false},
});
Hope it helps you too...

Related

Trying to run a Cloud Function with LRO

Background
I am working on creating an autonomous Google AutoML end<>end system. I created a cloud function that receives a cloud pub/sub message when training starts. The cloud function uses the operation ID to get the operation status of the training. If the training of the model is complete(operation metadata = true), the function will send the model ID to a deployment function and send a pub/sub message with the modelID for the model to be called on prediction from. I found a solution from SO from this post How to programmatically get model id from google-cloud-automl with node.js client library
Problem
The issue I am coming across is with the cloud function timeout of 10 minutes. I wrote this question on reddit on potential solutions. https://www.reddit.com/r/googlecloud/comments/jqr213/cloud_function_to_compute_engine/ The Compute Engine solution seems not practical for a system mainly written in a cloud function environment. While trying to implement the cron job solution, I thought of the retry feature for cloud functions. It keeps the same event and will retry the function for up to a week. The documentation for retry is https://cloud.google.com/functions/docs/bestpractices/retries How could I include a cancel of the function to keep it retrying until it becomes true and completes the deployment and pub/sub message? My thought is to include the ending of the system in the if else statement, I am just struggling to find documentation of this/ if it would actually work.
Code
const {AutoMlClient} = require('#google-cloud/automl').v1;
// Instantiates a client
const client = new AutoMlClient();
exports.helloPubSub = (event, context) => {
//Imports the Google Cloud AutoML library
const message = event.data
? Buffer.from(event.data, 'base64').toString()
: 'Hello, World';
const model = message;
console.log(model);
const modelpath = message.replace('"','');
const modelID = modelpath.replace('"','');
const message1 = model.replace('projects/170974376642/locations/us-central1/operations/','');
const message2 = message1.replace('"','');
const message3 = message2.replace('"','');
console.log(`Operation ID is: ${message3}`)
getOperationStatus(message3, modelID);
}
// [START automl_vision_classification_deploy_model_node_count]
async function getOperationStatus(opId, message) {
console.log('Starting operation status');
const opped = opId;
const data = message;
const projectId = '170974376642';
const location = 'us-central1';
const operationId = opId;
// Construct request
const request = {
name: `${message}`,
};
console.log('Made it to the response');
const [response] = await client.operationsClient.getOperation(request);
console.log(`Name: ${response.name}`);
console.log(`Operation details:`);
var apple = JSON.stringify(response);
console.log(apple);
console.log('Loop until the model is ready to deploy');
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
}
async function pubSub(id) {
const topicName = 'modelID';
const data = JSON.stringify({foo: `${id}`});
async function publishMessage() {
// Publishes the message as a string, e.g. "Hello, world!" or JSON.stringify(someObject)
const dataBuffer = Buffer.from(data);
try {
const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
console.log(`Message ${messageId} published.`);
} catch (error) {
console.error(`Received error while publishing: ${error.message}`);
process.exitCode = 1;
}
}
publishMessage();
// [END pubsub_publish_with_error_handler]
// [END pubsub_quickstart_publisher]
process.on('unhandledRejection', err => {
console.error(err.message);
process.exitCode = 1;
});
}
async function deployModelWithNodeCount(message) {
const projectId = 'ireda1';
const location = 'us-central1';
const modelId = message;
// Construct request
const request = {
name: client.modelPath(projectId, location, modelId),
imageClassificationModelDeploymentMetadata: {
nodeCount: 1,
},
};
const [operation] = await client.deployModel(request);
// Wait for operation to complete.
const [response] = await operation.promise();
console.log(`Model deployment finished. ${response}`);
}
// [END automl_vision_classification_deploy_model_node_count]
There are several improvements that you can consider for your code. First of all, it is important to understand that Cloud Functions are short-lived. 9 minutes is the maximum, your function will be active. Cloud Functions are not meant for background operations, if you are looking at a solution, which can be executed in the background and requires minimal infrastructure, I would recommend having a look at Cloud Run.
Now lets have a look at some parts of the code and how it can be improved with a different architecture maintaining Cloud Functions and PubSub as the backbone.
Waiting on model deployment
The code you use is:
if (apple.includes('True')) {
const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
deployModelWithNodeCount(appleF);
pubSub(appleF);
} else {
getOperationStatus(opped, data);
}
First of all, I would strongly suggest not to use recursion here, because a) this can be handled via a simple loop, b) you are bombarding the service without any time out or back-off policy. The latter might result in either your service crashing or endpoint starting to reject your requests.
To improve your code, you can for example set at least timeout function, like this:
setTimeout(getOperationStatus(opped, data), 1000)
For readability, I would also suggest just to use a loop in the future since you are using async patterns anyways:
status = getOperationStatus(opped, data);
while(!status){
await new Promise(t => setTimeout(t, 1000));
status = getOperationStatus(opped, data);
}
In this case, you need to separate it into two functions - 1) getOperationStatus, which actually just return status, and 2) waitForDeployment, which polls for the status, compares it with the expected result, and decides to a) wait & retry or b) abandon & return
This might make your code better, but does not solve the fundamental problem of the system design. To understand this, let's have a look a splitting responsibility and structuring the system differently. As a side note, the guide here is not meant for a Cloud Function application.
A few explanations:
Activation Function initializes the entire process, it calls the Vision Auto ML to start the deployment. It only gets the ID of the operation and pushes it to the queue
Cloud Scheduler pushes a trigger to PubSub (alternatively it can also call the function as an endpoint) every X minutes/seconds saying that it is time to check on the progress
Polling Function once triggered ask for the next ID to check, queries Cloud AutoML and if finished, acknowledges the message and writes the results, otherwise exits. You need to be careful with the configuration of acknowledgments here. Useful information is here
Polling of the status
The minor thing I have noticed is how you are polling the status. Why don't your just query this URL GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id and get status of done (check here for details)
Conclusion: Cloud Functions are short-lived and must handle only one operation at a time, no waiting. If you want a simple loop for waiting for results, use Cloud Run

Experience Neptune Gremlin connections problem on calling AWS lambda handlers` callback

I am using gremlin#3.3.5 for my Node.js 8.10 application with AWS Lambdas. The process works all fine for a single invocation. Here is my very sample code.
const gremlin = require('gremlin');
const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
const Graph = gremlin.structure.Graph;
exports.handler = (event, context, callback) => {
dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin');
const graph = new Graph();
const g = graph.traversal().withRemote(dc);
try {
const result = await g.V().limit(1).count().next();
dc.close();
callback(null, { result: result });
} catch (exception) {
callback('Error');
throw error;
}
}
When I run this process for single invocation, it appears to work all fine, but soon as I try to run a batch process of operations (something like 100,000 requests / hr), I am experiencing in CloudWatch log metrics that my connections are not closed successfully. I have tried a number of implementation of this, like callbackWaitForEventLoopEmpty, but that seizes the lambda. When I remove callback (or return similarly), this process works fine with batch operations too. But I do want to return data from this lambda with information that is passed to my step function to trigger another lambda based on that information.
After doing some research, I have found out the problem was with how gremlin package was handling the event of closing a connection didn't favor serverless architecture. When triggered driver.close(). When driver is instantiated, it creates instance of client, which inside itself creates instance of connection, which creates instance of websocket using ws library. Now ws.close() event gracefully closes all the events, which doesn't wait for event to be called before my callback is called and that event remains open and leaks. So after explicitly calling dc._client._connection.ws.terminate() on connection instance and then dc.close() closes connection immediately.
g.V().limit(1).count().next() is asynchronous.
Try this:
exports.handler = async (event) => {
try {
dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin');
const graph = new Graph();
const g = graph.traversal().withRemote(dc);
const result = await g.V().limit(1).count().next();
dc.close();
return result;
} catch (error) {
throw error;
}
}
Since your Lambda runtime is Node.js 8.10 you don't need to use callback.

When using pg-promise, how do you set a query to timeout after a time/cancel the query? [duplicate]

I want to add timeout to pg-promise queries so they will fail after some amount of time if database have not yet responded.
Is there any recommended way to do that or should I make custom wrapper that will handle timer and reject promise if it's too late?
From the author of pg-promise...
pg-promise doesn't support query cancellation, because it is a hack to work-around incorrect database design or bad query execution.
PostgreSQL supports events that should be used when executing time-consuming queries, so instead of waiting, one can set an event listener to be triggered when specific data/view becomes available. See LISTEN/NOTIFY example.
You can extend pg-promise with your own custom query method that will time out with a reject (see example below), but that's again another work-around on top of a design problem.
Example using Bluebird:
const Promise = require('bluebird');
Promise.config({
cancellation: true
});
const initOptions = {
promiseLib: Promise,
extend(obj) {
obj.queryTimeout = (query, values, delay) => {
return obj.any(query, values).timeout(delay);
}
}
};
const pgp = require('pg-promise')(initOptions);
const db = pgp(/* connection details */);
Then you can use db.queryTimeout(query, values, delay) on every level.
Alternatively, if you are using Bluebird, you can chain .timeout(delay) to any of the existing methods:
db.any(query, values)
.timeout(500)
.then(data => {})
.catch(error => {})
See also:
extend event
Bluebird.timeout
UPDATE
From version 8.5.3, pg-promise started supporting query timeouts, via property query_timeout within the connection object.
You can either override the defaults:
pgp.pg.defaults.query_timeout = 3000; // timeout every query after 3 seconds
Or specify it within the connection object:
const db = pgp({
/* all connection details */
query_timeout: 3000
});

Query timeout in pg-promise

I want to add timeout to pg-promise queries so they will fail after some amount of time if database have not yet responded.
Is there any recommended way to do that or should I make custom wrapper that will handle timer and reject promise if it's too late?
From the author of pg-promise...
pg-promise doesn't support query cancellation, because it is a hack to work-around incorrect database design or bad query execution.
PostgreSQL supports events that should be used when executing time-consuming queries, so instead of waiting, one can set an event listener to be triggered when specific data/view becomes available. See LISTEN/NOTIFY example.
You can extend pg-promise with your own custom query method that will time out with a reject (see example below), but that's again another work-around on top of a design problem.
Example using Bluebird:
const Promise = require('bluebird');
Promise.config({
cancellation: true
});
const initOptions = {
promiseLib: Promise,
extend(obj) {
obj.queryTimeout = (query, values, delay) => {
return obj.any(query, values).timeout(delay);
}
}
};
const pgp = require('pg-promise')(initOptions);
const db = pgp(/* connection details */);
Then you can use db.queryTimeout(query, values, delay) on every level.
Alternatively, if you are using Bluebird, you can chain .timeout(delay) to any of the existing methods:
db.any(query, values)
.timeout(500)
.then(data => {})
.catch(error => {})
See also:
extend event
Bluebird.timeout
UPDATE
From version 8.5.3, pg-promise started supporting query timeouts, via property query_timeout within the connection object.
You can either override the defaults:
pgp.pg.defaults.query_timeout = 3000; // timeout every query after 3 seconds
Or specify it within the connection object:
const db = pgp({
/* all connection details */
query_timeout: 3000
});

Node Postgres Module not responding

I have an amazon beanstalk node app that uses the postgres amazon RDS. To interface node with postgres I use node postgres. Code looks like this:
var pg = require('pg'),
done,client;
function DataObject(config,success,error) {
var PG_CONNECT = "postgres://"+config.username+":"+config.password+"#"+
config.server+":"+config.port+"/"+config.database;
self=this;
pg.connect(PG_CONNECT, function(_error, client, done) {
if(_error){ error();}
else
{
self.client = client;
self.done = done;
success();
}
});
}
DataObject.prototype.add_data = function(data,success,error) {
self=this;
this.client.query('INSERT INTO sample (data) VALUES ($1,$2)',
[data], function(_error, result) {
self.done();
success();
});
};
To use it I create my data object and then call add_data every time new data comes along. Within add_data I call 'this/self.done()' to release the connection back to the pool. Now when I repeatedly make those requests the client.query never gets back. Under what circumstance could this lead to a blocking/not responding database interface?
The way you are using pool is incorrect.
You are asking for a connection from pool in the function DataObject. This function acts as a constructor and is executed once per data object. Thus only one connection is asked for from the pool.
When we call add_data the first time, the query is executed and the connection is returned to the pool. Thus the consequent calls are not successful since the connection is already returned.
You can verify this by logging _error:
DataObject.prototype.add_data = function(data,success,error) {
self=this;
this.client.query('INSERT INTO sample (data) VALUES ($1,$2)',
[data], function(_error, result) {
if(_error) console.log(_error); //log the error to console
self.done();
success();
});
};
There are couple of ways you can do it differently:
Ask for a connection for every query made. Thus you'll need to move the code which ask for pool to function add_data.
Release client after performing all queries. This is a tricky way since calls are made asynchronously, you need to be careful that client is not shared i.e. no new request be made until client.query callback function is done.

Resources