Lambda + pg-promise + transaction = random timeout - node.js

I've been struggling with this issue for the past 48 hours and after reading lots of answers here, blogs, articles, documentation, etc... I still cannot find a solution!
Basically, I have a lambda function with a 2-minute timeout. Based on logs and insights, it processes fine most of the requests, but it randomly fails with a "timed out" error when trying to execute the transaction below.
Lambda code (chopped for legibility):
import pgp from 'pg-promise'
import logger from '../lib/logger'
const Database = pgp()
const db = Database({
connectionString: process.env.DATABASE_URL,
max: 3,
idleTimeoutMillis: 10000,
})
db.connect()
.then(() => logger.info('Successfully connected to the PG database'))
.catch(err => logger.error({ err }))
export const handler = async (event, context) => {
logger.info('transaction start...')
await db.tx(async tx => {
await tx.none(
`
INSERT INTO...`,
[someValue1, someValue2]
)
const updatedRow = await tx.one(
`
UPDATE Something...`,
[somethingId]
)
return someFunction(updatedRow)
})
logger.info('transaction end...')
}
const someFunction = async (data) => {
return db.task('someTask', async ctx => {
const value = await ctx.oneOrNone(
`SELECT * FROM Something...`,
[data.id]
)
if (!value) {
return
}
const doStuff = async (points) =>
ctx.none(
`UPDATE Something WHERE id =.....`,
[points]
)
// increment points x miles
if (data.condition1) {
await doStuff(10)
}
if (data.condition2) {
await doStuff(20)
}
if (data.condition3) {
await doStuff(30)
}
})
}
I see that the transaction starts but never ends, so the function is inevitably killed by timeout.
I read the whole wiki in pg-promise and understood everything about tweaks, performance, good practices, etc. But still, something is very wrong.
You can see that I also changed the pool size and max timeout for experimenting, but it didn't fix the issue.
Any ideas?
Thanks!

Most likely you are running out of connections. You are not using them correctly, while at the same time you are setting a very low connection limit of 3.
The first issue, you are testing a connection by calling connect, without following it with done, which permanently occupies, and thus wastes your initial/primary connection.
See the example here where we are releasing the connection after we have tested it.
The second problem - you are requesting a new connection (by calling .task on the root db level) while inside a transaction, which is bad for any environment, while particularly critical when you have very few connections available.
The task should be reusing connection of the current transaction, which means your someFunction should either require the connection context, or at least take it as optional parameter:
const someFunction = async (data, ctx) => {
return (ctx || db).task('someTask', async tx => {
const value = await tx.oneOrNone(
Task <-> Transaction interfaces in pg-promise can be fully inter-nested, you see, propagating the current connection through all levels.
Also, I suggest use of pg-monitor, for a good query+context visualization.

Related

Mongoose too many connection and commands

I'm here to request help with mongo/mongoose. I use AWS lambda that accesses a mongo database and I'm having problems sometimes my connections reach the limit of 500. I'm trying to fix this problem and I did some things like this https://dzone.com/articles/how-to-use-mongodb-connection-pooling-on-aws-lambd and https://www.mongodb.com/blog/post/optimizing-aws-lambda-performance-with-mongodb-atlas-and-nodejs. That basically is to use a singleton-like and set context.callbackWaitsForEmptyEventLoop = false, which indeed helped but is still, rarely, open 100 connections in less than a minute, it looks like there is some connection that is not being reused even tho our logs show that they are being reused. So I realized a weird behavior, whenever mongoatlas shows me an increased number of commands, my mongo connections increase heavily. The first chart is operations and the second is the connections.
Looking at operations, there are too many commands and just a few queries. I have no idea what are those commands, my theory is that those commands are causing the problem but I did not find anything that explained what is the difference between query and command exactly for me to know if that is a valid theory or not. Another thing is, how to choose correctly the number of pool size, we have really simple queries.
Here is our singleton class because maybe this is what we are doing wrong:
class Database {
options: [string, mongoose.ConnectionOptions];
instance?: typeof mongoose | null;
constructor(options = config) {
console.log('[DatabaseService] Created database instance...');
this.options = options;
this.instance = null;
}
async checkConnection() {
try {
if (this.instance) {
const pingResponse = await this.instance.connection.db.admin().ping();
console.log(`[DatabaseService] Connection status: ${pingResponse.ok}`);
return pingResponse.ok === 1;
}
return false;
} catch (error) {
console.log(error);
return false;
}
}
async init() {
const connectionActive = await this.checkConnection();
if (connectionActive) {
console.log(`[DatabaseService] Already connected, returning instance`);
return this.instance;
}
console.log('[DatabaseService] Previous connection was not active, creating new connection...');
this.instance = await mongoose.connect(...this.options);
const timeId = Date.now();
console.log(`Connection opened ${timeId}`);
console.time(`Connection started at ${timeId}`);
this.instance?.connection.on('close', () => {
console.timeEnd(`Connection started at ${timeId}`);
console.log(`Closing connection ${timeId}`);
});
return this.instance;
}
async getData(id: string) {
await this.init();
const response = await Model.findOne({ 'uuid': id });
return response;
}
}
I hope that is enough information. My main question is if my theory of commands causing too many connections is possible and what are exactly commands because every explanation that I found look like is the same than query.
Based on the comment written by Matt I have changed my init function and now my connections are under control.
async init() {
if (this.instance) {
console.log(`[DatabaseService] Already connected, returning instance`);
return this.instance;
}
console.log('[DatabaseService] Previous connection was not active, creating new connection...');
this.instance = await mongoose.connect(...this.options);
const timeId = Date.now();
console.log(`Connection opened ${timeId}`);
console.time(`Connection started at ${timeId}`);
this.instance?.connection.on('close', () => {
console.timeEnd(`Connection started at ${timeId}`);
console.log(`Closing connection ${timeId}`);
});
return this.instance;
}

Lambda function only putting one data point into InfluxDB

I have a Lambda function that is designed to take a message from a SQS queue and then input a value called perf_value which is just an integer. The CloudWatch logs show it firing each time and logging Done as seen in the .then() block of my write point. With it firing each time I am still only seeing a single data point in InfluxDB Cloud. I can't figure out why it is only inputting a single value then nothing after that. I don't see a backlog in SQS and no error messages in CloudWatch either. I'm guessing it is a code issue or InfluxDB Cloud setup though I used defaults which you would expect to actually work for multiple data points
'use strict';
const {InfluxDB, Point, HttpError} = require('#influxdata/influxdb-client')
const InfluxURL = 'https://us-west-2-1.aws.cloud2.influxdata.com'
const token = '<my token>=='
const org = '<my org>'
const bucket= '<bucket name>'
const writeApi = new InfluxDB({url: InfluxURL, token}).getWriteApi(org, bucket, 'ms')
module.exports.perf = function (event, context, callback) {
context.callbackWaitsForEmptyEventLoop = false;
let input = JSON.parse(event.Records[0].body);
console.log(input)
const point = new Point('elapsedTime')
.tag(input.monitorID, 'monitorID')
.floatField('elapsedTime', input.perf_value)
// .timestamp(input.time)
writeApi.writePoint(point)
writeApi
.close()
.then(() => {
console.log('Done')
})
.catch(e => {
console.error(e)
if (e instanceof HttpError && e.statusCode === 401) {
console.log('Unauthorized request')
}
console.log('\nFinished ERROR')
})
return true
};
EDIT**
Still have been unable to resolve the issue. I can get one datapoint to go into the influxdb and then nothing will show up.
#Joshk132 -
I believe the problem is here:
writeApi
.close() // <-- here
.then(() => {
console.log('Done')
})
You are closing the API client object after the first write so you are only able to write once. You can use flush() instead if you want to force sending the Point immediately.

Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? Best practices for using Knex.Transaction

When working with a big application that has several tables and several DB operations it's very difficult to keep track of what transactions are occurring. To workaround this we started by passing around a trx object.
This has proven to be very messy.
For example:
async getOrderById(id: string, trx?: Knex.Transaction) { ... }
Depending on the function calling getOrderById it will either pass a trx object or not. The above function will use trx if it is not null.
This seems simple at first, but it leads to mistakes where if you're in the middle of a transaction in one function and call another function that does NOT use a transaction, knex will hang with famous Knex: Timeout acquiring a connection. The pool is probably full.
async getAllPurchasesForUser(userId: string) {
..
const trx = await knex.transaction();
try {
..
getPurchaseForUserId(userId); // Forgot to make this consume trx, hence Knex timesout acquiring connection.
..
}
Based on that, I'm assuming this is not a best practice, but I would love if someone from Knex developer team could comment.
To improve this we're considering to instead use knex.transactionProvider() that is accessed throughout the app wherever we perform DB operations.
The example on the website seems incomplete:
// Does not start a transaction yet
const trxProvider = knex.transactionProvider();
const books = [
{title: 'Canterbury Tales'},
{title: 'Moby Dick'},
{title: 'Hamlet'}
];
// Starts a transaction
const trx = await trxProvider();
const ids = await trx('catalogues')
.insert({name: 'Old Books'}, 'id')
books.forEach((book) => book.catalogue_id = ids[0]);
await trx('books').insert(books);
// Reuses same transaction
const sameTrx = await trxProvider();
const ids2 = await sameTrx('catalogues')
.insert({name: 'New Books'}, 'id')
books.forEach((book) => book.catalogue_id = ids2[0]);
await sameTrx('books').insert(books);
In practice here's how I'm thinking about using this:
SingletonDBClass.ts:
const trxProvider = knex.transactionProvider();
export default trxProvider;
Orders.ts
import trx from '../SingletonDBClass';
..
async getOrderById(id: string) {
const trxInst = await trx;
try {
const order = await trxInst<Order>('orders').where({id});
trxInst.commit();
return order;
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to fetch order, error: ${e}`);
}
}
..
Am I understanding this correctly?
Another example function where a transaction is actually needed:
async cancelOrder(id: string) {
const trxInst = await trx;
try {
trxInst('orders').update({ status: 'CANCELED' }).where({ id });
trxInst('active_orders').delete().where({ orderId: id });
trxInst.commit();
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to cancel order, error: ${e}`);
}
}
Can someone confirm if I'm understanding this correctly? And more importantly if this is a good way to do this. Or is there a best practice I'm missing?
Appreciate your help knex team!
No. You cannot have global singleton class returning the transaction for your all of your internal functions. Otherwise you are trying always to use the same transaction for all the concurrent users trying to do different things in the application.
Also when you once commit / rollback the transaction returned by provider, it will not work anymore for other queries. Transaction provider can give you only single transaction.
Transaction provider is useful in a case, where you have for example middleware, which provides transaction for request handlers, but it should not be started, since it might not be needed so you don't want yet allocate a connection for it from pool.
Good way to do your stuff is to pass transcation or some request context or user session around, so that each concurrent user can have their own separate transactions.
for example:
async cancelOrder(trxInst, id: string) {
try {
trxInst('orders').update({ status: 'CANCELED' }).where({ id });
trxInst('active_orders').delete().where({ orderId: id });
trxInst.commit();
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to cancel order, error: ${e}`);
}
}
Depending on the function calling getOrderById it will either pass a trx object or not. The above function will use trx if it is not null.
This seems simple at first, but it leads to mistakes where if you're in the middle of a transaction in one function and call another function that does NOT use a transaction, knex will hang with famous Knex: Timeout acquiring a connection. The pool is probably full.
We usually do it in a way that if trx is null, query throws an error, so that you need to explicitly pass either knex / trx to be able to execute the method and in some methods trx is actually required to be passed.
Anyhow if you really want to force everything to go through single transaction in a session by default you could create API modules in a way that for each user session you create an API instance which is initialized with transaction:
const dbForSession = new DbService(trxProvider);
const users = await dbForSession.allUsers();
and .allUsers() does something like return this.trx('users');

converting promiseAll to gradual promises resolve(every 3promises for example) does not work

I have a list of promises and currently I am using promiseAll to resolve them
Here is my code for now:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id, authorization, pageNumber);
if (!urlObject.url) {
// throw error
}
const data = await rp.get({
gzip: true,
headers: {
"Accept-Encoding": "gzip,deflate",
},
json: true,
uri: `${urlObject.url}`,
})
const objects = data.objects.filter((object: any) => object.type === "observed-data" && object.created);
return new Promise((resolve, reject) => {
this._resultsDatastore.bulkInsert(
databaseName,
objects
).then(succ => {
resolve(succ)
}, err => {
reject(err)
})
})
})
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
So as you see here I use promise all and it works:
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
However I noticed it affects the database performance wise so I decided to resolve every 3 of them at a time.
for that I was thinking of different ways like cwait, async pool or wrting my own iterator
but I get confused on how to do that?
For example when I use cwait:
let promiseQueue = new TaskQueue(Promise,3);
const all=new Promise.map(pageFutures, promiseQueue.wrap(()=>{}));
I do not know what to pass inside the wrap so I pass ()=>{} for now plus I get
Property 'map' does not exist on type 'PromiseConstructor'.
So whatever way I can get it working(my own iterator or any library) I am ok with as far as I have a good understanding of it.
I appreciate if anyone can shed light on that and help me to get out of this confusion?
First some remarks:
Indeed, in your current setup, the database may have to process several bulk inserts concurrently. But that concurrency is not caused by using Promise.all. Even if you had left out Promise.all from your code, it would still have that behaviour. That is because the promises were already created, and so the database requests will be executed any way.
Not related to your issue, but don't use the promise constructor antipattern: there is no need to create a promise with new Promise when you already have a promise in your hands: bulkInsert() returns a promise, so return that one.
As your concern is about the database load, I would limit the work initiated by the pageFutures promises to the non-database aspects: they don't have to wait for eachother's resolution, so that code can stay like it was.
Let those promises resolve with what you currently store in objects: the data you want to have inserted. Then concatenate all those arrays together to one big array, and feed that to one database bulkInsert() call.
Here is how that could look:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id,
authorization, pageNumber);
if (!urlObject.url) { // throw error }
const data = await rp.get({
gzip: true,
headers: { "Accept-Encoding": "gzip,deflate" },
json: true,
uri: `${urlObject.url}`,
});
// Return here, don't access the database yet...
return data.objects.filter((object: any) => object.type === "observed-data"
&& object.created);
});
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e);
return []; // in case of error, still return an array
}).flat(); // flatten it, so all data chunks are concatenated in one long array
// Don't create a new Promise with `new`, only to wrap an other promise.
// It is an antipattern. Use the promise returned by `bulkInsert`
return this._resultsDatastore.bulkInsert(databaseName, objects);
This uses .flat() which is rather new. In case you have no support for it, look at the alternatives provided on mdn.
First, you asked a question about a failing solution attempt. That is called X/Y problem.
So in fact, as I understand your question, you want to delay some DB request.
You don't want to delay the resolving of a Promise created by a DB request... Like No! Don't try that! The promise wil resolve when the DB will return a result. It's a bad idea to interfere with that process.
I banged my head a while with the library you tried... But I could not do anything to solve your issue with it. So I came with the idea of just looping the data and setting some timeouts.
I made a runnable demo here: Delaying DB request in small batch
Here is the code. Notice that I simulated some data and a DB request. You will have to adapt it. You also will have to adjust the timeout delay. A full second certainly is too long.
// That part is to simulate some data you would like to save.
// Let's make it a random amount for fun.
let howMuch = Math.ceil(Math.random()*20)
// A fake data array...
let someData = []
for(let i=0; i<howMuch; i++){
someData.push("Data #"+i)
}
console.log("Some feak data")
console.log(someData)
console.log("")
// So we have some data that look real. (lol)
// We want to save it by small group
// And that is to simulate your DB request.
let saveToDB = (data, dataIterator) => {
console.log("Requesting DB...")
return new Promise(function(resolve, reject) {
resolve("Request #"+dataIterator+" complete.");
})
}
// Ok, we have everything. Let's proceed!
let batchSize = 3 // The amount of request to do at once.
let delay = 1000 // The delay between each batch.
// Loop through all the data you have.
for(let i=0;i<someData.length;i++){
if(i%batchSize == 0){
console.log("Splitting in batch...")
// Process a batch on one timeout.
let timeout = setTimeout(() => {
// An empty line to clarify the console.
console.log("")
// Grouping the request by the "batchSize" or less if we're almost done.
for(let j=0;j<batchSize;j++){
// If there still is data to process.
if(i+j < someData.length){
// Your real database request goes here.
saveToDB(someData[i+j], i+j).then(result=>{
console.log(result)
// Do something with the result.
// ...
})
} // END if there is still data.
} // END sending requests for that batch.
},delay*i) // Timeout delay.
} // END splitting in batch.
} // END for each data.

Firestore trigger timeouts occasionally

I have a Cloud Firestore trigger that takes care of adjusting the balance of a user's wallet in my app.
exports.onCreateTransaction = functions.firestore
.document('accounts/{accountId}/transactions/{transactionId}')
.onCreate(async (snap, context) => {
const { accountId, transactionId } = context.params;
const transaction = snap.data();
// See the implementation of alreadyTriggered in the next code block
const alreadyTriggered = await firestoreHelpers.triggers.alreadyTriggered(context);
if (alreadyTriggered) {
return null;
}
if (transaction.status === 'confirmed') {
const accountRef = firestore
.collection('accounts')
.doc(accountId);
const account = (await accountRef.get()).data();
const balance = transaction.type === 'deposit' ?
account.balance + transaction.amount :
account.balance - transaction.amount;
await accountRef.update({ balance });
}
return snap.ref.update({ id: transactionId });
});
As a trigger may actually be called more than once, I added this alreadyTriggered helper function:
const alreadyTriggered = (event) => {
return firestore.runTransaction(async transaction => {
const { eventId } = event;
const metaEventRef = firestore.doc(`metaEvents/${eventId}`);
const metaEvent = await transaction.get(metaEventRef);
if (metaEvent.exists) {
console.error(`Already triggered function for event: ${eventId}`);
return true;
} else {
await transaction.set(metaEventRef, event);
return false;
}
})
};
Most of the time everything works as expected. However, today I got a timeout error which caused data inconsistency in the database.
Function execution took 60005 ms, finished with status: 'timeout'
What was the reason behind this timeout? And how do I make sure that it never happens again, so that my transaction amounts are successfully reflected in the account balance?
That statement about more-than-once execution was a beta limitation, as stated. Cloud Functions is out of beta now. The current guarantee is at-least-once execution by default. you only get multiple possible events if you enable retries in the Cloud console. This is something you should do if you want to make sure your events are processed reliably.
The reason for the timeout may never be certain. There could be any number of reasons. Perhaps there was a hiccup in the network, or a brief amount of downtime somewhere in the system. Retries are supposed to help you recover from these temporary situations by delivering the event potentially many times, so your function can succeed.

Resources