Synchronize multiple requests to database in NestJS - node.js

in our NestJS application we are using TypeORM as ORM to work with db tables and typeorm-transactional-cls-hooked library.
now we have problem with synchronization of requests which are read and modifying database at same time.
Sample:
#Transactional()
async doMagicAndIncreaseCount (id) {
const await { currentCount } = this.fooRepository.findOne(id)
// do some stuff where I receive new count which I need add to current, for instance 10
const newCount = currentCount + 10
this.fooRepository.update(id, { currentCount: newCount })
}
When we executed this operation from the frontend multiple times at the same time, the final count is wrong. The first transaction read currentCount and then start computation, during computation started the second transaction, which read currentCount as well, and first transaction finish computation and save new currentCount, and then also second transaction finish and rewrite result of first transaction.
Our goal is to execute this operation on foo table only once at the time, and other requests should wait until.
I tried set SERIALIZABLE isolation level like this:
#Transactional({ isolationLevel: IsolationLevel.SERIALIZABLE })
which ensure that only one request is executed at time, but other requests failed with error. Can you please give me some advice how to solve that?

I never used TypeORM and moreover you are hiding the DB engine you are using.
Anyway to achieve this target you need write locks.
The doMagicAndIncreaseCount pseudocode should be something like
BEGIN TRANSACTION
ACQUIRE WRITE LOCK ON id
READ id RECORD
do computation
SAVE RECORD
CLOSE TRANSACTION
Alternatively you have to use some operation which is natively atomic on the DB engine; ex. the INCR operation on Redis.
Edit:
Reading on TypeORM find documentation, I can suggest something like:
this.fooRepository.findOne({
where: { id },
lock: { mode: "pessimistic_write", version: 1 },
})
P.S. Looking at the tags of the question I would guess the used DB engine is PostgreSQL.

Related

Nodejs race condition for database insert and update

I am using Nodejs with SQL Server database. I am using this node-mssql package for writing queries from Nodejs.
I have a route which has a if condition and query structure as below:
let checkPartExists = await pool.query(`Select * from Parts WHERE partID = ${partID}`);
if (checkPartExists.recordset.length == 0){
await pool.query(`INSERT INTO Parts(PartID, Quantity) VALUES(${partID}, ${quantity})`)
}
else{
await pool.query(`UPDATE Parts SET Quantity = ${quantity} WHERE PartID = ${partID}`)
}
Now, if the single threaded Nodejs didn't have any event loop, I could safely assume that this would always work. But I know that that is not the case. I just had an instance where the same partID has been inserted twice.
My understanding is that:
User 1 makes a post request to that route
It executes the select query, finds that this partID does not exist in the parts table and reaches the insert portion
However, before it finishes with the insert User 2 (or maybe the same user) makes a post request and the select query is executed which also thinks that a part with that partID does not exist.
This will insert the same partID twice. Is this called a race condition? How do I prevent such situation?
I know I can make PartID a unique key in the database and throw an error when this happens, but I feel like there has to be a way of handling this through code as well.
Please let me know how you guys/girls are handling such situations.
This is a job for a Transaction. Something like this.
const transaction = new sql.Transaction()
await transaction.begin()
const request = new sql.Reqest(transaction)
let checkPartExists = await request.query(`
Select * from Parts WHERE partID = ${partID}`);
if (checkPartExists.recordset.length == 0){
await request.query(`INSERT INTO Parts(PartID, Quantity) VALUES(${partID}, ${quantity})`)
} else {
await request.query(`UPDATE Parts SET Quantity = ${quantity} WHERE PartID = ${partID}`)
}
await transaction.commit()
This serializes access to the table. Transactions are the standard way of avoiding race conditions.
If only one cluster of node.js application is running, then async-mutex can be the solution. If you are running multiple clusters then distributed deadlocking could be a solution.
Mutex is a design pattern so the resource can not be shared among instances.

Non blocking code when dealing with Nodejs using database

A lot of people when talking about Nodejs talk about the fact that we can write a non blocking code. I have made a website which makes a lot of database queries but there is very rarely an instance where I do not want to await a query before moving forward. For example :
let studentID = await pool.quer(`SELECT * FROM Students Where StudentNumber = 'asdas'`)
await (`INSERT INTO StudentHistry(StudentID, Time) Values(${studentID}, NOW())
await (`Upadate Table3 set col2= 'somehting' where StudentID = ${studentID}`)
So most of the situations I have to wait for a result of the previous query to proceed . Is there a way to take advantage of non blocking way of Nodejs?
Also, what about situations where I have to wait for a single query but it takes 2-3 seconds to complete? For example I have an end point like this:
router.post("/getInfo",security.isLoggedIn,async (req,res,next)=>{
try{
let pool = await connection;
let info= await pool.query(`Select * FROM LargeQuery`) //Takes time to complete
info= info.recordset
res.send(info)
    }
catch(err){
console.trace(err.lineNumber)
console.log(err)
next(err)
 }
})
If I do not wait for the query the execution is going to jump to res.send(info) and send undefined. But at the same time I do not want the entire nodejs application to hang on this line.

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.
I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.
yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**
First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.
This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM

I want to query all the documents in a collection in a reactive way. The collection.find() method of the mongodb nodejs driver returns a cursor that fires events for each document found in the collection. So I made this:
function giant_query = (db) => {
var req = db.collection('mycollection').find({});
return Rx.Observable.merge(Rx.Observable.fromEvent(req, 'data'),
Rx.Observable.fromEvent(req, 'end'),
Rx.Observable.fromEvent(req, 'close'),
Rx.Observable.fromEvent(req, 'readable'));
}
It will do what I want: fire for each document, so I can treat then in a reactive way, like this:
Rx.Observable.of('').flatMap(giant_query).do(some_function).subscribe()
I could query the documents in packets of tens, but then I'd have to keep track of an index number for each time the observable stream is fired, and I'd have to make an observable loop which I do not know if it's possible or the right way to do it.
The problem with this cursor is that I don't think it does things in packets. It'll probably fire all the events in a short period of time, therefore flooding my RAM. Even if I buffer some events in packets using Observable's buffer, the events and events data (the documents) are going to be waiting on RAM to be manipulated.
What's the best way to deal with it n a reactive way?
I'm not an expert on mongodb, but based on the examples I've seen, this is a pattern I would try.
I've omitted the events other than data, since throttling that one seems to be the main concern.
var cursor = db.collection('mycollection').find({});
const cursorNext = new Rx.BehaviourSubject('next'); // signal first batch then wait
const nextBatch = () => {
if(cursor.hasNext()) {
cursorNext.next('next');
}
});
cursorNext
.switchMap(() => // wait for cursorNext to signal
Rx.Observable.fromPromise(cursor.next()) // get a single doc
.repeat() // get another
.takeWhile(() => cursor.hasNext() ) // stop taking if out of data
.take(batchSize) // until full batch
.toArray() // combine into a single emit
)
.map(docsBatch => {
// do something with the batch
// return docsBatch or modified doscBatch
})
... // other operators?
.subscribe(x => {
...
nextBatch();
});
I'm trying to put together a test of this Rx flow without mongodb, in the meantime this might give you some ideas.
You also might wanna check my solution without using of rxJS:
Mongoose Cursor: http bulk request from collection

Preventing database-related race conditions in Node.js

Overview
I am attempting to understand how to ensure aynchronous safety when using an instance of a model when using Node.js. Here, I use the Mongoose ODM in code samples, but the question applies to any case where a database is used with the asynchronous event-driven I/O approach that Node.js employs.
Consider the following code (which uses Mongoose for MongoDB queries):
Snippet A
MyModel.findOne( { _id : <id #1> }, function( err, doc ) {
MyOtherModel.findOne( { _id : someOtherId }, ( function(err, otherDoc ) {
if (doc.field1 === otherDoc.otherField) {
doc.field2 = 0; // assign some new value to a field on the model
}
doc.save( function() { console.log( 'success' ); }
});
});
In a separate part of the application, the document described by MyModel could be updated. Consider the following code:
Snippet B
MyModel.update( { _id : <id #1> }, { $set : { field1 : someValue }, callback );
In Snippet A, a MongoDB query is issued with a registered callback to be fired once the document is ready. An instance of the document described by MyModel is retained in memory (in the "doc" object). The following sequence could occur:
Snippet A executes
A query is initiated for MyModel, registering a callback (callback A) for later use
<< The Node event loop runs >>
MyModel is retrieved from the database, executing the registered callback (callback A)
A query is initiated for MyOtherModel, registering a callback for later use (callback B)
<< The Node event loop runs >>
Snippet B executes
The document (id #1) is updated
<< The Node event loop runs >>
MyOtherModel is retrieved from the database, executing the registered callback (callback B)
The stale version of the document (id #1) is incorrectly used in a comparison.
Questions
Are there any guarantees that this type of race condition won't happen in Node.js/MongoDB?
What can I do to deterministically prevent this scenario from happening?
While Node runs code in a single-threaded manner, it seems to me that any allowance of the event loop to run opens the door for potentially stale data. Please correct me if this observance is wrong.
No, there are no guarantees that this type of race condition won't occur in node.js/MongoDB. It doesn't have anything to do with node.js though, and this is possible with any database that supports concurrent access, not just MongoDB.
The problem is, however, trickier to solve with MongoDB because it doesn't support transactions like your typical SQL database would. So you have to solve it in your application layer using a strategy like the one outlined in the MongoDB cookbook here.

Resources