Preventing database-related race conditions in Node.js - node.js

Overview
I am attempting to understand how to ensure aynchronous safety when using an instance of a model when using Node.js. Here, I use the Mongoose ODM in code samples, but the question applies to any case where a database is used with the asynchronous event-driven I/O approach that Node.js employs.
Consider the following code (which uses Mongoose for MongoDB queries):
Snippet A
MyModel.findOne( { _id : <id #1> }, function( err, doc ) {
MyOtherModel.findOne( { _id : someOtherId }, ( function(err, otherDoc ) {
if (doc.field1 === otherDoc.otherField) {
doc.field2 = 0; // assign some new value to a field on the model
}
doc.save( function() { console.log( 'success' ); }
});
});
In a separate part of the application, the document described by MyModel could be updated. Consider the following code:
Snippet B
MyModel.update( { _id : <id #1> }, { $set : { field1 : someValue }, callback );
In Snippet A, a MongoDB query is issued with a registered callback to be fired once the document is ready. An instance of the document described by MyModel is retained in memory (in the "doc" object). The following sequence could occur:
Snippet A executes
A query is initiated for MyModel, registering a callback (callback A) for later use
<< The Node event loop runs >>
MyModel is retrieved from the database, executing the registered callback (callback A)
A query is initiated for MyOtherModel, registering a callback for later use (callback B)
<< The Node event loop runs >>
Snippet B executes
The document (id #1) is updated
<< The Node event loop runs >>
MyOtherModel is retrieved from the database, executing the registered callback (callback B)
The stale version of the document (id #1) is incorrectly used in a comparison.
Questions
Are there any guarantees that this type of race condition won't happen in Node.js/MongoDB?
What can I do to deterministically prevent this scenario from happening?
While Node runs code in a single-threaded manner, it seems to me that any allowance of the event loop to run opens the door for potentially stale data. Please correct me if this observance is wrong.

No, there are no guarantees that this type of race condition won't occur in node.js/MongoDB. It doesn't have anything to do with node.js though, and this is possible with any database that supports concurrent access, not just MongoDB.
The problem is, however, trickier to solve with MongoDB because it doesn't support transactions like your typical SQL database would. So you have to solve it in your application layer using a strategy like the one outlined in the MongoDB cookbook here.

Related

Synchronize multiple requests to database in NestJS

in our NestJS application we are using TypeORM as ORM to work with db tables and typeorm-transactional-cls-hooked library.
now we have problem with synchronization of requests which are read and modifying database at same time.
Sample:
#Transactional()
async doMagicAndIncreaseCount (id) {
const await { currentCount } = this.fooRepository.findOne(id)
// do some stuff where I receive new count which I need add to current, for instance 10
const newCount = currentCount + 10
this.fooRepository.update(id, { currentCount: newCount })
}
When we executed this operation from the frontend multiple times at the same time, the final count is wrong. The first transaction read currentCount and then start computation, during computation started the second transaction, which read currentCount as well, and first transaction finish computation and save new currentCount, and then also second transaction finish and rewrite result of first transaction.
Our goal is to execute this operation on foo table only once at the time, and other requests should wait until.
I tried set SERIALIZABLE isolation level like this:
#Transactional({ isolationLevel: IsolationLevel.SERIALIZABLE })
which ensure that only one request is executed at time, but other requests failed with error. Can you please give me some advice how to solve that?
I never used TypeORM and moreover you are hiding the DB engine you are using.
Anyway to achieve this target you need write locks.
The doMagicAndIncreaseCount pseudocode should be something like
BEGIN TRANSACTION
ACQUIRE WRITE LOCK ON id
READ id RECORD
do computation
SAVE RECORD
CLOSE TRANSACTION
Alternatively you have to use some operation which is natively atomic on the DB engine; ex. the INCR operation on Redis.
Edit:
Reading on TypeORM find documentation, I can suggest something like:
this.fooRepository.findOne({
where: { id },
lock: { mode: "pessimistic_write", version: 1 },
})
P.S. Looking at the tags of the question I would guess the used DB engine is PostgreSQL.

what does the function incrementTransactionNumber() do in mongodb node driver?

I know the function's name seems to be self explanatory, however, after researching for quite a while I can't find a transaction number anywhere within a clientSession.
Is it an internal number ? is it possible to get it ?
Transaction numbers are used by mongodb to keep track of operations(read/writes) per transaction per session. Sessions can be started either explicitly by calling startSession() or implicity whenever you create a mongodb connection to db server.
How incrementTransactionNumber() works with sessions (explicit)
When you start a session, by calling client.startSession() method, it will create a new ClientSession. This takes in already created server session pool as one of its' constructor parameters. (See) These server sessions have a property called txnNumber which is initialized to be 0.(Init) So whenever you start a transaction by calling startTransaction(), client session object calls incrementTransactionNumber() internally to increment the txnNumber in server session. And all the successive operations will use the same txnNumber, until you call, commitTransaction() or abortTransaction() methods. Reason that you can't find it anywhere within clientSession is, it is a property of serverSession not clientSession.
ServerSession
class ServerSession {
constructor() {
this.id = { id: new Binary(uuidV4(), Binary.SUBTYPE_UUID) };
this.lastUse = now();
this.txnNumber = 0;
this.isDirty = false;
}
So whenever you try to send a command to database (read/write), this txnNumber would be sent along with it. (Assign transaction number to command)
This is to keep track of database operations that belong to a given transaction per session. (A transaction operation history that uniquely identify each transaction per session.)
How incrementTransactionNumber() works with sessions (implicit)
In this case it would be called every time a new command is issued to the database if that command does not belong to a transaction and it's a write operation where retryWrites are enabled. So each new write operation would have new transaction number as long as it does not belong to a explicitly started transaction with startTransaction(). But in this case as well a txnNumber would be sent along with each command.
execute_operation.
const willRetryWrite =
topology.s.options.retryWrites === true &&
session &&
!inTransaction &&
supportsRetryableWrites(server) &&
operation.canRetryWrite;
if (
operation.hasAspect(Aspect.RETRYABLE) &&
((operation.hasAspect(Aspect.READ_OPERATION) && willRetryRead) ||
(operation.hasAspect(Aspect.WRITE_OPERATION) && willRetryWrite))
) {
if (operation.hasAspect(Aspect.WRITE_OPERATION) && willRetryWrite) {
operation.options.willRetryWrite = true;
session.incrementTransactionNumber();
}
operation.execute(server, callbackWithRetry);
return;
}
operation.execute(server, callback);
Also read this article as well. And yes if you need you can get the transaction number for any session through txnNumber property, clientSession.serverSession.txnNumber.

Perform non-blocking eval reads in MongoDB

I figured out how to run javascript code on the MongoDB server, from a node.js client:
db.eval("function(x){ return x*10; }", 1, function (err, retval) {
console.log('err: '+err);
console.log('retval: '+retval);
});
And that works fine. But the docs say that db.eval() issues a write lock, so that nothing else can read or write to the database. I do not want that.
It also says that eval has no such limitation, but I do not know where to find it. From the way they're talking about it, it seems as if regular eval is only available in the mongo shell, and so not from the client side.
So: how can I run these stored procedures on the mongodb server without blocking everything?
you can pass an object with the field nolock set to true as an optional 3rd parameter to eval:
db.eval('function (x) {return x*10; }', [1], {nolock:true}, function(err, retval) {
console.log('err: '+err);
console.log('retval: '+retval);
});
Note that this prevents eval from setting an obligatory write-lock, but it doesn't prevent any operations inside your function from creating write-locks on their own.
Source: the documentation.
Note that the term "stored procedure" is wrong in this case. A stored procedure refers to code which is stored on the database itself and not delivered by the application layer. MongoDB can also do this utilizing the special collection db.system.js, but doing this is discouraged: http://docs.mongodb.org/manual/applications/server-side-javascript/#storing-functions-server-side
By the way: MongoDB wasn't designed for stored procedures. It is usually recommended to implement any advanced logic on the application layer. The practice to implement even trivial operations as stored procedures, like it is sometimes done on SQL databases, is discouraged.
This is this the way to store your functions on the Server Side and you call use it as shown below:
db.system.js.save( { _id : "myAddFunction" , value : function (x,y)
{ return x +y;} } );
db.system.js.find()
{ "_id" : "myAddFunction", "value" : function (x,y){ return x + y; } }
db.eval( "myAddFunction( 1 ,2)" )
3

How to populate mongoose with a large data set

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.
My latest attempt looks like this:
var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;
var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');
exports.main = function(callback){
var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
var i = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
if(err){
console.log(err);
} else {
i++;
}
});
if(i === catalogArray.length -1) return callback('database populated');
}
});
};
I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.
But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).
So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?
Rob
i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:
var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
saved++;
if(err){
console.log(err);
} else {
if(saved === catalogArray.length) {
return callback('database populated');
}
}
});
i++;
}
});
UPDATE
If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:
Catalog.remove({}, function(err){
async.forEachLimit(catalogArray, 1, function (catalog, cb) {
new Catalog(JSON.parse(catalog)).save(function (err, doc) {
if (err) {
console.log(err);
}
cb(err);
});
}, function (err) {
callback('database populated');
});
}
Rob,
The short answer:
You created an infinite loop. You're thinking synchronously and with blocking, Javascript functions asynchronously and without blocking. What you are trying to do is like trying to directly turn the feeling of hunger into a sandwich. You can't. The closest thing is you use the feeling of hunger to motivate you to go to the kitchen and make it. Don't try to make Javascript block. It won't work. Now, learn async.forEachLimit. It will work for what you want to do here.
You should probably review asynchronous design patterns and understand what it means on a deeper level. Callbacks are not simply an alternative to return values. They are fundamentally different in how and when they are executed. Here is a good primer: http://cs.brown.edu/courses/csci1680/f12/handouts/async.pdf
The long answer:
There is an underlying problem here, and that is your lack of understanding of what non-blocking IO and asynchronous means. Im not sure if you are breaking into node development, or this is just a one-off project, but if you do plan to continue using node (or any asynchronous language) then it is worth the time to understand the difference between synchronous and asynchronous design patterns, and what motivations there are for them. So, that is why you have a logic error putting the loop invariant increment inside an asynchronous callback which is creating an infinite loop.
In non-computer science, that means that your increment to i will never occur. The reason is because Javascript executes a single block of code to completion before any asynchronous callbacks are called. So in your code, your loop will run over and over, without i ever incrementing. And, in the background, you are storing the same document in mongo over and over. Each iteration of the loop starts sending document with index 0 to mongo, the callback can't fire until your loop ends, and all other code outside the loop runs to completion. So, the callback queues up. But, your loop runs again since i++ is never executed (remember, the callback is queued until your code finishes), inserting record 0 again, queueing another callback to execute AFTER your loop is complete. This goes on and on until your memory is filled with callbacks waiting to inform your infinite loop that document 0 has been inserted millions of times.
In general, there is no way to make Javascript block without doing something really really bad. For example, something paramount to setting your kitchen on fire to fry some eggs for that sandwich I talked about in the "short answer".
My advice is to take advantage of libs like async. https://github.com/caolan/async JohnnyHK mentioned it here, and he was correct for doing so.

Redis: How to check if exists in while loop

I'm using Redis in my application and one thing is not clear for me. I save an object with a random generated string as its key. However I would like to check if that key exists. I am planning to use while loop however I am not sure how would I struct it according to Redis. Since if I would like to check for once, I would do;
redisClient.get("xPQ", function(err,result){
if(result==null)
exists = false
});
But I would like use the while loop as;
while(exists == false)
However I cannot build the code structure in my head. Would the while be inside the function or outside the function?
In general, you shouldn't check for existence of a key on the client side. It leads to race conditions. For example, another thread could insert the key after the first thread checked for its presence.
You should use the commands ending with NX. For example - SETNX and HSETNX. These will insert the key only if doesn't already exist. It is guaranteed to be atomic.
I do not understand why you need to implement active polling to check whether a key exists (there are much better ways to handle this kind of situations), but I will try to answer the question.
You should not use a while loop at all (inside or outside the function). Because of the asynchronous nature of node.js, these loops are better implemented using tail recursion. Here is an example:
var redis = require('redis')
var rc = redis.createClient(6379, 'localhost');
function wait_for_key( key, callback ) {
rc.get( key, function(err,result) {
if ( result == null ) {
console.log( "waiting ..." )
setTimeout( function() {
wait_for_key(key,callback);
}, 100 );
} else {
callback(key,result);
}
});
}
wait_for_key( "xPQ", function(key,value) {
console.log( key+" exists and its value is: "+value )
});
There are multiple ways to simplify these expressions using dedicated libraries (using continuation passing style, or fibers). For instance you may want to check the whilst and until functions of the async.js package.
https://github.com/caolan/async

Resources