Node.js + SQLite async transactions - node.js

I am using node-sqlite3, but I am sure this problem appears in another database libraries too. I have discovered a bug in my code with mixing transactions and async code.
function insertData(arrayWithData, callback) {
// start a transaction
db.run("BEGIN", function() {
// do multiple inserts
slide.asyncMap(
arrayWithData,
function(cb) {
db.run("INSERT ...", cb);
},
function() {
// all done
db.run("COMMIT");
}
);
});
}
// some other insert
setInterval(
function() { db.run("INSERT ...", cb); },
100
);
You can also run the full example.
The problem is that some other code with insert or update query can be launched during the async pause after begin or insert. Then this extra query is run in the transaction. This is not a problem when the transaction is committed. But if the transaction is rolled back the change made by this extra query is also rolled back. Hoops we've just unpredictably lost data without any error message.
I thought about this issue and I think that one solution is to create a wrapper class that will make sure that:
Only one transaction is running at the same time.
When transaction is running only queries which belong to the transaction are executed.
All the extra queries are queued and executed after the current transaction is finished.
All attempts to start a transaction when one is already running will also get queued.
But it sounds like too complicated solution. Is there a better approach? How do you deal with this problem?

At first, I would like to state that I have no experience with SQLite. My answer is based on quick study of node-sqlite3.
The biggest problem with your code IMHO is that you try to write to DB from different locations. As I understand SQLite, you have no control of different parallel "connections" as you have in PostgreSQL, so you probably need to wrap all your communication with DB. I modified your example to use always insertData wrapper. Here is the modified function:
function insertData(callback, cmds) {
// start a transaction
db.serialize(function() {
db.run("BEGIN;");
//console.log('insertData -> begin');
// do multiple inserts
cmds.forEach(function(item) {
db.run("INSERT INTO data (t) VALUES (?)", item, function(e) {
if (e) {
console.log('error');
// rollback here
} else {
//console.log(item);
}
});
});
// all done
//here should be commit
//console.log('insertData -> commit');
db.run("ROLLBACK;", function(e) {
return callback();
});
});
}
Function is called with this code:
init(function() {
// insert with transaction
function doTransactionInsert(e) {
if (e) return console.log(e);
setTimeout(insertData, 10, doTransactionInsert, ['all', 'your', 'base', 'are', 'belong', 'to', 'us']);
}
doTransactionInsert();
// Insert increasing integers 0, 1, 2, ...
var i=0;
function doIntegerInsert() {
//console.log('integer insert');
insertData(function(e) {
if (e) return console.log(e);
setTimeout(doIntegerInsert, 9);
}, [i++]);
}
...
I made following changes:
added cmds parameter, for simplicity I added it as last parameter but callback should be last (cmds is an array of inserted values, in final implementation it should be an array of SQL commands)
changed db.exec to db.run (should be quicker)
added db.serialize to serialize requests inside transaction
ommited callback for BEGIN command
leave out slide and some underscore
Your test implementation now works fine for me.

I have end up doing full wrapper around sqlite3 to implement locking the database in a transaction. When DB is locked all queries are queued and executed after the current transaction is over.
https://github.com/Strix-CZ/sqlite3-transactions

IMHO there are some problems with the ivoszz's answer:
Since all db.run are async you cannot check the result of the whole transaction and if one run has error result you should rollback all commands. For do this you should call db.run("ROLLBACK") in the callback in the forEach loop. The db.serialize function will not serialize async run and so a "cannot start transaction within transaction occurs".
The "COMMIT/ROLLBACK" after the forEach loop has to check the result of all statements and you cannot run it before all the previous run finished.
IMHO there are only one way to make a safe-thread (obv referred to the background thread pool) transaction management: create a wrapper function and use the async library in order to serialize manually all statements. In this way you can avoid db.serialize function and (more important) you can check all single db.run result in order to rollback the whole transaction (and return the promise if needed).
The main problem of the node-sqlite3 library related to transaction is that there aren't a callback in the serialize function in order to check if one error occurs

Related

How to make two db query synchronous, so that if any of them fails then both fails in node.js

async.parallel([
function(callback){
con.Attandance.insert({'xxx':'a'}, function(err,data) {
console.log(data);
callback();
});
}, function(callback) {
console.log(data);
con.Profile.insert({'xxx':'a'},function(err){callback()});
}
], function(err) {
console.log('Both a and b are saved now');
});
Attendance.insert() works either Profile.insert() execute or fails. I want if any of them fails data should not be saved in any collection either in Attendance or in Profile
What you mean are transactions, which have nothing to do with synchronous / asynchronous.
Unfortunately, MongoDB simply does not support transactions. The only way to achieve something even remotely close, you have to perform either a two phase commit, or implement a custom rollback logic to undo all changes to Attandance if the changes to Profile failed.
The only possibility to at least achieve atomic (yet not transactions!) updates, is by changing your model. If the Profile is a container for all Attandance instances, you can update the entire object at one. It's impossible to update more than one object atomically with MongoDB, and neither is it possible to achieve a strict order of transactions.
If you need that, go for an SQL database instead. Pretty much all (except SQlite) support transactions.
I wrote a library that implements the two phase commit system (mentioned in a prior answer) described in the docs. It might help in this scenario. Fawn - Transactions for MongoDB.
var Fawn = require("Fawn");
// intitialize Fawn
Fawn.init("mongodb://127.0.0.1:27017/testDB");
/**
optionally, you could initialize Fawn with mongoose
var mongoose = require("mongoose");
mongoose.connect("mongodb://127.0.0.1:27017/testDB");
Fawn.init(mongoose);
**/
// after initialization, create a task
var task = Fawn.Task();
task.save("Attendance", {xxx: "a"})
.save("Profile", {xxx: "a"})
.run()
.then(function(results){
// task is complete
// result from first operation
var firstUpdateResult = results[0];
// result from second operation
var secondUpdateResult = results[1];
})
.catch(function(err){
// Everything has been rolled back.
// log the error which caused the failure
console.log(err);
});

How do I make a large but unknown number of REST http calls in nodejs?

I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.

node.js sqlite transaction isolation

How do I ensure that SQLite doesn't interleave queries from multiple concurrent node.js/Express HTTP requests into a single transaction?
I would want DB queries from different requests to be executed in separate transactions, isolated from each other, allowing each to be committed or rolled back independently of each other. Given node.js's single-thread (and single-DB-connection?) characteristics this seems particularly problematic.
I've been trawling through dozens of web pages and docs and haven't found any clear (to me) explanation of how (or whether) this scenario is handled. For instance the description of db.serialize() only states that one query will be executed at a time - it says nothing about the separation of queries belonging to different transactions.
Any pointers appreciated!
I faced the exact same issue. Here is how I solved it:
Step 1: Dnsure all database access is done via a set of library functions, where the database is passed in as the first input parameter.
Step 2: When you create the database object (using new sqlite3.Database('path')), wrap the object with some extra meta-information:
Here is how I wrapped mine:
var wrap = { db_obj : new sqlite3.Database('path'), locked : "UNLOCKED", work_queue : [] }
Step 3: Create a function that will be used to run db functions. If the database is locked, this function will defer their execution until later. I called mine "runAsync":
function runAsync( db, fn, callBack) {
if (db.locked === "LOCKED" ) {
db.work_queue.push( function() { runAsync( db, fn, callBack ); });
return;
}
fn(db,callBack);
}
We can see that this function checks the state of the wrapped db object. If it is locked, we defer execution by storing the function in the "work queue" to be executed later (by doing fn()).
Step 4: Create a function that will be used to run db functions exclusively. If the database is locked, this function will defer execution until later, else it will lock the database, run its function and then unlock the database. I called mine: runAsyncExclusive():
function runAsyncExclusive( db, fn, callBack ) {
if(db.locked === "LOCKED") {
db.work_queue.push( function() { runAsyncExclusive( db, fn, callBack) });
return;
}
var exclusiveDb = {
db_obj : db.db_obj,
locked : "UNLOCKED",
work_queue : []
};
db.locked = "LOCKED";
fn(exclusiveDb, function(err,res) {
db.locked = "UNLOCKED";
var workItems = db.work_queue;
_.each(workItems, function(fn) { fn(); });
db.work_queue = [];
callBack(err,res);
})
}
The exclusiveDb object that is passed into the function will allow exclusive access to the db. This object can itself be locked, allowing for arbitrarily deep nested locking.
Step 5: Adjust your library functions such that they call asyncRun() and asyncRunExclusive() where appropriate:
function get(db,sql,params,callBack) {
db.get(sql,params,callBack);
}
Becomes...
function get(db,sql,params,callBack) {
runAsync( db, function(db,cb) { db.get(sql,params,cb); }, callBack );
}
Voila!
(Apologies if this wasn't clear enough)

node.js for loop execution in a synchronous manner

I have to implement a program in node.js which looks like the following code snippet. It has an array though which I have to traverse and match the values with database table entries. I need to wait till the loop ends and send the result back to the calling function:
var arr=[];
arr=[one,two,three,four,five];
for(int j=0;j<arr.length;j++) {
var str="/^"+arr[j]+"/";
// consider collection to be a variable to point to a database table
collection.find({value:str}).toArray(function getResult(err, result) {
//do something incase a mathc is found in the database...
});
}
However, as the str="/^"+arr[j]+"/"; (which is actually a regex to be passed to find function of MongoDB in order to find partial match) executes asynchronously before the find function, I am unable to traverse through the array and get required output.
Also, I am having hard time traversing through array and send the result back to calling function as I do not have any idea when will the loop finish executing.
Try using async each. This will let you iterate over an array and execute asynchronous functions. Async is a great library that has solutions and helpers for many common asynchronous patterns and problems.
https://github.com/caolan/async#each
Something like this:
var arr=[];
arr=[one,two,three,four,five];
asych.each(arr, function (item, callback) {
var str="/^"+item+"/";
// consider collection to be a variable to point to a database table
collection.find({value:str}).toArray(function getResult(err, result) {
if (err) { return callback(err); }
// do something incase a mathc is found in the database...
// whatever logic you want to do on result should go here, then execute callback
// to indicate that this iteration is complete
callback(null);
});
} function (error) {
// At this point, the each loop is done and you can continue processing here
// Be sure to check for errors!
})

Firebase transactions in NodeJS always running 3 times?

Whenever I define a Firebase transaction in NodeJS I notice it always runs three times - the first two times with null data, then finally a third time with actually data. Is this normal/intended?
For example this code:
firebaseOOO.child('ref').transaction(function(data) {
console.log(data);
return data;
});
outputs the following:
null
null
i1: { a1: true }
I would have expected that it only print the last item.
To answer a question in the comments, here is the same with a callback:
firebaseOOO.child('ref').transaction(function(data) {
console.log(data);
return data;
}, function(error, committed, snapshot) {
if (error)
console.log('failed');
else if (!committed)
console.log('aborted');
else
console.log('committed');
console.log('fin');
});
Which yields the following output:
null
null
i1: { a1: true }
committed
fin
I had read the details of how transactions work before posting the question, so I had tried setting applyLocally to false like this:
firebaseOOO.child('ref').transaction(function(data) {
console.log('hit');
return data;
}, function(){}, false);
But it still hits 3 times (just double-checked) so I thought it was something different. Getting the 'value' before transacting does "work" as expected, in that it only hits once, and that's regardless of what applyLocally is set to, so I'm not sure what applyLocally does? This is what I mean by getting the value before transacting:
firebaseOOO.child('ref').once('value', function(data) {
console.log('1');
firebaseOOO.child('ref').transaction(function(data) {
console.log('2');
return data;
});
});
Outputs:
1
2
#Michael: How can one make use of this behavior? Transactions are primarily for having data use itself to modify itself - the prototypical increment++ scenario. So if I need to add 1 to the existing value of 10, and continue working with the result of 11, the first two times the function hits I will have an erroneous result of 1 that I need to handle, and finally the correct result of 11 on the third hit. How can I make use of those two initial 1's? Another scenario (and maybe I shouldn't be using transactions for this, but if it worked like I expected it makes for cleaner code) is to insert a value if it does not yet exist. If transactions only hit once, a null value would mean the value does not exist, and so you could, for example, init the counter to 1 in that case, otherwise add 1 to whatever the value is. With the noisy nulls, this is not possible.
It seems the takeaway from all this is to simply use the 'once' pattern more often than not?
ONCE TRANSACTION PATTERN:
firebaseOOO.child('ref').once('value', function(data) {
console.log('1');
firebaseOOO.child('ref').transaction(function(data) {
console.log('2');
return data;
});
});
The behavior you're seeing here is related to how Firebase fires local events and then eventually synchronizes with the Firebase servers. In this specific example, the "running three times" will only happen the very first time you run the code—after that, the state has been completely synchronized and it'll just trigger once from then on out. This behavior is detailed here: https://www.firebase.com/docs/transactions.html (See the "When a Transaction is run, the following occurs" section.)
If, for example, you have an outstanding on() at the same location and then, at some later time, run this same transaction code, you'll see that it'll just run once. This is because everything is in sync prior to the transaction running (in the ideal case; barring any normal conflicts, etc).
transaction() will be called multiple times and must be able to handle null data. Even if there is existing data in your database it may not be locally cached when the transaction function is run.
firebaseOOO.child('ref').transaction(function(data) {
if(data!=null){
console.log(data);
return data;
}
else {
return data;
}
}, function(error, committed, snapshot) {
if (error)
console.log('failed');
else if (!committed)
console.log('aborted');
else
console.log('committed');
console.log('fin');
});

Resources