node.js sqlite transaction isolation - node.js

How do I ensure that SQLite doesn't interleave queries from multiple concurrent node.js/Express HTTP requests into a single transaction?
I would want DB queries from different requests to be executed in separate transactions, isolated from each other, allowing each to be committed or rolled back independently of each other. Given node.js's single-thread (and single-DB-connection?) characteristics this seems particularly problematic.
I've been trawling through dozens of web pages and docs and haven't found any clear (to me) explanation of how (or whether) this scenario is handled. For instance the description of db.serialize() only states that one query will be executed at a time - it says nothing about the separation of queries belonging to different transactions.
Any pointers appreciated!

I faced the exact same issue. Here is how I solved it:
Step 1: Dnsure all database access is done via a set of library functions, where the database is passed in as the first input parameter.
Step 2: When you create the database object (using new sqlite3.Database('path')), wrap the object with some extra meta-information:
Here is how I wrapped mine:
var wrap = { db_obj : new sqlite3.Database('path'), locked : "UNLOCKED", work_queue : [] }
Step 3: Create a function that will be used to run db functions. If the database is locked, this function will defer their execution until later. I called mine "runAsync":
function runAsync( db, fn, callBack) {
if (db.locked === "LOCKED" ) {
db.work_queue.push( function() { runAsync( db, fn, callBack ); });
return;
}
fn(db,callBack);
}
We can see that this function checks the state of the wrapped db object. If it is locked, we defer execution by storing the function in the "work queue" to be executed later (by doing fn()).
Step 4: Create a function that will be used to run db functions exclusively. If the database is locked, this function will defer execution until later, else it will lock the database, run its function and then unlock the database. I called mine: runAsyncExclusive():
function runAsyncExclusive( db, fn, callBack ) {
if(db.locked === "LOCKED") {
db.work_queue.push( function() { runAsyncExclusive( db, fn, callBack) });
return;
}
var exclusiveDb = {
db_obj : db.db_obj,
locked : "UNLOCKED",
work_queue : []
};
db.locked = "LOCKED";
fn(exclusiveDb, function(err,res) {
db.locked = "UNLOCKED";
var workItems = db.work_queue;
_.each(workItems, function(fn) { fn(); });
db.work_queue = [];
callBack(err,res);
})
}
The exclusiveDb object that is passed into the function will allow exclusive access to the db. This object can itself be locked, allowing for arbitrarily deep nested locking.
Step 5: Adjust your library functions such that they call asyncRun() and asyncRunExclusive() where appropriate:
function get(db,sql,params,callBack) {
db.get(sql,params,callBack);
}
Becomes...
function get(db,sql,params,callBack) {
runAsync( db, function(db,cb) { db.get(sql,params,cb); }, callBack );
}
Voila!
(Apologies if this wasn't clear enough)

Related

Avoid triggering Firebase functions by real-time database on special cases

Sometimes we use the firebase functions triggered by real-time database (onCreate/onDelete/onUpdate ...) to do some logic (like counting, etc).
My question, would it be possible to avoid this trigger in some cases. Mainly, when I would like to allow a user to import a huge JSON to firebase?
Example:
a function E triggered on the creation of a new child in /examples. Normally, users add examples one by one to /examples and function E runs to do some logic. However, I would like to allow a user (from the front-end) to import 2000 children to /examples and the logic which is done by function E is possible at import time without the need for E. Then, I do not need E to be triggered for such a case where a high number of functions could be executed. (Note: I am aware of the 1000 limit)
Update:
based on the accepted answer, submitted my answer down.
As far as I know, there is no way to disable a Cloud Function programmatically without just deleting it. However this introduces an edge case where data is added to the database while the import is taking place.
A compromise would be to signal that the data you are uploading should be post-processed. Let's say you were uploading to /examples/{pushId}, instead of attaching the database trigger to /examples/{pushId}, attach it to /examples/{pushId}/needsProcessing (or something similar). Unfortunately this has the trade-off of not being able to make use of change objects for onUpdate() and onWrite().
const result = await firebase.database.ref('/examples').push({
title: "Example 1A",
desc: "This is an example",
attachments: { /* ... */ },
class: "-MTjzAKMcJzhhtxwUbFw",
author: "johndoe1970",
needsProcessing: true
});
async function handleExampleProcessing(snapshot, context) {
// do post processing if needsProcessing is truthy
if (!snapshot.exists() || !snapshot.val()) {
console.log('No processing needed, exiting.');
return;
}
const exampleRef = admin.database().ref(change.ref.parent); // /examples/{pushId}, as admin
const data = await exampleRef.once('value');
// do something with data, like mutate it
// commit changes
return exampleRef.update({
...data,
needsProcessing: null /* delete needsProcessing value */
});
}
const functionsExampleProcessingRef = functions.database.ref("examples/{pushId}/needsProcessing");
export const handleExampleNeedingProcessingOnCreate = functionsExampleProcessingRef.onCreate(handleExampleProcessing);
// this is only needed if you ever intend on writing `needsProcessing = /* some falsy value */`, I recommend just creating and deleting it, then you can use just the above trigger.
export const handleExampleNeedingProcessingOnUpdate = functionsExampleProcessingRef.onUpdate((change, context) => handleExampleProcessing(change.after, context));
An alternative to Sam's approach is to use feature flags to determine if a Cloud Function performs its main function. I often have this in my code:
exports.onUpload = functions.database
.ref("/uploads/{uploadId}")
.onWrite((event) => {
return ifEnabled("transcribe").then(() => {
console.log("transcription is enabled: calling Cloud Speech");
...
})
});
The ifEnabled is a simple helper function that checks (also in Realtime Database) if the feature is enabled:
function ifEnabled(feature) {
console.log("Checking if feature '"+feature+"' is enabled");
return new Promise((resolve, reject) => {
admin.database().ref("/config/features")
.child(feature)
.once('value')
.then(snapshot => {
if (snapshot.val()) {
resolve(snapshot.val());
}
else {
reject("No value or 'falsy' value found");
}
});
});
}
Most of my usage of this is during talks at conferences, to enable the Cloud Functions at the right time (as a deploy takes a bit longer than we'd like for a demo). But the same approach should work to temporarily disable features during for example data import.
Okay, another solution would be
A: Add a new table in firebase like /triggers-queue where all CRUD that should fire a background function are added. In this table, we add a key for each table that should have triggers - in our example /examples table. Any key that represents a table should also have /created, /updated, and /deleted keys as follows.
/examples
.../example-id-1
/triggers-queue
.../examples
....../created
........./example-id
....../updated
........./example-id
............old-value
....../deleted
........./example-id
............old-value
Note that the old-value should be added from app (front-end, etc).
We set triggers always onCreate on
/triggers-queue/examples/created/{exampleID} (simulate onCreate)
/triggers-queue/examples/updated/{exampleID} (simulate onUpdate)
/triggers-queue/examples/deleted/{exampleID} (simulate onDelete)
The fired function can know all the necessary info to handle the logic as follows:
Operation type: from the path (either: created, updated, or deleted)
key of the object: from the path
current data: by reading the corresponding table (i.e., /examples/id)
old data: from the triggers table
Good Points:
You can import a huge data to /examples table without firing any function as we do not add to the /triggers-queue
you can fanout functions to pass the limit 1000/sec. That is by setting triggers on (as an example to fanout on-create)
/triggers-queue/examples/created0/{exampleID} and
/triggers-queue/examples/created1/{exampleID}
bad-points:
more difficult to implement
need to write more data to firebase (like old-data) from the app.
B- Another way (although not an answer for this) is to move the login in the background function to an HTTP function and call it on every crud ops.

How do I make a large but unknown number of REST http calls in nodejs?

I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.

Node.js + SQLite async transactions

I am using node-sqlite3, but I am sure this problem appears in another database libraries too. I have discovered a bug in my code with mixing transactions and async code.
function insertData(arrayWithData, callback) {
// start a transaction
db.run("BEGIN", function() {
// do multiple inserts
slide.asyncMap(
arrayWithData,
function(cb) {
db.run("INSERT ...", cb);
},
function() {
// all done
db.run("COMMIT");
}
);
});
}
// some other insert
setInterval(
function() { db.run("INSERT ...", cb); },
100
);
You can also run the full example.
The problem is that some other code with insert or update query can be launched during the async pause after begin or insert. Then this extra query is run in the transaction. This is not a problem when the transaction is committed. But if the transaction is rolled back the change made by this extra query is also rolled back. Hoops we've just unpredictably lost data without any error message.
I thought about this issue and I think that one solution is to create a wrapper class that will make sure that:
Only one transaction is running at the same time.
When transaction is running only queries which belong to the transaction are executed.
All the extra queries are queued and executed after the current transaction is finished.
All attempts to start a transaction when one is already running will also get queued.
But it sounds like too complicated solution. Is there a better approach? How do you deal with this problem?
At first, I would like to state that I have no experience with SQLite. My answer is based on quick study of node-sqlite3.
The biggest problem with your code IMHO is that you try to write to DB from different locations. As I understand SQLite, you have no control of different parallel "connections" as you have in PostgreSQL, so you probably need to wrap all your communication with DB. I modified your example to use always insertData wrapper. Here is the modified function:
function insertData(callback, cmds) {
// start a transaction
db.serialize(function() {
db.run("BEGIN;");
//console.log('insertData -> begin');
// do multiple inserts
cmds.forEach(function(item) {
db.run("INSERT INTO data (t) VALUES (?)", item, function(e) {
if (e) {
console.log('error');
// rollback here
} else {
//console.log(item);
}
});
});
// all done
//here should be commit
//console.log('insertData -> commit');
db.run("ROLLBACK;", function(e) {
return callback();
});
});
}
Function is called with this code:
init(function() {
// insert with transaction
function doTransactionInsert(e) {
if (e) return console.log(e);
setTimeout(insertData, 10, doTransactionInsert, ['all', 'your', 'base', 'are', 'belong', 'to', 'us']);
}
doTransactionInsert();
// Insert increasing integers 0, 1, 2, ...
var i=0;
function doIntegerInsert() {
//console.log('integer insert');
insertData(function(e) {
if (e) return console.log(e);
setTimeout(doIntegerInsert, 9);
}, [i++]);
}
...
I made following changes:
added cmds parameter, for simplicity I added it as last parameter but callback should be last (cmds is an array of inserted values, in final implementation it should be an array of SQL commands)
changed db.exec to db.run (should be quicker)
added db.serialize to serialize requests inside transaction
ommited callback for BEGIN command
leave out slide and some underscore
Your test implementation now works fine for me.
I have end up doing full wrapper around sqlite3 to implement locking the database in a transaction. When DB is locked all queries are queued and executed after the current transaction is over.
https://github.com/Strix-CZ/sqlite3-transactions
IMHO there are some problems with the ivoszz's answer:
Since all db.run are async you cannot check the result of the whole transaction and if one run has error result you should rollback all commands. For do this you should call db.run("ROLLBACK") in the callback in the forEach loop. The db.serialize function will not serialize async run and so a "cannot start transaction within transaction occurs".
The "COMMIT/ROLLBACK" after the forEach loop has to check the result of all statements and you cannot run it before all the previous run finished.
IMHO there are only one way to make a safe-thread (obv referred to the background thread pool) transaction management: create a wrapper function and use the async library in order to serialize manually all statements. In this way you can avoid db.serialize function and (more important) you can check all single db.run result in order to rollback the whole transaction (and return the promise if needed).
The main problem of the node-sqlite3 library related to transaction is that there aren't a callback in the serialize function in order to check if one error occurs

Synchronize node.js object

I am using a variable and that is used by many functions at a time. I need to synchronize it. How do I do it?
var x = 0;
var a = function(){
x=x+1;
}
var b = function(){
x=x+2;
}
var c = function(){
var t = x;
return t;
}
This is the simplified logic of my code. To give more insight, X is as good as my mongoDB object which needs to be used by only one function at a time. Also 3 functions are like REST api calls so there is probability they will be called at same time.
I need to write getX function which should manage locking and unlocking.
Any suggestions?
Node is single threaded so there is no chance of the the 3 functions to be executed at the same time. Syncronization and race conditions only apply in multithreaded environments. There is a case though, if the first function blocks for i/o.
You are asking about keeping a single object synchronized as several
asynchronous operations modify that object. This is a bit vague (do you need to execute them in order? do they change the same properties?) Its hard to make a catch all solution, so I suggest that you determine what order, if any, the operations must take place in, and use the async library to handle
the control flow.
The async.waterfall method (example below) is useful if you want to pass
results down a chain of functions that execute in order. There are many other
useful functions included in the library, like async.eachSeries (execute a function once per array item in order) and
async.parallel (execute an array of functions simultaneously.) All docs available at https://github.com/caolan/async
var async = require('async');
function calculateX(callback){
async.waterfall(
[
function(done){
var x = 0;
asyncCall1(x, function(x1){ // add x1=x+1;
done(null, x1);
});
},
function(x1, done){
asyncCall2(x1, function(x2){ // add x2=x1+2;
done(null, x2);
});
},
],
function(err, x2){
var t = x2;
callback(t);
});
};
calculateX(function(x2){
mongo.save(x2, function(err){ // or something idk mongo
if(err){ console.log(err) };
});
});

What's the right way to find out when a series of callbacks (fired from a loop) have all executed?

I'm new to Node.js and am curious what the prescribed methodology is for running a loop on a process (repeatedly) where at the end of the execution some next step is to take place, but ONLY after all the iterations' callbacks have fired.
Specifically I'm making SQL calls and I need to close the sql connection after making a bunch of inserts and updates, but since they're all asynchronous, I have no way of knowing when all of them have in fact completed, so that I can call end() on the session.
Obviously this is a problem that extends far beyond this particular example, so, I'm not looking for the specific solution regarding sql, but more the general practice, which so far, I'm kind of stumped by.
What I'm doing now is actually setting a global counter to the length of the loop object and decrementing from it in each callback to see when it reaches zero, but that feels REALLY klugy, and I'm hoping theres a more elegant (and Javascript-centric) way to achieve this monitoring.
TIA
There are a bunch of flow-control libraries available that apply patterns to help with this kind of thing. My favorite is async. If you wanted to run a bunch of SQL queries one after another in order, for instance, you might use series:
async.series([
function(cb) { sql.exec("SOME SQL", cb) },
function(cb) { sql.exec("SOME MORE SQL", cb) },
function(cb) { sql.exec("SOME OTHER SQL", cb) }
], function(err, results) {
// Here, one of two things are true:
// (1) one of the async functions passed in an error to its callback
// so async immediately calls this callback with a non-null "err" value
// (2) all of the async code is done, and "results" is
// an array of each of the results passed to the callbacks
});
I wrote my own queue library to do this (I'll publish it one of these days), basically push queries onto a queue (an array basically) execute each one as it's removed, have a callback take place when the array is empty.
It doesn't take much to do it.
*edit. I've added this example code. It isn't what I've used before and I haven't tried it in practice, but it should give you a starting point. There's a lot more you can do with the pattern.
One thing to note. Queueing effectively makes your actions synchronous, they happen one after another. I wrote my mysql queue script so I could execute queries on multiple tables asynchronously but on any one table in synch, so that inserts and selects happened in the order they were requested.
var queue = function() {
this.queue = [];
/**
* Allows you to pass a callback to run, which is executed at the end
* This example uses a pattern where errors are returned from the
* functions added to the queue and then are passed to the callback
* for handling.
*/
this.run = function(callback){
var i = 0;
var errors = [];
while (this.queue.length > 0) {
errors[errors.length] = this.queue[i]();
delete this.queue[i];
i++;
}
callback(errors);
}
this.addToQueue = function(callback){
this.queue[this.queue.length] = callback;
}
}
use:
var q = new queue();
q.addToQueue(function(){
setTimeout(function(){alert('1');}, 100);
});
q.addToQueue(function(){
setTimeout(function(){alert('2');}, 50);
});
q.run();

Resources