best practice for data synchronization in nodejs - node.js

I'm trying to do some integration between two systems, and I've got some problems when it comes to synchronizing data.
I am using nodejs and the database is mongodb or firebase.
The scenario is described as below:
systemA with dbA
systemB with dbB
Do the following:
systemA sends a request (POST or PUT whatever) to systemB, the body is like this:
{
.....
fieldA1: valueA1,
fieldA2: valueA2
.....
}
then systemB needs to update several fields(fieldB1, fieldB2) in dbB according to the systemA data, like this:
fieldB1: valueA1
fieldB2: valueA2
after BOTH fieldB1 and fieldB2 are updated successfully, then execute more logic
I'm using async to control asynchronous process. My code to implement these 3 steps:
async.waterfall([
function (callback) {
//get valueA1 of fieldA1 and valueA2 of fieldA2
},
async.parallel([
function (callback) {
//set valueA1 to fieldB1
},
function (callback) {
//set valueA2 to fieldB2
}
], function (err, result) {
//more logic here
})
], function (err, result) {
//more logic here
})
Since fieldB1 and fieldB2 should be updated at the same time, either situation when failing to update fieldB1 or fieldB2 will lead to data inconsistency, which is not the correct result.
However, async.parallel cannot guarantee that any update failure will rollback or prevent the others' update right ? Is there any way to idealy keep the data consistency of BOTH when updating fieldB1 and fieldB2?

I find it difficult to map your code to the Firebase API. But what you're describing sounds like its achievable by either using transactions or multi-location updates.
I covered these type of updates in-depth in the past in: How to write denormalized data in Firebase

Related

How to make two db query synchronous, so that if any of them fails then both fails in node.js

async.parallel([
function(callback){
con.Attandance.insert({'xxx':'a'}, function(err,data) {
console.log(data);
callback();
});
}, function(callback) {
console.log(data);
con.Profile.insert({'xxx':'a'},function(err){callback()});
}
], function(err) {
console.log('Both a and b are saved now');
});
Attendance.insert() works either Profile.insert() execute or fails. I want if any of them fails data should not be saved in any collection either in Attendance or in Profile
What you mean are transactions, which have nothing to do with synchronous / asynchronous.
Unfortunately, MongoDB simply does not support transactions. The only way to achieve something even remotely close, you have to perform either a two phase commit, or implement a custom rollback logic to undo all changes to Attandance if the changes to Profile failed.
The only possibility to at least achieve atomic (yet not transactions!) updates, is by changing your model. If the Profile is a container for all Attandance instances, you can update the entire object at one. It's impossible to update more than one object atomically with MongoDB, and neither is it possible to achieve a strict order of transactions.
If you need that, go for an SQL database instead. Pretty much all (except SQlite) support transactions.
I wrote a library that implements the two phase commit system (mentioned in a prior answer) described in the docs. It might help in this scenario. Fawn - Transactions for MongoDB.
var Fawn = require("Fawn");
// intitialize Fawn
Fawn.init("mongodb://127.0.0.1:27017/testDB");
/**
optionally, you could initialize Fawn with mongoose
var mongoose = require("mongoose");
mongoose.connect("mongodb://127.0.0.1:27017/testDB");
Fawn.init(mongoose);
**/
// after initialization, create a task
var task = Fawn.Task();
task.save("Attendance", {xxx: "a"})
.save("Profile", {xxx: "a"})
.run()
.then(function(results){
// task is complete
// result from first operation
var firstUpdateResult = results[0];
// result from second operation
var secondUpdateResult = results[1];
})
.catch(function(err){
// Everything has been rolled back.
// log the error which caused the failure
console.log(err);
});

Node Js MongoDB Query Against returned array

I have a mongodb Relationships collection that stores the user_id and the followee_id(person the user is following). If I query for against the user_id I can find all the the individuals the user is following. Next I need to query the Users collection against all of the returned followee ids to get their personal information. This is where I confused. How would I accomplish this?
NOTE: I know I can embed the followees in the individual user's document and use and $in operator but I do not want to go this route. I want to maintain the most flexibility I can.
You can use an $in query without denormalizing the followees on the user. You just need to do a little bit of data manipulation:
Relationship.find({user_id: user_id}, function(error, relationships) {
var followee_ids = relationships.map(function(relationship) {
return relationship.followee_id;
});
User.find({_id: { $in: followee_ids}}, function(error, users) {
// voila
});
};
if i got your problem right(i think so).
you need to query each of the "individuals the user is following".
that means to query the database multiple queries about each one and get the data.
because the queries in node.js (i assume you using mongoose) are asynchronies you need to get your code more asynchronies for this task.
if you not familier with the async module in node.js it's about time to know it.
see npm async for docs.
i made you a sample code for your query and how it needs to be.
/*array of followee_id from the last query*/
function query(followee_id_arr, callback) {
var async = require('async')
var allResults = [];
async.eachSerias(followee_id_arr, function (f_id, callback){
db.userCollection.findOne({_id : f_id},{_id : 1, personalData : 1},function(err, data){
if(err) {/*handel error*/}
else {
allResults.push(data);
callback()
}
}, function(){
callback(null, allResults);
})
})
}
you can even make all the queries in parallel (for better preformance) by using async.map

How can I cancel MongoDB query from .each callback

I implemented a little NodeJs web server that stores log entries and provides a backend for a web based log browser. The web interface provides also an "Export to CVS" function and lets user download the logs in CVS format. My code looks similar to this:
this.log_entries(function(err, collection) {
collection.find(query)
.sort({_id: 1})
.each(function (err, doc) {
if(doc){
WriteLineToCSVFile(doc);
}
else {
ZipCSVFileAndSendIt();
}
});
});
The export operation may take a significant amount of time and disk space in case if a user didn't specify the right filters for the query. I need to implement a fail safe mechanism preventing this. One important requirement is that user should be able to abort the ongoing export operation at any point in time. Currently my solution is that I stop writing the data to the CSV file, however the callback passed to the .each() still gets called. I could not find any information how to stop the each loop. So the question is how can I do this?
UPDATE, THE ANSWER:
Use cursor.nextObject()
For the correct answer see the comments by #dbra below: db.currentOp() and db.killOp() doesn't work for this case.
The final solution looks like this:
this.log_entries(function(err, collection) {
var cursor = collection.find(query);
cursor.sort("_id", 1, function(err, sorted) {
function exportFinished(aborted) {
...
}
function processItem(err, doc) {
if(doc === null ) {
exportFinished(false);
}
else if( abortCurrentExport ) {
exportFinished(true);
}
else {
var line = formatCSV(doc);
WriteFile(line);
process.nextTick(function(){
sorted.nextObject(processItem);
});
}
}
sorted.nextObject(processItem);
});
});
Note the usage of process.nextTick - without it there will be a stack overflow!
You could search the running query with db.currentOp and then kill it with db.killOp, but il would be a nasty solution.
A better way could be working with limited subsequent batches; the easier way would be a simple pagination with "limit" and "skip", but it depends on how your collection changes while you read it.

Refactoring mongoose queries

I been using mongoose a consider amount and I cant seem to get around "callback hell" and polluting my queries with error treatments.
For example here is a route I have:
var homePage = function(req, res) {
var companyUrl = buildingId = req.params.company
db.pmModel
.findOne({ companyUrl: companyUrl })
.exec(function (err, doc) {
if (err)
return HandleError(req, res, err)
if( !doc )
return NoResult(req, res, {msg: 'Aint there'})
console.log(doc)
db.rentalModel
.find({ propertyManager: doc.id })
.populate('building')
.exec(function (err, rentals) {
if (err)
return HandleError(req, res, err)
if( !doc )
return NoResult(req, res, {msg: 'Aint there'})
console.log(doc)
var data = doc.toJSON()
data.rentals = rentals
res.render('homePage', data)
})
})
}
my question: is there a more succinct way of writing this?
So perhaps what you have above is just a small example, but it doesn't appear to me that there's too much "callback hell" going on in your code (in my opinion). However, you can certainly refactor your code. Just know in doing so you can make it more difficult to understand or follow from a maintenance perspective.
One thing you can do is simply refactor your database layer. If you always find yourself querying one collection and then turning right around and querying another, you could consider merging those collections, or at least the documents that you're looking for. In a relational database you might separate out these tables and do merges, however in a document-based database, it sometimes makes more sense to combine the data within each document. This allows for easier queries and simpler logic in your code.
Another solution is to refactor your calls into separate functions, and control the flow in a different way. A popular library to help with this is async which provides many helper functions to assist in the asynchronous world of JavaScript. There are many to choose from, but one suggestion would be to use the waterfall function for your situation (since each call must be made before the next). It would then look something like this:
async.waterfall([
function(callback){
findCompany(companyUrl, callback);
},
function(id, callback){
findPropertyManager(id, callback);
}
], function (err, rentals) {
res.render(rentals)
});
You would still need to handle the errors in each function, but you could even refactor that out into a helper function. Furthermore, you could choose to code up something yourself to help with the control flow rather than using async.
But again, the code you showed above is understandable and readable, and only contains a couple inline callbacks. In this way, there's a lot less going on and may make debugging it later (if things go wrong) easier.

node.js and express passing sqlite data to one of my views

In my app.js I have the following to try to retrieve data from a sqlite database and pass it to one of my views:
app.get("/dynamic", function(req, res) {
var db = new sqlite3.Database(mainDatabase)
var posts = []
db.serialize(function() {
db.each("SELECT * FROM blog_posts", function(err, row) {
posts.push({title: row.post_title, date: row.post_date, text: row.post_text})
})
})
res.render("dynamic", {title: "Dynamic", posts: posts})
})
Can someone tell me what I am doing wrong here. The posts array seems to stay empty nomatter what.
EDIT
I was following a tutorial that explained that though the plugin has async, this method is not asynchronous
Here is a quote from the tutorial
Despite the callbacks and asynchronous nature of Node.js, these
transactions will run in series, allowing us to create, insert, and
query knowing that the statement prior will run before the current one
does. However, sqlite3 provides a "parallel" wrapper with the same
interface, but runs all the transactions in parallel. It just all
depends on your current circumstances.
the db calls are likely asynchronous. Which means you are rendering before they return with their data.
You need to figure out how to get one callback from your query, and render your template in that callback.
It looks like you want a second complete callback passed to db.each() (Thanks, Jonathan Lonowski, for the tip!)
var posts = [];
db.serialize(function() {
db.each("SELECT * FROM blog_posts", function(err, row) {
posts.push({title: row.post_title, date: row.post_date, text: row.post_text})
}, function() {
// All done fetching records, render response
res.render("dynamic", {title: "Dynamic", posts: posts})
})
})
The idea is the render in the last callback of any asynchronous code, that way you have everything you need.

Resources