Mongo Bulk Updates - which succeeded (matched and modified) and which did not? - node.js

In order to improve the performance of many single Mongo document updates #Node, I consider using Mongo's Bulk operation - to update as many as 1000 documents at each iteration.
In this use case, each individual update opeartion may or may not occur - an update will only occur if the document version had not changed since it was last read by the updater. If a docuemnt was not updated, the application needs to retry and/or do other stuff to hadnle the situation.
Currently the Node code looks like this:
col.update(
{_id: someid, version:someversion},
{$set:{stuf:toupdate, version:(someversion+1)}},
function(err, res) {
if (err) {
console.log('wow, something is seriously wrong!');
// do something about it...
return;
}
if (!res || !res.result || !res.result.nModified) { // no update
console.log('oops, seems like someone updated doc before me);
// do something about it...
return;
}
// Great! - Document was updated, continue as usual...
});
Using Mongo's Bulk unordered operations, is there a way to know which of the group of (1000) updates had succeeded and which had not been performed (in this case due to wrong version)?
The code I started playing with looks like:
var bulk = col.initializeUnorderedBulkOp();
bulk.find({_id: someid1, version:someversion1}).updateOne(
{$set:{stuf:toupdate1, version:(someversion1+1)}});
bulk.find({_id: someid2, version:someversion2}).updateOne(
{$set:{stuf:toupdate2, version:(someversion2+1)}});
...
bulk.find({_id: someid1000, version:someversion1000}).updateOne(
{$set:{stuf:toupdate1000, version:(someversion1000+1)}});
bulk.execute(function(err, result) {
if (err) {
console.log('wow, something is seriously wrong!');
// do something about it...
return;
}
if (result.nMatched < 1000) { // not all got updated
console.log('oops, seems like someone updated at least one doc before me);
// But which of the 1000 got updated OK and which had not!!!!
return;
}
// Great! - All 1000 documents got updated, continue as usual...
});

I was unable to find a Mongo solution for that.
The solution I used was to revert to per document operation if the bulk operation failed... This gives reasonable performance in most cases.

Related

How to make two db query synchronous, so that if any of them fails then both fails in node.js

async.parallel([
function(callback){
con.Attandance.insert({'xxx':'a'}, function(err,data) {
console.log(data);
callback();
});
}, function(callback) {
console.log(data);
con.Profile.insert({'xxx':'a'},function(err){callback()});
}
], function(err) {
console.log('Both a and b are saved now');
});
Attendance.insert() works either Profile.insert() execute or fails. I want if any of them fails data should not be saved in any collection either in Attendance or in Profile
What you mean are transactions, which have nothing to do with synchronous / asynchronous.
Unfortunately, MongoDB simply does not support transactions. The only way to achieve something even remotely close, you have to perform either a two phase commit, or implement a custom rollback logic to undo all changes to Attandance if the changes to Profile failed.
The only possibility to at least achieve atomic (yet not transactions!) updates, is by changing your model. If the Profile is a container for all Attandance instances, you can update the entire object at one. It's impossible to update more than one object atomically with MongoDB, and neither is it possible to achieve a strict order of transactions.
If you need that, go for an SQL database instead. Pretty much all (except SQlite) support transactions.
I wrote a library that implements the two phase commit system (mentioned in a prior answer) described in the docs. It might help in this scenario. Fawn - Transactions for MongoDB.
var Fawn = require("Fawn");
// intitialize Fawn
Fawn.init("mongodb://127.0.0.1:27017/testDB");
/**
optionally, you could initialize Fawn with mongoose
var mongoose = require("mongoose");
mongoose.connect("mongodb://127.0.0.1:27017/testDB");
Fawn.init(mongoose);
**/
// after initialization, create a task
var task = Fawn.Task();
task.save("Attendance", {xxx: "a"})
.save("Profile", {xxx: "a"})
.run()
.then(function(results){
// task is complete
// result from first operation
var firstUpdateResult = results[0];
// result from second operation
var secondUpdateResult = results[1];
})
.catch(function(err){
// Everything has been rolled back.
// log the error which caused the failure
console.log(err);
});

NodeJS writes to MongoDB only once

I have a NodeJS app that is supposed to generate a lot of data sets in a synchronous manner (multiple nested for-loops). Those data sets are supposed to be saved to my MongoDB database to look them up more effectively later on.
I use the mongodb - driver for NodeJS and have a daemon running. The connection to the DB is working fine and according to the daemon window the first group of datasets is being successfully stored. Every ~400-600ms there is another group to store but after the first dataset there is no output in the MongoDB console anymore (not even an error), and as the file sizes doesn't increase i assume those write operations don't work (i cant wait for it to finish as it'd take multiple days to fully run).
If i restart the NodeJS script it wont even save the first key anymore, possibly because of duplicates? If i delete the db folder content the first one will be saved again.
This is the essential part of my script and i wasn't able to find anything that i did wrong. I assume the problem is more in the inner logic (weird duplicate checks/not running concurrent etc).
var MongoClient = require('mongodb').MongoClient, dbBuffer = [];
MongoClient.connect('mongodb://127.0.0.1/loremipsum', function(err, db) {
if(err) return console.log("Cant connect to MongoDB");
var collection = db.collection('ipsum');
console.log("Connected to DB");
for(var q=startI;q<endI;q++) {
for(var w=0;w<words.length;w++) {
dbBuffer.push({a:a, b:b});
}
if(dbBuffer.length) {
console.log("saving "+dbBuffer.length+" items");
collection.insert(dbBuffer, {w:1}, function(err, result) {
if(err) {
console.log("Error on db write", err);
db.close();
process.exit();
}
});
}
dbBuffer = [];
}
db.close();
});
Update
db.close is never called and the connection doesn't drop
Changing to bulk insert doesn't change anything
The callback for the insert is never called - this could be the problem! The MongoDB console does tell me that the insert process was successful but it looks like the communication between driver and MongoDB isn't working properly for insertion.
I "solved" it myself. One misconception that i had was that every insert transaction is confirmed in the MongoDB console while it actually only confirms the first one or if there is some time between the commands. To check if the insert process really works one needs to run the script for some time and wait for MongoDB to dump it in the local file (approx. 30-60s).
In addition, the insert processes were too quick after each other and MongoDB appears to not handle this correctly under Win10 x64. I changed from the Array-Buffer to the internal buffer (see comments) and only continued with the process after the previous data was inserted.
This is the simplified resulting code
db.collection('seedlist', function(err, collection) {
syncLoop(0,0, collection);
//...
});
function syncLoop(q, w, collection) {
batch = collection.initializeUnorderedBulkOp({useLegacyOps: true});
for(var e=0;e<words.length;e++) {
batch.insert({a:a, b:b});
}
batch.execute(function(err, result) {
if(err) throw err;
//...
return setTimeout(function() {
syncLoop(qNew,wNew,collection);
}, 0); // Timer to prevent Memory leak
});
}

Node.js oracle driver - multiple updates

So I've set up a node.js backend that is to be used to move physical items in our warehouse. The database hosting our software is oracle, and our older version of this web application is written in PHP which works fine, but has some weird glitches and is slow as all hell.
The node.js backend works fine for moving single items, but once I try moving a box (which will then move anything from 20-100 items), the entire backend stops at the .commit() part.
Anyone have any clue as to why this happens, and what I can do to remedy it? Suggestions for troubleshooting would be most welcome as well!
Code:
function move(barcode,location) {
var p = new Promise(function(resolve,reject) {
console.log("Started");
exports.findOwner(barcode).then(function(data) {
console.log("Got data");
// console.log(barcode);
var type = data[0];
var info = data[1];
var sql;
sql = "update pitems set location = '"+location+"' where barcode = '"+barcode+"' and status = 0"; // status = 0 is goods in store.
ora.norway.getConnection(function(e,conn) {
if(e) {
reject({"status": 0, "msg": "Failed to get connection", "error":e});
}
else {
console.log("Got connection");
conn.execute(sql,[],{}, function(err,results) {
console.log("Executed");
if(err) {
conn.release();
reject({"status": 0, "msg": "Failed to execute sql"+sql, "error": err});
}
else {
console.log("Execute was successfull"); // This is the last message logged to the console.
conn.commit(function(e) {
conn.release(function(err) {
console.log("Failed to release");
})
if(e) {
console.log("Failed to commit!");
reject({"status": 0, "msg": "Failed to commit sql"+sql, "error": e});
}
else {
console.log("derp6");
resolve({"status": 1, "msg": "Relocated "+results.rowsAffected+" items."});
}
});
}
});
}
});
});
});
return p;
}
Please be aware that your code is open to SQL injection vulnerabilities. Even more so since you posted it online. ;)
I recommend updating your statement to something like this:
update pitems
set location = :location
where barcode = :barcode
and status = 0
Then update your conn.execute as follows:
conn.execute(
sql,
{
location: location,
barcode: barcode
},
{},
function(err,results) {...}
);
Oracle automatically escapes bind variables. But there's another advantage in that you'll avoid hard parses when the values of the bind variables change.
Also, I'm happy to explore the issue you're encountering more with commit. But it would really help if you could provide a reproducible test case that I could run on my end.
I think this is an issue on the database level, an update on multiple items without providing an ID is maybe not allowed.
You should do two things:
1) for debugging purposes, add console.log(JSON.stringify(error)) where you expect an error. Then you'll find the error that your database provides back
2) at the line that says
conn.release(function(err) {
console.log("Failed to release");
})
Check if err is defined:
conn.release(function(err) {
if(err){
console.log("Failed to release");
}
else{console.log("conn released");}
})
This sounds like similar to an issue that I'm having. Node.js is hanging while updating oracle db using oracledb library. It looks like when there are 167 updates to make, it works fine. The program hangs when I have 168 updates. The structure of the program goes like this:
When 168 records from local sqlite db, for each record returned as callback from sqlite: 1.) get an Oracle connection; 2.) do 2 updates to two tables (one update to each table with an autoCommit on the latter execute). All 1st update completed but none can start the second update. They are just hanging there. With 167 records, they will run to completion.
The strange thing observed is that none of the 168 could get started on 2nd update (they finished 1st update) so some will have a chance to go forward to commit. It looks like they are all queued up in some way.

How can I cancel MongoDB query from .each callback

I implemented a little NodeJs web server that stores log entries and provides a backend for a web based log browser. The web interface provides also an "Export to CVS" function and lets user download the logs in CVS format. My code looks similar to this:
this.log_entries(function(err, collection) {
collection.find(query)
.sort({_id: 1})
.each(function (err, doc) {
if(doc){
WriteLineToCSVFile(doc);
}
else {
ZipCSVFileAndSendIt();
}
});
});
The export operation may take a significant amount of time and disk space in case if a user didn't specify the right filters for the query. I need to implement a fail safe mechanism preventing this. One important requirement is that user should be able to abort the ongoing export operation at any point in time. Currently my solution is that I stop writing the data to the CSV file, however the callback passed to the .each() still gets called. I could not find any information how to stop the each loop. So the question is how can I do this?
UPDATE, THE ANSWER:
Use cursor.nextObject()
For the correct answer see the comments by #dbra below: db.currentOp() and db.killOp() doesn't work for this case.
The final solution looks like this:
this.log_entries(function(err, collection) {
var cursor = collection.find(query);
cursor.sort("_id", 1, function(err, sorted) {
function exportFinished(aborted) {
...
}
function processItem(err, doc) {
if(doc === null ) {
exportFinished(false);
}
else if( abortCurrentExport ) {
exportFinished(true);
}
else {
var line = formatCSV(doc);
WriteFile(line);
process.nextTick(function(){
sorted.nextObject(processItem);
});
}
}
sorted.nextObject(processItem);
});
});
Note the usage of process.nextTick - without it there will be a stack overflow!
You could search the running query with db.currentOp and then kill it with db.killOp, but il would be a nasty solution.
A better way could be working with limited subsequent batches; the easier way would be a simple pagination with "limit" and "skip", but it depends on how your collection changes while you read it.

MongoDB NodeJS driver, how to know when .update() 's are complete

As the code is quite large to posted in here, I append my github repo https://github.com/DiegoGallegos4/Mongo
I am trying to use de NodeJS driver to update some records fulfilling a criteria but first I have to find some records fulfilling another criteria. On the update part, the records found and filter from the find operation are used. This is,
file: weather1.js
MongoClient.connect(some url, function(err,db){
db.collection(collection_name).find({},{},sort criteria).toArray(){
.... find the data and append to an array
.... this data inside a for loop
db.collection(collection_name).update(data[i], {$set...}, callback)
}
})
That´s the structure used to solve the problem, relating when to close the connection , it is when the length of the data array equals the number of callbacks on the update operation. For more details you can refer to the repo.
file: weather.js
On the other approach, Instead of toArray is used .each to iterate on the cursor.
I've looked up for a solution to this for a week now on several forums.
I've read about pooling connections but I want to know what is my conceptual error on my code. I would appreciate a deep insight on this topic.
The way you pose your question is very misleading. All you want to know is "When is the processing complete so I can close?".
The answer to that is you need to respect the callbacks generally only move through the cursor of results once each update is complete.
The simple way without other dependencies is to use the stream interface suported by the driver:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://localhost:27017/data',function(err,db){
if(err) throw err;
coll = db.collection('weather');
console.log('connection established')
var stream = coll.find().sort([['State',1],['Temperature',-1]])
stream.on('err',function(err) {
throw err;
});
stream.on('end',function() {
db.close();
});
var month_highs = [];
var state = '';
var length = 0;
stream.on('data',function(doc) {
stream.pause(); // pause processing documents
if (err) throw err;
if (doc) {
length = month_highs.length
if(state != doc['State']){
month_highs.push(doc['State']);
//console.log(doc);
}
state = doc['State']
if(month_highs.length > length){
coll.update(doc, {$set : {'month_high':true} }, function(err, updated){
if (err) throw err;
console.log(updated)
stream.resume(); // resume processing documents
});
} else {
stream.resume();
}
} else {
stream.resume();
}
});
});
That's just a copy of the code from your repo, refactored to use a stream. So all the important parts are where the word "stream" appears, and most importantly where they are being called.
In a nutshell the "data" event is emitted by each document from the cursor results. First you call .pause() so new documents do not overrun the processing. Then you do your .update() and within it's callback on return you call .resume(), and the flow continues with the next document.
Eventually "end" is emitted when the cursor is depleted, and that is where you call db.close().
That is basic flow control. For other approaches, look at the node async library as a good helper. But do not loop arrays with no async control, and do not use .each() which is DEPRECATED.
You need to signal when the .update() callback is complete to follow a new "loop iteration" at any rate. This is the basic no additional dependancy approach.
P.S I am a bit suspect about the general logic of your code, especially testing if the length of something is greater when you read it without possibly changing that length. But this is all about how to implement "flow control", and not to fix the logic in your code.

Resources