mongodb mapReduce for sailsjs - node.js

I am trying to sort two Collections based on creation time. I searched through stackoverflow and found the solution which was mentioned this link http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/ that is you mapReduce the collections to another Collection named joined and later sort using a given parameter(creation time in my case).
The problem now is how we mapReduce in Sailsjs, I have already referred this http://sailsjs.org/documentation/reference/waterline-orm/models/native where we have this code.
Pet.native(function(err, collection) {
if (err) return res.serverError(err);
collection.find({}, {
name: true
}).toArray(function (err, results) {
if (err) return res.serverError(err);
return res.ok(results);
});
});
the collection parameter in the callback carries properties such as find, create and so on. Does collection parameter have the function mapReduce ? Since it was undefined in my case.
Is there any way to mapReduce in Sailsjs apart from opening a raw connection and using the connection opened in config/connection.js(default connection).
Or better any way to sort 2 Collections in mongodb based on a particular attribute?
Sails v0.11.5, node v0.4.4.2
Thanks.

Related

How is insertMany different to collection.insert in Mongoose?

I scouted around to find the right solution for inserting a large amount of documents to MongoDB using Mongoose.
My current solution looks like this:
MongoClient.saveData = function(RecordModel, data, priority, SCHID, callback){
var dataParsed = parseDataToFitSchema(data, priority, SCHID);
console.log("Model created. Inserting in batches.");
RecordModel.insertMany(dataParsed)
.then(function(mongooseDocuments) {
console.log("Insertion was successful.");
})
.catch(function(err) {
callback("Error while inserting data to DB: "+err);
return;
})
.done(function() {
callback(null);
return;
});
}
But it appears to me there are other offered solutions out there. Like this one:
http://www.unknownerror.org/opensource/Automattic/mongoose/q/stackoverflow/16726330/mongoose-mongodb-batch-insert
Using collection.insert. How is that different to the Model.insertMany?
Same goes for update, my previous question: What is the right approach to update many records in MongoDB using Mongoose
Asks how do I update big chunk of data with Mongoose, defined by _id. The answer suggests to use collection.bulkWrite while I am under impression Model.insertMany can do it too.

Loopback query which compares field values

Say i have the following Scheme
Product: {
Quantity: Number,
SelledQuantity: Number
}
Would it be possible to write a query where all the results returned are where Quantity=SelledQuantity?
If so, is there a way to use it when doing a populate? (Perhaps inside the match field in the opts object ?)
I use mysql connector.
yes as I understood your problem you can do this by following rest call.
http://localhost:3000/api/products?filter[where][SelledQuantity]=n
this will give you the desired results.
This question is more related to MySQL query. But you can achieve it by javascript as follows:
Product.find({}, fuction(err, products) {
if(err) throw err;
//considering products as array of product. Otherwise you can get to depth for array of product.
var filteredProducts = products.filter(function(p1) {
return p1.Quantity === p1.SelledQuantity;
});
//Your desired output
console.log(filteredProducts);
});
This will be slow but will work for smaller database size. For more optimized answer, ask the question in mysql section with respect to database and table structure.

Mongoose Find and Remove

I'm trying to delete multiple documents that satisfy a query. However I need the data of those documents for storing them in a separate collection for undo functionality. The only way I got this to work is with multiple queries:
Data.find(query).exec(function(err, data)
{
Data.remove(query).exec(function(err2)
{
ActionCtrl.saveRemove(data);
});
});
Is there a better way? In this post: How do I remove documents using Node.js Mongoose? it was suggested to use find().remove().exec():
Data.find(query).remove().exec(function(err, data)
{
ActionCtrl.saveRemove(data);
});
However data is usually 1, don't ask me why. Can I do this without infinitely nesting my queries? Thanks!
As you have noted, using the following will not return the document:
Data.find(query).remove().exec(function(err, data) {
// data will equal the number of docs removed, not the document itself
}
As such, you can't save the document in ActionCtrl using this approach.
You can achieve the same result using your original approach, or use some form of iteration. A control flow library like async might come in handy to handle the async calls. It won't reduce your code, but will reduce the queries. See example:
Data.find(query, function(err, data) {
async.each(data, function(dataItem, callback) {
dataItem.remove(function(err, result) {
ActionCtrl.saveRemove(result, callback);
});
});
});
This answer assumes that the ActionCtrl.saveRemove() implementation can take an individual doc as a parameter, and can execute the callback from the async.each loop. async.each requires a callback to be run without arguments at the end of each iteration, so you would ideally run this at the end of .saveRemove()
Note that the remove method on an individual document will actually return the document that has been removed.

Why is Model.save() not working in Sails.js?

Save() giving me error like "Object has no method 'save'"
Country.update({id:req.param('country_id')},model).exec(function(err,cntry){
if(err) return res.json(err);
if(!cntry.image){
cntry.image = 'images/countries/'+filename;
cntry.save(function(err){ console.log(err)});
}
})
Any Idea about how to save model within update query . ??
Assuming you're using Waterline and sails-mongo, the issue here is that update returns an array (because you can update multiple records at once), and you're treating it like a single record. Try:
Country.update({id:req.param('country_id')},model).exec(function(err,cntry){
if(err) return res.json(err);
if(cntry.length === 0) {return res.notFound();}
if(!cntry[0].image){
cntry[0].image = 'images/countries/'+filename;
cntry[0].save(function(err){ console.log(err)});
}
});
This seems to me an odd bit of code, though; why not just check for the presence of image in model before doing Country.update and alter model (or a copy thereof) accordingly? That would save you an extra database call.
When using mongoose (3.8) to update the database directly the callback function receives 3 parameters, none of then is a mongoose object of the defined model. The parameters are:
err is the error if any occurred
numberAffected is the count of updated documents Mongo reported
rawResponse is the full response from Mongo
The right way is, first you fetch and then change the data:
Country.findOne({id: req.param('country_id')}, function (err, country) {
// do changes
})
Or using the update method, the way you intended:
Country.update({id: req.param('country_id'), image: {$exists: false}}, {image: newValue}, callback)

How to do a massive random update with MongoDB / NodeJS

I have a mongoDB collection with more then 1000000 documents and i would like to update each document one by one with a dedicated information (each doc has an information coming from an other collection).
Currently i'm using a cursor that fetch all the data from the collection and i do an update of each records through the async module of Node.js
Fetch all docs :
inst.db.collection(association.collection, function(err, collection) {
collection.find({}, {}, function(err, cursor) {
cursor.toArray(function(err, items){
......
);
});
});
update each doc :
items.forEach(function(item) {
// *** do some stuff with item, add field etc.
tasks.push(function(nextTask) {
inst.db.collection(association.collection, function(err, collection) {
if (err) callback(err, null);
collection.save(item, nextTask);
});
});
});
call the "save" task in parallel
async.parallel(tasks, function(err, results) {
callback(err, results);
});
Ho would you do this type of operation in a more efficient way? I mean how to avoid the initial "find" to load a cursor. Is there now way to do an operation doc by doc knowing that all docs should be updated?
Thanks for your support.
You're question inspired me to create a Gist to do some performance testing of different approaches to your problem.
Here are the results running on a small EC2 instance with the MongoDB at localhost. The test scenario is to uniquely operate on every document of a 100000 element collection.
108.661 seconds -- Uses find().toArray to pull in all the items at once then replaces the documents with individual "save" calls.
99.645 seconds -- Uses find().toArray to pull in all the items at once then updates the documents with individual "update" calls.
74.553 seconds -- Iterates on the cursor (find().each) with batchSize = 10, then uses individual update calls.
58.673 seconds -- Iterates on the cursor (find().each) with batchSize = 10000, then uses individual update calls.
4.727 seconds -- Iterates on the cursor with batchSize = 10000, and does inserts into a new collection 10000 items at a time.
Though not included, I also did a test with MapReduce used as a server side filter which ran at about 19 seconds. I would have liked to have similarly used "aggregate" as a server side filter, but it doesn't yet have an option to output to a collection.
The bottom line answer is that if you can get away with it, the fastest option is to pull items from an initial collection via a cursor, update them locally and insert them into a new collection in big chunks. Then you can swap in the new collection for the old.
If you need to keep the database active, then the best option is to use a cursor with a big batchSize, and update the documents in place. The "save" call is slower than "update" because it needs to replace whole document, and probably needs to reindex it as well.

Resources