My db contains projects and phases. Projects can have multiple phases. The models are similar to these:
Phase:
var phaseSchema = new mongoose.Schema({
project: { type: mongoose.Schema.Types.ObjectId, ref: 'Project' }
});
Project:
var projectSchema = new mongoose.Schema({
name : { type: String }
});
Currently I'm using the following approach to retrieve the phases for each project:
var calls = [];
var projects = _.each(projects, function (p) {
calls.push(function (callback) {
req.app.db.models.Phase.find({ project: p._id }, function (err, doc) {
if (err) {
callback(err);
} else {
p.phases = doc;
callback();
}
});
})
});
async.parallel(calls, function (err) {
workflow.outcome.projects = projects;
return workflow.emit('response');
});
As you can see I'm not passing anything to callback() just (ab)using async's parallel to wait with the response until the lookup finishes.
Alternatively I could pass the phase object to the callback but then in parallel I should iterate over phase and over projects to find the appropriate project for the current phase.
Am I falling into a common pitfall with this design and for some reason it would be better to iterate over the projects and the phases again, or I should take a completely different approach?
I actually think in this case you would be better of running one query to match all the potential results. For the "test" query you would issue all the _id values as an $in clause, then just do some matching on the results to your source array to assign the match(ed) document(s):
Matching all at once
// Make a hash from the source for ease of matching
var pHash = {};
_.each(projects,function(p) {
pHash[p._id.toString()] = p;
});
// Run the find with $in
req.app.db.models.Phase.find({ "project": { "$in": _.keys(pHash) } },function(err,response) {
_.each(response,function(r) {
// Assign phases array if not already there
if (!phash[r.project.toString()].hasOwnProperty("phases")
pHash[r.project.toString()].phases = [];
// Append to array of phases
pHash[r.project.toString()].phases.push(r)
});
// Now return the altered hash as orginal array
projects = _.mapObject(pHash,function(val,key) {
return val;
});
});
Also adding like you say "projects can have multiple phases", so the logic would be an "array" rather than an assignment of a single value.
More efficient $lookup
On the other hand, if you have MongoDB 3.2 available, then the $lookup aggregation pipeline operator seems to be for you. In this case you would just be working with the Projects model, but doing the $lookup on the `"phases" collection. With "collection" being the operative term here, since it is a server side operation that therefore only knows about collections and not the application "models":
// BTW all models are permanently registered with mongoose
mongoose.model("Project").aggregate(
[
// Whatever your match conditions were for getting the project list
{ "$match": { .. } },
// This actually does the "join" (but really a "lookup")
{ "$lookup": {
"from": "phases",
"localField": "_id",
"foreignField": "project",
"as": "phases"
}}
],function(err,projects) {
// Now all projects have an array containing any matched phase
// or an empty array. Just like a "left join"
})
);
That would be the most efficient way to handle this since all the work is done on the server.
So what you seem to be asking here is basically the "reverse case" of .populate() where instead of holding the "phases" as references on the "project" object the reference to the project is instead listed in the "phase".
In that case, either form of "lookup" should be what you are looking for. Either where you emulate that join via the $in and "mapping" stage, or directly using the aggregation framework $lookup operator.
Either way, this reduces the server contact down to "one" operation, where as your current approach is going to create a lot of connections and each up a fair amount of resources. Also no need to "Wait for all responses". I'd wager that both were much faster as well.
Related
I have a collection of about 1M documents. Each document has internalNumber property and I need to get all internalNumbers in my node.js code.
Previously I was using
db.docs.distinct("internalNumber")
or
collection.distinct('internalNumber', {}, {},(err, result) => { /* ... */ })
in Node.
But with the growth of the collection I started to get the error: distinct is too big, 16m cap.
Now I want to use aggregation. It consumes a lot of memory and it is slow, but it is OK since I need to do it only once at the script startup. I've tried following in Robo 3T GUI tool:
db.docs.aggregate([{$group: {_id: '$internalNumber'} }]);
It works, and I wanted to use it in node.js code the following way:
collection.aggregate([{$group: {_id: '$internalNumber'} }],
(err, docs) => { /* ... * });
But in Node I get an error: "MongoError: aggregation result exceeds maximum document size (16MB) at Function.MongoError.create".
Please help to overcome that limit.
The problem is that the native driver differs from how the shell method is working by default in that the "shell" is actually returning a "cursor" object where the native driver needs this option "explicitly".
Without a "cursor", .aggregate() returns a single BSON document as an array of documents, so we turn it into a cursor to avoid the limitation:
let cursor = collection.aggregate(
[{ "$group": { "_id": "$internalNumber" } }],
{ "cursor": { "batchSize": 500 } }
);
cursor.toArray((err,docs) => {
// work with resuls
});
Then you can use regular methods like .toArray() to make the results a JavaScript array which on the 'client' does not share the same limitations, or other methods for iterating a "cursor".
For Casbah users:
val pipeline = ...
collection.aggregate(pipeline, AggregationOptions(batchSize = 500, outputMode = AggregationOptions.CURSOR)
I'm trying to implement a rating system and I'm struggling to only allow one rating per user in a reasonable way.
Simply put, i have an array of ratings in my schema, containing the "rater" and the rating, as such:
var schema = new Schema({
//...
ratings: [{
by: {
type: Schema.Types.ObjectId
},
rating: {
type: Number,
min: 1,
max: 5,
validate: ratingValidator
}
}],
//...
});
var Model = mongoose.model('Model', schema);
When i get a request, i wish to add the users rating to the array if the user has not already voted this document, otherwise i wish to update the rating (you should not be able to give more than one rating)
One way to do this is to find the document, "loop through" the array of ratings and search for the user. If the user has got already a rating in the array, the rating is changed, otherwise a new rating is pushed. As such:
Model.findById(id)
.select('ratings')
.exec(function(err, doc) {
if(err) return next(err);
if(doc) {
var rated = false;
var ratings = doc.ratings;
for(var i = 0; i < ratings.length; i++) {
if(ratings[i].by === user.id) {
ratings[i].rating = rating;
rated = true;
break;
}
}
if(!rated) {
ratings.push({
by: user.id,
rating: rating
});
}
doc.markModified('ratings');
doc.save();
} else {
//Not found
}
});
Is there an easier way? A way to let mongodb do this automatically?
The mongodb $addToSet operator could be an alternative, however i have not managed to use it for this, since that could allow two ratings with different scores from the same user.
As you note the $addToSet operator will not work in this case as indeed a userId with a different vote value would be a different value and it's own unique member of the set.
So the best way to do this is to actually issue two update statements with complementary logic. Only one will actually be applied depending on the state of the document:
async.series(
[
// Try to update a matching element
function(callback) {
Model.update(
{ "_id": id, "ratings.by": user.id },
{ "$set": { "ratings.$.rating": rating } },
callback
);
},
// Add the element where it does not exist
function(callback) {
Model.update(
{ "_id": id, "ratings.by": { "$ne": user.id } },
{ "$push": { "ratings": { "by": user.id, "rating": rating } }},
callback
);
}
],
function(err,result) {
// all done
}
);
The principle is simple, try to match the userId present in the ratings array for the document and update the entry. If that condition is not met then no document is updated. In the same way, try to match the document where there is no userId present in the ratings array, if there is a match then add the element, otherwise there will be no update.
This does bypass the built in schema validation of mongoose, so you would have to apply your constraints manually ( or inspect the schema validation rules and apply manually ) but it is better than you current approach in one very important aspect.
When you .find() the document and call it back to your client application to modify using code as you are, then there is no guarantee that the document has not changed on the server from another process or request. So when you issue .save() the document on the server may no longer be in the state that it was when it was read and any modifications can overwrite the changes made there.
Hence while there are two operations to the server and not one ( and your current code is two operations anyway ), it is the lesser of two evils to manually validate than to possibly cause a data inconsistency. The two update approach will respect any other updates issued to the document possibly occurring at the same time.
I am working through a MEAN stack tutorial. It contains the following code as a route in index.js. The name of my Mongo collection is brandcollection.
/* GET Brand Complaints page. */
router.get('/brands', function(req, res) {
var db = req.db;
var collection = db.get('brandcollection');
collection.find({},{},function(e,docs){
res.render('brands', {
"brands" : docs
});
});
});
I would like to modify this code but I don't fully understand how the .find method is being invoked. Specifically, I have the following questions:
What objects are being passed to function(e, docs) as its arguments?
Is function(e, docs) part of the MongoDB syntax? I have looked at the docs on Mongo CRUD operations and couldn't find a reference to it. And it seems like the standard syntax for a Mongo .find operation is collection.find({},{}).someCursorLimit(). I have not seen a reference to a third parameter in the .find operation, so why is one allowed here?
If function(e, docs) is not a MongoDB operation, is it part of the Monk API?
It is clear from the tutorial that this block of code returns all of the documents in the collection and places them in an object as an attribute called "brands." However, what role specifically does function(e, docs) play in that process?
Any clarification would be much appreciated!
The first parameter is the query.
The second parameter(which is optional) is the projection i.e if you want to restrict the contents of the matched documents
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 },function(e,docs){})
would mean to get only the item and qty fields in the matched documents
The third parameter is the callback function which is called after the query is complete. function(e, docs) is the mongodb driver for node.js syntax. The 1st parameter e is the error. docs is the array of matched documents. If an error occurs it is given in e. If the query is successful the matched documents are given in the 2nd parameter docs(the name can be anything you want).
The cursor has various methods which can be used to manipulate the matched documents before mongoDB returns them.
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 })
is a cursor you can do various operations on it.
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 }).skip(10).limit(5).toArray(function(e,docs){
...
})
meaning you will skip the first 10 matched documents and then return a maximum of 5 documents.
All this stuff is given in the docs. I think it's better to use mongoose instead of the native driver because of the features and the popularity.
What is the best way to update a value within an array saved in a mongodb record? Currently, I'm trying it this way:
Record.find({ 'owner': owner}, {}, {sort: { date: -1 }}, function(err, record){
if(!err){
for (var i = 0; i < record[0].array.length; i++){
record[0].array[i].score = 0;
record[0].array[i].changed = true;
record[0].save();
}
}
});
And the schema looks like this:
var recordSchema = mongoose.Schema({
owner: {type: String},
date: {type: Date, default: Date.now},
array: mongoose.Schema.Types.Mixed
});
Right now, I can see that the array updates, I get no error in saving, but when I query the database again, the array hasn't been updated.
It would help if you explained your intent here as naming a property "array" conveys nothing about its purpose. I guess from your code you hope to go and set the score of each item there to zero. Note your save is currently being ignored because you can only save top-level mongoose documents, not nested documents.
Certain find-and-modify operations on arrays can be done with a single database command using the Array Update Operators like $push, $addToSet, etc. However I don't see any operators that can directly make your desired change in a single operation. Thus I think you need to find your record, alter the array date, and save it. (Note findOne is a convenience function you can use if you only care about the first match, which seems to be the case for you).
Record.findOne({ 'owner': owner}, {}, {sort: { date: -1 }}, function(err, record){
if (err) {
//don't just ignore this, log or bubble forward via callbacks
return;
}
if (!record) {
//Record not found, log or send 404 or whatever
return;
}
record.array.forEach(function (item) {
item.score = 0;
item.changed = true;
});
//Now, mongoose can't automatically detect that you've changed the contents of
//record.array, so tell it
//see http://mongoosejs.com/docs/api.html#document_Document-markModified
record.markModified('array');
record.save();
});
If you have a mongoose object of a document, you can of course update the array as in the question, with the following Caveat.
This is in fact a mongoose gotcha. Mongoose cannot track changes in the array of mixed, one has to use markModified:
doc.mixed.type = 'changed';
doc.markModified('mixed.type');
doc.save() // changes to mixed.type are now persisted
I have two models in my app: Item and Comment. An Item can have many Comments, and a Comment instance contains a reference to an Item instance with key 'comment', to keep track of the relationship.
Now I have to send a JSON list of all Items with their Comment count when user requests on a particular URL.
function(req, res){
return Item.find()
.exec(function(err, items) {
return res.send(items);
});
};
I am not sure how can I "populate" comment count to the items. This seems to be a common problem and I tend to think there should be some nicer way of doing this job than brute force.
So please share your thoughts. How would you "populate" the Comment count to the Items?
check the MongoDB documentation and look for the method findAndModify() -- with it you can atomically update a document, e.g. add a comment and increment the document counter at the same time.
findAndModify
The findAndModify command atomically modifies and returns a single document. By default, the returned document does not include the modifications made on the update. To return the document with the modifications made on the update, use the new option.
Example
Use the update option, with update operators $inc for the counter, and $addToSet for adding the actual comment to an embedded array of comments.
db.runCommand(
{
findAndModify: "item",
query: { name: "MyItem", state: "active", rating: { $gt: 10 } },
sort: { rating: 1 },
update: { $inc: { commentCount: 1 },
$addToSet: {comments: new_comment} }
}
)
See:
MongoDB: findAndModify
MongoDB: Update Operators
I did some research on this issue and came up with following results. First, MongoDB docs suggest:
In general, use embedded data models when:
you have “contains” relationships between entities.
you have one-to-many relationships where the “many” objects always appear with or are viewed in the context of their parent documents.
So in my situation, it makes much more sense if Comments are embedded into Items, instead of having independent existence.
Nevertheless, I was curious to know the solution without changing my data model. As mentioned in MongoDB docs:
Referencing provides more flexibility than embedding; however, to
resolve the references, client-side applications must issue follow-up
queries. In other words, using references requires more roundtrips to
the server.
As multiple roundtrips are kosher now, I came up with following solution:
var showList = function(req, res){
// first DB roundtrip: fetch all items
return Item.find()
.exec(function(err, items) {
// second DB roundtrip: fetch comment counts grouped by item ids
Comment.aggregate({
$group: {
_id: '$item',
count: {
$sum: 1
}
}
}, function(err, agg){
// iterate over comment count groups (yes, that little dash is underscore.js)
_.each(agg, function( itr ){
// for each aggregated group, search for corresponding item and put commentCount in it
var item = _.find(items, function( item ){
return item._id.toString() == itr._id.toString();
});
if ( item ) {
item.set('commentCount', itr.count);
}
});
// send items to the client in JSON format
return res.send(items);
})
});
};
Agree? Disagree? Please enlighten me with your comments!
If you have a better answer, please post here, I'll accept it if I find it worthy.