I have a tree-like schema that specifies a collection of parents, and a collection of children.
The collection of children will likely have millions of documents - each of which contains a small amount of data, and a reference to the parent that it belongs to which is stored as a string (perhaps my first mistake).
The collection of parents is much smaller, but may still be in the tens of thousands, and will slowly grow over time. Generally speaking though, a single parent may have as few as 10 children, or as many as 50,000 (possibly more, although somewhat unlikely).
A single child document might look something like this:
{
_id: ObjectId("507f191e810c19729de860ea"),
info: "Here's some info",
timestamp: 1234567890.0,
colour: "Orange",
sequence: 1000,
parent: "12a4567b909c7654d212e45f"
}
Its corresponding parent record (which lives in a separate collection) might look something like this:
{
_id: ObjectId("12a4567b909c7654d212e45f")
info: "Blah",
timestamp: 1234567890.0
}
My query in mongoose (which contains the parent ID in the request) looks like this:
/* GET all children with the specified parent ID */
module.exports.childrenFromParent = function(req, res) {
parentID = req.params.parentID;
childModel.find({
"parentid": parentID
}).sort({"sequence": "asc"}).exec(
function(err, children) {
if (!children) {
sendJSONResponse(res, 404, {
"message": "no children found"
});
return;
} else if (err) {
sendJSONResponse(res, 404, err);
return;
}
sendJSONResponse(res, 200, children);
}
);
};
So basically what's happening is that the query has to search the entire collection of children for any documents that have a parent which matches the provided ID.
I'm currently saving this parent ID as a string in the children collection schema (childModel in the code above), which is probably a bad idea, however, my API is providing the parent ID as a string in the request.
If anyone has any ideas as to how I can either fix my schema or change the query to improve the performance, it would be much appreciated!
Why are you not using .lean() before your exec? Do you really want all of your documents to be Mongoose documents or just simple JSON docs? With lean() you will not get all the extra getters and setters that come with Mongoose document. This could easily shave off at least a second or two from the response time.
Write up from the comments:
You could help speed up and optimize queries by adding an index on the parent field. You can add an (ascending) index by doing the following:
db.collection.createIndex( { parent: 1 } )
You can analyse the benefit of an index by adding .explain("executionStats") to a query. See the docs for more info.
Adding an index on a large collection may take time, you can check the status by running the following query:
db.currentOp(
{
$or: [
{ op: "query", "query.createIndexes": { $exists: true } },
{ op: "insert", ns: /\.system\.indexes\b/ }
]
}
)
Edit: If you are sorting by sequence, you might want to add a compound index for the parent and the sequence.
Related
I have a User schema with basic fields which include interests, location co-ordinates
I need to perform POST request with a specific UserId to get the results
app.post('api/users/search/:id',function(err,docs)
{ //find all the documents whose search is enabled.
//on documents returned in above find the documents who have atleast 3 common interests(req.body.interests) with the user with ':id'
// -----OR-----
//find the documents who stay within 'req.body.distance' compared to location of user with':id'
//Something like this
return User
.find({isBuddyEnabled:true}),
.find({"interests":{"$all":req.body.interests}}),
.find({"_id":req.params.id},geoLib.distance([[req.body.gcordinates],[]))
});
Basically i need to perform find inside find or Query inside query..
As per your comments in the code you want to use multiple conditions in your find query such that either one of those condition is satisfied and returns the result based on it. You can use $or and $and to achieve it. A sample code with conditions similar to yours is given below.
find({
$or:[
{ isBuddyEnabled:true },
{ "interests": { "$all":req.body.interests }},
{ $and:[
{ "_id":req.params.id },
{ geoLib.distance...rest_of_the_condition }
]
}
]
});
I'm trying to implement a rating system and I'm struggling to only allow one rating per user in a reasonable way.
Simply put, i have an array of ratings in my schema, containing the "rater" and the rating, as such:
var schema = new Schema({
//...
ratings: [{
by: {
type: Schema.Types.ObjectId
},
rating: {
type: Number,
min: 1,
max: 5,
validate: ratingValidator
}
}],
//...
});
var Model = mongoose.model('Model', schema);
When i get a request, i wish to add the users rating to the array if the user has not already voted this document, otherwise i wish to update the rating (you should not be able to give more than one rating)
One way to do this is to find the document, "loop through" the array of ratings and search for the user. If the user has got already a rating in the array, the rating is changed, otherwise a new rating is pushed. As such:
Model.findById(id)
.select('ratings')
.exec(function(err, doc) {
if(err) return next(err);
if(doc) {
var rated = false;
var ratings = doc.ratings;
for(var i = 0; i < ratings.length; i++) {
if(ratings[i].by === user.id) {
ratings[i].rating = rating;
rated = true;
break;
}
}
if(!rated) {
ratings.push({
by: user.id,
rating: rating
});
}
doc.markModified('ratings');
doc.save();
} else {
//Not found
}
});
Is there an easier way? A way to let mongodb do this automatically?
The mongodb $addToSet operator could be an alternative, however i have not managed to use it for this, since that could allow two ratings with different scores from the same user.
As you note the $addToSet operator will not work in this case as indeed a userId with a different vote value would be a different value and it's own unique member of the set.
So the best way to do this is to actually issue two update statements with complementary logic. Only one will actually be applied depending on the state of the document:
async.series(
[
// Try to update a matching element
function(callback) {
Model.update(
{ "_id": id, "ratings.by": user.id },
{ "$set": { "ratings.$.rating": rating } },
callback
);
},
// Add the element where it does not exist
function(callback) {
Model.update(
{ "_id": id, "ratings.by": { "$ne": user.id } },
{ "$push": { "ratings": { "by": user.id, "rating": rating } }},
callback
);
}
],
function(err,result) {
// all done
}
);
The principle is simple, try to match the userId present in the ratings array for the document and update the entry. If that condition is not met then no document is updated. In the same way, try to match the document where there is no userId present in the ratings array, if there is a match then add the element, otherwise there will be no update.
This does bypass the built in schema validation of mongoose, so you would have to apply your constraints manually ( or inspect the schema validation rules and apply manually ) but it is better than you current approach in one very important aspect.
When you .find() the document and call it back to your client application to modify using code as you are, then there is no guarantee that the document has not changed on the server from another process or request. So when you issue .save() the document on the server may no longer be in the state that it was when it was read and any modifications can overwrite the changes made there.
Hence while there are two operations to the server and not one ( and your current code is two operations anyway ), it is the lesser of two evils to manually validate than to possibly cause a data inconsistency. The two update approach will respect any other updates issued to the document possibly occurring at the same time.
I'm managing a MongoDB database for a building products store. The most immediate collection is products, right?
There are quite several products, however they all belong to one among a set of 5-8 categories and then to one subcatefory among a small set of subcategories.
For example:
-Electrical
*Wires
p1
p2
..
*Tools
p5
pn
..
*Sockets
p11
p23
..
-Plumber
*Pipes
..
*Tools
..
PVC
..
I will use Angular at web site client side to show whole products catalog, I think about AJAX for querying the right subset of products I want.
Then, I wonder whether I should manage one only collection like:
{
MainCategory1: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategory2: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategoryn: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
}
}
Or a single collection per each category. The number of documents might not be higher than 500. However I care about a balance for:
quick DB answer,
easy server side DB querying, and
client-side Angular code for rendering results to html.
I'm using mongodb node.js module, not Mongoose now.
What CRUD operations will I do?
Inserts of products, I'd also like to have a way to obtain autogenerated ids (maybe sequential) per each new register. However, as it might seem natural I wouldn't offer the _id to the user.
Querying the whole documents set of a subcategory. Maybe just obtaining a few attributes at first.
Querying whole or a specific subset of attributes of a document (product) in particular.
Modifying a product's attributes values.
I agree client side should get the easiest result to render. However, to nest categories into products is still a bad idea. The trade off is once you want to change, for example, the name of a category, it will be a disaster. And if you think about the possible usecases, for example:
list all categories
find all subcategories of a certain category
find all products in a certain category
You'll find it hard to do these stuff with your data structure.
I had same situation in my current project. So here's what I do for your reference.
First, categories should be in a separate collection. DON'T nest categories into each other, as it will complicate the procedure to find all subcategories. The traditional way for finding all subcategories is to maintain an idPath property. For example, your categories are divided into 3 levels:
{
_id: 100,
name: "level1 category"
parentId: 0, // means it's the top category
idPath: "0-100"
}
{
_id: 101,
name: "level2 category"
parentId: 100,
idPath: "0-100-101"
}
{
_id: 102,
name: "level3 category"
parentId: 101,
idPath: "0-100-101-102"
}
Note with idPath, parentId is not necessary anymore. It's for you to understand the structure easier.
Once you need to find all subcategories of category 100, simply do the query:
db.collection("category").find({_id: /^0-100-/}, function(err, doc) {
// whatever you want to do
})
With category stored in a separate collection, in your product you'll need to reference them by _id, just like when we use RDBMS. For example:
{
... // other fields of product
categories: [100, 101, 102, ...]
}
Now if you want to find all products in a certain category:
db.collection("category").find({_id: new RegExp("/^" + idPath + "-/"}, function(err, categories) {
var cateIds = _.pluck(categories, "_id"); // I'm using underscore to pluck category ids
db.collection("product").find({categories: { $in: cateIds }}, function(err, products) {
// products are here
}
})
Fortunately, category collection is usually very small, with only hundreds of records inside (or thousands). And it doesn't varies a lot. So you can always store a live copy of categories inside memory, and it can be constructed as nested objects like:
[{
id: 100,
name: "level 1 category",
... // other fields
subcategories: [{
id: 101,
... // other fields
subcategories: [...]
}, {
id: 103,
... // other fields
subcategories: [...]
},
...]
}, {
// another top1 category
}, ...]
You may want to refresh this copy every several hours, so:
setTimeout(3600000, function() {
// refresh your memory copy of categories.
});
That's all I get in mind right now. Hope it helps.
EDIT:
to provide int ID for each user, $inc and findAndModify is very useful. you may have a idSeed collection:
{
_id: ...,
seedValue: 1,
forCollection: "user"
}
When you want to get an unique ID:
db.collection("idSeed").findAndModify({forCollection: "user"}, {}, {$inc: {seedValue: 1}}, {}, function(err, doc) {
var newId = doc.seedValue;
});
The findAndModify is an atomic operator provided by mongodb. It will guarantee thread safety. and the find and modify actually happens in a "transaction".
2nd question is in my answer already.
query subsets of properties is described with mongodb Manual. NodeJS API is almost the same. Read the document of projection parameter.
update subsets is also supported by $set of mongodb operator.
I have two models in my app: Item and Comment. An Item can have many Comments, and a Comment instance contains a reference to an Item instance with key 'comment', to keep track of the relationship.
Now I have to send a JSON list of all Items with their Comment count when user requests on a particular URL.
function(req, res){
return Item.find()
.exec(function(err, items) {
return res.send(items);
});
};
I am not sure how can I "populate" comment count to the items. This seems to be a common problem and I tend to think there should be some nicer way of doing this job than brute force.
So please share your thoughts. How would you "populate" the Comment count to the Items?
check the MongoDB documentation and look for the method findAndModify() -- with it you can atomically update a document, e.g. add a comment and increment the document counter at the same time.
findAndModify
The findAndModify command atomically modifies and returns a single document. By default, the returned document does not include the modifications made on the update. To return the document with the modifications made on the update, use the new option.
Example
Use the update option, with update operators $inc for the counter, and $addToSet for adding the actual comment to an embedded array of comments.
db.runCommand(
{
findAndModify: "item",
query: { name: "MyItem", state: "active", rating: { $gt: 10 } },
sort: { rating: 1 },
update: { $inc: { commentCount: 1 },
$addToSet: {comments: new_comment} }
}
)
See:
MongoDB: findAndModify
MongoDB: Update Operators
I did some research on this issue and came up with following results. First, MongoDB docs suggest:
In general, use embedded data models when:
you have “contains” relationships between entities.
you have one-to-many relationships where the “many” objects always appear with or are viewed in the context of their parent documents.
So in my situation, it makes much more sense if Comments are embedded into Items, instead of having independent existence.
Nevertheless, I was curious to know the solution without changing my data model. As mentioned in MongoDB docs:
Referencing provides more flexibility than embedding; however, to
resolve the references, client-side applications must issue follow-up
queries. In other words, using references requires more roundtrips to
the server.
As multiple roundtrips are kosher now, I came up with following solution:
var showList = function(req, res){
// first DB roundtrip: fetch all items
return Item.find()
.exec(function(err, items) {
// second DB roundtrip: fetch comment counts grouped by item ids
Comment.aggregate({
$group: {
_id: '$item',
count: {
$sum: 1
}
}
}, function(err, agg){
// iterate over comment count groups (yes, that little dash is underscore.js)
_.each(agg, function( itr ){
// for each aggregated group, search for corresponding item and put commentCount in it
var item = _.find(items, function( item ){
return item._id.toString() == itr._id.toString();
});
if ( item ) {
item.set('commentCount', itr.count);
}
});
// send items to the client in JSON format
return res.send(items);
})
});
};
Agree? Disagree? Please enlighten me with your comments!
If you have a better answer, please post here, I'll accept it if I find it worthy.
I'm pretty new to couchDB and even after reading (latest archive as now deleted) http://wiki.apache.org/couchdb/How_to_store_hierarchical_data (via ‘Store the full path to each node as an attribute in that node's document’) it's still not clicking just yet.
Instead of using the full path pattern as described in the wiki I'm hoping to keep track of children as an array of UUIDs and the parent as a single UUID. I'm leaning towards this pattern so I can maintain the order of children by their positions in the children array.
Here are some sample documents in couch, buckets can contain buckets and items, items can only contain other items. (UUIDs abbreviated for clarity):
{_id: 3944
name: "top level bucket with two items"
type: "bucket",
parent: null
children: [8989, 4839]
}
{_id: 8989
name: "second level item with no sub items"
type: "item"
parent: 3944
}
{
_id: 4839
name: "second level bucket with one item"
type: "bucket",
parent: 3944
children: [5694]
}
{
_id: 5694
name: "third level item (has one sub item)"
type: "item",
parent: 4839,
children: [5390]
}
{
_id: 5390
name: "fourth level item"
type: "item"
parent: 5694
}
Is it possible to look up a document by an embedded document id within a map function?
function(doc) {
if(doc.type == "bucket" || doc.type == "item")
emit(doc, null); // still working on my key value output structure
if(doc.children) {
for(var i in doc.children) {
// can i look up a document here using ids from the children array?
doc.children[i]; // psuedo code
emit(); // the retrieved document would be emitted here
}
}
}
}
In an ideal world final JSON output would look something like.
{"_id":3944,
"name":"top level bucket with two items",
"type":"bucket",
"parent":"",
"children":[
{"_id":8989, "name":"second level item with no sub items", "type":"item", "parent":3944},
{"_id": 4839, "name":"second level bucket with one item", "type":"bucket", "parent":3944, "children":[
{"_id":5694", "name":"third level item (has one sub item)", "type":"item", "parent": 4839, "children":[
{"_id":5390, "name":"fourth level item", "type":"item", "parent":5694}
]}
]}
]
}
You can find a general discussion on the CouchDB wiki.
I have no time to test it right now, however your map function should look something like:
function(doc) {
if (doc.type === "bucket" || doc.type === "item")
emit([ doc._id, -1 ], 1);
if (doc.children) {
for (var i = 0, child_id; child_id = doc.children[i]; ++i) {
emit([ doc._id, i ], { _id: child_id });
}
}
}
}
You should query it with include_docs=true to get the documents, as explained in the CouchDB documentation: if your map function emits an object value which has {'_id': XXX} and you query view with include_docs=true parameter, then CouchDB will fetch the document with id XXX rather than the document which was processed to emit the key/value pair.
Add startkey=["3944"]&endkey["3944",{}] to get only the document with id "3944" with its children.
EDIT: have a look at this question for more details.
Can you output a tree structure from a view? No. CouchDB view queries return a list of values, there is no way to have them output anything other than a list. So, you have to deal with your map returning the list of all descendants of a given bucket.
You can, however, plug a _list post-processing function after the view itself, to turn that list back into a nested structure. This is possible if your values know the _id of their parent — the algorithm is fairly straightforward, just ask another question if it gives you trouble.
Can you grab a document by its id in the map function? No. There's no way to grab a document by its identifier from within CouchDB. The request must come from the application, either in the form of a standard GET on the document identifier, or by adding include_docs=true to a view request.
The technical reason for this is pretty simple: CouchDB only runs the map function when the document changes. If document A was allowed to fetch document B, then the emitted data would become invalid when B changes.
Can you output all descendants without storing the list of parents of every node? No. CouchDB map functions emit a set of key-value-id pairs for every document in the database, so the correspondence between the key and the id must be determined based on a single document.
If you have a four-level tree structure A -> B -> C -> D but only let a node know about its parent and children, then none of the nodes above know that D is a descendant of A, so you will not be able to emit the id of D with a key based on A and thus it will not be visible in the output.
So, you have three choices:
Grab only three levels (this is possible because B knows that C is a descendant of A), and grab additional levels by running the query again.
Somehow store the list of descendants of every node within the node (this is costly).
Store the list of parents of every node within the node.