I'm creating a system that users can write review about an item and rate it from 0-5. I'm using MongoDB for this. And my problem is to find the best solution to calculate the total rating in product schema. I don't think querying all comments to get the size and dividing it by total rating is a good solution. Here is my Schema. I appreciate any advice:
Comments:
var commentSchema = new Schema({
Rating : { type: Number, default:0 },
Helpful : { type: Number, default:0 },
User :{
type: Schema.ObjectId,
ref: 'users'
},
Content: String,
});
Here is my Item schema:
var productSchema = new Schema({
//id is barcode
_id : String,
Rating : { type: Number, default:0 },
Comments :[
{
type: Schema.ObjectId,
ref: 'comments'
}
],
});
EDIT: HERE is the solution I got from another topic : calculating average in Mongoose
You can get the total using the aggregation framework. First you use the $unwind operator to turn the comments into a document stream:
{ $unwind: "$Comments" }
The result is that for each product-document is turned into one product-document per entry in its Comments array. That comment-entry is turned into a single object under the field Comments, all other fields are taken from the originating product-document.
Then you use $group to rejoin the documents for each product by their _id, while you use the $avg operator to calculate the average of the rating-field:
{ $group: {
_id: "$_id",
average: { $avg: "$Comments.Rating" }
} }
Putting those two steps into an aggregation pipeline calculates the average rating for every product in your collection. You might want to narrow it down to one or a small subset of products, depending on what the user requested right now. To do this, prepend the pipeline with a $match step. The $match object works just like the one you pass to find().
The underlying question that it would be useful to understand is why you don't think that finding all of the ratings, summing them up, and dividing by the total number is a useful approach. Understanding the underlying reason would help drive a better solution.
Based on the comments below, it sounds like your main concern is performance and the need to run map-reduce (or another aggregation framework) each time a user wants to see total ratings.
This person addressed a similar issue here: http://markembling.info/2010/11/using-map-reduce-in-a-mongodb-app
The solution they identified was to separate out the execution of the map-reduce function from the need in the view to see the total value. In this case, the optimal solution would be to run the map-reduce periodically and store the results in another collection, and have the average rating based on the collection that stores the averages, rather than doing the calculation in real-time each time.
As I mentioned in the previous version of this answer, you can improve performance further by limiting the map-reduce to addresing ratings that were created or updated more recently, or since the last map-reduce aggregation.
Related
I'm currently learning some backend stuff using an Udemy course and I have an example website that lets you add campgrounds (campground name, picture, description, etc.) and review them. I'm using the Express framework for Node.js, and Mongoose to access the database.
My campground schema looks like:
const campgroundSchema = new mongoose.Schema({
name: String,
image: String,
description: String,
price: String,
comments: [
{
type: mongoose.Schema.Types.ObjectId,
ref: "Comment"
}
],
rating: {type: Number, default: 0}
});
And my comment/review schema looks like:
const commentSchema = new mongoose.Schema({
text: String,
rating: {
type: Number,
min: 1,
max: 5,
validate: {validator: Number.isInteger}
},
campground: {type: mongoose.Schema.Types.ObjectId, ref: "Campground"}
});
Campgrounds and Comments also have references to a User but I've left that out for simplicity.
I'm looking to know the best practice for updating and displaying the campground average rating.
The method used by the tutorial I'm following is to recalculate the average rating each time a comment is added, changed, or deleted. Here's how it would work for a new comment:
Campground.findById(campgroundId).populate("comments").exec(function(err, campground) {
Comment.create(newComment, function(err, comment) {
campground.comments.push(comment);
campground.rating = calculateRating(campground.comments);
campground.save();
});
});
"calculateRating" iterates through the comment array, gets the total sum, and returns the sum divided by the number of comments.
My gut instinct tells me that there should be a way to make the "rating" field of Campground perform the functionality of the "calculateRating" function, so that I don't have to update the rating every time a comment is added, changed, or removed. I've been poking around documentation for a while now, but since I'm pretty new to Mongoose and databases in general, I'm a bit lost on how to proceed.
In summary: I want to add functionality to my Campground model so that when I access its rating, it automatically accesses each comment referenced in the comments array, sums up their ratings, and returns the average.
My apologies if any of my terminology is incorrect. Any tips on how I would go about achieving this would be very much appreciated!
Love,
Cal
I think what you are trying to do is get a virtual property of the document that gets the average rating but it does not get persisted to the mongo database.
according to mongoosejs :- Virtuals are document properties that you can get and set but that do not get persisted to MongoDB. They are set on the schema.
You can do this:
CampgroundSchema.virtual('averageRating').get(function() {
let ratings = [];
this.comments.forEach((comment) => ratings.push(comment.rating));
return (ratings.reduce((a,b)=>a+b)/ratings.length).toFixed(2);
});
After that on your view engine after finding campgrounds or a campground, all you need to call is ; campground.averageRating;
Read more here : https://mongoosejs.com/docs/guide.html#virtuals
also note that you can not make any type of query on virtual properties.
I have a tricky query that hits my MongoDB know-how. Here the simplified szenario.
We have a collection Restaurant and a collection Subsidary.
They look roughly like this (simplified - using mongoose):
const restaurantSchema = new Schema(
{
name: { type: String, required: true },
categories: { type: [String], required: true },
...
})
const subsidarySchema = new Schema(
{
restaurant: { type: Schema.Types.ObjectId, ref: 'Restaurant' },
location: {
type: { type: String, enum: ['Point'], required: true },
coordinates: { type: [Number], required: true },
},
...
})
What is required:
Always: Find restaurants that have a subsidary within 3.5 KM radius and sort by distance.
Sometimes filter those restaurants also by a string that should fuzy-match the Restaurant name.
Apply further filters and pagination (e.g. filter by categories, ...)
I'm trying to tackle this with a mongodb aggregation. The problem:
The aggregation pipeline stages geoNear and text require each to be first in the pipeline - which means they exclude each other.
Here my thought so far:
Start aggregation with subsidary, $geoNear stage first. This cuts away already all restaurants outside the 3.5 KM.
$group the subsidaries by restaurant and keep the minimal distance value per cluster.
$lookup to get the matchin restaurant for each cluster. Maybe $unwind here.
??? Here the text/search match should be, fuzy-matching the restaurants' name. ???
$match for other values (category, openingHours, ...)
$sort and $limit and $skip for sorting andd pagination.
Here the same as illustration.
Question
Does this approach make sense? What would be a possible way to implement stage 4?
I was searching a lot but there seems no way to use something like { $match: { $text: { $search: req.query.name } } } as a 4th stage.
An alternative would be to run a second query before that just handles the text search and then build an intersection. This could lead to a massive amount of restaurant IDs being passed in that stage. Is that something mongodb could handle?
I'm very thankful for your comments!
Some ways around the requirement that both text search and geo query must be the first stage:
Use text search as the first stage, then manually calculate the distance using $set/$expr in a subsequent stage.
Use geo query as the first stage, then perform text filtering in your application (allowing you also to use any text matching/similarity algorithm you like).
I have following Mongo Schemas(truncated to hide project sensitive information) from a Healthcare project.
let PatientSchema = mongoose.Schema({_id:String})
let PrescriptionSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let ReportSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let EventsSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
There is ui screen from the mobile and web app called Health history, where I need to paginate the entries from prescription, reports and events sorted based on createAt. So I am building a REST end point to get this heterogeneous data. How do I achieve this. Is it possible to create a "View" from multiple schema models so that I won't load the contents of all 3 schema to fetch one page of entries. The schema of my "View" should look like below so that I can run additional queries on it (e.g. find last report)
{recordType:String,/* prescription/report/event */, createdDate:Date, data:Object/* content from any of the 3 tables*/}
I can think of three ways to do this.
Imho the easiest way to achieve this is by using an aggregation something like this:
db.Patients.aggregate([
{$match : {_id: <somePatientId>},
{
$lookup:
{
from: Prescription, // replicate this for Report and Event,
localField: _id,
foreignField: patient,
as: prescriptions // or reports or events,
}
},
{ $unwind: prescriptions }, // or reports or events
{ $sort:{ $createDate : -1}},
{ $skip: <positive integer> },
{ $limit: <positive integer> },
])
You'll have to adapt it further, to also get the correct createdDate. For this, you might want to look at the $replaceRoot operator.
The second option is to create a new "meta"-collection, that holds your actual list of events, but only holds a reference to your patient as well as the actual event using a refPath to handle the three different event types. This solution is the most elegant, because it makes querying your data way easier, and probably also more performant. Still, it requires you to create and handle another collection, which is why I didn't want to recommend this as the main solution, since I don't know if you can create a new collection.
As a last option, you could create virtual populate fields in Patient, that automatically fetch all prescriptions, reports and events. This has the disadvantage that you can not really sort and paginate properly...
I have an article model like this:
var ArticleSchema = new Schema({
type: String
,title: String
,content: String
,hashtags: [String]
,comments: [{
type: Schema.ObjectId
,ref: 'Comment'
}]
,replies: [{
type: Schema.ObjectId
,ref: 'Reply'
}]
, status: String
,statusMeta: {
createdBy: {
type: Schema.ObjectId
,ref: 'User'
}
,createdDate: Date
, updatedBy: {
type: Schema.ObjectId
,ref: 'User'
}
,updatedDate: Date
,deletedBy: {
type: Schema.ObjectId,
ref: 'User'
}
,deletedDate: Date
,undeletedBy: {
type: Schema.ObjectId,
ref: 'User'
}
,undeletedDate: Date
,bannedBy: {
type: Schema.ObjectId,
ref: 'User'
}
,bannedDate: Date
,unbannedBy: {
type: Schema.ObjectId,
ref: 'User'
}
,unbannedDate: Date
}
}, {minimize: false})
When user creates or modify the article, I will create hashtags
ArticleSchema.pre('save', true, function(next, done) {
var self = this
if (self.isModified('content')) {
self.hashtags = helper.listHashtagsInText(self.content)
}
done()
return next()
})
For example, if user write "Hi, #greeting, i love #friday", I will store ['greeting', 'friday'] in hashtags list.
I am think about creating an index for hashtags to make queries on hashtags faster. But from mongoose manual, I found this:
When your application starts up, Mongoose automatically calls
ensureIndex for each defined index in your schema. Mongoose will call
ensureIndex for each index sequentially, and emit an 'index' event on
the model when all the ensureIndex calls succeeded or when there was
an error. While nice for development, it is recommended this behavior
be disabled in production since index creation can cause a significant
performance impact. Disable the behavior by setting the autoIndex
option of your schema to false.
http://mongoosejs.com/docs/guide.html
So is indexing faster or slower for mongoDB/Mongoose?
Also, even if I create index like
hashtags: { type: [String], index: true }
How can I make use of the index in my query? Or will it just magically become faster for normal queries like:
Article.find({hashtags: 'friday'})
You are reading it wrong
You are misreading the intent of the quoted block there as to what .ensureIndex() ( now deprecated, but still called by mongoose code ) actually does here in the context.
In mongoose, you define an index either at the schema or model level as is appropriate to your design. What mongoose "automatically" does for you is on connection it inpects each registered model and then calls the appropriate .ensureIndex() methods for the index definitions provided.
What does this actually do?
Well, in most cases, being after you have already started up your application before and the .ensureIndexes() method was run is Absolutely Nothing. That is a bit of an overstatement, but it more or less rings true.
Because the index definition has already been created on the server collection, a subsesquent call does not do anything. I.e, it does not drop the index and "re-create". So the real cost is basically nothing, once the index itself has been created.
Creating indexes
So since mongoose is just a layer on top of the standard API, the createIndex() method contains all the details of what is happening.
There are some details to consider here, such as that an index build can happen in the "background", and while this is less intrusive to your application it does come at it's own cost. Notably that the index size from "background" generation will be larger than if you built it n the foreground, blocking other operations.
Also all indexes come at a cost, notably in terms of disk usage as well as an additional cost of writing the additional information outside of the collection data itself.
The adavantages of an index are that it is much faster to "search" for values contained within an index than to seek through the whole collection and match the possible conditions.
These are the basic "trade-offs" associated with indexes.
Deployment Pattern
Back to the quoted block from the documentation, there is a real intent behind this advice.
It is typical in deployment patterns and particularly with data migrations to do things in this order:
Populate data to relevant collections/tables
Enable indexes on the collection/table data relevant to your needs
This is because there is a cost involved with index creation, and as mentioned earlier it is desirable to get the most optimum size from the index build, as well as avoid having each document insertion also having the overhead of writing an index entry when you are doing this "load" in bulk.
So that is what indexes are for, those are the costs and benefits and the message in the mongoose documentation is explained.
In general though, I suggest reading up on Database Indexes for what they are and what they do. Think of walking into a library to find a book. There is a card index there at the entrance. Do you walk around the library to find the book you want? Or do you look it up in the card index to find where it is? That index took someone time to create and also keep it updated, but it saves "you" the time of walking around the whole library just so you can find your book.
After being so used to SQL, I have came across this problem with mongoDB.
First, I am using mongoose.
Now, the problem. I have a collection named User.
var UserSchema = new Schema ({
id : ObjectId,
name : {type : String, trim : true, required : true},
email: {type:String, trim:true, required: true, index: { unique: true }},
password: {type:String, required: true, set: passwordToMD5},
age: {type:Number, min: 18, required: true, default: 18},
gender: {type: Number, default:0, required: true},
height: {type: Number, default:180, min: 140, max: 220},
_eye_color: {type: ObjectId, default: null},
location: {
lon: {type: Number, default: 0},
lat: {type: Number, default: 0}
},
status: {type:Number, required: true, default:0}
},{
toObject: { virtuals: true },
toJSON: { virtuals: true },
collection:"user"});
Now I need to select all users from this collection and sort them by special attribude (say "rank"). This rank is calculated with certain logic depending of their distance from a point, age compared with given age, etc...
So now I was wondering how to select this rank and then use it in sorting? I have tried to use virtuals, they are handy to count additional info, but unfortunately, it is not possible to sort the find() results by a virtual field.
Of course I can calculate this rank in a virtual, then select all records, and after that, in callback, do some javascript. But in this case, as I select all the users then sort and then limit, the javascript part might take too long...
I was thinking to use mapreduce, but I am not sure it will do what I want.
Can someone give me a hint if my task is possible to do in mongoDB/mongoose?
EDIT 1
I have also tried to use aggregation framework, and at first it seemed to be the best solution with the $project ability. But then, when I needed to do rank calculations, I found out that aggregation does not support a lot of mathematical functions like sin, cos and sqrt. And also it was impossible to use pre-defined usual javascript functions in projection. I mean,the function got called, but I was not able to pass current record fields to it.
{$project: {
distance_from_user: mUtils.getDistance(point, this.location)
}
Inside function the second attr was "undefined".
So I guess it is impossible to do my rank calculations with aggregation framework.
EDIT 2
Ok, I know everyone tells me not to use mapreduce as it is not good for realtime queries, but as I cannot use aggregation, I think I'll try mapreduce. So Let's say I have this map reduce.
function map() {
emit(1, // Or put a GROUP BY key here
{name: this.name, // the field you want stats for
age: this.age,
lat: this.location.lat,
lon: this.location.lon,
distance:0,
rank:0
});
}
function reduce(key, values) {
return val;
}
function finalize(key, value){
return value;
}
var command = {'mapreduce': "user", 'map': map.toString(), 'reduce': reduce.toString(), query:{$and: [{gender: user_params.gender}, {_id: {$ne: current_user_id}}]}, 'out': {inline:1}};
mongoose.connection.db.executeDbCommand(command, function(error, result){
if(error) {
log(error);
return;
}
log(result);
return;
});
What should I write in reduce (or maybe change map) to calculate rank for every user?
The only real solution is to calculate your rank for each document and storing it in the document. Since this value will be constant as long as the values in your document remain constant you can simply calculate this value whenever you update the fields that affect it.
Map/reduce certainly isn't a good solution for this nor is any other type of aggregation. Precalculating your rank and storing it with the document is the only option that scales if you're using MongoDB.
You are aware of amount of computations such thing would need - if you'd do it every time user logs in, you'll have interesting load peaks when lots of people would log in at shorter amount of time - and your page (interface) would be heavily resources-bound (which is not good).
I'd recommend you something a bit different - keeping ranking for every logged-on user and updating them in intervals: keeping "short session" and "long session" (long session - the one you use in web browser and short - "online, currently using the site") and generating ranks regularly only for "shortly-active" users and rarely for the logged on in the long session. Something like every five minutes. Much more scallable - and if user would be unhappy about him not having his rank counted - you may always tweak the sys to count his ranks on demand.
You might use mapredurce in such case - your map function should only emit the data you need for counting the rank for a given user (like age, lat, long, whatever you need) AND a result (rank) for a tested user (emit it empty). For reduce function you'd need to look at sorting with mapreduce (it highly depends on the way you create the rank) - also you'd count the rank (or some kind of a sub-value) for the other users.
It look like a good use case for MongoDB + Hadoop.
This presentation show some of the possibilities of this combination.