I have the following postSchema and would like to fetch datas depending on updatedAt field. When people make comment I increase numberofreply by one and its updatedAt is updated. How should I fetch datas for infinite scroll and should I use indexing for this operation ?
const postScheme = mongoose.Schema(
{
post: {
type: String,
trim: true,
},
numberOfReply: {
type: Number,
default: 0
},
owner: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
},
hasImage: {
type: Boolean,
},
image: {
type: String,
trim: true
},
},
{timestamps: true}
)
this is what I use to fetch first page
Post.Post.find({}).sort({'updatedAt': -1}).limit(10).populate('owner').populate('coin').exec(function (err, posts) {
res.send(posts)
})
this is for infinite scroll
Post.Post.find({isCoin: true, updatedAt: {$lt: req.body.last}}).sort({'updatedAt': -1}).populate('owner').limit(
10).exec(function (err, posts) {
res.send(posts)
})
The limit, skip syntax is Mongo's way of paginating through data so you got that worked out, from a code perspective you can't really change anything to work better.
should I use indexing for this operation
Most definitely yes, indexes are the way to make this operation be efficient. otherwise Mongo will do a collection scan for each pagination which is very inefficient.
So what kind of index you should built? Well you want to build a compound index that will allow the query to both satisfy the query and the sort conditions, and in your case that is on the isCoin and updateAt fields, like so:
db.collection.createIndex( { isCoin: 1, updateAt: -1 } )
A few improvements you can make to make the index a bit more efficient (for this specific query) are:
Consider creating the index as a sparse index, this will only index documents with both fields in them, obviously if the data doesn't include this options you can ignore it.
This one has a few caveats in it, but partial indexes are designed for this case, to improve query performance by indexing a smaller subset of the data. and in your case you can add this option
{ partialFilterExpression: { isCoin: true } }
with that said this will limit your index usage for other queries so it might not be the ultimate choice for you.
Related
I have a tricky query that hits my MongoDB know-how. Here the simplified szenario.
We have a collection Restaurant and a collection Subsidary.
They look roughly like this (simplified - using mongoose):
const restaurantSchema = new Schema(
{
name: { type: String, required: true },
categories: { type: [String], required: true },
...
})
const subsidarySchema = new Schema(
{
restaurant: { type: Schema.Types.ObjectId, ref: 'Restaurant' },
location: {
type: { type: String, enum: ['Point'], required: true },
coordinates: { type: [Number], required: true },
},
...
})
What is required:
Always: Find restaurants that have a subsidary within 3.5 KM radius and sort by distance.
Sometimes filter those restaurants also by a string that should fuzy-match the Restaurant name.
Apply further filters and pagination (e.g. filter by categories, ...)
I'm trying to tackle this with a mongodb aggregation. The problem:
The aggregation pipeline stages geoNear and text require each to be first in the pipeline - which means they exclude each other.
Here my thought so far:
Start aggregation with subsidary, $geoNear stage first. This cuts away already all restaurants outside the 3.5 KM.
$group the subsidaries by restaurant and keep the minimal distance value per cluster.
$lookup to get the matchin restaurant for each cluster. Maybe $unwind here.
??? Here the text/search match should be, fuzy-matching the restaurants' name. ???
$match for other values (category, openingHours, ...)
$sort and $limit and $skip for sorting andd pagination.
Here the same as illustration.
Question
Does this approach make sense? What would be a possible way to implement stage 4?
I was searching a lot but there seems no way to use something like { $match: { $text: { $search: req.query.name } } } as a 4th stage.
An alternative would be to run a second query before that just handles the text search and then build an intersection. This could lead to a massive amount of restaurant IDs being passed in that stage. Is that something mongodb could handle?
I'm very thankful for your comments!
Some ways around the requirement that both text search and geo query must be the first stage:
Use text search as the first stage, then manually calculate the distance using $set/$expr in a subsequent stage.
Use geo query as the first stage, then perform text filtering in your application (allowing you also to use any text matching/similarity algorithm you like).
I know there are a lot of similar questions, but they're too old and since Mongodb has evolved alot for last 5-6 years I am looking for a good schema design.
Goal: I want to have a post with comments by users.
What I have designed so far is:
Separate post model:
const projectSchema = new mongoose.Schema({
user: { type: mongoose.Schema.Types.ObjectId, required: true, ref: 'User' },
title: { type: String, required: true },
image: { type: String, default: undefined },
description: { type: String, required: true, minLength: 200, maxlength: 500 },
comments: [{
type: mongoose.Schema.Types.ObjectId, ref: 'Comment'
}],
state: { type: Boolean, default: true },
collaborators: { type: Array, default: [] },
likes: { type: Array, default: [] }
})
And a separate comments model:
const commentSchema = new mongoose.Schema({
comment: { type: String, required: true },
project: { type: String, required: true, ref: 'Project' },
user: { type: String, required: true, ref: 'User' }
})
The reason I am going for the relational approach is because if the comments increase to say 10,000 in number, it will increase the size of schema by alot.
This way, no matter how many comments we can populate them using their IDs, also, we will have different collection for comments iself.
Reference : one-to-many
Is this a good approach for my project?
The way I am querying the comments from one single post:
const project = await Project.findById(
new mongoose.Types.ObjectId(req.params.projectId)
).populate({
path: 'comments',
populate: { path: 'user' }
}).lean()
Whether it's a good design depends how many comments per post do you expect, and what query will be performed on your app.
There's a good blog from mongodb.com on how to design your database schema
The common design is:
One to Few (Use embed)
One to Many (Use embed reference)
One to squillions (The usual relational database one-to-many approach)
Summary is:
So, even at this basic level, there is more to think about when designing a MongoDB schema than when designing a comparable relational schema. You need to consider two factors:
Will the entities on the āNā side of the One-to-N ever need to stand alone?
What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?
Based on these factors, you can pick one of the three basic One-to-N schema designs:
Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions
There is also a blog about advanced schema design which is worth the read.
You seems to be using the two-way referencing approach.
The difference between yours and one-to-squillions is you are not only storing post id reference on comment document, but also storing comment ids as reference in post document, while one-to-squillions will only stores project id reference in comment document.
Using your approach will be better if you need to get comment ids of a post. But the disadvantage is you need to run two queries when deleting or creating a comment, one to delete / create comment id from post, and the other one to delete / create the comment document it self. It's also will be slower to find "which post belongs to given comment id".
While using one-to-squillions would gives you worse performance when performing a query to get comments by post id. But you can mitigate this by properly indexing your comment collection.
so I'm new to Programming and Mongoose, just learning the basics.
and now I get the task to make a dynamic query with criteria such as
query => gets all the data I input into the query in the parameter
example:
query: [ios, android] => i get all ios data, and android
query_else => gets all the data I input other than the parameter in the query
example:
query_else: [ios, android] => I got all the data OTHER ios and android
if I try to use .find, I can only get 1 specific data I'm looking for, but it can't if I enter 2 queries in it the results are []
maybe it's not the answer I want to look for, but how to think about solving this case, because of my lack of knowledge about mongoose and my lack of knowledge about coding that makes me deadlock thinking
Thank you in advance
account log schema in activity-log collection:
const actLogSchema = new Schema(
{
account_id: {
type: Schema.Types.ObjectId,
ref: 'account'
},
date_created: { type: String, default: Date.now() },
ip: String,
location: String,
device: String,
type: String,
label: String,
action: String,
description: String
},
{ versionKey: false }
);
i assume query is part of {date_created, ip, location, device, type, action, label, description}
and query_else is equal to query but the different it is values.. like the example above
I was wondering if there is way to force a unique collection entry but only if entry is not null.
e
Sample schema:
var UsersSchema = new Schema({
name : {type: String, trim: true, index: true, required: true},
email : {type: String, trim: true, index: true, unique: true}
});
'email' in this case is not required but if 'email' is saved I want to make sure that this entry is unique (on a database level).
Empty entries seem to get the value 'null' so every entry wih no email crashes with the 'unique' option (if there is a different user with no email).
Right now I'm solving it on an application level, but would love to save that db query.
thx
As of MongoDB v1.8+ you can get the desired behavior of ensuring unique values but allowing multiple docs without the field by setting the sparse option to true when defining the index. As in:
email : {type: String, trim: true, index: true, unique: true, sparse: true}
Or in the shell:
db.users.ensureIndex({email: 1}, {unique: true, sparse: true});
Note that a unique, sparse index still does not allow multiple docs with an email field with a value of null, only multiple docs without an email field.
See http://docs.mongodb.org/manual/core/index-sparse/
tl;dr
Yes, it is possible to have multiple documents with a field set to null or not defined, while enforcing unique "actual" values.
requirements:
MongoDB v3.2+.
Knowing your concrete value type(s) in advance (e.g, always a string or object when not null).
If you're not interested in the details, feel free to skip to the implementation section.
longer version
To supplement #Nolan's answer, starting with MongoDB v3.2 you can use a partial unique index with a filter expression.
The partial filter expression has limitations. It can only include the following:
equality expressions (i.e. field: value or using the $eq operator),
$exists: true expression,
$gt, $gte, $lt, $lte expressions,
$type expressions,
$and operator at the top-level only
This means that the trivial expression {"yourField"{$ne: null}} cannot be used.
However, assuming that your field always uses the same type, you can use a $type expression.
{ field: { $type: <BSON type number> | <String alias> } }
MongoDB v3.6 added support for specifying multiple possible types, which can be passed as an array:
{ field: { $type: [ <BSON type1> , <BSON type2>, ... ] } }
which means that it allows the value to be of any of a number of multiple types when not null.
Therefore, if we want to allow the email field in the example below to accept either string or, say, binary data values, an appropriate $type expression would be:
{email: {$type: ["string", "binData"]}}
implementation
mongoose
You can specify it in a mongoose schema:
const UsersSchema = new Schema({
name: {type: String, trim: true, index: true, required: true},
email: {
type: String, trim: true, index: {
unique: true,
partialFilterExpression: {email: {$type: "string"}}
}
}
});
or directly add it to the collection (which uses the native node.js driver):
User.collection.createIndex("email", {
unique: true,
partialFilterExpression: {
"email": {
$type: "string"
}
}
});
native mongodb driver
using collection.createIndex
db.collection('users').createIndex({
"email": 1
}, {
unique: true,
partialFilterExpression: {
"email": {
$type: "string"
}
}
},
function (err, results) {
// ...
}
);
mongodb shell
using db.collection.createIndex:
db.users.createIndex({
"email": 1
}, {
unique: true,
partialFilterExpression: {
"email": {$type: "string"}
}
})
This will allow inserting multiple records with a null email, or without an email field at all, but not with the same email string.
Just a quick update to those researching this topic.
The selected answer will work, but you might want to consider using partial indexes instead.
Changed in version 3.2: Starting in MongoDB 3.2, MongoDB provides the
option to create partial indexes. Partial indexes offer a superset of
the functionality of sparse indexes. If you are using MongoDB 3.2 or
later, partial indexes should be preferred over sparse indexes.
More doco on partial indexes: https://docs.mongodb.com/manual/core/index-partial/
Actually, only first document where "email" as field does not exist will get save successfully. Subsequent saves where "email" is not present will fail while giving error ( see code snippet below). For the reason look at MongoDB official documentation with respect to Unique Indexes and Missing Keys here at http://www.mongodb.org/display/DOCS/Indexes#Indexes-UniqueIndexes.
// NOTE: Code to executed in mongo console.
db.things.ensureIndex({firstname: 1}, {unique: true});
db.things.save({lastname: "Smith"});
// Next operation will fail because of the unique index on firstname.
db.things.save({lastname: "Jones"});
By definition unique index can only allow one value to be stored only once. If you consider null as one such value it can only be inserted once! You are correct in your approach by ensuring and validating it at application level. That is how it can be done.
You may also like to read this http://www.mongodb.org/display/DOCS/Querying+and+nulls
After being so used to SQL, I have came across this problem with mongoDB.
First, I am using mongoose.
Now, the problem. I have a collection named User.
var UserSchema = new Schema ({
id : ObjectId,
name : {type : String, trim : true, required : true},
email: {type:String, trim:true, required: true, index: { unique: true }},
password: {type:String, required: true, set: passwordToMD5},
age: {type:Number, min: 18, required: true, default: 18},
gender: {type: Number, default:0, required: true},
height: {type: Number, default:180, min: 140, max: 220},
_eye_color: {type: ObjectId, default: null},
location: {
lon: {type: Number, default: 0},
lat: {type: Number, default: 0}
},
status: {type:Number, required: true, default:0}
},{
toObject: { virtuals: true },
toJSON: { virtuals: true },
collection:"user"});
Now I need to select all users from this collection and sort them by special attribude (say "rank"). This rank is calculated with certain logic depending of their distance from a point, age compared with given age, etc...
So now I was wondering how to select this rank and then use it in sorting? I have tried to use virtuals, they are handy to count additional info, but unfortunately, it is not possible to sort the find() results by a virtual field.
Of course I can calculate this rank in a virtual, then select all records, and after that, in callback, do some javascript. But in this case, as I select all the users then sort and then limit, the javascript part might take too long...
I was thinking to use mapreduce, but I am not sure it will do what I want.
Can someone give me a hint if my task is possible to do in mongoDB/mongoose?
EDIT 1
I have also tried to use aggregation framework, and at first it seemed to be the best solution with the $project ability. But then, when I needed to do rank calculations, I found out that aggregation does not support a lot of mathematical functions like sin, cos and sqrt. And also it was impossible to use pre-defined usual javascript functions in projection. I mean,the function got called, but I was not able to pass current record fields to it.
{$project: {
distance_from_user: mUtils.getDistance(point, this.location)
}
Inside function the second attr was "undefined".
So I guess it is impossible to do my rank calculations with aggregation framework.
EDIT 2
Ok, I know everyone tells me not to use mapreduce as it is not good for realtime queries, but as I cannot use aggregation, I think I'll try mapreduce. So Let's say I have this map reduce.
function map() {
emit(1, // Or put a GROUP BY key here
{name: this.name, // the field you want stats for
age: this.age,
lat: this.location.lat,
lon: this.location.lon,
distance:0,
rank:0
});
}
function reduce(key, values) {
return val;
}
function finalize(key, value){
return value;
}
var command = {'mapreduce': "user", 'map': map.toString(), 'reduce': reduce.toString(), query:{$and: [{gender: user_params.gender}, {_id: {$ne: current_user_id}}]}, 'out': {inline:1}};
mongoose.connection.db.executeDbCommand(command, function(error, result){
if(error) {
log(error);
return;
}
log(result);
return;
});
What should I write in reduce (or maybe change map) to calculate rank for every user?
The only real solution is to calculate your rank for each document and storing it in the document. Since this value will be constant as long as the values in your document remain constant you can simply calculate this value whenever you update the fields that affect it.
Map/reduce certainly isn't a good solution for this nor is any other type of aggregation. Precalculating your rank and storing it with the document is the only option that scales if you're using MongoDB.
You are aware of amount of computations such thing would need - if you'd do it every time user logs in, you'll have interesting load peaks when lots of people would log in at shorter amount of time - and your page (interface) would be heavily resources-bound (which is not good).
I'd recommend you something a bit different - keeping ranking for every logged-on user and updating them in intervals: keeping "short session" and "long session" (long session - the one you use in web browser and short - "online, currently using the site") and generating ranks regularly only for "shortly-active" users and rarely for the logged on in the long session. Something like every five minutes. Much more scallable - and if user would be unhappy about him not having his rank counted - you may always tweak the sys to count his ranks on demand.
You might use mapredurce in such case - your map function should only emit the data you need for counting the rank for a given user (like age, lat, long, whatever you need) AND a result (rank) for a tested user (emit it empty). For reduce function you'd need to look at sorting with mapreduce (it highly depends on the way you create the rank) - also you'd count the rank (or some kind of a sub-value) for the other users.
It look like a good use case for MongoDB + Hadoop.
This presentation show some of the possibilities of this combination.