Can we create completely separate indexes for completely separate queries on the same collection?
I want an efficient query for users retrieving their activities using an index like so
index{ userDBID: 1 }
Example query
ActivityModel.find({ userDBID }).lean();
I want a separate efficient query for entire app statistics which gets activities also, but needs use a separate compound index like so
index{season: 1, matchID: 1}
Example queries
ActivityModel.find({ season, matchID }).lean()
ActivityModel.find({ season }).lean();
I am finding it hard to find a solid high-quality answer. I know hint() seems to be a solution, but I am sceptical about that one.
Daniel
Of course you can.
You can just add:
schema.index({ userDBID: 1 });
schema.index({ season: 1, matchID: 1 });
right after your schema declaration, before saving the Model with mongoose.model('Model', schema);.
You will see (after a while) the new schema added in the DB. If you use an inspection tool like MongoDB Compass you'll even have a visual representation.
I am using this efficiently in a production app so I am certain of this (just today's usage):
http://prntscr.com/qj1n2o
Related
I am reading through mongo docs fro nodejs driver, particularly this index section https://www.mongodb.com/docs/drivers/node/current/fundamentals/indexes/#geospatial-indexes and it looks like all of the indexes that they mention are for sortable / searchable data. So I wanted to ask if I need indexes for following use case:
I have this user document structure
{
email: string,
version: number,
otherData: ...
}
As far as I understand I can query each user by _id and this already has default unique index applied to it? I alos want to query user by email as well, so I created following unique index
collection.createIndex({ email: 1 }, { unique: true })
Is my understanding correct here that by creating this index I guarantee thaa:
Email is always unique
My queries like collection.findOne({email: 'my#email.com'}) are optimised?
Next, I want to perform update operations on user documents, but only on specific versions, so:
collection.updateOne({email: '...', version: 2}, update)
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
Yes, the unique constraint happens at the db layer so by definition this will be unique, It is worth mentioning that this can affect insert/update performance as this check has to be executed on each of these instances - from my experience you only start feeling this overhead in larger scale ( hundreds of millions of documents in a single collection + thousands of inserts a minutes ).
Yes. there is no other way to optimize this further.
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
You want to create a compound index, the syntax will looks like this:
collection.createIndex({ email: 1, version: 1 }, { unique: true })
I will just say that by definition the (first) email index ensures uniqueness, so any additional filtering you add to the query and index will not really affect anything as there will always be only 1 of those emails in the DB. Basically why bother adding a "version" field to the query? if you need it for filtering that's fine but then you won't be needing to alter the existing index.
I have three collections in MongoDB
achievements
students
student_achievements
achievements is a list of achievements a students can achieve in an academic year while
students collections hold data list of students in the school.
student_achievements holds documents where each documents contains studentId & achievementId.
I have an interface where i use select2 multiselect to allocate one or more achievements from achievements to students from students and save it to their collection student_achievements, right now to do this i populate select2 with available achievements from database. I have also made an arrangement where if a student is being allocated same achievement again the system throws an error.
what i am trying to achieve is if an achievement is allocated to student that shouldn't be available in the list or removed while fetching the list w.r.t student id,
what function in mongodb or its aggregate framework can i use to achieve this i.e to compare to collections and remove out the common.
Perhaps your data-structure could be made different to make the problem easier to solve. MongoDB is a NoSQL schemaless store, don't try to make it be like a relational database.
Perhaps we could do something like this:
var StudentSchmea = new Schema({
name: String,
achievements: [{ type: Schema.Types.ObjectId, ref: 'Achivement' }]
});
Then you can do something like this which will only add the value if it is unique to the achievements array:
db.student.update(
{ _id: 1 },
{ $addToSet: { achievements: <achivement id> } }
)
If you are using something like Mongoose you can also write your own middleware to remove orphaned docs:
AchivementSchema.post('remove', function(next) {
// Remove all references to achievements in Student schema
});
Also, if you need to verify that the achievement exists before adding it to the set, you can do a findOne query before updating/inserting to verify.
Even with the post remove hook in place, there are certain cases where you will end up with orphaned relationships potentially. The best thing to do for those situations is to have a regularly run cron task to to do cleanup when needed. These are some of the tradeoffs you encounter when using a NoSQL store.
I'm new to mongodb and nosql databases. I would really appreciate some input/help with my schema design so I don't shoot myself in the foot.
Data: I need to model Quotes. A Quote contains many Ttems. Each Item contains many Orders. Each Order is tied to a specific fiscal quarter. Ex. I have a Quote containing an Item which has Orders in Q3-14, Q4-14, Q1-15. Orders only go max 12 quarters (3 years) into the future. Specifically, I'm having trouble with modelling the Order-quarter binding. I'm trying to denormalize the data and embed Quote <- Items <- Orders for performance.
Attempts/Ideas:
Have an Order schema containing year and qNum fields. Embed an array of Orders in every Item. Could also create virtual qKey field for setting/getting via string like Q1-14
Create a hash that embeds a Orders into an Item using keys like Q1-14. This would be nice, but isn't supported natively in Mongoose.
Store the current (base) quarter in each Quote, and have each Item contain an array of Orders, but have them indexed by #quarters offset from the base quarter. I.e. if It's currently Q1-14, and an order comes in for Q4-14, store it in array position 2.
Am I totally off the marker? Any advice is appreciated as I struggle to use Mongo effectively. Thank you
Disclaimer: I've embarked on this simply as a challenge to myself. See the <rant> below for an explanation as to why I disagree with your approach.
First step to getting a solid grasp on No-SQL is throwing out terms like "denormalize" – they simply do not apply in a document based data store. Another important concept to understand is there are no JOINS in MongoDB, so you have to change the way you think about your data completely to adjust.
The best way to solve your problem with mongoose is to setup collections for Quotes and Items separately. Then we can set up references between these collections to "link" the documents together.
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var quoteSchema = new Schema({
items: [{ type: Schema.Types.ObjectId, ref: 'Item' }]
});
var itemSchema = new Schema({});
That handles your Quotes -> Items "relationship". To get the Orders setup, you could use an array of embedded documents like you've indicated, but if you ever decided to start querying/indexing Orders, you'd be up a certain creek without a paddle. Again, we can solve this with references:
var itemSchema = new Schema({
orders: [{ type: Schema.Types.ObjectId, ref: 'Order' }]
});
var orderSchema = new Schema({
quarter: String
});
Now you can use population to get what you need:
Item
.findById(id)
.populate({
path: 'orders',
match: { quarter: 'Q1-14' }
})
.exec(function (err, item) {
console.log(item.orders); // logs an array of orders from Q1-14
});
Trouble with references is that you are actually hitting the DB with a read instruction twice, once to find the parent document, and then once to populate its references.
You can read more about refs and population here: http://mongoosejs.com/docs/populate.html
<rant>
I could go on for hours why you should stick to an RDBMS for this kind of data. Especially when the defense for the choice is a lack of an ORM and Mongo being "all the rage." Engineers pick the best tech for the solution, not because a certain tech is trending. Its the difference between weekend hackery and creating Enterprise level products. Don't get me wrong, this is not to trash No-SQL – the largest codebase I maintain is built on NodeJS and MongoDB. However, I chose those technologies because they were the right technologies for my document based problem. If my data had been a relational ordering system like yours, I'd ditch Mongo in a heartbeat.
</rant>
I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.
I have a CouchDB database (we'll say it holds project time card related data: project code, person, person's job title, task, date, hours worked, their bill rate, etc.). I want to create summary views of the project by day... or by person, or by task, or by title, or by any single attribute.
I'm concerned that I'm heading down an unsustainable path and that my database size may end up far bigger than it needs to be.
I created a view with a map function that emits each document several times, once for each attribute. That works. But does that ever reach an end point where you should stop?
I have multiple emits:
emit([doc.project, 'day', doc.day], doc);
emit([doc.project, 'month', doc.month], doc);
emit([doc.project, 'person', doc.person], doc);
emit([doc.project, 'job title', doc.persons-job-title], doc);
emit([doc.project, 'task', doc.task], doc);
Then always query with a start/end key of [project, ] to [project, , {}]
Will my database eventually just get so huge as to make it prohibitively expensive to add any new data? Is multi-emit() the preferred method for doing what I'm trying to do? Is there a better/different way out there?
Would creating the emit's dynamically based on the document just be asking for trouble in the case of some giant document coming through and creating huge storage requirements?
Basically, is there a point where I should just stop the madness?
First of all:
Don't emit the doc as a value... you can use &include_docs=true, if you need the data in the result sets.
Second:
Assuming, that your doc holds more than one project:
Does it make sense, asking for projects on a day without Month ?
If not, you can use emit([doc.project,'monthday',doc.month,doc.day],1)
then you can ask for all Projects in a Month:
startkey=["project1","monthday",3]&endkey=["project1","monthday",3,{}]
day of a month:
key=["project1","monthday",3,9]
If you're using a simple reduce-function (_sum) you would have the benefit of asking, how many days a project has (+in a month):
startkey=["project1","monthday"]&endkey=["project1","monthday",{}]&group_level=3
...
"key":["project1","monthday",2],"value:1), // 1 Day in month 2
"key":["project1","monthday",3],"value:2) // 2 Days in month 3
using group_level=4 (same as reduce=false) :
"key":["project1","monthday",2,20],"value:1),
"key":["project1","monthday",2,21],"value:1),
"key":["project1","monthday",3,1],"value:1),
of course you can combine the last case with &include_docs=true to get the data
Third:
It is ok to emit more than one Value per Document...
Of course you could seperate the emits into different views, so you do not need the second key.
Try to figure out, which information belongs together and are useless without others (like day/month, person/jobtitle?)
Fourth:
it is not expensive adding data.. just building views ;-)