Pull a document in one collection using a property from a document in another collection (Mongo, MEAN Stack) - node.js

Context:
I am working on a test-taking web-app, where users answer questions in an examination format.
I currently have two collections:
tests
questions
Each document in the tests collection has a questions array that contains the Mongo IDs of documents in the questions collections.
My Question...
Is it possible to (all at once / in one go): Retrieve a specific document in tests using a provided Mongo ID and then use the Mongo IDs saved in the questions array (within that document) to then pull documents from questions?
My closest guess is to use Mongoose's DBRef convention, but I can't quite understand how to use it in this context (and even if I did, I don't understand how I can retrieve multiple questions and save them under a single test).
I would appreciate any and all help with this!
P.S. The reason questions and tests are separate is so that we can randomize the order of questions when the user takes the exam in the web-app.

Go the other direction. Put the testId on the Question model:
var TestSchema = new mongoose.Schema({
name: String
});
var Test = mongoose.model("Test", TestSchema);
var QuestionSchema = new mongoose.Schema({
testId: {
type: mongoose.Schema.ObjectId,
ref: "Test"
},
text: String,
answer: String
});
QuestionSchema.index({testId: 1})
var Question = mongoose.model("Question", QuestionSchema);
You have three situations; you have a question document in memory and you want to find the test it belongs to:
Test.findOne({_id: question.testId},callback);
Or you have a question document in memory and want to find all questions that belong to the same test:
Question.find({testId: question.testId}, callback);
Or, you have a test document in memory and want to find all of it's questions:
Question.find({testId: test._id}, callback);
I see populate() pop up as an answer on any question resembling yours. I want to make sure people realize populate() isn't a SQL JOIN. From the docs:
Populated paths are no longer set to their original _id , their value is replaced with the mongoose document returned from the database by performing a separate query before returning the results.
populate() is just syntactic sugar for serializing a second find().

Related

how to set mongoose indexes correctly and test them

I want to set 2 indexes for now, perhaps a 3rd but wanted to know how I can test if they are actually working? Do I need to use with mongo shell or is there a way to check using Node.js during development? I also saw an example of the indexes being created in mongoDb Compass. I am using mongoDb Atlas so wondered if I must just set the index in Compass or do I still need to do it in my mongoose schema?
Also, the mongoose docs say you should set autoIndex to false. Is the below then correct?
const mongoose = require("mongoose");
const Schema = mongoose.Schema;
const userSchema = new Schema({
firstName: {
type: String,
},
lastName: {
type: String,
},
});
userSchema.set("autoIndex", false);
userSchema.index({ firstName: 1, lastName: 1 });
module.exports = mongoose.model("User", userSchema);
There are a bunch of different questions here, let's see if we can tackle them in order.
I want to set 2 indexes for now, perhaps a 3rd
This isn't a question from your side, but rather from mine. What are the indexes that you are considering and what queries will you be running?
The reason I ask is because I only see a single index definition provided in the question ({ firstName: 1, lastName: 1 }) and no query. Normally indexes are designed specifically to support the queries, so the first step towards ensuring a successful indexing strategy is to make sure they align appropriately with the anticipated workload.
how I can test if they are actually working? Do I need to use with mongo shell or is there a way to check using Node.js during development?
There are a few ways to approach this, which include:
Using the explain() method to confirm that the winningPlan is using the index as expected. This is often done via the MongoDB Shell or via Compass.
Using the $indexStats aggregation stage to confirm that usage counters of the index are incrementing as expected when the application runs.
Taking a look at some of the tabs in the Atlas UI such as Performance Advisor or the Profiler which may help alert you to unoptimized operations and missing indexes.
I am using mongoDb Atlas so wondered if I must just set the index in Compass or do I still need to do it in my mongoose schema?
You can use Compass (or the Atlas UI, or the MongoDB Shell) to create your indexes. I would recommend against doing this in the application directly.
Also, the mongoose docs say you should set autoIndex to false. Is the below then correct?
As noted above, I would go further and remove index creation from the application code altogether. There can be some unintended side effects of making the application directly responsible for index management, which is one of the reasons that Mongoose no longer recommends using the autoIndex functionality.

Mongoose, How to limit the query based on the sum of a field in the document

I have a document in the shape of
const Model = mongoose.Schema({
something1: {type:String},
someNumber1:{type:Number},
something2: {type:String},
someNumber2:{type:Number},
aFloatNumber: {type:Number}
)}
and after indexing the document like
Model.index({something1:1 , something2:1 , aFloatNumber:1})
for better performance which I hope I am doing right and please correct me if I am doing it wrong.
I am trying to query usign syntax:
const model = await Model.find({
$and:[{something1:anInput}, {something2:anotherInput}]})
.sort(aFloatNumber)
now I want to limit the returned query as it could be a very large list to improve performance, however, this limit changes based on an input. Basically I want the mongoose to keep adding someNumber1 together and stop returning after it gets larger than the input number. Something like the code bellow:
const model = await Model.find({
$and:[{something1:anInput}, {something2:anotherInput}]})
.sort(aFloatNumber)
.limit( sum(someNumber1) >= theInputNumber )
So basically my questions are:
Am I indexing the document correctly based on my query?
Does it make any difference on the performance to limit the query since it is sorting the data and I think it is going to check all the document to be able to sort it?
If it makes a huge difference on the performance, what is the correct syntax for it as I am going to make this query a lot in my application?
You're asking for skip function of mongodb which is like offset in sql
https://docs.mongodb.com/manual/reference/operator/aggregation/skip/

Should I create New Schema Model file for every route OR use already created Schema?

Suppose I have a User Schema which has around 30 fields, and other 3 schemas also.
UserSchema.js
user_schema = new Schema({
user_id: { type: String},
.........//30 properties
});
ctrs_schema = new Schema({
.........10 properties
});
ids_schema = new Schema({
.........5 properties
});
comments_schema = new Schema({
.........10 properties
});
Now I am writing a route which will change the gender of the user, Now in order to do it I can use UserSchema.js but that will load all of the schemas into my route, whereas if I would have created a new file which had only one schema with two fields, then all schemas will not get loaded into the memory for the route.
UserGenderSchema.js
gender_schema = new Schema({
user_id: { type: String},
gender: { type: String}
});
I know there are pros and cons of both of the ways
Pros -
I have to edit only in single file if I would have to change something
for any field.
Cons -
All Schemas are Loading for all routes which are unnecesary. Memory
Wastage.
Will, there be any less memory usage between both of the ways on the threads?
Can anyone Please tell me which architecture will be better or what you are implementing in your project and why?
Thanks
It's better to keep user related fields in just one schema, cause mongo has been there because of its non-relational structure and it gained its performance by keeping relational structures away, so if you create a schema for each field and then create a reference in each of them to point out to the user they are related to, you are kind of using mongo to build a heavily relational structure and mongo is not good as it should be in this situation, so if later on your application you want to somehow show all the information of the user or try to update multiple fields of user or try to show more information of the user in one of your routes or something, you will end up having some serious performance issues. as a conclusion, the cost of loading all the schema to touch only one field is not as much as the cost of breaking down your data structure.

How to find a sub document in mongoose without using _id fields but using multiple properties

I have a sample schema like this -
Comment.add({
text:String,
url:{type:String,unique:true},
username:String,
timestamp:{type:Date,default:Date}
});
Feed.add({
url:{type:String, unique:true },
username:String,
message:{type:String,required:'{PATH} is required!'},
comments:[Comment],
timestamp:{type:Date,default:Date}
});
Now, I don't want to expose the _id fields to the outside world that's why I am not sending it to the clients anywhere.
Now, I have two important properties in my comment schema (username,url)
What I want to do is update the content of the sub document that satisfies
feed.url
comment.url
comment.username
if the comment.username is same as my client value req.user.username then update the comment.text property of that record whose url was supplied by client in req.body.url variable.
One long and time consuming approach I thought is to first find the feed with the given url and then iterating over all the subdocuments to find the document which satisfies the comment.url==req.body.url and then check if the comment.username==req.user.username if so, update the comment object.
But, I think there must be an easier way of doing this?
I already tried -
db.feeds.update({"username":"harshitladdha93#gmail.com","comments.username":"harshitladdha3#gmail.com","comments.url":"test"},{$set:{"comments.$.text":"updated text 2"}})
found from http://www.tagwith.com/question_305575_how-to-find-and-update-subdocument-within-array-based-on-parent-property
but this updates even when the comments.url or comments.usernamematches other sub documents
and I also tried
db.feeds.distinct("comments._id",{"comments.url":req.body.url})
to find the _id of document associated with the url but it returns all the _id in the subdocument
First off - you should not rely on _id not being seen by the outside world in terms of security. This is a very bad idea for a multitude of reasons (primarily REST and also the fact that it's returned by default with all your queries).
Now, to address your question, what you want is the $elemMatch operator. This says that you're looking for something where the specified sub-document within an array matches multiple queries.
E.g.
db.feeds.update({
"username":"harshitladdha93#gmail.com",
comments: {
$elemMatch: {
username: "harshitladdha3#gmail.com",
url: "test"
}
}
}, {$set: {"comments.$.text":"updated text 2"}})
If you don't use $elemMatch you're saying that you're ok with the document if any of the comments match your query - i.e. if there is a comment by user "harshitladdha3#gmail.com", and separate comment has a url "test", the document will match unless you use $elemMatch

Denormalization with Mongoose: How to synchronize changes

What is the best way to propagate updates when you have a denormalized Schema? Should it be all done in the same function?
I have a schema like so:
var Authors = new Schema({
...
name: {type: String, required:true},
period: {type: Schema.Types.ObjectId, ref:'Periods'},
quotes: [{type: Schema.Types.ObjectId, ref: 'Quotes'}]
active: Boolean,
...
})
Then:
var Periods = new Schema({
...
name: {type: String, required:true},
authors: [{type: Schema.Types.ObjectId, ref:'Authors'}],
active: Boolean,
...
})
Now say I want to denormalize Authors, since the period field will always just use the name of the period (which is unique, there can't be two periods with the same name). Say then that I turn my schema into this:
var Authors = new Schema({
...
name: {type: String, required:true},
period: String, //no longer a ref
active: Boolean,
...
})
Now Mongoose doesn't know anymore that the period field is connected to the Period schema. So it's up to me to update the field when the name of a period changes. I created a service module that offers an interface like this:
exports.updatePeriod = function(id, changes) {...}
Within this function I go through the changes to update the period document that needs to be updated. So here's my question. Should I, then, update all authors within this method? Because then the method would have to know about the Author schema and any other schema that uses period, creating a lot of coupling between these entities. Is there a better way?
Perhaps I can emit an event that a period has been updated and all the schemas that have denormalized period references can observe it, is that a better solution? I'm not quite sure how to approach this issue.
Ok, while I wait for a better answer than my own, I will try to post what I have been doing so far.
Pre/Post Middleware
The first thing I tried was to use the pre/post middlewares to synchronize documents that referenced each other. (For instance, if you have Author and Quote, and an Author has an array of the type: quotes: [{type: Schema.Types.ObjectId, ref:'Quotes'}], then whenever a Quote is deleted, you'd have to remove its _id from the array. Or if the Author is removed, you may want all his quotes removed).
This approach has an important advantage: if you define each Schema in its own file, you can define the middleware there and have it all neatly organized. Whenever you look at the schema, right below you can see what it does, how its changes affect other entities, etc:
var Quote = new Schema({
//fields in schema
})
//its quite clear what happens when you remove an entity
Quote.pre('remove', function(next) {
Author.update(
//remove quote from Author quotes array.
)
})
The main disadvantage however is that these hooks are not executed when you call update or any Model static updating/removing functions. Rather you need to retrieve the document and then call save() or remove() on them.
Another smaller disadvantage is that Quote now needs to be aware of anyone that references it, so that it can update them whenever a Quote is updated or removed. So let's say that a Period has a list of quotes, and Author has a list of quotes as well, Quote will need to know about these two to update them.
The reason for this is that these functions send atomic queries to the database directly. While this is nice, I hate the inconsistency between using save() and Model.Update(...). Maybe somebody else or you in the future accidently use the static update functions and your middleware isn't triggered, giving you headaches that you struggle to get rid of.
NodeJS Event Mechanisms
What I am currently doing is not really optimal but it offers me enough benefits to actually outweight the cons (Or so I believe, if anyone cares to give me some feedback that'd be great). I created a service that wraps around a model, say AuthorService that extends events.EventEmitter and is a Constructor function that will look roughly like this:
function AuthorService() {
var self = this
this.create = function() {...}
this.update = function() {
...
self.emit('AuthorUpdated, before, after)
...
}
}
util.inherits(AuthorService, events.EventEmitter)
module.exports = new AuthorService()
The advantages:
Any interested function can register to the Service
events and be notified. That way, for instance, when a Quote is
updated, the AuthorService can listen to it and update the Authors
accordingly. (Note 1)
Quote doesn't need to be aware of all the documents that reference it, the Service simply triggers the QuoteUpdated event and all the documents that need to perform operations when this happens will do so.
Note 1: As long as this service is used whenever anyone needs to interact with mongoose.
The disadvantages:
Added boilerplate code, using a service instead of mongoose directly.
Now it isn't exactly obvious what functions get called when you
trigger the event.
You decouple producer and consumer at the cost of legibility (since
you just emit('EventName', args), it's not immediately obvious
which Services are listening to this event)
Another disadvantage is that someone can retrieve a Model from the Service and call save(), in which the events won't be triggered though I'm sure this could be addressed with some kind of hybrid between these two solutions.
I am very open to suggestions in this field (which is why I posted this question in the first place).
I'm gonna speak more from an architectural point of view than a coding point of view since when it comes right down to it, you can pretty-much achieve anything with enough lines of code.
As far as I've been able to understand, your main concern has been keeping consistency across your database, mainly removing documents when their references are removed and vice-versa.
So in this case, rather than wrapping the whole functionality in extra code I'd suggest going for atomic Actions, where an Action is a method you define yourself that performs a complete removal of an entity from the DB (both document and reference).
So for example when you wanna remove an author's quote, you do something like removing the Quote document from the DB and then removing the reference from the Author document.
This sort of architecture ensures that each of these Actions performs a single task and performs it well, without having to tap into events (emitting, consuming) or any other stuff. It's a self-contained method for performing its own unique task.

Resources