Compare two collections in MongoDb and remove common - node.js

I have three collections in MongoDB
achievements
students
student_achievements
achievements is a list of achievements a students can achieve in an academic year while
students collections hold data list of students in the school.
student_achievements holds documents where each documents contains studentId & achievementId.
I have an interface where i use select2 multiselect to allocate one or more achievements from achievements to students from students and save it to their collection student_achievements, right now to do this i populate select2 with available achievements from database. I have also made an arrangement where if a student is being allocated same achievement again the system throws an error.
what i am trying to achieve is if an achievement is allocated to student that shouldn't be available in the list or removed while fetching the list w.r.t student id,
what function in mongodb or its aggregate framework can i use to achieve this i.e to compare to collections and remove out the common.

Perhaps your data-structure could be made different to make the problem easier to solve. MongoDB is a NoSQL schemaless store, don't try to make it be like a relational database.
Perhaps we could do something like this:
var StudentSchmea = new Schema({
name: String,
achievements: [{ type: Schema.Types.ObjectId, ref: 'Achivement' }]
});
Then you can do something like this which will only add the value if it is unique to the achievements array:
db.student.update(
{ _id: 1 },
{ $addToSet: { achievements: <achivement id> } }
)
If you are using something like Mongoose you can also write your own middleware to remove orphaned docs:
AchivementSchema.post('remove', function(next) {
// Remove all references to achievements in Student schema
});
Also, if you need to verify that the achievement exists before adding it to the set, you can do a findOne query before updating/inserting to verify.
Even with the post remove hook in place, there are certain cases where you will end up with orphaned relationships potentially. The best thing to do for those situations is to have a regularly run cron task to to do cleanup when needed. These are some of the tradeoffs you encounter when using a NoSQL store.

Related

Can mongoose batch update based on an array of objects that matches the collection?

I am working on a project in Express/Node, and I am utilizing a MongoDB database that has a collection of Course documents that represent a course in my school system that changes in real-time. The Course documents in my database each look like this:
Course Document
{
courseID: Number,
restrictions: String,
status: String,
}
My program has to check for changes in the school's course system, and update any changes that it sees and updates my private MongoDB database with the changes. To accomplish this, I currently have a script that looks at all the courses in the school system, and records them in an array of objects, with each object corresponding to a course.
var allCourses =
[
{
courseID: 123456,
restrictions: "A and B",
status: "OPEN"
},
{
courseID: 678990,
restrictions: "A",
status: "FULL",
}
]
The goal now is to be able to go through my database, and skip the documents that are the same as the corresponding javascript object in the array, and update those that are not.
Obviously, I could just iterate through my array with forEach, and update every single course by filtering by 'courseID' and updating both fields one document at a time, but I can foresee that this would take a large amount of time.
I was wondering if there was a batch update function, similar to the insertMany operation, that can take my array of objects and update my database documents that correspond to an object within the array?
These are helpful links
Trying to do a bulk upsert with Mongoose. What's the cleanest way to do this?
https://docs.mongodb.com/manual/reference/method/db.collection.insertMany/

How to Create Relationships Using MongoDB?

I am working on a web application that uses a mongoDB database and express/nodeJS. I want to create a project in which I have users, and users can have posts, which can have many attributes, such as title, creator, and date. I am confused how to do this so that I avoid replication in my database. I tried references by using ids in a list of all the users posts like this idea: [postID1, postID2, postID3, etc...]. The problem is that I want to be able to use query back to all the users posts and display them in an ejs template, but I don't know how to do that. How would I use references? What should I do to make a this modeling system optimal for relationships?
Any help would be greatly appreciated!
Thank you!
This is a classic parent-child relationship, and your problem is that you're storing the relationship in the wrong record :-). The parent should never contain the reference to the children. Instead, each child should have a reference to the parent. Why not the other way around? It's a bit of a historical quirk: it's done that way because a classic relational table can't have multiple values for a single field, which means you can't store multiple child IDs easily in a relational table, whereas since each child will only ever have one parent, it's easy to setup a single field in the child. A Mongo document can have multiple values within a single field by using arrays, but unless you really have a good reason to do so, it's just better to follow the historical paradigm.
How does this apply in your situation? What you're trying to do is to store references to all the children (i.e. the post IDs) as a list in the parent (i.e. an array in the user document). This is not the usual way to do this. Instead, in each child (i.e. in each post), have a field called user_id, and store the userID there.
Next, make sure you create an index on the user_id field.
With that setup, it's easy to take a post and figure out who the user was (just look at the user_id field). And if you want to find all of a user's posts, just do posts.find({user_id: 'XXXX'}). If you have an index on that field, the find will execute quickly.
Storing parent references in the child is almost always better than storing child references in the parent. Even though Mongo is flexible enough to allow you to structure it either way, it's not preferred unless you have a real reason for it.
EDIT
If you do have a valid reason for storing the child references in the parent, then assuming a structure like this:
user = {posts: [postID1, postID2, postID3, ...]}
You can find the user for a specific post by user.find({posts: "XXXX"}). MongoDB is smart enough to know that you're searching for a user in which the post array contains element "XXX". And if you create an index on the posts field, then the query should be pretty quick.
I would like to mention that, there is nothing wrong in Parent containing Child references in NoSQL databases at least. It all depends on what suits your needs.
You have One-to-many relationship between users and post, and you can model your data in following 3 ways
Embedded Data Model
{
user: "username",
post: [
{
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
},
{
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
]
}
Parent containing child references
{
user: "username",
posts: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
}
{
_id: 234567890,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
Child containing parent reference
{
_id: "U123",
name: "username"
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
{
_id: 23456789,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
According to the MongoDB docs (I have edited the below paragraph according to your case)
When using references, the growth of the relationships determine where
to store the reference. If the number of posts per user is small
with limited growth, storing the post reference inside the user
document may sometimes be useful. Otherwise, if the number of posts
per user is unbounded, this data model would lead to mutable,
growing arrays.
Reference: https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
Now you have to decide what is best for your project keeping in mind that your model should satisfy all the test cases
Peace,

MongoDb and Storing Relationships Between Objects

I am currently planning the development of an application using Node and I am stuck as to whether or not I should use MongoDb as a databse. Ideally I would like to use it. I understand how it works in general, but what I don't understand is how to reference other objects within a document model.
For example, let's say I have two objects; a User and an Order object.
{
Order : {
Id: 1,
Amount: 23.95
}
}
{
User: {
Id: 1,
Orders: [ ]
}
}
Essentially, a User will place an order, and upon creation of that Order object, I would like for the User object to update the Orders array appropriately.
First of all, I hear alot about MongoDb lacking relational functionality. So would I be able to store a reference to that order in the Orders array, perhaps by ID? Or should I just store a duplicate of the order object into the array?
If I were you, I would have a field named userId in Order to keep a reference to the user creating the order. Because the relation between User and Order is one-to-many, User may have many Order but Order only have one User.

How to make a subdocument have two parent documents in mongodb?

I'm using mongoose with Node.js for an application. I have a Document class, which has a Review subdocument. I also have a User class.
I want the user to be able to see all the reviews they've done, while I also want the Document to be able to easily get all of its reviews. Searching through all the documents and all their reviews to find ones matching a user seems horribly ineffecient. So, how do I allow the Review to be owned by both a Document and a User?
If this is impossible, how else can I efficiently have two documents know about one subdocument.
If you don't want to deal with consistency issues I don't think there's any way except for normalization to assign two parents for a document. Your issue is a common one for social networks, when developers have to deal with friends, followers, etc. Usually the best solution depends on what queries you are gonna run, what data is volatile and what is not and how many children a document might have. Usually it turns out to be a balance between embedding and referencing. Here's what I would do if I were you:
Let's assume Documents usually have 0-5 Reviews. Which is a few, so we might consider embed Reviews into Documents. Also we would often need to display reviews every time a Document is queried, this is one more reason for embedding. Now we need a way to query all reviews by a User efficiently. Assume we don't run this query as often as the first one but still it is important. Let's also assume that when we query for User's Reviews we just want to display Review titles as links to Review page or even Document page as probably it's hard to read a review without seeing the actual Document. So the best way here would be to store { document_id, review_id, reviewTitle }. ReviewTitle should not be volatile in this case. So now when you have a User object, you can easily query for reviews. Using document_id you will filter out most documents and it will be super fast. Then you can get required Reviews either on the client side or by using MapReduce to turn Reviews into separate list of documents.
This example contains many assumption so it might not be exactly what you need by my goal was to show you the most important things to consider while designing your collections and the logic you should follow. So just to sum up, consider QUERIES, HOW VOLATILE SOME DATA IS and HOW MANY CHILDREN A DOCUMENT IS GONNA HAVE, and find a balance between embeding and referencing
Hope it helps!
This is an old question, but here is a solution that, I think, works well:
var mongoose = require('mongoose'),
Schema = mongoose.Schema;
var DocumentSchema = new Schema({
title: String,
...
});
mongoose.model('Document', DocumentSchema);
var UserSchema = new Schema({
name: String,
...
});
mongoose.model('User', UserSchema);
var ReviewSchema = new Schema({
created: {
type: Date,
default: Date.now
},
document: {
type: Schema.ObjectId,
ref: 'Document'
},
reviewer: {
type: Schema.ObjectId,
ref: 'User'
},
title: String,
body: String
});
Then, you could efficiently query the Reviews to get all reviews for a User, or for a Document.
Hope this helps someone.

Effective mongodb + mongoose. Schema design

I'm new to mongodb and nosql databases. I would really appreciate some input/help with my schema design so I don't shoot myself in the foot.
Data: I need to model Quotes. A Quote contains many Ttems. Each Item contains many Orders. Each Order is tied to a specific fiscal quarter. Ex. I have a Quote containing an Item which has Orders in Q3-14, Q4-14, Q1-15. Orders only go max 12 quarters (3 years) into the future. Specifically, I'm having trouble with modelling the Order-quarter binding. I'm trying to denormalize the data and embed Quote <- Items <- Orders for performance.
Attempts/Ideas:
Have an Order schema containing year and qNum fields. Embed an array of Orders in every Item. Could also create virtual qKey field for setting/getting via string like Q1-14
Create a hash that embeds a Orders into an Item using keys like Q1-14. This would be nice, but isn't supported natively in Mongoose.
Store the current (base) quarter in each Quote, and have each Item contain an array of Orders, but have them indexed by #quarters offset from the base quarter. I.e. if It's currently Q1-14, and an order comes in for Q4-14, store it in array position 2.
Am I totally off the marker? Any advice is appreciated as I struggle to use Mongo effectively. Thank you
Disclaimer: I've embarked on this simply as a challenge to myself. See the <rant> below for an explanation as to why I disagree with your approach.
First step to getting a solid grasp on No-SQL is throwing out terms like "denormalize" – they simply do not apply in a document based data store. Another important concept to understand is there are no JOINS in MongoDB, so you have to change the way you think about your data completely to adjust.
The best way to solve your problem with mongoose is to setup collections for Quotes and Items separately. Then we can set up references between these collections to "link" the documents together.
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var quoteSchema = new Schema({
items: [{ type: Schema.Types.ObjectId, ref: 'Item' }]
});
var itemSchema = new Schema({});
That handles your Quotes -> Items "relationship". To get the Orders setup, you could use an array of embedded documents like you've indicated, but if you ever decided to start querying/indexing Orders, you'd be up a certain creek without a paddle. Again, we can solve this with references:
var itemSchema = new Schema({
orders: [{ type: Schema.Types.ObjectId, ref: 'Order' }]
});
var orderSchema = new Schema({
quarter: String
});
Now you can use population to get what you need:
Item
.findById(id)
.populate({
path: 'orders',
match: { quarter: 'Q1-14' }
})
.exec(function (err, item) {
console.log(item.orders); // logs an array of orders from Q1-14
});
Trouble with references is that you are actually hitting the DB with a read instruction twice, once to find the parent document, and then once to populate its references.
You can read more about refs and population here: http://mongoosejs.com/docs/populate.html
<rant>
I could go on for hours why you should stick to an RDBMS for this kind of data. Especially when the defense for the choice is a lack of an ORM and Mongo being "all the rage." Engineers pick the best tech for the solution, not because a certain tech is trending. Its the difference between weekend hackery and creating Enterprise level products. Don't get me wrong, this is not to trash No-SQL – the largest codebase I maintain is built on NodeJS and MongoDB. However, I chose those technologies because they were the right technologies for my document based problem. If my data had been a relational ordering system like yours, I'd ditch Mongo in a heartbeat.
</rant>

Resources