It's my first time creating an application solo (back-end, front-end, design) and I could use some guidance on my back-end.
Let's say there is a table of Users, Jobs, and Applications (Job Applications). Right now I have it structured like so:
UserSchema {
// other attributes
_id: String,
application_ids: [String] // array of String id's
}
JobSchema {
// other attributes
_id: String,
application_ids: [String]
}
ApplicationSchema {
// other attributes
_id: String,
applicant_id: String, // the _id of a User
job_id: String // the _id of a Job
}
My current plan is like when a new Application is created (via POST /api/applications where I send the User's _id and Job's _id) I would then set the Application's applicant_id to the User's _id and then update the User's application_ids array as well as the Job's application_ids array.
Question: Am I going about this in a reasonable manner or am I touching too many tables for one POST request? There are other tables/schemas in my application that will follow a similar relationship structure. Then there's the matter of deleting Applications and then having to update application_ids again and etc, etc but that's another matter.
Note: I am using Express.js and Mongoose.js, if that helps
No, you shouldn't do it this way. By storing the ID of the user and job in the application, you can use a query to get all the applications by user or all applications for a given job. No need to touch both.
If you really want to have the relationship on both sides, at least set it up as an ObjectId and use the "ref" declaration. Check out the populate docs in the mongoose docs.
Related
I am building a REST API with MongoDB + nodeJS. All the documents are stored and are using _id as the primary key. I've read here that we should not expose the _id and we should use another ID which is not incremental.
In the DB, a document is represented as:
{
_id: ObjectId("5d2399b83e9148db977859ea")
bookName: "My book"
}
For the following the endpoints, how should the documents be exposed?
GET /books
GET /books/{bookId}
Currently my API returns:
{
_id: "5d2399b83e9148db977859ea"
bookName: "My book"
}
but should it instead return something like:
{
id: "some-unique-id-generated-on-creation"
bookName: "My book"
}
Questions
Should I expose the _id so that one can make queries such as:
GET /books/5d2399b83e9148db977859ea
Should I use a UUID for my ID instead of ObjectId?
Should I keep the internal _id (but never expose it) and create another attribute id which would use UUID or another custom generated ID ?
Is it a good practice to work with _id in my backend or should I only make queries using my own custom ID? Example: find({ id: }) instead of find({ _id: })
To answer your questions.
You can expose _id so that authenticated users can make queries like GET, PUT and PATCH on that _id.
MongoDB has support that allows you to generate your own BSON ID and use it, instead of mongodb created it's own _id during the insert.
There is no need of duplicating logic, the main purpose of _id is to identify each document separately and having two id columns means you are storing redundant data, follow DRY (DO NOT REPEAT YOURSELF) principle wherever possible.
It's not a bad practice to work with _id in your backend.
Hope this helps!
Given you're using Mongoose, you can use 'virtuals', which are essentially fake fields that Mongoose creates. They're not stored in the DB, they just get populated at run time:
// Duplicate the ID field.
Schema.virtual('id').get(function(){
return this._id.toHexString();
});
// Ensure virtual fields are serialised.
Schema.set('toJSON', {
virtuals: true
});
Any time toJSON is called on the Model you create from this Schema, it will include an 'id' field that matches the _id field Mongo generates. Likewise you can set the behaviour for toObject in the same way.
You can refer the following docs:
1) https://mongoosejs.com/docs/api.html
2) toObject method
In my case, whether it's a security risk or not, but my _id is a concatenation of any of the fields in my Document that are semantically considered as keys, i.e. if i have First Name, Last Name, and Email as my identifier, and a fourth field such as Age as attribute, then _id would be concatenation of all these 3 fields. It would not be difficult to get and update such record as long as I have First Name, Last Name and email information available
I am working on a web application that uses a mongoDB database and express/nodeJS. I want to create a project in which I have users, and users can have posts, which can have many attributes, such as title, creator, and date. I am confused how to do this so that I avoid replication in my database. I tried references by using ids in a list of all the users posts like this idea: [postID1, postID2, postID3, etc...]. The problem is that I want to be able to use query back to all the users posts and display them in an ejs template, but I don't know how to do that. How would I use references? What should I do to make a this modeling system optimal for relationships?
Any help would be greatly appreciated!
Thank you!
This is a classic parent-child relationship, and your problem is that you're storing the relationship in the wrong record :-). The parent should never contain the reference to the children. Instead, each child should have a reference to the parent. Why not the other way around? It's a bit of a historical quirk: it's done that way because a classic relational table can't have multiple values for a single field, which means you can't store multiple child IDs easily in a relational table, whereas since each child will only ever have one parent, it's easy to setup a single field in the child. A Mongo document can have multiple values within a single field by using arrays, but unless you really have a good reason to do so, it's just better to follow the historical paradigm.
How does this apply in your situation? What you're trying to do is to store references to all the children (i.e. the post IDs) as a list in the parent (i.e. an array in the user document). This is not the usual way to do this. Instead, in each child (i.e. in each post), have a field called user_id, and store the userID there.
Next, make sure you create an index on the user_id field.
With that setup, it's easy to take a post and figure out who the user was (just look at the user_id field). And if you want to find all of a user's posts, just do posts.find({user_id: 'XXXX'}). If you have an index on that field, the find will execute quickly.
Storing parent references in the child is almost always better than storing child references in the parent. Even though Mongo is flexible enough to allow you to structure it either way, it's not preferred unless you have a real reason for it.
EDIT
If you do have a valid reason for storing the child references in the parent, then assuming a structure like this:
user = {posts: [postID1, postID2, postID3, ...]}
You can find the user for a specific post by user.find({posts: "XXXX"}). MongoDB is smart enough to know that you're searching for a user in which the post array contains element "XXX". And if you create an index on the posts field, then the query should be pretty quick.
I would like to mention that, there is nothing wrong in Parent containing Child references in NoSQL databases at least. It all depends on what suits your needs.
You have One-to-many relationship between users and post, and you can model your data in following 3 ways
Embedded Data Model
{
user: "username",
post: [
{
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
},
{
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
]
}
Parent containing child references
{
user: "username",
posts: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
}
{
_id: 234567890,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
Child containing parent reference
{
_id: "U123",
name: "username"
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
{
_id: 23456789,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
According to the MongoDB docs (I have edited the below paragraph according to your case)
When using references, the growth of the relationships determine where
to store the reference. If the number of posts per user is small
with limited growth, storing the post reference inside the user
document may sometimes be useful. Otherwise, if the number of posts
per user is unbounded, this data model would lead to mutable,
growing arrays.
Reference: https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
Now you have to decide what is best for your project keeping in mind that your model should satisfy all the test cases
Peace,
I have three collections in MongoDB
achievements
students
student_achievements
achievements is a list of achievements a students can achieve in an academic year while
students collections hold data list of students in the school.
student_achievements holds documents where each documents contains studentId & achievementId.
I have an interface where i use select2 multiselect to allocate one or more achievements from achievements to students from students and save it to their collection student_achievements, right now to do this i populate select2 with available achievements from database. I have also made an arrangement where if a student is being allocated same achievement again the system throws an error.
what i am trying to achieve is if an achievement is allocated to student that shouldn't be available in the list or removed while fetching the list w.r.t student id,
what function in mongodb or its aggregate framework can i use to achieve this i.e to compare to collections and remove out the common.
Perhaps your data-structure could be made different to make the problem easier to solve. MongoDB is a NoSQL schemaless store, don't try to make it be like a relational database.
Perhaps we could do something like this:
var StudentSchmea = new Schema({
name: String,
achievements: [{ type: Schema.Types.ObjectId, ref: 'Achivement' }]
});
Then you can do something like this which will only add the value if it is unique to the achievements array:
db.student.update(
{ _id: 1 },
{ $addToSet: { achievements: <achivement id> } }
)
If you are using something like Mongoose you can also write your own middleware to remove orphaned docs:
AchivementSchema.post('remove', function(next) {
// Remove all references to achievements in Student schema
});
Also, if you need to verify that the achievement exists before adding it to the set, you can do a findOne query before updating/inserting to verify.
Even with the post remove hook in place, there are certain cases where you will end up with orphaned relationships potentially. The best thing to do for those situations is to have a regularly run cron task to to do cleanup when needed. These are some of the tradeoffs you encounter when using a NoSQL store.
I am currently planning the development of an application using Node and I am stuck as to whether or not I should use MongoDb as a databse. Ideally I would like to use it. I understand how it works in general, but what I don't understand is how to reference other objects within a document model.
For example, let's say I have two objects; a User and an Order object.
{
Order : {
Id: 1,
Amount: 23.95
}
}
{
User: {
Id: 1,
Orders: [ ]
}
}
Essentially, a User will place an order, and upon creation of that Order object, I would like for the User object to update the Orders array appropriately.
First of all, I hear alot about MongoDb lacking relational functionality. So would I be able to store a reference to that order in the Orders array, perhaps by ID? Or should I just store a duplicate of the order object into the array?
If I were you, I would have a field named userId in Order to keep a reference to the user creating the order. Because the relation between User and Order is one-to-many, User may have many Order but Order only have one User.
I have the following senario:
A user can login to a website. A user can add/delete the poll(a question with two options). Any user can give there opinion on the poll by selecting anyone of the options.
Considering the above scenario I have three models - Users Polls Options . They are as follows, in order of dependency:
Option Schema
var optionSchema = new Schema({
optionName : {
type : String,
required : true,
},
optionCount : {
type : Number,
default : 0
}
});
Poll Schema
var pollSchema = new Schema({
question : {
type : String,
required : true
},
options : [optionSchema]
});
User Schema: parent schema
var usersSchema = new Schema({
username : {
type : String,
required : true
},
email : {
type : String,
required : true,
unique : true
},
password : String,
polls : [pollSchema]
});
How do I implement the above relation between those documents. What exaclty is mongoose population? How is it different from subdocuments ? Should I go for subdocuments or should I use Mongoose population.
As MongoDb hasn't got joins as relational databases, so population is a something like hidden join. It just means that when you have that User model and you will populate Poll Model, mongoose will do something like this:
fetch User
fetch related Polls, by ObjectIds which are stored in User document
put fetched Polls documents into User document
And when you will set User as document and Polls as subdocument, it will just mean that you will put whole data in single document. At one side it means that to fetch User Polls, mongoose doesn't need to run two queries(it need to fetch only User document, because Polls data is already there).
But what is better to choose? It just depends of the case.
If your Polls document will refer in another documents (you need access to Polls from documents User, A, B, C - it could be better to populate it, but not for sure. The advantage of populating is fact, that when you will need to change some Polls fields, you don't need to change that data in every document which is referring to that Polls document(as it will be a subdocument) - in that case in document User, A, B, C - you will only update Polls document. As you see it's nice. I told that it's not sure if populating will be better in that case, because I don't know how you need to retrieve your Polls data. If you store you data in wrong way, you will get performance issues or have some problems in easy data fetch.
Subdocuments are the basic way of storing data. It's great when Polls will be only referring to User. There is performance advantage - mongoose need to do one query instead of two as in population and there is no previously reminded update disadvantage, because you store Polls data only in single place, so there is no need to update other documents.
Basically MongoDb was created to mostly use Subdocuments. As the matter of fact, it's just non-relational database. So in most cases I prefer to use subdocuments. I can't answer which way will be better in your case, because I'm not sure how your DB looks like(in a full way) and how you want to retrieve your data.
There is some useful info in official documentation:
http://mongoosejs.com/docs/subdocs.html
http://mongoosejs.com/docs/populate.html
Take a look on that.
Edit
As I prefer to fetch data easily, take care about performance and know that data redundancy in MongoDb is something common, I will choose to store this data as subdocuments.