I've been through several tutorials. I'm still wondering what the best approach for my problem would be. I got the following Schema:
var userSchema = new Schema({
_id : Number,
first_name : String,
last_name : String,
friends : [ Number ],
messages : [{
from: Number,
body : String,
date : { type : Date, default: Date.now}
}]
}, { collection : "user"});
In friends I want to store the ids of user's friends in an array. In message.from I want to store the sender's id of a message.
Ideally I want those ids in friends and message.from to be only ids of valid user entries.
Unfortunately mongodb doesn't enforce referential integrity.
This functionality must be provided by your application.
So in your case: when a user is deleted your application must also remove references to that user in all other user's friends arrays and message fields.
Related
I'm new to mongoDB and Mongoose, and I have some problems with relations.
My schema has 3 tables (User / Person / Family), you can see it below.
var mongoose = require('mongoose')
, Schema = mongoose.Schema
var userSchema = Schema({
_id : Schema.Types.ObjectId,
email : String,
person : [{ type: Schema.Types.ObjectId, ref: 'Person' }] // A user is linked to 1 person
});
var personSchema = Schema({
_id : Schema.Types.ObjectId,
name : String,
user : [{ type: Schema.Types.ObjectId, ref: 'User' }] // A person is linked to 1 user
families : [{ type: Schema.Types.ObjectId, ref: 'Family' }] // A person have 1,n families
});
var familySchema = Schema({
_id : Schema.Types.ObjectId,
name : String,
persons : [{ type: Schema.Types.ObjectId, ref: 'Person' }] // A family have 0,n persons
});
var User = mongoose.model('User', userSchema);
var Person = mongoose.model('Person', personSchema);
var Family = mongoose.model('Family', familySchema);
I don't know if my schema is good, does the parameter person is require in my userSchema ? Because the informations will be duplicated, in userSchema I will have the personID and in the personSchema this wil be the userID.
If I understand it's usefull to have this duplicated values for my requests ? But if the informations is duplicated I need to execute two queries to update the two tables ?
For exemple, if I have a person with a family (families parameter in personSchema), and in the family I have this person (persons parameter in familySchema). What will be the requests to remove / update the lines in the tables ?
Thanks
IMHO, your schema seems fine if it meets your needs !! (Although, if you think your current schema fulfills your purpose without being bloated, then yeah its fine)..
"Person" seems to be the only type of a user and the entity to be connected to rest of the other entities . As long as this is the case, you can feel free to remove the person parameter from the userschema as you can access the user information from the person. But lets assume if there exists another entity "Aliens" who also has their own unique family, then it would be better to add the alien and person parameter in the "User" Schema to see the types of users.(As long as there's only one type i.e. Person, then you may not need to add it in userschema). In case, if you still like to keep it, then please make the following change too in your schema as you are passing the array although it seems to be one to one relation !
var userSchema = Schema({
_id : Schema.Types.ObjectId,
email : String,
person : { type: Schema.Types.ObjectId, ref: 'Person' } // A user is linked to 1
//person // Here I have removed the []
});
var personSchema = Schema({
_id : Schema.Types.ObjectId,
name : String,
user : { type: Schema.Types.ObjectId, ref: 'User' } // removed [] here too
families : [{ type: Schema.Types.ObjectId, ref: 'Family' }]
});
Yes, you will need to update it for both entities Person and Family if you want to maintain the uniformity. But, it could be done in one request/ mutation.
Well, you could perform the request depending upon the flow order of your business logic. Lets say if "Homer" is a Person who is a new member of the Simpson Family.
So, in that case you would add "Homer" to the Family collection(table) and then push the
ObjectId of this Simpson (Family collection) to the Person entity.
I have added the sample example of adding Homer to the Simpson family below. Hope this helps :)
addNewFamilyMember: async (_, {personID, familyID}) => {
try{
let family = await Family.findOne({_id: familyID});
let person = await Person.findOne({_id: personID}); // created to push the objectId of the family in this
if (!family){
throw new Error ('Family not found !')
} else {
let updatedFamily = await Family.findByIdAndUpdate(
{ _id: familyID },
{
"$addToSet": { // "addToSet" inserts into array only if it does not exist already
persons: personID
}
},
{ new: true }
);
person.families.push(updatedFamily); // pushing objectId of the Simpson family in Person "Homer"
person = await person.save();
updatedFamily.persons.push(person); // pushing "Homer" in the Simpson family
updatedFamily = updatedFamily.save();
return updatedFamily;
}
} catch(e){
throw e;
}
}
If you want to perform update, then it depends upon the intent of your purpose (as for example, if you just want to update the name "Homer", you would only have to update it in the Person collection, as the Family collection already has reference to the objectId of Homer, so every time you make an update to the Homer, the updated document would be referenced by Family collection ! ), and
if you want to perform deletion, then in that case too, the approach would be different based upon the scenario, as if you wish to remove a person document, or just remove the person reference from the family, or remove the family reference from the person !!
Lets say you want to delete a person then in that case, you would have to take the personId and search for that person and since you have access to the families via this person, you can access the families via person.families and remove the personId from those respective families as well ! And then you could remove the associated user too as you have the reference to the user too from the same person object.
To sum up, it depends upon your choice of action, and how much sanitization you want in your schema.. The above mentioned process would be just different in case if we take a different approach.
Assuming the case of a /login API, where, for a matching set of credentials, a user object from the collection should be returned, which approach would be more performant:
1) One model with projection queries:
var UserSchema = new Schema({
name : String,
email : String,
dob : Number,
phone : Number,
gender : String,
location : Object, // GeoJSON format {type: 'Point', geometry: { lat: '23.00', lng: '102.23' }}
followers : [UserSchema],
following : [UserSchema],
posts : [PostSchema],
groups : [GroupSchema]
// ... and so on
});
2) Split models:
var UserMinimalSchema = new Schema({
name : String,
email : String,
phone : Number,
location : Object,
});
var UserDetailSchema = new Schema({
dob : Number,
gender : String,
followers : [UserSchema],
following : [UserSchema],
posts : [PostSchema],
groups : [GroupSchema]
// ... and so on
});
Let's say:
For a logged-in user, only id, name, email, phone and location are to be returned.
The first model will use a projection query to return the properties in (1).
In the second case, only the UserMinimalSchema would be used to query the entire document.
Essentially both queries return exactly the same amount of data as mentioned in (1).
Assume that average user object is ~16MB limit and there are 1 Million records.
If someone performed such a test/links to documentation, it would be of great help to see how much it will matter to split or not.
I would not use split models:
You'll have to perform a population query everytime you want to look all of the user's data
You're increasing your data storage (you now will have to reference the user in your user details schema.
When Mongo will go do a lookup, it will find references to model instances and only extract data that you've specified in your projection query anyways. It will not load the entire object into memory unless you specify that in your query.
I'm trying to add private messaging between users into my data model. I've been going back and forth between two possible ways of doing this.
1) Each user has an array of user_id, chat_id pairs which correspond to chats they are participating in. Chat model just stores chat_id and array of messages.
2) Don't store chats with user at all and just have the Chat model store a pair of user_ids and array of messages.
The issue with option (1) is whenever a user joins or starts a chat, I would need to look first through the array for the user to see if the user_id, chat_id pair already exists. And then do a second find for the chat_id in Chat. If it doesn't exist, I would need to create the user_id, chat_id pair in two different places for both users who are participating.
With option (2) I would search through the Chat model for the user_id1, user_id2 pair, and if I find it I'm done, if not I would create a new Chat record for that pair and done.
Based on this option (2) does seem like the better way of handling this. However, I'm running into issues figuring out how to model the "pair" of user ids in a way that they are easily searchable in the chat model. i.e. how do I make sure I can find the chat record even if the user_ids are passed in the wrong order, i.e. user_id2, user_id1. What would be the best way to model this in Mongoose?
var chatSchema = mongoose.Schema({
messages: [{
text: {
type: String,
max: 2000
},
sender: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}
}],
participant1: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}]
participant2: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}]
});
If it's something like above, how would I search for a participant pair? Could I order the participant IDs in some way so that they are always participant1 < participant2 for example, making search simpler?
Well, there is no correct answer to this question, But definitely, the approaches you have mentioned are not the best at all!
Firstly, when you are thinking about designing a "chat" model, you need to take into account that there would be millions of messages between the users, so you need to care about performance when you want to fetch the chats.
Storing the messages into an array is not a good idea at all, your model's size will be large by the time and you have to consider that MongoDB's document size limit is currently 16 MB per document.
https://docs.mongodb.com/manual/reference/limits/
Secondly, You have to consider pagination aspect because it will affect the performance when the chat is large, when you retrieve the chat between 2 users you won't request all the chats since the beginning of the time, you will just request the most recent ones, and then you can request the older ones if the user scroll the chat, this aspect is very important and can't be neglected due to its effect on performance.
My approach will be to store each message in a separated document
First of all, storing each message in a single document will boost your performance during fetching the chats, and the document size will be very small.
This is a very simple example, you need to change the model according to your needs, it is just to represent the idea:
const MessageSchema = mongoose.Schema({
message:{
text: { type:String, required:true }
// you can add any other properties to the message here.
// for example, the message can be an image ! so you need to tweak this a little
}
// if you want to make a group chat, you can have more than 2 users in this array
users:[{
user: { type:mongoose.Schema.Types.ObjectId, ref:'User', required:true }
}]
sender: { type:mongoose.Schema.Types.ObjectId, ref:'User', required:true },
read: { type:Date }
},
{
timestamps: true
});
you can fetch the chats by this query:
Message.find(({ users: { "$in" : [#user1#,#user2#]} })
.sort({ updatedAt: -1 })
.limit(20)
Easy and clean!
as you see, pagination becomes very easy with this approach.
A few suggestions.
First - why store Participant1 and 2 as arrays? There is one specific sender, and one (or more) recipients (depending on if you want group messages).
Consider the following Schema:
var ChatSchema = new Schema({
sender : {
type : mongoose.Schema.Types.ObjectId,
ref : 'User'
},
messages : [
{
message : String,
meta : [
{
user : {
type : mongoose.Schema.Types.ObjectId,
ref : 'User'
},
delivered : Boolean,
read : Boolean
}
]
}
],
is_group_message : { type : Boolean, default : false },
participants : [
{
user : {
type : mongoose.Schema.Types.ObjectId,
ref : 'User'
},
delivered : Boolean,
read : Boolean,
last_seen : Date
}
]
});
This schema allows one chat document to store all messages, all participants, and all statuses related to each message and each participant.
the Boolean is_group_message is just a shorter way to filter which are direct / group messages, maybe for client side viewing or server-side processing. Direct messages are obviously easier to work with query-wise, but both are pretty simple.
the meta array lists the delivered/read status, etc, for each participant of a single message. If we weren't handling group messages, this wouldn't need to be an array, but we are, so that's fine.
the delivered and read properties on the main document (not the meta subdocument) are also just shorthand ways of telling if the last message was delivered/read or not. They're updated on each write to the document.
This schema allows us to store everything about a chat in one document. Even group chats.
I have following schemas in Mongoose:
UserSchema = new Schema({
ratings = [{type : Schema.ObjectId, ref : 'Rating'}] })
ItemSchema = new Schema({
ratings = [{type : Schema.ObjectId, ref : 'Rating'}] })
Rating = new Schema({
user = [{type : Schema.ObjectId, ref : 'User'}],
venue = [{type : Schema.ObjectId, ref : 'Venue'}]
})
Are they right? I should query ratings by users, ratings for items. Also I want to check if the user has already rated an item.
Here are two of the following options you can go with.
You can maintain a separate collection Rating quite similar to what you would have done in SQL.
User: voter (reference to User object),
Item: item_voted (reference to item object),
Rating: what so ever user rated
Time: time_rated,
other fields as per your requirements...
Now maintain index over User and Item to boost up queries to check if user already rated for an item or not.
OR you could maintain an array in User collection for items rated by that user, and index over that array. Here it is what you can have your data model for User like.
items_rated: [item1, item2, item3]
other fields of User as per your requirements...
This second approach has a limitation that it fails if your BSON records exceeds 16MB limit, but in practical usage it very very less probable that you actually would hit that limit. Though nothing can be said. If your Users turn out to be maniac like some top stackoverflow users you will hit that 16MB wall :P
The way you can check if item has been rated or not (if you opt for second choice is)
if (db.user.count({item_rated: item_k, _id:'user-id-1'}) == 0) { ... }
is there a way to declare a Model schema in mongoose so that when the model is new'ed the _id field would auto-generate?
for example:
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var ObjectIdSchema = Schema.ObjectId;
var ObjectId = mongoose.Types.ObjectId;
var PersonSchema = new Schema({
_id: ObjectIdSchema,
firstName: {type: String, default: 'N/A'},
lastName: {type: String, default: 'N/A'},
age: {type: Number, min: 1}
});
var Person = mongoose.model('Person', PersonSchema);
at first, i thought great!, i'll just do
_id: {type:ObjectIdSchema, default: new ObjectId()}
but of course that doesn't work, because new ObjectId() is only called on initialize of schema. so calling new Persion() twice creates two objects with the same _id value.
so is there a way to do it so that every time i do "new Person()" that a new ObjectId() is generated?
the reason why i'm trying to do this is because i need to know the value of the new person's _id value for further processing.
i also tried:
var person = new Person({firstName: "Joe", lastName: "Baz"});
person.save(function(err, doc, num){
console.log(doc._id);
});
even then, doc doesn't contain the ObjectId. but if i look in the database, it does contain it.
p.s. i'm using mongoose 2.7.1
p.p.s. i know i can manually create the ObjectId when creating the person as such:
var person = new Person({_id: new ObjectId(), firstName: "Joe", lastName: "Baz"});
but i rather not have to import ObjectId and have to new it every time i want to new a Person. guess i'm used to using the java driver for mongodb, where i can just create the value for the _id field in the Model constructor.
Add the auto flag:
_id: {
type: mongoose.Schema.Types.ObjectId,
index: true,
required: true,
auto: true,
}
source
the moment you call var person = new Person();
person._id should give you the id (even if it hasn't been saved yet). Just instantiating it is enough to give it an id. You can still save it after, and that will store the id as well as the rest of the person object
Instead of:
_id: {type:ObjectIdSchema, default: new ObjectId()}
You should do:
_id: {type:ObjectIdSchema, default: function () { return new ObjectId()} }
Taken from the official MongoDB Manual and Docs:
_id
A field required in every MongoDB document. The _id field must have a
unique value. You can think of the _id field as the document’s primary
key. If you create a new document without an _id field, MongoDB
automatically creates the field and assigns a unique BSON ObjectId.
Source
ObjectID
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values consist of 12 bytes, where the first four bytes are a
timestamp that reflect the ObjectId’s creation. Specifically:
a 4-byte value representing the seconds since the Unix epoch, a 5-byte
random value, and a 3-byte counter, starting with a random value. In
MongoDB, each document stored in a collection requires a unique _id
field that acts as a primary key. If an inserted document omits the
_id field, the MongoDB driver automatically generates an ObjectId
for the _id field.
This also applies to documents inserted through update operations with
upsert: true.
MongoDB clients should add an _id field with a unique ObjectId.
Using ObjectIds for the _id field provides the following additional
benefits: in the mongo shell, you can access the creation time of the
ObjectId, using the ObjectId.getTimestamp() method. sorting on an _id
field that stores ObjectId values is roughly equivalent to sorting by
creation time. IMPORTANT While ObjectId values should increase over
time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values
created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
Source
Explicitly declaring _id:
When explicitly declaring the _id field, specify the auto option:
new Schema({ _id: { type: Schema.ObjectId, auto: true }})
ObjectId only - Adds an auto-generated ObjectId default if turnOn is true.
Source
TL;DR
If you create a new document without an _id field, MongoDB automatically creates the field and assigns a unique BSON ObjectId.
This is good way to do this
in model:
const schema = new Schema({ userCurrencyId:{type: mongoose.Schema.Types.ObjectId,
index: true,
required: true,
auto: true});