Denormalization with Mongoose: How to synchronize changes - node.js

What is the best way to propagate updates when you have a denormalized Schema? Should it be all done in the same function?
I have a schema like so:
var Authors = new Schema({
...
name: {type: String, required:true},
period: {type: Schema.Types.ObjectId, ref:'Periods'},
quotes: [{type: Schema.Types.ObjectId, ref: 'Quotes'}]
active: Boolean,
...
})
Then:
var Periods = new Schema({
...
name: {type: String, required:true},
authors: [{type: Schema.Types.ObjectId, ref:'Authors'}],
active: Boolean,
...
})
Now say I want to denormalize Authors, since the period field will always just use the name of the period (which is unique, there can't be two periods with the same name). Say then that I turn my schema into this:
var Authors = new Schema({
...
name: {type: String, required:true},
period: String, //no longer a ref
active: Boolean,
...
})
Now Mongoose doesn't know anymore that the period field is connected to the Period schema. So it's up to me to update the field when the name of a period changes. I created a service module that offers an interface like this:
exports.updatePeriod = function(id, changes) {...}
Within this function I go through the changes to update the period document that needs to be updated. So here's my question. Should I, then, update all authors within this method? Because then the method would have to know about the Author schema and any other schema that uses period, creating a lot of coupling between these entities. Is there a better way?
Perhaps I can emit an event that a period has been updated and all the schemas that have denormalized period references can observe it, is that a better solution? I'm not quite sure how to approach this issue.

Ok, while I wait for a better answer than my own, I will try to post what I have been doing so far.
Pre/Post Middleware
The first thing I tried was to use the pre/post middlewares to synchronize documents that referenced each other. (For instance, if you have Author and Quote, and an Author has an array of the type: quotes: [{type: Schema.Types.ObjectId, ref:'Quotes'}], then whenever a Quote is deleted, you'd have to remove its _id from the array. Or if the Author is removed, you may want all his quotes removed).
This approach has an important advantage: if you define each Schema in its own file, you can define the middleware there and have it all neatly organized. Whenever you look at the schema, right below you can see what it does, how its changes affect other entities, etc:
var Quote = new Schema({
//fields in schema
})
//its quite clear what happens when you remove an entity
Quote.pre('remove', function(next) {
Author.update(
//remove quote from Author quotes array.
)
})
The main disadvantage however is that these hooks are not executed when you call update or any Model static updating/removing functions. Rather you need to retrieve the document and then call save() or remove() on them.
Another smaller disadvantage is that Quote now needs to be aware of anyone that references it, so that it can update them whenever a Quote is updated or removed. So let's say that a Period has a list of quotes, and Author has a list of quotes as well, Quote will need to know about these two to update them.
The reason for this is that these functions send atomic queries to the database directly. While this is nice, I hate the inconsistency between using save() and Model.Update(...). Maybe somebody else or you in the future accidently use the static update functions and your middleware isn't triggered, giving you headaches that you struggle to get rid of.
NodeJS Event Mechanisms
What I am currently doing is not really optimal but it offers me enough benefits to actually outweight the cons (Or so I believe, if anyone cares to give me some feedback that'd be great). I created a service that wraps around a model, say AuthorService that extends events.EventEmitter and is a Constructor function that will look roughly like this:
function AuthorService() {
var self = this
this.create = function() {...}
this.update = function() {
...
self.emit('AuthorUpdated, before, after)
...
}
}
util.inherits(AuthorService, events.EventEmitter)
module.exports = new AuthorService()
The advantages:
Any interested function can register to the Service
events and be notified. That way, for instance, when a Quote is
updated, the AuthorService can listen to it and update the Authors
accordingly. (Note 1)
Quote doesn't need to be aware of all the documents that reference it, the Service simply triggers the QuoteUpdated event and all the documents that need to perform operations when this happens will do so.
Note 1: As long as this service is used whenever anyone needs to interact with mongoose.
The disadvantages:
Added boilerplate code, using a service instead of mongoose directly.
Now it isn't exactly obvious what functions get called when you
trigger the event.
You decouple producer and consumer at the cost of legibility (since
you just emit('EventName', args), it's not immediately obvious
which Services are listening to this event)
Another disadvantage is that someone can retrieve a Model from the Service and call save(), in which the events won't be triggered though I'm sure this could be addressed with some kind of hybrid between these two solutions.
I am very open to suggestions in this field (which is why I posted this question in the first place).

I'm gonna speak more from an architectural point of view than a coding point of view since when it comes right down to it, you can pretty-much achieve anything with enough lines of code.
As far as I've been able to understand, your main concern has been keeping consistency across your database, mainly removing documents when their references are removed and vice-versa.
So in this case, rather than wrapping the whole functionality in extra code I'd suggest going for atomic Actions, where an Action is a method you define yourself that performs a complete removal of an entity from the DB (both document and reference).
So for example when you wanna remove an author's quote, you do something like removing the Quote document from the DB and then removing the reference from the Author document.
This sort of architecture ensures that each of these Actions performs a single task and performs it well, without having to tap into events (emitting, consuming) or any other stuff. It's a self-contained method for performing its own unique task.

Related

Should I create New Schema Model file for every route OR use already created Schema?

Suppose I have a User Schema which has around 30 fields, and other 3 schemas also.
UserSchema.js
user_schema = new Schema({
user_id: { type: String},
.........//30 properties
});
ctrs_schema = new Schema({
.........10 properties
});
ids_schema = new Schema({
.........5 properties
});
comments_schema = new Schema({
.........10 properties
});
Now I am writing a route which will change the gender of the user, Now in order to do it I can use UserSchema.js but that will load all of the schemas into my route, whereas if I would have created a new file which had only one schema with two fields, then all schemas will not get loaded into the memory for the route.
UserGenderSchema.js
gender_schema = new Schema({
user_id: { type: String},
gender: { type: String}
});
I know there are pros and cons of both of the ways
Pros -
I have to edit only in single file if I would have to change something
for any field.
Cons -
All Schemas are Loading for all routes which are unnecesary. Memory
Wastage.
Will, there be any less memory usage between both of the ways on the threads?
Can anyone Please tell me which architecture will be better or what you are implementing in your project and why?
Thanks
It's better to keep user related fields in just one schema, cause mongo has been there because of its non-relational structure and it gained its performance by keeping relational structures away, so if you create a schema for each field and then create a reference in each of them to point out to the user they are related to, you are kind of using mongo to build a heavily relational structure and mongo is not good as it should be in this situation, so if later on your application you want to somehow show all the information of the user or try to update multiple fields of user or try to show more information of the user in one of your routes or something, you will end up having some serious performance issues. as a conclusion, the cost of loading all the schema to touch only one field is not as much as the cost of breaking down your data structure.

Mongoose update middleware - need to create hooks for every single update middleware?

Let's say I have the following schema:
PersonSchema = {
name: String,
timesUpdated: {
type: Number,
default: 0
}
}
Every time that the given person is updated, I would want the timesUpdated field to increment by one. Now, I could use Mongoose's update middleware hook, which would be called by something like
PersonModel.update({_id: <id>}, {name: 'new name'})
and my timesUpdated field would be appropriately incremented. However, if I only wrote a hook for the update middleware, the following code would not update my timesUpdated field:
PersonModel.updateOne({_id: <id>}, {name: 'new name'})
In order for my count to be updated, I would have to write middleware for the udpateOne query. This pattern repeats for several other similar middleware hooks, such as updateMany, replaceOne, save (if you want to update a document this way), findOneAndUpdate and I'm sure many others.
I use the example of an updated count for simplicity, but I could also have used an example where some other unrelated action happens upon changing my name. Am I missing something in how hooks should be used, or is this a limitation of mongoose hooks?
Pre save hook will only be executed with following functions according to mongoose's middleware document.
init
validate
save
remove
However update functions are working directly with MongoDB, therefor there is no general use hook applies on all update functions. See related discussion on Github.
I'd suggest using a function to perform your task before/after all required calls (to update or updateOne) rather a hook, because of the limitations mentioned in the other answer and the question.
Or perhaps limit the kinds of methods that can be called to the ones that have the hook set.
Or use a hook which will always get called in the middle-ware sequence, like a validate hook.

How do I change this "schema" with out the need of transactions(ACID)?

I have a model like the following one in MongoDB using Mongoose:
Stuff
{
_id: ObjectId...,
stuff: String..,
someBoolean: Boolean,
description: String,
transactionsOfThisStuff: [{
trnsactionNumber: objectId?
date: Date.now()
info: String
}]
}
As you can see, the idea is to move stuff, and I need to register every movement, so I made an array of "transactions" where I keep the history.
To make a "transaction" there are some requirements, for example, the "someBoolean" must be in certain value, etc.
And when a transaction is made, some values of the stuff must be updated.
Also, I must be able to move multiple stuff at the same time (move a table, a plumbus, etc), so all of them will have the same "transactionNumber" in each document.
The problem I see with this model is that I can't easily for example list the last 10 movements, or I don't find efficietn getting the Stuff that has been moved with a given "transactionNumber".
If a use two models:
Stuff
{
_id: ObjectId...,
stuff: String..,
someBoolean: Boolean,
description: String,
}
transaction
{
_id: objectId,
date: Date.now(),
info: String,
stuff: [{type:ObjectId, ref: 'Stuff', requiered: true}]
}
the problem with this idea, is that I would need ACID, since if move multiple Stuff, I need to update some values in the "Stuff" :/
Edit:
I save hardware parts, like CPU, Mouse, Keyboard, Monitor, etc. Each one of them is stored in "logical" warehouses. I can make different types of "transactions", like move it to another warehouse, give it to a person, take it back, etc. Each transaction must be tracked, so I can have a history of that item. Also, I can move many items at the same time within the same transaction, for example in the transaction number 21361764 I moved 10 different items. At the same time, I need to update some info in the item, like isStored: False, True.
The transaction itself must have the date of the execution, a client, a user (who is doing the transaction), and an array of items, etc.
The ideas above the edit are what I came so far, but each one has problems since I would need transactions. There must be a way to solve this problem without falling to a relational database.

using _.omit on mongoose User in node.js

I have a mongoose User schema built like this:
var UserSchema = new Schema({
username: { type: String, required: true, index: { unique: true } },
password: { type: String, required: true },
salt: { type: String, required: true}
});
I want to be able to send this user object to the client side of my application but I don't want to sned the password or salt fields.
So I added he following code to my user model module
U
serSchema.methods.forClientSide = function() {
console.log('in UserSchema.methods.forClientSide');
console.log(this);
//var userForClientSide=_.omit(this,'passsword','salt');
var userForClientSide={_id:this._id, username:this.username };
console.log(userForClientSide);
return userForClientSide;
}
I have required the underscore module (its installed locally via a dependency in my package.js).
not the commented out line - I was expecting it to omit the password and salt fields of the user object but it did not do anything :( the logged object had the full set of properties.
when replaced with the currently used like var userForClientSide={_id:this._id, username:this.username }; it gets the results I want but:
1) I want to know why does the _.omit not work.
2) I don't like my current workaround very much because it actually selects some properties instead of omitting the ones I don't like so if I will add any new propertes to the scema I will have to add them here as well.
This is my first attempt at writing something using node.js/express/mongodb/mongoose etc. so It is very possible hat I am missing some other better solution to this issue (possibly some feature of mongoose ) feel free to educate me of the right way to do things like this.
so basically I want to know both what is the right way to do this and why did my way not work.
thanks
1) I want to know why does the _.omit not work.
Mongoose uses defineProperty and some heavy metaprogramming. If you want to use underscore, first call user.toJSON() to get a plain old javascript object that will work better with underscore without all the metaprogramming fanciness, functions, etc.
A better solution is to use mongo/mongoose's fields object and pass the string "-password -salt" and therefore just omit getting these back from mongo at all.
Another approach is to use the mongoose Transform (search for "tranform" on that page). Your use case is the EXACT use case the documentation uses as an example.
You can also make your mongoose queries "lean" by calling .lean() on your query, in which case you will get back plain javascript objects instead of mongoose model instances.
However, after trying each of these things, I'm personally coming to the opinion that there should be a separate collection for Account that has the login details and a User collection, which will make leaking the hashes extremely unlikely even by accident, but any of the above will work.

Node.js + Mongoose / Mongo & a shortened _id field

I'd like the unique _id field in one of my models to be relatively short: 8 letters/numbers, instead of the usual Mongo _id which is much longer. Having a short unique-index like this helps elsewhere in my code, for reasons I'll skip over here. I've successfully created a schema that does the trick (randomString is a function that generates a string of the given length):
new Schema('Activities', {
'_id': { type: String, unique: true, 'default': function(){ return randomString(8); } },
// ... other definitions
}
This works well so far, but I am concerned about duplicate IDs generated from the randomString function. There are 36^8 possible IDs, so right now it is not a problem... but as the set of possible IDs fills up, I am worried about insert commands failing due to a duplicate ID.
Obviously, I could do an extra query to check if the ID was taken before doing an insert... but that makes me cry inside.
I'm sure there's a better way to be doing this, but I'm not seeing it in the documentation.
This shortid lib https://github.com/dylang/shortid is being used by Doodle or Die, seems to be battle tested.
By creating a unique index on _id you'll get an error if you try to insert a document with a duplicate key. So wrap error handling around any inserts you do that looks for the error and then generates another ID and retries the insert in that case. You could add a method to your schema that implements this enhanced save to keep things clean and DRY.

Resources