Mongoose update middleware - need to create hooks for every single update middleware? - node.js

Let's say I have the following schema:
PersonSchema = {
name: String,
timesUpdated: {
type: Number,
default: 0
}
}
Every time that the given person is updated, I would want the timesUpdated field to increment by one. Now, I could use Mongoose's update middleware hook, which would be called by something like
PersonModel.update({_id: <id>}, {name: 'new name'})
and my timesUpdated field would be appropriately incremented. However, if I only wrote a hook for the update middleware, the following code would not update my timesUpdated field:
PersonModel.updateOne({_id: <id>}, {name: 'new name'})
In order for my count to be updated, I would have to write middleware for the udpateOne query. This pattern repeats for several other similar middleware hooks, such as updateMany, replaceOne, save (if you want to update a document this way), findOneAndUpdate and I'm sure many others.
I use the example of an updated count for simplicity, but I could also have used an example where some other unrelated action happens upon changing my name. Am I missing something in how hooks should be used, or is this a limitation of mongoose hooks?

Pre save hook will only be executed with following functions according to mongoose's middleware document.
init
validate
save
remove
However update functions are working directly with MongoDB, therefor there is no general use hook applies on all update functions. See related discussion on Github.

I'd suggest using a function to perform your task before/after all required calls (to update or updateOne) rather a hook, because of the limitations mentioned in the other answer and the question.
Or perhaps limit the kinds of methods that can be called to the ones that have the hook set.
Or use a hook which will always get called in the middle-ware sequence, like a validate hook.

Related

Mongoose pre hook middleware with typescript, how to set up types to access the query objects parameters?

I am using Mongoose 5+ and currently do not have the option of upgrading to Mongoose 6 (which seems to have fixed several issues concerning types and stuff)
I am refactoring from js to ts, and I keep hitting a wall when dealing with pre hooks. In this particular case, I want to understand how to pass generic types to the pre hook and not have typescript get mad that i am trying to access certain fields of this
So my prehook looks like this. It is using findOneAndUpdate and in this case the this is bound to the Query, which gives me some particular properties to access, such as this._update and this._conditions. I use this._update to access the information I am trying to update in this document, and I use that to modify another document in another collection before committing to the change in this document. I use this so the operation will be atomic and no changes will be committed to the DB if any of the other writes fails. However, typescript does not like me accessing values from this and outlined below are the errors i get
unitsSchema.pre('findOneAndUpdate', async function(next){
const update = this._update; //TSError: Property '_update' does not exist on type 'Query<any, any>'
const conditions = this._conditions; //Property '_conditions' does not exist on type 'Query<any, any>'
if(update.isDeleted === true){
//remove the unit from the condo model
await Condos.updateOne({_id:conditions.condoID},
{$pull:{
units:conditions._id
}}).catch(e=>next(e));
await UnitSttmt.updateMany({unitID:conditions._id},
{isDeleted:true})
.catch(e=>next(e));
}
//I even get an error here for some reason, i dont understand why here next is expecting a required argument, but not on other similar hooks
next(); // Expected 1 arguments, but got 0
}
I have tried passing it my document interface which extends mongoose.Document type and some other types too, but to no avail. Does anyone have any insight on how to get typescript to recognize the available Query paramters that exist?
Some examples I have tried
unitsSchema.pre<Query<any, UnitsDocument>>(...)
// this one obviously works but kind of defeats the purpose, but at least it gets rid of my error
unitsSchema.pre<any>(...)
also want ot mention the code works fine as javascript, it must be an error or limitation in the type declarations.. or maybe I'm just not supposed to be accessing those fields from the Query this ?

what is the difference between document middleware, model middleware, aggregate middleware, and query middleware?

I am fairly new to MongoDB and Mongoose, I am really confused about why some of the middleware works at the document and some works on query. I am also confused about why some of the query methods return documents and some return queries. If a query is returning document it is acceptable, but why a query return query and what really it is.
Adding more to my question what is a Document function and Model or Query function, because both of them have some common methods like updateOne.
Moreover, I have gathered all these doubts from the mongoose documentation.
Tl;dr: the type of middleware most commonly defines what the this variable in a pre/post hook refers to:
Middleware Hook
'this' refers to the
methods
Document
Document
validate, save, remove, updateOne, deleteOne, init
Query
Query
count, countDocuments, deleteMany, deleteOne, estimatedDocumentCount, find, findOne, findOneAndDelete, findOneAndRemove, findOneAndReplace, findOneAndUpdate, remove, replaceOne, update, updateOne, updateMany
Aggregation
Aggregation object
aggregate
Model
Model
insertMany
Long explanation:
Middlewares are nothing, but built-in methods to interact with the database in different ways. However, as there are different ways to interact with the database, each with different advantages or preferred use-cases, they also behave differently to each other and therefor their middlewares can behave differently, even if they have the same name.
By themselves, middlewares are just shorthands/wrappers for the mongodbs native driver that's being used under the hood of mongoose. Therefor, you can usually use all middlewares, as if you were using regular methods of objects without having to care if it's a Model-, Query-, Aggregation- or Document-Middleware, as long as it does what you want it to.
However, there are a couple of use-cases where it is important to differentiate the context in which these methods are being called.
The most prominent use-case being hooks. Namely the *.pre() and the *.post() hooks. These hooks are methods that you can "inject" into your mongoose setup, so that they are being executed before or after specific events.
For example:
Let's assume I have the following Schema:
const productSchema = new Schema({
name: 'String',
version: {
type: 'Number',
default: 0
}
});
Now, let's say you always want to increase the version field with every save, so that it automatically increases the version field by 1.
The easiest way to do this would be to define a hook that takes care of this for us, so we don't have to care about this when saving an object. If we for example use .save() on the document we just created or fetched from the database, we'd just have to add the following pre-hook to the schema like this:
productSchema.pre('save', function(next) {
this.version = this.version + 1; // or this.version += 1;
next();
});
Now, whenever we call .save() on a document of this schema/model, it will always increment the version before it is actually being saved, even if we only changed the name.
However, what if we don't use the .save() or any other document-only middleware but e.g. a query middleware like findOneAndUpdate() to update an object?
Then, we won't be able to use the pre('save') hook, as .save() won't be called. In this case, we'd have to implement a similar hook for findOneAndUpdate().
Here, however, we finally come to the differences in the middlewares, as the findOneAndUpdate() hook won't allow us to do that, as it is query hook, meaning it does not have access to the actual document, but only to the query itself. So if we e.g. only change the name of the product the following middleware would not work as expected:
productSchema.pre('findOneAndUpdate', function(next) {
// this.version is undefined in the query and would therefor be NaN
this.version = this.version + 1;
next();
});
The reason for this is, that the object is directly updated in the database and not first "downloaded" to nodejs, edited and "uploaded" again. This means, that in this hook this refers to the query and not the document, meaning, we don't know what the current state of version is.
If we were to increment the version in a query like this, we'd need to update the hook as follows, so that it automatically adds the $inc operator:
productSchema.pre('findOneAndUpdate', function(next) {
this.$inc = { version: 1 };
next();
});
Alternatively, we could emulate the previous logic by manually fetching the target document and editing it using an async function. This would be less efficient in this case, as it would always call the db twice for every update, but would keep the logic consistent:
productSchema.pre('findOneAndUpdate', async function() {
const productToUpdate = await this.model.findOne(this.getQuery());
this.version = productToUpdate.version + 1;
next();
});
For a more detailed explanation, please the check the official documentation that also has a designated paragraph for the problem of having colliding naming of methods (e.g. remove() being both a Document and Query middleware method)

Sequelize.js - how to properly use get methods from associations (no sql query on each call)?

I'm using Sequelize.js for ORM and have a few associations (which actually doesn't matter now). My models get get and set methods from those associations. Like this (from docs):
var User = sequelize.define('User', {/* ... */})
var Project = sequelize.define('Project', {/* ... */})
// One-way associations
Project.hasOne(User)
/*
...
Furthermore, Project.prototype will gain the methods getUser and setUser
according to the first parameter passed to define.
*/
So now, I have Project.getUser(), which returns a Promise. But if I call this twice on the very same object, I get SQL query executed twice.
My question is - am I missing something out, or this is an expected behavior? I actually don't want to make additional queries each time I call the same method on this object.
If this is expected - should I use custom getters with member variables which I manually populate and return if present? Or there is something more clever? :)
Update
As from DeBuGGeR's answer - I understand I can use includes when making a query in order to eager load everything, but I simply don't need it, and I can't do it all the time. It's waste of resources and a big overhead if I load my entire DB at the beginning, just to understand (by some criteria) that I won't need it. I want to make additional queries depending on situation. But I also can't afford to destroy all models (DAO objects) that I have and create new ones, with all the info inside them. I should be able to update parts of them, which are missing (from relations).
If you use getUser() it will make the query call, it dosent give you access to the user. You can manually save it to project.user or project.users depending on the association.
But you can try Eager Loading
Project.find({
include: [
{ model: User, as: 'user' } // here you HAVE to specify the same alias as you did in your association
]
}).success(function(project){
project.user // contains the user
});
Also e.g of getUser(). Dont expect it to automatically cache user and dont override this cleverly as it will create side effects. getUser is expected to get from database and it should!
Project.getUser().then(function(user){
// user is available and is a sequelize object
project.user = user; // save project.user and use it till u want to
})
The first part of things is clear - every call to get[Association] (for example Project.getUser()) WILL result in database query.
Sequelize does not maintain any kind of state nor cache for the results. You can get user in the Promisified result of the call, but if you want it again - you will have to make another query.
What #DeBuGGeR said - about using accessors is also not true - accessors are present only immediately after a query, and are not preserved.
As sometimes this is not ok, you have to implement some kind of caching system by yourself. Here comes the tricky part:
IF you want to use the same get method Project.getUser(), you won't be able to do it, as Sequelize overrides your instanceMethods. For example, if you have the association mentioned above, this won't work:
instanceMethods: {
getUser: function() {
// check if you have it, otherwise make a query
}
}
There are few possible ways to fix it - either change Sequelize core a little (to first check if the method exists), or use some kind of wrapper to those functions.
More details about this can be found here: https://github.com/sequelize/sequelize/issues/3707
Thanks to mickhansen for the cooperation on how to understand what to do :)

Denormalization with Mongoose: How to synchronize changes

What is the best way to propagate updates when you have a denormalized Schema? Should it be all done in the same function?
I have a schema like so:
var Authors = new Schema({
...
name: {type: String, required:true},
period: {type: Schema.Types.ObjectId, ref:'Periods'},
quotes: [{type: Schema.Types.ObjectId, ref: 'Quotes'}]
active: Boolean,
...
})
Then:
var Periods = new Schema({
...
name: {type: String, required:true},
authors: [{type: Schema.Types.ObjectId, ref:'Authors'}],
active: Boolean,
...
})
Now say I want to denormalize Authors, since the period field will always just use the name of the period (which is unique, there can't be two periods with the same name). Say then that I turn my schema into this:
var Authors = new Schema({
...
name: {type: String, required:true},
period: String, //no longer a ref
active: Boolean,
...
})
Now Mongoose doesn't know anymore that the period field is connected to the Period schema. So it's up to me to update the field when the name of a period changes. I created a service module that offers an interface like this:
exports.updatePeriod = function(id, changes) {...}
Within this function I go through the changes to update the period document that needs to be updated. So here's my question. Should I, then, update all authors within this method? Because then the method would have to know about the Author schema and any other schema that uses period, creating a lot of coupling between these entities. Is there a better way?
Perhaps I can emit an event that a period has been updated and all the schemas that have denormalized period references can observe it, is that a better solution? I'm not quite sure how to approach this issue.
Ok, while I wait for a better answer than my own, I will try to post what I have been doing so far.
Pre/Post Middleware
The first thing I tried was to use the pre/post middlewares to synchronize documents that referenced each other. (For instance, if you have Author and Quote, and an Author has an array of the type: quotes: [{type: Schema.Types.ObjectId, ref:'Quotes'}], then whenever a Quote is deleted, you'd have to remove its _id from the array. Or if the Author is removed, you may want all his quotes removed).
This approach has an important advantage: if you define each Schema in its own file, you can define the middleware there and have it all neatly organized. Whenever you look at the schema, right below you can see what it does, how its changes affect other entities, etc:
var Quote = new Schema({
//fields in schema
})
//its quite clear what happens when you remove an entity
Quote.pre('remove', function(next) {
Author.update(
//remove quote from Author quotes array.
)
})
The main disadvantage however is that these hooks are not executed when you call update or any Model static updating/removing functions. Rather you need to retrieve the document and then call save() or remove() on them.
Another smaller disadvantage is that Quote now needs to be aware of anyone that references it, so that it can update them whenever a Quote is updated or removed. So let's say that a Period has a list of quotes, and Author has a list of quotes as well, Quote will need to know about these two to update them.
The reason for this is that these functions send atomic queries to the database directly. While this is nice, I hate the inconsistency between using save() and Model.Update(...). Maybe somebody else or you in the future accidently use the static update functions and your middleware isn't triggered, giving you headaches that you struggle to get rid of.
NodeJS Event Mechanisms
What I am currently doing is not really optimal but it offers me enough benefits to actually outweight the cons (Or so I believe, if anyone cares to give me some feedback that'd be great). I created a service that wraps around a model, say AuthorService that extends events.EventEmitter and is a Constructor function that will look roughly like this:
function AuthorService() {
var self = this
this.create = function() {...}
this.update = function() {
...
self.emit('AuthorUpdated, before, after)
...
}
}
util.inherits(AuthorService, events.EventEmitter)
module.exports = new AuthorService()
The advantages:
Any interested function can register to the Service
events and be notified. That way, for instance, when a Quote is
updated, the AuthorService can listen to it and update the Authors
accordingly. (Note 1)
Quote doesn't need to be aware of all the documents that reference it, the Service simply triggers the QuoteUpdated event and all the documents that need to perform operations when this happens will do so.
Note 1: As long as this service is used whenever anyone needs to interact with mongoose.
The disadvantages:
Added boilerplate code, using a service instead of mongoose directly.
Now it isn't exactly obvious what functions get called when you
trigger the event.
You decouple producer and consumer at the cost of legibility (since
you just emit('EventName', args), it's not immediately obvious
which Services are listening to this event)
Another disadvantage is that someone can retrieve a Model from the Service and call save(), in which the events won't be triggered though I'm sure this could be addressed with some kind of hybrid between these two solutions.
I am very open to suggestions in this field (which is why I posted this question in the first place).
I'm gonna speak more from an architectural point of view than a coding point of view since when it comes right down to it, you can pretty-much achieve anything with enough lines of code.
As far as I've been able to understand, your main concern has been keeping consistency across your database, mainly removing documents when their references are removed and vice-versa.
So in this case, rather than wrapping the whole functionality in extra code I'd suggest going for atomic Actions, where an Action is a method you define yourself that performs a complete removal of an entity from the DB (both document and reference).
So for example when you wanna remove an author's quote, you do something like removing the Quote document from the DB and then removing the reference from the Author document.
This sort of architecture ensures that each of these Actions performs a single task and performs it well, without having to tap into events (emitting, consuming) or any other stuff. It's a self-contained method for performing its own unique task.

How to obtain a MongoDb collection in NodeJS

There are two different methods to obtain a reference to a MongoDB collection - both of them are used throughout the official documentation.
There is
var mycollection = db.collection('mycollection)'
and there is
db.collection('mycollection', function(err, collection){
//use collection
}
I tend to use the second one because it is consistent with "db.createCollecion(collection, callback)"
What is the difference between these methods?
Is there any database interaction when using these methods?
If you look at the code for Database, currently around line 456, you'll see that the only difference between the two in the way you've used them is how the collection object is returned. If you specify a callback, then it's returned that way, otherwise, it's returned as the value to the function. If you set the options however and specifically the option strict to true, you need to use the callback. When strict is set to true, the collection is verified before continuing (asynchronously).
Given that collections can be created dynamically (and usually are upon first use), there often isn't need to use strict mode.
So, it's really matter of personal coding preference otherwise. There is normally no activity to the database when creating a Collection object via: db.collection('collectionname') with the exception I mentioned above.

Resources