Preventing concurrent access to documents in Mongoose - node.js

My server application (using node.js, mongodb, mongoose) has a collection of documents for which it is important that two client applications cannot modify them at the same time without seeing each other's modification.
To prevent this I added a simple document versioning system: a pre-hook on the schema which checks if the version of the document is valid (i.e., not higher than the one the client last read). At first sight it works fine:
// Validate version number
UserSchema.pre("save", function(next) {
var user = this
user.constructor.findById(user._id, function(err, userCurrent) { // userCurrent is the user that is currently in the db
if (err) return next(err)
if (userCurrent == null) return next()
if(userCurrent.docVersion > user.docVersion) {
return next(new Error("document was modified by someone else"))
} else {
user.docVersion = user.docVersion + 1
return next()
}
})
})
The problem is the following:
When one User document is saved at the same time by two client applications, is it possible that these interleave between the pre-hook and the actual save operations? What I mean is the following, imagine time going from left to right and v being the version number (which is persisted by save):
App1: findById(pre)[v:1] save[v->2]
App2: findById(pre)[v:1] save[v->2]
Resulting in App1 saving something that has been modified meanwhile (by App2), and it has no way to notice that it was modified. App2's update is completely lost.
My question might boil down to: Do the Mongoose pre-hook and the save method happen in one atomic step?
If not, could you give me a suggestion on how to fix this problem so that no update ever gets lost?
Thank you!

MongoDB has findAndModify which, for a single matching document, is an atomic operation.
Mongoose has various methods that use this method, and I think that they will suit your use case:
Model.findOneAndUpdate()
Model.findByIdAndUpdate()
Model.findOneAndRemove()
Model.findByIdAndRemove()
Another solution (one that Mongoose itself uses as well for its own document versioning) is to use the Update Document if Current pattern.

Related

what is the difference between document middleware, model middleware, aggregate middleware, and query middleware?

I am fairly new to MongoDB and Mongoose, I am really confused about why some of the middleware works at the document and some works on query. I am also confused about why some of the query methods return documents and some return queries. If a query is returning document it is acceptable, but why a query return query and what really it is.
Adding more to my question what is a Document function and Model or Query function, because both of them have some common methods like updateOne.
Moreover, I have gathered all these doubts from the mongoose documentation.
Tl;dr: the type of middleware most commonly defines what the this variable in a pre/post hook refers to:
Middleware Hook
'this' refers to the
methods
Document
Document
validate, save, remove, updateOne, deleteOne, init
Query
Query
count, countDocuments, deleteMany, deleteOne, estimatedDocumentCount, find, findOne, findOneAndDelete, findOneAndRemove, findOneAndReplace, findOneAndUpdate, remove, replaceOne, update, updateOne, updateMany
Aggregation
Aggregation object
aggregate
Model
Model
insertMany
Long explanation:
Middlewares are nothing, but built-in methods to interact with the database in different ways. However, as there are different ways to interact with the database, each with different advantages or preferred use-cases, they also behave differently to each other and therefor their middlewares can behave differently, even if they have the same name.
By themselves, middlewares are just shorthands/wrappers for the mongodbs native driver that's being used under the hood of mongoose. Therefor, you can usually use all middlewares, as if you were using regular methods of objects without having to care if it's a Model-, Query-, Aggregation- or Document-Middleware, as long as it does what you want it to.
However, there are a couple of use-cases where it is important to differentiate the context in which these methods are being called.
The most prominent use-case being hooks. Namely the *.pre() and the *.post() hooks. These hooks are methods that you can "inject" into your mongoose setup, so that they are being executed before or after specific events.
For example:
Let's assume I have the following Schema:
const productSchema = new Schema({
name: 'String',
version: {
type: 'Number',
default: 0
}
});
Now, let's say you always want to increase the version field with every save, so that it automatically increases the version field by 1.
The easiest way to do this would be to define a hook that takes care of this for us, so we don't have to care about this when saving an object. If we for example use .save() on the document we just created or fetched from the database, we'd just have to add the following pre-hook to the schema like this:
productSchema.pre('save', function(next) {
this.version = this.version + 1; // or this.version += 1;
next();
});
Now, whenever we call .save() on a document of this schema/model, it will always increment the version before it is actually being saved, even if we only changed the name.
However, what if we don't use the .save() or any other document-only middleware but e.g. a query middleware like findOneAndUpdate() to update an object?
Then, we won't be able to use the pre('save') hook, as .save() won't be called. In this case, we'd have to implement a similar hook for findOneAndUpdate().
Here, however, we finally come to the differences in the middlewares, as the findOneAndUpdate() hook won't allow us to do that, as it is query hook, meaning it does not have access to the actual document, but only to the query itself. So if we e.g. only change the name of the product the following middleware would not work as expected:
productSchema.pre('findOneAndUpdate', function(next) {
// this.version is undefined in the query and would therefor be NaN
this.version = this.version + 1;
next();
});
The reason for this is, that the object is directly updated in the database and not first "downloaded" to nodejs, edited and "uploaded" again. This means, that in this hook this refers to the query and not the document, meaning, we don't know what the current state of version is.
If we were to increment the version in a query like this, we'd need to update the hook as follows, so that it automatically adds the $inc operator:
productSchema.pre('findOneAndUpdate', function(next) {
this.$inc = { version: 1 };
next();
});
Alternatively, we could emulate the previous logic by manually fetching the target document and editing it using an async function. This would be less efficient in this case, as it would always call the db twice for every update, but would keep the logic consistent:
productSchema.pre('findOneAndUpdate', async function() {
const productToUpdate = await this.model.findOne(this.getQuery());
this.version = productToUpdate.version + 1;
next();
});
For a more detailed explanation, please the check the official documentation that also has a designated paragraph for the problem of having colliding naming of methods (e.g. remove() being both a Document and Query middleware method)

Nodejs, MongoDB concurrent request creates duplicate record

Let me be real simple. I am running node.js server. When I receive data from patch request, first I need to check in database. If the row exists I will update it, otherwise I will create that record. Here is my code. This is what I am calling in the request handler callback.
let dbCollection = db$master.collection('users');
createUser(req.body).then(user => {
dbCollection.updateMany(
{ rmisId: req.body.rmisId },
{ $set: { ...user } },
{ upsert: true }
).then((result) => {
log.debug("RESULTS", user);
return result;
})
.catch((err => {
log.debug(err);
return err;
}));
})
This is working fine in sequential requests. But its creating duplicate record when I receive 10 concurrent request. I am running on my local machine and replicating concurrent request using Apache JMeter. Please help me if you have experienced this kind of problem.
Thank you !
UPDATE
I have tested another approach that reads the database like dbCollection.find({rmisId: req.body.rmisId}) the database for determine its existing or no. But it has no difference at all.
You cannot check-and-update. Mongodb operations are atomic at the document level. After you check and see that the record does not exist, another request may create the document you just checked, and after that you can recreate the same record if you don't have unique indexes or if you're generating the IDs.
Instead, you can use upsert, as you're already doing, but without the create. It looks like you're getting the ID from the request, so simply search using that ID, and upsert the user record. That way if some other thread inserts it before you do, you'll update what the previous thread inserted. If this is not something you prefer, add a unique index for that user ID field.

Express with pug, Postgres and proper MVC

I recently started using Node.js + Express.js (generated with pug) + pg-promise for handling db.
My first target is to obtain data from Postgres (already set up) and display it pretty using render and pug. Let's say it is user list from Users table.
On this restful tutorial I have learned how to get data and return it as JSON - it worked.
Based on Mozilla's tutorial I seperated my code:
routes/users.js: where for '/' I call user_controller.user_list method (using router.get)
controllers/userController.js I have exported user_list where I would like to ask model for data and call render if I have results
queries.js which is kinda my model? But I'm not sure. It has API: connection to db with promises and one function for every query I am going to use in Controllers. I believe I should have like one Model file per table (or any logical entity) but where to store pgp connections?
This file is based on first tutorial I mentioned
// queries.js (connectionString is set properly to my postgres)
var pgp = require('pg-promise')(options);
var db = pgp(connectionString);
function getUsers(req, res, next) {
db.any('SELECT (user_id, username) FROM public.users ORDER BY user_id ASC LIMIT 1000')
.then(function (data) {
res.json({ data: data });
})
.catch(function (err) {
return next(err);
});
}
module.exports = {
getUsers: getUsers
};
Here starts my problem as most tutorials uses mongoose which is very model-db-schema-friendly and what I have is simple 'SELECT ...' string I pass to pg-promise's any() function.
Therefore I have no model class like User.
In userControllers.js I don't know how to call getUsers() to handle its data. Returning JS object from getUsers() would be nice.
Also: where should I call render? In controller or only in
db.any(...).then(function (data) { <--here--> })
Before, I also tried to embed whole Postgres handling into Controller but from db.any() I got this array for handling:
[{ row: '(1,John)' },{ row: '(2,Amy)' },{ row: '(50,Peter)' } ]
Didn't know how go from there as I probably lost my API functionality as well ;-)
I am browsing through multiple tutorials how to handle MVC but usually they handle MongoDB and
satisfy readers with res.send() not render().
I am not sure that I understand what your question is exactly about, but since I do not have enough reputation to comment, I'll do my best to help you with your interrogations. :)
First, regarding the queries.js file, it is IMO not exactly a model, but rather a DAO (Data Access Object) file. DAO comes between you Model (which is actually you database) and your Controller layers. There usually is a DAO file per object (User, Pet, whatever you want) in your data model.
When the data model is rather complex, it can be useful to use an Object Relational Mapping (ORM) such as Mongoose to map your database and execute complexe processes on your objects. In such a case, you might need a specific file per object so as to describe your model and store your queries. But since you don't need an ORM, you DAO can directly interact with your database. That is why you do not have a User.js file.
Regarding the way the db object should be used, I think you should refer directly to pg-promise documentation on the matter.
IMPORTANT: For any given connection, you should only create a single
Database object in a separate module, to be shared in your application
(see the code example below). If instead you keep creating the
Database object dynamically, your application will suffer from loss in
performance, and will be getting a warning in a development
environment (when NODE_ENV = development)
As a matter of fact, a db object in pg-promise sort of represents the database itself and is actually designed for the simultaneous use of several databases, which does not seem to be your case for the moment.
Finally, when it comes to the render function, I believe it should be in the controller, as your DAO is not supposed to know how the data it has gathered is going to be used.
Modularity is always a time-saving choice on the long-term.
Furthermore, note that you might later need a Business Layer between your DAO and your controller, in order to preprocess and postprocess data you are going to persist or to display. In such a case, if you need for instance to ask for data from your database, you will need to render data after it is processed by the Business layer. If the render is made in the DAO layer, it will not be possible.
In the link I provided earlier to pg-promise's db object connection, you will also find documentation on the any() method. You might already have looked it up.
It specifically states that it returns
A promise object that represents the query result:
When no rows are returned, it resolves with an empty array.
When 1 or more rows are returned, it resolves with the array of rows.
so your returned data is a JS Array. If you want to make it a JS object, just use
JSON.stringify(yourArray) to process your data before rendering it in your controller.
But I wonder if Pug is not able to use your data directly.
Also, if you cannot get any data out of your DAO, maybe you should check that your data object is not empty, as such a case is tolerated by the any() method. If you expect your query to always return something, you might want to consider using the many() or the one() methods.
I hope this helps you.

How to Determine if a Document was Actually Changed During Update in MongoDB

I am using the Mongoose driver with NodeJS. I have quite a simple update call whose purpose is to sync an external source of meetings to my database:
collection.update({ meeting_id: doc.meeting_id}, newDoc, {upsert:true})
The object returned determines whether or not an update or an insert occurred. This works perfectly. My issue is that I must determine if an actual change occurred. When you update a document with itself, MongoDB treats this in exactly the same way as if all fields were changed.
So my question is: Is there any good way to tell if anything actually changed? I could search for each document then compare each field manually, but this seems like a poor (and slow) solution.
you can use findAndModify which will return updated results as compared to update which will return no of updated records.
collection.findAndModify(
{ meeting_id: doc.meeting_id},
newDoc,
{ new: true },
function (err, documents) {
res.send({ error: err, affected: documents });
}
);

Sequelize.js - how to properly use get methods from associations (no sql query on each call)?

I'm using Sequelize.js for ORM and have a few associations (which actually doesn't matter now). My models get get and set methods from those associations. Like this (from docs):
var User = sequelize.define('User', {/* ... */})
var Project = sequelize.define('Project', {/* ... */})
// One-way associations
Project.hasOne(User)
/*
...
Furthermore, Project.prototype will gain the methods getUser and setUser
according to the first parameter passed to define.
*/
So now, I have Project.getUser(), which returns a Promise. But if I call this twice on the very same object, I get SQL query executed twice.
My question is - am I missing something out, or this is an expected behavior? I actually don't want to make additional queries each time I call the same method on this object.
If this is expected - should I use custom getters with member variables which I manually populate and return if present? Or there is something more clever? :)
Update
As from DeBuGGeR's answer - I understand I can use includes when making a query in order to eager load everything, but I simply don't need it, and I can't do it all the time. It's waste of resources and a big overhead if I load my entire DB at the beginning, just to understand (by some criteria) that I won't need it. I want to make additional queries depending on situation. But I also can't afford to destroy all models (DAO objects) that I have and create new ones, with all the info inside them. I should be able to update parts of them, which are missing (from relations).
If you use getUser() it will make the query call, it dosent give you access to the user. You can manually save it to project.user or project.users depending on the association.
But you can try Eager Loading
Project.find({
include: [
{ model: User, as: 'user' } // here you HAVE to specify the same alias as you did in your association
]
}).success(function(project){
project.user // contains the user
});
Also e.g of getUser(). Dont expect it to automatically cache user and dont override this cleverly as it will create side effects. getUser is expected to get from database and it should!
Project.getUser().then(function(user){
// user is available and is a sequelize object
project.user = user; // save project.user and use it till u want to
})
The first part of things is clear - every call to get[Association] (for example Project.getUser()) WILL result in database query.
Sequelize does not maintain any kind of state nor cache for the results. You can get user in the Promisified result of the call, but if you want it again - you will have to make another query.
What #DeBuGGeR said - about using accessors is also not true - accessors are present only immediately after a query, and are not preserved.
As sometimes this is not ok, you have to implement some kind of caching system by yourself. Here comes the tricky part:
IF you want to use the same get method Project.getUser(), you won't be able to do it, as Sequelize overrides your instanceMethods. For example, if you have the association mentioned above, this won't work:
instanceMethods: {
getUser: function() {
// check if you have it, otherwise make a query
}
}
There are few possible ways to fix it - either change Sequelize core a little (to first check if the method exists), or use some kind of wrapper to those functions.
More details about this can be found here: https://github.com/sequelize/sequelize/issues/3707
Thanks to mickhansen for the cooperation on how to understand what to do :)

Resources