MongoDB, there is no info online about transactions and read operations interaction

MongoDB, there is no info online about transactions and read operations interaction - node.js

I'm trying to get the full picture...
When I create a session, I know all write operations that are associated with this session will either succeed together or roll back together.
I didn't find any official mongo documentation that explains what transactions lock exactly and when does the lock occur during the lifetime of the transaction (by lock I refer to both pessimistic lock or optimistic lock)
post here seems to be based on the assumption that a lock on a document starts after it's been updated and released at the end of the session.
But does the document even needs to lock? Does it indeed lock in that instant? Where can I find documentation on that?
That means that if I do
const person = await findOne({ id },{ session })
const updatedPerson = await updateOne({ id },{ person },{ session , new: true})
There is absolutely no meaning for session being on findOne? Because the specific person document doesn't get locked?
So if between me finding the person and updating the person, some other request have updated Person, updatedPerson could actually be different than person, is that correct? There is no mongoDB built in way with sessions to ensure person will be locked? ( I know there is a schema option for optimisticConcurrency, but I want to understand sessions, and also this option seems to be limited to only throwing an error instead of retrying which seems a bit odd considering usually the behavior you want with optimisticConcurrency is to retry or atleast have the option to)
If that's correct, then the only reason for session to be on strictly read operations would be to be able to view write results that are part of the session.
const updatedPerson = await updateOne({ id },{ field1: 'changed' },{ session , new: true})
const person = await findOne({ id} ,{ session })
Associating person with session here lets me view the updatedPerson post update.
Am I correct in my understanding? If so, that leads me to the next question, specifically on mongoose with .save(). According to mongoose documentation
For example, if you're using save() to update a document, the document can change
in MongoDB in between when you load the document using findOne() and when you
save the document using save() as show below. For many use cases, the save() race
condition is a non-issue. But you can work around it with findOneAndUpdate() (or
transactions) if you need to.
Which raises my question, how can you fix save() race condition with transactions considering transactions do not lock read documents?

I did some manual testings using await new Promise((r) => setTimeout(r, 5000)); and updating the document while a session is ongoing on it to observe its behavior.
My findings are as follows:
In this example:
const person = await findOne({ id },{ session })
const updatedPerson = await updateOne({ id },{ $set: { ...person } },{ session , new: true})
There is meaning for the session on findOne, even though the findOne operation doesn't lock the document, it causes updateOne to fail and abort the transaction if the document that was fetched by findOne , in our case person, was changed by something that is not part of the transaction.
That means you can trust that updatedPerson will be person because person is part of the session. This answer renders the rest of my question irrelevant.

Related

In mongodb is it possible to create a transaction of 2 queries? Like find and update separately?

I use mongodb (with mongoose) and I have a case where I find a document, run a bunch of complicated conditional checks and then update the document.
This is fine but now I want to ensure that the document I'm updating in the end hasn't been updated by a different update query while my conditions have been running.
Can I create a lock or somehow contain all these actions within a transaction?

You add a simple field editor and make sure each process has ownership of the document when it's time to update.
Here is a simple example:
let processId = uniqueID;
//if doc is none then a different process 'owns' it. need to decide on behaviour.
let doc = await collection.findOneAndUpdate({_id: docId, editor: {$exists: false}}, {$set: {editor: processId}})
### do calculations. ###
let newValue = calculatedValue;
let newDoc = await collection.findOneAndUpdate({_id: docId, editor: processId}, {$unset: {editor: ""}})
I feel that using transactions as suggjested in the comments is an overkill, from the docs:
In MongoDB, an operation on a single document is atomic
And
For situations that require atomicity of reads and writes to multiple documents , MongoDB supports multi-document transactions.
Transactions are meant for more complex situations than this, and I feel this overhead is not needed as a simple solution suffices.

Nodejs, MongoDB concurrent request creates duplicate record

Let me be real simple. I am running node.js server. When I receive data from patch request, first I need to check in database. If the row exists I will update it, otherwise I will create that record. Here is my code. This is what I am calling in the request handler callback.
let dbCollection = db$master.collection('users');
createUser(req.body).then(user => {
dbCollection.updateMany(
{ rmisId: req.body.rmisId },
{ $set: { ...user } },
{ upsert: true }
).then((result) => {
log.debug("RESULTS", user);
return result;
})
.catch((err => {
log.debug(err);
return err;
}));
})
This is working fine in sequential requests. But its creating duplicate record when I receive 10 concurrent request. I am running on my local machine and replicating concurrent request using Apache JMeter. Please help me if you have experienced this kind of problem.
Thank you !
UPDATE
I have tested another approach that reads the database like dbCollection.find({rmisId: req.body.rmisId}) the database for determine its existing or no. But it has no difference at all.

You cannot check-and-update. Mongodb operations are atomic at the document level. After you check and see that the record does not exist, another request may create the document you just checked, and after that you can recreate the same record if you don't have unique indexes or if you're generating the IDs.
Instead, you can use upsert, as you're already doing, but without the create. It looks like you're getting the ID from the request, so simply search using that ID, and upsert the user record. That way if some other thread inserts it before you do, you'll update what the previous thread inserted. If this is not something you prefer, add a unique index for that user ID field.

SetTimeout is not working in Mongoose schema post middleware

I am trying to update forgetpassword_token to null in mongodb document after 24 hours of generating forgetpassword_token. So I am using Mongoose schema middleware and setTimeout, but setTimeout is not working.
I have tried to implement async await which is also not working as per my result.
CompanySchema.post('updateOne',true, function(doc,next){
next();
setTimeout(this.update({},{ $set: { forgetpassword_token: null } }).then(result=>{
console.log(result);
}),10000000000);
});

The main problem here is that this implementation is flawed, because if your node application is restarted during the 24-hour window, your timeout will disappear (is an in memory object, not persisted) and the token will remain active, exposing you to security risks.
Manual token verification
A very common solution is to save the token_expiration_date alongside the token, making a date comparison during the related password reset request. If the token_expiration_date is expired, the request return an error and the server must delete the token on the db.
You can also make the opposite: store the token_creation_date and the max-token-ttl in your app code (e.g. 24 hours). In any case you make the date comparison at request time.
#NikKyriakides suggested (see comments) a more sophisticated version of this approach: you create a single JWT token that contains itself the expiration date. When the user request the reset password page you need only to verify if the token is valid calling a single method (no manual date comparison).
The Mongo expire option
A more elegant and effective solution is to create a different mongoose schema for your forgetpassword_token and use the native mongo/mongoose expire option to auto delete documents after a fixed time from their creation.
const secondsInADay = 60 * 60 * 24;
const tokenSchema = mongoose.Schema({
value: String
}, {timestamps: true});
tokenSchema.index({createdAt: 1},{expireAfterSeconds: secondsInADay});
const Token = mongoose.model('Token', tokenSchema);
Then add to your existing CompanySchema a reference to this schema:
forgetpassword_token: {type: mongoose.Schema.Types.ObjectId, ref: 'Token'}
A lot of question exists on this topic, so please also check them alongside with the related mongoose documentation.
The job scheduler
Another approach is to use a job scheduler like agenda to hourly check for expired tokens and delete them. Yes, you can write a setTimeout based check as a module for your app, but if the right tools exists yet, why don't use it? Also check #NikKyriakides comments below for potential drawbacks of this solution.

Preventing concurrent access to documents in Mongoose

My server application (using node.js, mongodb, mongoose) has a collection of documents for which it is important that two client applications cannot modify them at the same time without seeing each other's modification.
To prevent this I added a simple document versioning system: a pre-hook on the schema which checks if the version of the document is valid (i.e., not higher than the one the client last read). At first sight it works fine:
// Validate version number
UserSchema.pre("save", function(next) {
var user = this
user.constructor.findById(user._id, function(err, userCurrent) { // userCurrent is the user that is currently in the db
if (err) return next(err)
if (userCurrent == null) return next()
if(userCurrent.docVersion > user.docVersion) {
return next(new Error("document was modified by someone else"))
} else {
user.docVersion = user.docVersion + 1
return next()
}
})
})
The problem is the following:
When one User document is saved at the same time by two client applications, is it possible that these interleave between the pre-hook and the actual save operations? What I mean is the following, imagine time going from left to right and v being the version number (which is persisted by save):
App1: findById(pre)[v:1] save[v->2]
App2: findById(pre)[v:1] save[v->2]
Resulting in App1 saving something that has been modified meanwhile (by App2), and it has no way to notice that it was modified. App2's update is completely lost.
My question might boil down to: Do the Mongoose pre-hook and the save method happen in one atomic step?
If not, could you give me a suggestion on how to fix this problem so that no update ever gets lost?
Thank you!

MongoDB has findAndModify which, for a single matching document, is an atomic operation.
Mongoose has various methods that use this method, and I think that they will suit your use case:
Model.findOneAndUpdate()
Model.findByIdAndUpdate()
Model.findOneAndRemove()
Model.findByIdAndRemove()
Another solution (one that Mongoose itself uses as well for its own document versioning) is to use the Update Document if Current pattern.

How to ensure two users can atomically confirm transaction has taken place in mongodb

I have a model called a Transaction which has the following schema
var transactionSchema = new mongoose.Schema({
amount: Number,
status: String,
_recipient: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
_sender: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
});
I want both sender and recipient of this transaction to be able to 'confirm' that the transaction took place. The status starts out as "initial". So when only the sender has confirmed the transaction (but the recipient yet not), I want to update the status to "senderConfirmed" or something, and when the recipient has confirmed it (but sender has not), I want to update status to "recipientConfirmed". When they have both confirmed it, I want to update the status to "complete".
The problem is, how I can know when to update it to "complete" in a way that avoids race conditions? If both sender and recipient go to confirm the transaction at the same time, then both threads will think the status is "initial" and update it just to "senderConfirmed" or "recipientConfirmed", when in actuality it ought to go to "complete".
I read about MongoDBs two phase commit approach here but that doesn't quite fit my need, since I don't (in the case that another thread is currently modifying a transaction) want to prevent the second thread from making its update - I just want it to wait until the first thread is finished before doing its update, and then making the content of its update contingent on the latest status of the transaction.

Bottom line is you need "two" update statement to do this for each of sender and recipient respectively. So basically one is going to try and set the "partial" status to complete, and the other will only set the "initial" status match to the "partial" state.
Bulk operations are the best way to implement multiple statements, so you should use these by accessing the underlying driver methods. Modern API releases have the .bulkWrite() method, which degrades nicely if the server version does not support the "bulk" protocol, and just falls back to issuing separate updates.
// sender confirmation
Transaction.collection.bulkWrite(
[
{ "updateOne": {
"filter": {
"_id": docId,
"_sender": senderId,
"status": "recipientConfirmed"
},
"update": {
"$set": { "status": "complete" }
}
}},
{ "updateOne": {
"filter": {
"_id": docId,
"_sender": senderId,
"status": "initial"
},
"update": {
"$set": { "status": "senderConfirmed" }
}
}}
],
{ "ordered": false },
function(err,result) {
// result will confirm only 1 update at most succeeded
}
);
And of course the same applies for the _recipient except the different status check or change. You could alternately issue an $or condition on the _sender or _recipient and have a generic "partial" status instead of coding different update conditions, but the same basic "two update" process applies.
Of course again you "could" just use the regular methods and issue both updates to the sever in another way, possibly even in parallel since the conditions remain "atomic", but that is also the reason for the { "ordered": false } option since their is no determined sequence that needs to be respected here.
Bulk operations though are better than separate calls, since the send and return is only one request and response, as opposed to "two" of each, so the overhead using bulk operations is far less.
But that is the general approach. No single statement could possibly leave a "status" in "deadlock" or mark as "complete" before the other party also issues their confirmation.
There is a "possibility" and a very slim one that a status was changed from "initial" in between the first attempt update and the second, which would result in nothing being updated. In that case, you can "retry" the action on which it "should" update on the subsequent attempt.
This should only ever need "one" retry at most though. And very very rarely.
NOTE: Care should be taken when using the .collection accessor on Mongoose models. All the regular model methods have built in logic to "ensure" the connection to the database is actually present before they do anything, and in fact "queue" operations until a connection is present.
It's generally good practice to wrap your application startup in an event handler to ensure the database connection:
mongoose.on("open",function() {
// App startup and init here
})
So using the "on" or "once" events for this case.
Generally though a connection is always present either after this event is fired, or after any "regular" model method has already been called in the application.
Possibly mongoose will include methods like .bulkWrite() directly on the model methods in future releases. But presently it does not, so the .collection accessor is necessary to grab the underlying Collection object from the core driver.

Update: I am clarifying my answer based on a comment that my original response did not provide an answer.
An alternative approach would be to keep track of the status as two separate properties:
senderConfirmed: true/false,
recipientConfirmed: true/false,
When the sender confirms you simply update the senderConfirmed field. When the recipient confirms you update the recipientConfirmed field. There is no way they will overwrite each other.
To determine if the transaction is complete you would merely query {senderConfirmed:true,recipientConfirmed:true}.
Obviously this is a change to the document schema, so it may not be ideal.
Original Answer:
Is a change to your schema possible? What if you had two properties - senderStatus and recipientStatus? Sender would only update senderStatus and recipient would only update recipientStatus. Then they couldn't overwrite each others changes.
You would still need some other way to mark it as complete, I assume. You could us a cron job or something...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string