In a nested MongoDB call, how do I ensure atomicity? - node.js

Is it possible to atomically update/remove two documents in MongoDB by calling a new update/remove call from within the first update's callback? In the case below, I want to remove the second document from the collection, but only if the update to the first document succeeds:
db.collection.update(conditions1, {$set: set}, function (err,result){
db.collection.remove(conditions2, function(err,doc_num){
db.close();
)};
});
I'm coming across the $isolated query operator, but from what I understand in the documentation, this operator is used for performing a read/write lock on a single query which affects multiple documents, not on performing a read/write lock on one document after performing an update on another document through the first document update's callback, which is what I want to try and accomplish.

No it's not possible because. As documented here a lock would be aquired on a single query and not a whole transaction.
You can overcome atomicity problem by using this.

As Amir said, it's not possible, but you can mimic the behavior in mongo by following the two phase commit pattern. That link also links to how to perform rollback-like operations.

Related

NodeJS MongoDB locks on documents

I am using the mongodb driver and am concerned about possible concurrency issues that could duplicate objects. Reading a few questions and answers on stack overflows I believe that writes operations are atomic, but this may not solve my concurrency problem. Let's say there are two concurrent calls to doSomeAndDelete with the same id: operations in HERE might take some time but only one of these two functions should be able to handle result. How can I implement a lock?
async function doSomeAndDelete(id){
const result = await myCollection.findOne({ _id : id });
/*Some operations on result [HERE]*/
if(/*conditions*/)
await myCollection.deleteOne({ _id : id});
}
For deletion, only one of the operations will succeed and delete the document, while the other one will not delete anything because the document no longer exists. That, assuming, the _id will not be reused.
In general, write operations on a document are atomic, so if you have multiple threads writing to a document, you might want to use mongodb transactions, or use some form of optimistic locking. For example, you can use an ObjectId field in your documents as a version id, and use a new value for each update. When you read-and-update a document, you validate that the field has the same value you obtained from the read, meaning the record has not been modified since you read it.

Are MongoDB queries client-side operations?

Lets say I have a document
{ "_id" : ObjectId("544946347db27ca99e20a95f"), "nameArray": [{"id":1 , first_name: "foo"}]
Now i need to push a array into nameArray using $push . How does document update in that case. Does document get's retrieved on client and updates happens on client and changes are then reflected to Mongodb database server. Entire operation is carried out in Mongodb Database.
What you are asking here is if MongoDB operations are client-side operations. The short answer is NO.
In MongoDB a query targets a specific collection of documents as mentioned in the documentation and a collection is a group of MongoDB documents which exists within a single database. Collections are simply what tables are in RDBMS. So if query targets a specific collection then it means their are perform on database level, thus server-side. The same thing applies for data modification and aggregation operations.
Sometimes, your operations may involve a client-side processing because MongoDB doesn't provides a way to achieve what you want out of the box. Generally speaking, you only those type of processing when you want to modify your documents structure in the collection or change your fields' type. In such situation, you will need to retrieve your documents, perform your modification using bulk operations.
See the documentation:
Your array is inserted into the existing array as one element. If the array does not exists it is created. If the target is not an array the operation fails.
There is nothing stated like "retriving the element to the client and update it there". So the operation is completely done on the database server side. I don't know any operation that works in the way like you described it. Unless you are chaining a query, with a modify of the item in your client and an update. But these are two separated operations and not one single command.

Bulk operation by mongoose

I want to store bulk data (more than 1000 or 10000 records) in a single operation by MongoOSE. But MongoOSE does not support bulk operations so I will use the native driver (MongoDB, for insertion). I know that I will bypass all MongoOSE middlewares but its ok. (Please correct me If I am wrong! :) )
I have an option to store data by insert method. But MongoDB also provides Bulk class (ordered and unordered operations). Now I have the following questions:
Difference between insert and bulk operation (both can store bulk data) ?
Any specific difference between initializeUnorderedBulkOp() (performs operation in serially) and initializeOrderedBulkOp() (performs operations in parallel) ?
If I will use initializeUnorderedBulkOp then it will effect on by range search or any side-effects ?
Can I do it by Promisification (by BlueBird) ?? (I am trying to do it.)
Thanks
EDIT: I am talking about bulk vs insert regarding to multiple insertions. Which one is better? Insertion one by one by bulk builder OR insertion by batches (1000) in insert method. I hope now it will clear Mongoose (mongodb) batch insert? this link
If you are calling this from a mongoose model you need the .collection accessor
var bulk = Model.collection.initializeOrderedBulkOp();
// examples
bulk.insert({ "a": 1 });
bulk.find({ "a": 1 }).updateOne({ "$set": { "a": 2 } });
bulk.execute(function(err,result) {
// result contains stats of the operations
});
You need to be "careful" when doing this though. Apart from not being bound to the same checks and validation that can be attached to mongoose schemas, when you call .collection you need to be "sure" that the connection to the database has already been made. Mongoose methods look after this for you, but once you use the underlying driver methods you are all on your own.
As for diffferences it's all there in the naming:
Ordered: Means that the batched instructions are executed in the same order they are added. They execute one after the other in sequence and one at a time. If an error occurs at any point, the execution of the batch is halted and the error response returned. All operations up until then are "comitted". This is not a rollback.
UnOrdered: Means that batched operations can execute in "any" sequence and often in parallel. This can lead to faster updates, but of course cannot be used where one bulk operation in the batch is meant to occur before another ( example above ). Any errors that occur are merely "reported" in the result, and the whole batch will complete as sent to the server.
Of course the core difference for either type of execution from the standard methods is that the "whole batch" ( actually in lots of 1000 maximum ) is sent to the server and you only get one response back. This saves network traffic and waiting for each idividual .insert() or other like operation to complete.
As for can a "promise" be used, well anything else with a callback that you can convert to returning a promise follows the same rules as here. Remember though that the "callback/promise" is on the .execute() method, and that what you get back complies to the rules of what is returned from Bulk operations results.
For more information see "Bulk" in the core documentation.

Handling conflict in find, modify, save flow in MongoDB with Mongoose

I would like to update a document that involves reading other collection and complex modifications, so the update operators in findAndModify() cannot serve my purpose.
Here's what I have:
Collection.findById(id, function (err, doc) {
// read from other collection, validation
// modify fields in doc according to user input
// (with decent amount of logic)
doc.save(function (err, doc) {
if (err) {
return res.json(500, { message: err });
}
return res.json(200, doc);
});
}
My worry is that this flow might cause conflict if multiple clients happens to modify the same document.
It is said here that:
Operations on a single document are always atomic with MongoDB databases
I'm a bit confused about what Operations mean.
Does this means that the findById() will acquire the lock until doc is out of scope (after the response is sent), so there wouldn't be conflicts? (I don't think so)
If not, how to modify my code to support multiple clients knowing that they will modify Collection?
Will Mongoose report conflict if it occurs?
How to handle the possible conflict? Is it possible to manually lock the Collection?
I see suggestion to use Mongoose's versionKey (or timestamp) and retry for stale document
Don't use MongoDB altogether...
Thanks.
EDIT
Thanks #jibsales for the pointer, I now use Mongoose's versionKey (timestamp will also work) to avoid committing conflicts.
aaronheckmann — Mongoose v3 part 1 :: Versioning
See this sample code:
https://gist.github.com/anonymous/9dc837b1ef2831c97fe8
Operations refers to reads/writes. Bare in mind that MongoDB is not an ACID compliant data layer and if you need true ACID compliance, you're better off picking another tech. That said, you can achieve atomicity and isolation via the Two Phase Commit technique outlined in this article in the MongoDB docs. This is no small undertaking, so be prepared for some heavy lifting as you'll need to work with the native driver instead of Mongoose. Again, my ultimate suggestion is to not drink the NoSQL koolaid if you need transaction support which it sounds like you do.
When MongoDB receives a request to update a document, it will lock the database until it has completed the operation. Any other requests that MongoDB receives will wait until the locking operation has completed and the database is unlocked. This lock/wait behavior is automatic, so there aren't any conflicts to handle. You can find a lot more information about this behavior in the Concurrency section of the FAQ.
See jibsales answer for links to MongoDB's recommended technique for doing multi-document transactions.
There are a couple of NoSQL databases that do full ACID transactions, which would make your life a lot easier. FoundationDB is one such database. Data is stored as Key-Value but it supports multiple data models through layers.
Full disclosure: I'm an engineer at FoundationDB.
In my case I was wrong when "try to query the dynamic field with the upsert option". This guide helped me: How to solve error E11000 duplicate
In above guide, you're probably making one of two mistakes:
Upsert a document when findOneAndupdate() but the query finds a non-unique field.
Use insert many new documents in one go but don't use "ordered = false"

Mongoose: what's the differences between Model.create and Collection.insert

I want do a batch insert job in MongoDB and I found two ways in mongoose:
One way is use insert:
dataArr = [
{
id: "",
name: ""
}
{
id: "",
name: ""
}
]
Collection.insert(dataArr)
and another way is Model.create:
Model.create(dataArr)
both could complete the batch insert job, but what's the difference between them?
Which one is more efficiency?
In Mongoose, there is Model.create and Collection.insert (the latter isn't strictly part of Mongoose, but of the underlying MongoDB driver).
According to the Mongoose developer, they are basically the same when called with an array of documents, although looking at the code makes me think that there are subtle differences (warning: I haven't looked at the code that well so I might be mistaken about the following):
using Model.create will call any validators/hooks declared on your schema;
Model.create does a .save for each document in the array, resulting in N database calls (where N is the number of documents in the array); Collection.insert performs one large database call;
according to what i've read, Collection.insert is a function of mongoDB driver it's much faster when inserting big amounts of data like millions or such at the cost that it bypasses mongoose validations.
handle with care
They loosely mean the same thing. You can use either of them.
checking the documentation for insertMany, apparently it's faster that create as it only sends one operation to the server, rather than one for each document.
insertMany documentation, one thing to note is that insertMany does not trigger save middleware, while on the other hand, create does.

Resources