Mongoose: what's the differences between Model.create and Collection.insert - node.js

I want do a batch insert job in MongoDB and I found two ways in mongoose:
One way is use insert:
dataArr = [
{
id: "",
name: ""
}
{
id: "",
name: ""
}
]
Collection.insert(dataArr)
and another way is Model.create:
Model.create(dataArr)
both could complete the batch insert job, but what's the difference between them?
Which one is more efficiency?

In Mongoose, there is Model.create and Collection.insert (the latter isn't strictly part of Mongoose, but of the underlying MongoDB driver).
According to the Mongoose developer, they are basically the same when called with an array of documents, although looking at the code makes me think that there are subtle differences (warning: I haven't looked at the code that well so I might be mistaken about the following):
using Model.create will call any validators/hooks declared on your schema;
Model.create does a .save for each document in the array, resulting in N database calls (where N is the number of documents in the array); Collection.insert performs one large database call;

according to what i've read, Collection.insert is a function of mongoDB driver it's much faster when inserting big amounts of data like millions or such at the cost that it bypasses mongoose validations.
handle with care

They loosely mean the same thing. You can use either of them.

checking the documentation for insertMany, apparently it's faster that create as it only sends one operation to the server, rather than one for each document.
insertMany documentation, one thing to note is that insertMany does not trigger save middleware, while on the other hand, create does.

Related

NodeJS MongoDB locks on documents

I am using the mongodb driver and am concerned about possible concurrency issues that could duplicate objects. Reading a few questions and answers on stack overflows I believe that writes operations are atomic, but this may not solve my concurrency problem. Let's say there are two concurrent calls to doSomeAndDelete with the same id: operations in HERE might take some time but only one of these two functions should be able to handle result. How can I implement a lock?
async function doSomeAndDelete(id){
const result = await myCollection.findOne({ _id : id });
/*Some operations on result [HERE]*/
if(/*conditions*/)
await myCollection.deleteOne({ _id : id});
}
For deletion, only one of the operations will succeed and delete the document, while the other one will not delete anything because the document no longer exists. That, assuming, the _id will not be reused.
In general, write operations on a document are atomic, so if you have multiple threads writing to a document, you might want to use mongodb transactions, or use some form of optimistic locking. For example, you can use an ObjectId field in your documents as a version id, and use a new value for each update. When you read-and-update a document, you validate that the field has the same value you obtained from the read, meaning the record has not been modified since you read it.

bulkWrite vs initialize(Un)orderedBulkOp

what is the differences between those 2 methods, and which should I use?
what is the diff between: initializeUnorderedBulkOp and bulkWrite with ordered: false
what is the diff between: initializeOrderedBulkOp and default bulkWrite
https://docs.mongodb.com/manual/reference/method/db.collection.initializeUnorderedBulkOp/
https://docs.mongodb.com/manual/reference/method/db.collection.initializeOrderedBulkOp/
https://docs.mongodb.com/manual/core/bulk-write-operations/
TL;DR
The difference is mainly in the usage. bulkWrite takes in an array of operations and executes it immediately.
initializeOrderedBulkOp and initializeUnorderedBulkOp return an instance which can be used to build queries gradually and execute it at last using the execute function.
Late to the party but I had a similar confusion so did some digging up.
The difference lies in the API implementation and usage.
bulkWrite
According to the API reference,
Perform a bulkWrite operation without a fluent API
In this method, you directly pass in an array of "write operations" as the first argument. See here for examples. I think by fluent API, they mean you don't exactly separate your update operations from your insert operations or delete operations. Every operation is in one array.
One crucial point is These operations are executed immediately.
As noted in the question, the execution is ordered by default but can be changed to unordered by setting { ordered: false } in the second argument which is a set of options.
The return value of the function is BulkWriteResult which contains information about the executed bulk operation.
initializeOrderedBulkOp and initializeUnorderedBulkOp
Referring to the API reference again,
Initiate an In order bulk write operation
As it says here, these methods initialize/return an instance which provides an API for building block operations. Those instances are of the class OrderedBulkOperation and UnorderedBulkOperation respectively.
const bulk = db.items.initializeUnorderedBulkOp();
// `bulk` is of the type UnorderedBulkOperation
This bulk variable provides a "fluent API" which allows you to build your queries across the application:
bulk.find( { /** foo **/ } ).update( { $set: { /** bar **/ } } );
Bear in mind, these queries are not executed in the above code. You can keep on building the whole operation and when all the write operations are "called", we can finally execute the query:
bulk.execute();
This execute function returns a BulkWriteResult instance which is basically what bulkWrite returns. Our database is finally changed.
Which one should you use?
It depends on your requirements.
If you want to update a lot of documents with separate queries and values from an existing array, bulkWrite seems a good fit. If you want to build your bulk operation through a fairly complex business logic, the other options make sense. Note that you can achieve the same by constructing a global array gradually and passing it in the end to bulkWrite.

In a nested MongoDB call, how do I ensure atomicity?

Is it possible to atomically update/remove two documents in MongoDB by calling a new update/remove call from within the first update's callback? In the case below, I want to remove the second document from the collection, but only if the update to the first document succeeds:
db.collection.update(conditions1, {$set: set}, function (err,result){
db.collection.remove(conditions2, function(err,doc_num){
db.close();
)};
});
I'm coming across the $isolated query operator, but from what I understand in the documentation, this operator is used for performing a read/write lock on a single query which affects multiple documents, not on performing a read/write lock on one document after performing an update on another document through the first document update's callback, which is what I want to try and accomplish.
No it's not possible because. As documented here a lock would be aquired on a single query and not a whole transaction.
You can overcome atomicity problem by using this.
As Amir said, it's not possible, but you can mimic the behavior in mongo by following the two phase commit pattern. That link also links to how to perform rollback-like operations.

Bulk operation by mongoose

I want to store bulk data (more than 1000 or 10000 records) in a single operation by MongoOSE. But MongoOSE does not support bulk operations so I will use the native driver (MongoDB, for insertion). I know that I will bypass all MongoOSE middlewares but its ok. (Please correct me If I am wrong! :) )
I have an option to store data by insert method. But MongoDB also provides Bulk class (ordered and unordered operations). Now I have the following questions:
Difference between insert and bulk operation (both can store bulk data) ?
Any specific difference between initializeUnorderedBulkOp() (performs operation in serially) and initializeOrderedBulkOp() (performs operations in parallel) ?
If I will use initializeUnorderedBulkOp then it will effect on by range search or any side-effects ?
Can I do it by Promisification (by BlueBird) ?? (I am trying to do it.)
Thanks
EDIT: I am talking about bulk vs insert regarding to multiple insertions. Which one is better? Insertion one by one by bulk builder OR insertion by batches (1000) in insert method. I hope now it will clear Mongoose (mongodb) batch insert? this link
If you are calling this from a mongoose model you need the .collection accessor
var bulk = Model.collection.initializeOrderedBulkOp();
// examples
bulk.insert({ "a": 1 });
bulk.find({ "a": 1 }).updateOne({ "$set": { "a": 2 } });
bulk.execute(function(err,result) {
// result contains stats of the operations
});
You need to be "careful" when doing this though. Apart from not being bound to the same checks and validation that can be attached to mongoose schemas, when you call .collection you need to be "sure" that the connection to the database has already been made. Mongoose methods look after this for you, but once you use the underlying driver methods you are all on your own.
As for diffferences it's all there in the naming:
Ordered: Means that the batched instructions are executed in the same order they are added. They execute one after the other in sequence and one at a time. If an error occurs at any point, the execution of the batch is halted and the error response returned. All operations up until then are "comitted". This is not a rollback.
UnOrdered: Means that batched operations can execute in "any" sequence and often in parallel. This can lead to faster updates, but of course cannot be used where one bulk operation in the batch is meant to occur before another ( example above ). Any errors that occur are merely "reported" in the result, and the whole batch will complete as sent to the server.
Of course the core difference for either type of execution from the standard methods is that the "whole batch" ( actually in lots of 1000 maximum ) is sent to the server and you only get one response back. This saves network traffic and waiting for each idividual .insert() or other like operation to complete.
As for can a "promise" be used, well anything else with a callback that you can convert to returning a promise follows the same rules as here. Remember though that the "callback/promise" is on the .execute() method, and that what you get back complies to the rules of what is returned from Bulk operations results.
For more information see "Bulk" in the core documentation.

Handling conflict in find, modify, save flow in MongoDB with Mongoose

I would like to update a document that involves reading other collection and complex modifications, so the update operators in findAndModify() cannot serve my purpose.
Here's what I have:
Collection.findById(id, function (err, doc) {
// read from other collection, validation
// modify fields in doc according to user input
// (with decent amount of logic)
doc.save(function (err, doc) {
if (err) {
return res.json(500, { message: err });
}
return res.json(200, doc);
});
}
My worry is that this flow might cause conflict if multiple clients happens to modify the same document.
It is said here that:
Operations on a single document are always atomic with MongoDB databases
I'm a bit confused about what Operations mean.
Does this means that the findById() will acquire the lock until doc is out of scope (after the response is sent), so there wouldn't be conflicts? (I don't think so)
If not, how to modify my code to support multiple clients knowing that they will modify Collection?
Will Mongoose report conflict if it occurs?
How to handle the possible conflict? Is it possible to manually lock the Collection?
I see suggestion to use Mongoose's versionKey (or timestamp) and retry for stale document
Don't use MongoDB altogether...
Thanks.
EDIT
Thanks #jibsales for the pointer, I now use Mongoose's versionKey (timestamp will also work) to avoid committing conflicts.
aaronheckmann — Mongoose v3 part 1 :: Versioning
See this sample code:
https://gist.github.com/anonymous/9dc837b1ef2831c97fe8
Operations refers to reads/writes. Bare in mind that MongoDB is not an ACID compliant data layer and if you need true ACID compliance, you're better off picking another tech. That said, you can achieve atomicity and isolation via the Two Phase Commit technique outlined in this article in the MongoDB docs. This is no small undertaking, so be prepared for some heavy lifting as you'll need to work with the native driver instead of Mongoose. Again, my ultimate suggestion is to not drink the NoSQL koolaid if you need transaction support which it sounds like you do.
When MongoDB receives a request to update a document, it will lock the database until it has completed the operation. Any other requests that MongoDB receives will wait until the locking operation has completed and the database is unlocked. This lock/wait behavior is automatic, so there aren't any conflicts to handle. You can find a lot more information about this behavior in the Concurrency section of the FAQ.
See jibsales answer for links to MongoDB's recommended technique for doing multi-document transactions.
There are a couple of NoSQL databases that do full ACID transactions, which would make your life a lot easier. FoundationDB is one such database. Data is stored as Key-Value but it supports multiple data models through layers.
Full disclosure: I'm an engineer at FoundationDB.
In my case I was wrong when "try to query the dynamic field with the upsert option". This guide helped me: How to solve error E11000 duplicate
In above guide, you're probably making one of two mistakes:
Upsert a document when findOneAndupdate() but the query finds a non-unique field.
Use insert many new documents in one go but don't use "ordered = false"

Resources