bulkWrite vs initialize(Un)orderedBulkOp - node.js

what is the differences between those 2 methods, and which should I use?
what is the diff between: initializeUnorderedBulkOp and bulkWrite with ordered: false
what is the diff between: initializeOrderedBulkOp and default bulkWrite
https://docs.mongodb.com/manual/reference/method/db.collection.initializeUnorderedBulkOp/
https://docs.mongodb.com/manual/reference/method/db.collection.initializeOrderedBulkOp/
https://docs.mongodb.com/manual/core/bulk-write-operations/

TL;DR
The difference is mainly in the usage. bulkWrite takes in an array of operations and executes it immediately.
initializeOrderedBulkOp and initializeUnorderedBulkOp return an instance which can be used to build queries gradually and execute it at last using the execute function.
Late to the party but I had a similar confusion so did some digging up.
The difference lies in the API implementation and usage.
bulkWrite
According to the API reference,
Perform a bulkWrite operation without a fluent API
In this method, you directly pass in an array of "write operations" as the first argument. See here for examples. I think by fluent API, they mean you don't exactly separate your update operations from your insert operations or delete operations. Every operation is in one array.
One crucial point is These operations are executed immediately.
As noted in the question, the execution is ordered by default but can be changed to unordered by setting { ordered: false } in the second argument which is a set of options.
The return value of the function is BulkWriteResult which contains information about the executed bulk operation.
initializeOrderedBulkOp and initializeUnorderedBulkOp
Referring to the API reference again,
Initiate an In order bulk write operation
As it says here, these methods initialize/return an instance which provides an API for building block operations. Those instances are of the class OrderedBulkOperation and UnorderedBulkOperation respectively.
const bulk = db.items.initializeUnorderedBulkOp();
// `bulk` is of the type UnorderedBulkOperation
This bulk variable provides a "fluent API" which allows you to build your queries across the application:
bulk.find( { /** foo **/ } ).update( { $set: { /** bar **/ } } );
Bear in mind, these queries are not executed in the above code. You can keep on building the whole operation and when all the write operations are "called", we can finally execute the query:
bulk.execute();
This execute function returns a BulkWriteResult instance which is basically what bulkWrite returns. Our database is finally changed.
Which one should you use?
It depends on your requirements.
If you want to update a lot of documents with separate queries and values from an existing array, bulkWrite seems a good fit. If you want to build your bulk operation through a fairly complex business logic, the other options make sense. Note that you can achieve the same by constructing a global array gradually and passing it in the end to bulkWrite.

Related

NodeJS MongoDB locks on documents

I am using the mongodb driver and am concerned about possible concurrency issues that could duplicate objects. Reading a few questions and answers on stack overflows I believe that writes operations are atomic, but this may not solve my concurrency problem. Let's say there are two concurrent calls to doSomeAndDelete with the same id: operations in HERE might take some time but only one of these two functions should be able to handle result. How can I implement a lock?
async function doSomeAndDelete(id){
const result = await myCollection.findOne({ _id : id });
/*Some operations on result [HERE]*/
if(/*conditions*/)
await myCollection.deleteOne({ _id : id});
}
For deletion, only one of the operations will succeed and delete the document, while the other one will not delete anything because the document no longer exists. That, assuming, the _id will not be reused.
In general, write operations on a document are atomic, so if you have multiple threads writing to a document, you might want to use mongodb transactions, or use some form of optimistic locking. For example, you can use an ObjectId field in your documents as a version id, and use a new value for each update. When you read-and-update a document, you validate that the field has the same value you obtained from the read, meaning the record has not been modified since you read it.

Is it faster to use aggregation or manually filter through data with nodejs and mongoose?

I'm at a crossroads trying to decide what methodology to use. Basically, I have a mongodb collection and i want to query it with specific params provided by the user, then i want to group the response according to the value of some of those parameters. For example, let's say my collection is animals and if i query all animals i get something like this
[
{type:"Dog",age:3,name:"Kahla"},
{type:"Cat",age:6,name:"mimi"},
...
]
Now i would like to return to the user a response that is grouped by the animal type, so that i end up with something like
{
Dogs: [...dog docs],
Cats: [...cat docs],
Cows: [...],
}
So basically I have 2 ways of doing this. One is to just use Model.find() and fetch all the animals that match my specific queries, such as age or any other field, and then manually filter and format my json response before sending it back to the user with res.json({}) (im using express btw)
Or I can use mongo's aggregate framework and $group to do this at the query level, hence returning from the DB an already grouped response to my request. The only inconvenience I've found with this so far with this is how the response is formatted, and ends up looking more something like this
[
{
"_id":"Dog",
"docs":[{dog docs...}]
},
{
"_id":"Cat",
"docs":[{...}]
}
]
The overall result is BASICALLY the same, but the formatting of the response is quite different, and my front end client needs to adjust to how Im sending the response. I don't really like the array of objects from the aggregation, and prefer a json-like object response with key names correponding to the arrays as I see fit.
So the real question here is whether there is one significant advantage of one way over the other? Is the aggregation framework so fast that it will scale well if my collection grows to huge numbers? Is filtering through the data with javascript and mapping the response so I can shape it to my liking a very inefficient process, and hence it's better to use aggregation and adapt the front end to this response shape?
I'm considering that by Faster you meant the least time to serve a request. That said, let's divide the time required to process your request:
Asynchronous Operations (Network Operations, File read/write etc)
Synchronous Operations
Synchronous operations are usually much more faster than the Asynchronous ones.(This also depends on the nature of the operation and the amount of data being processed). For example, if you loop over an iterable(e.g. Array, Map etc) which has a length of less than 1000 it won't take more than a few milliseconds.
On the other hand, Asynchronous operations takes more times. For example, if you run an HTTP request it would take couple of milliseconds to get the response.
When you are querying on the MongoDB with mongoose, it's an asynchronous call and it will take more time. So, if you run more queries to Database it will make your API slower. MongoDB Aggregation can help you reducing the total number of queries which may help you to make APIs faster. But the problem is, Aggregations are usually slower than normal find requests.
The summary is, if you can manually filter data without adding any DB query it's going to be faster.

In a nested MongoDB call, how do I ensure atomicity?

Is it possible to atomically update/remove two documents in MongoDB by calling a new update/remove call from within the first update's callback? In the case below, I want to remove the second document from the collection, but only if the update to the first document succeeds:
db.collection.update(conditions1, {$set: set}, function (err,result){
db.collection.remove(conditions2, function(err,doc_num){
db.close();
)};
});
I'm coming across the $isolated query operator, but from what I understand in the documentation, this operator is used for performing a read/write lock on a single query which affects multiple documents, not on performing a read/write lock on one document after performing an update on another document through the first document update's callback, which is what I want to try and accomplish.
No it's not possible because. As documented here a lock would be aquired on a single query and not a whole transaction.
You can overcome atomicity problem by using this.
As Amir said, it's not possible, but you can mimic the behavior in mongo by following the two phase commit pattern. That link also links to how to perform rollback-like operations.

Bulk operation by mongoose

I want to store bulk data (more than 1000 or 10000 records) in a single operation by MongoOSE. But MongoOSE does not support bulk operations so I will use the native driver (MongoDB, for insertion). I know that I will bypass all MongoOSE middlewares but its ok. (Please correct me If I am wrong! :) )
I have an option to store data by insert method. But MongoDB also provides Bulk class (ordered and unordered operations). Now I have the following questions:
Difference between insert and bulk operation (both can store bulk data) ?
Any specific difference between initializeUnorderedBulkOp() (performs operation in serially) and initializeOrderedBulkOp() (performs operations in parallel) ?
If I will use initializeUnorderedBulkOp then it will effect on by range search or any side-effects ?
Can I do it by Promisification (by BlueBird) ?? (I am trying to do it.)
Thanks
EDIT: I am talking about bulk vs insert regarding to multiple insertions. Which one is better? Insertion one by one by bulk builder OR insertion by batches (1000) in insert method. I hope now it will clear Mongoose (mongodb) batch insert? this link
If you are calling this from a mongoose model you need the .collection accessor
var bulk = Model.collection.initializeOrderedBulkOp();
// examples
bulk.insert({ "a": 1 });
bulk.find({ "a": 1 }).updateOne({ "$set": { "a": 2 } });
bulk.execute(function(err,result) {
// result contains stats of the operations
});
You need to be "careful" when doing this though. Apart from not being bound to the same checks and validation that can be attached to mongoose schemas, when you call .collection you need to be "sure" that the connection to the database has already been made. Mongoose methods look after this for you, but once you use the underlying driver methods you are all on your own.
As for diffferences it's all there in the naming:
Ordered: Means that the batched instructions are executed in the same order they are added. They execute one after the other in sequence and one at a time. If an error occurs at any point, the execution of the batch is halted and the error response returned. All operations up until then are "comitted". This is not a rollback.
UnOrdered: Means that batched operations can execute in "any" sequence and often in parallel. This can lead to faster updates, but of course cannot be used where one bulk operation in the batch is meant to occur before another ( example above ). Any errors that occur are merely "reported" in the result, and the whole batch will complete as sent to the server.
Of course the core difference for either type of execution from the standard methods is that the "whole batch" ( actually in lots of 1000 maximum ) is sent to the server and you only get one response back. This saves network traffic and waiting for each idividual .insert() or other like operation to complete.
As for can a "promise" be used, well anything else with a callback that you can convert to returning a promise follows the same rules as here. Remember though that the "callback/promise" is on the .execute() method, and that what you get back complies to the rules of what is returned from Bulk operations results.
For more information see "Bulk" in the core documentation.

Mongoose: what's the differences between Model.create and Collection.insert

I want do a batch insert job in MongoDB and I found two ways in mongoose:
One way is use insert:
dataArr = [
{
id: "",
name: ""
}
{
id: "",
name: ""
}
]
Collection.insert(dataArr)
and another way is Model.create:
Model.create(dataArr)
both could complete the batch insert job, but what's the difference between them?
Which one is more efficiency?
In Mongoose, there is Model.create and Collection.insert (the latter isn't strictly part of Mongoose, but of the underlying MongoDB driver).
According to the Mongoose developer, they are basically the same when called with an array of documents, although looking at the code makes me think that there are subtle differences (warning: I haven't looked at the code that well so I might be mistaken about the following):
using Model.create will call any validators/hooks declared on your schema;
Model.create does a .save for each document in the array, resulting in N database calls (where N is the number of documents in the array); Collection.insert performs one large database call;
according to what i've read, Collection.insert is a function of mongoDB driver it's much faster when inserting big amounts of data like millions or such at the cost that it bypasses mongoose validations.
handle with care
They loosely mean the same thing. You can use either of them.
checking the documentation for insertMany, apparently it's faster that create as it only sends one operation to the server, rather than one for each document.
insertMany documentation, one thing to note is that insertMany does not trigger save middleware, while on the other hand, create does.

Resources