Efficient multi-document upsert in mongo - node.js

I have a node.js app that updates a local mongo (3.0) data store from remote API, and I'm trying to make it as efficient as possible.
Every record in the collection has a unique remoteId property. After calling the API I get a set of records. Then I should update the local documents with new properties where ones with matching remoteId already exist, do inserts where they don't, and mark documents that exist locally but not in the remote data set as inactive.
My current solution is this (mongoose code, stripped out callbacks / promises for clarity, assume it runs synchronously):
timestamp = new Date
for item in remoteData
collection.findOneAndUpdate { remoteId: item.remoteId }, { updatedAt: timestamp, /* other properties */ }, { upsert: true }
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }
Seems straightforward enough. But when dealing with tens of thousands of documents, it gets quite slow.
I looked at Bulk.upsert from mongo documentation, but that seems to work only when your document finding queries are static.
What could I do here?

Turns out I didn't fully grasp the mongo Bulk api - I had missed that it's basically an array of commands that gets sent to database when you call execute. In the end, this is what I had to do:
timestamp = new Date
bulkOp = collection.initializeUnorderedBulkOp()
for item in remoteData
bulkOp.find({ remoteId: item.remoteId }).upsert().updateOne { updatedAt: timestamp, /* other properties */ }
bulkOp.execute()
collection.update { updatedAt: { $lt: timestamp} }, { active: false }, { multi: true }

Related

How to improve the performance of query in mongodb?

I have a collection in MongoDB with more than 5 million documents. Whenever I create a document inside the same collection I have to check if there exists any document with same title and if it exists then I don't have to add this to the database.
Example: here is my MongoDB document:
{
"_id":ObjectId("3a434sa3242424sdsdw"),
"title":"Lost in space",
"desc":"this is description"
}
Now whenever a new document is being created in the collection, I want to check if the same title already exists in any of the documents and if it does not exists, then only I want to add it to the database.
Currently, I am using findOne query and checking for the title, if it not available only then it is added to the database. I am facing the performance issue in this. It is taking too much time to do this process. Please suggest a better approach.
async function addToDB(data){
let result= await db.collection('testCol').findOne({title:data.title});
if(result==null){
await db.collection('testCol').insertOne(data);
}else{
console.log("already exists in db");
}
}
You can reduce the network round trip time which is currently 2X. Because you execute two queries. One for find then one for update. You can combine them into one query as below.
db.collection.update(
<query>,
{ $setOnInsert: { <field1>: <value1>, ... } },
{ upsert: true }
)
It will not update if already exists.
db.test.update(
{"key1":"1"},
{ $setOnInsert: { "key":"2"} },
{ upsert: true }
)
It looks for document with key1 is 1. If it finds, it skips. If not, it inserts using the data provided in the object of setOnInsert.

How can I prevent Mongoose from allowing multiple updates of a document across parallel requests?

Given the schema:
{
widgets: [{
widget: {type: mongoose.Schema.Types.ObjectId, ref: 'widget'},
count: {type: Number}
}]
}
and an application route where:
User attempts to add a new widget into the widgets array of an existing document.
We check that the widget being added meets some requirements.
We add the widget to the document and save the document.
In this setup, if I have 2 parallel requests to perform the same action, both checks in number 2 pass and the document is saved with 2 copies of the widget, even though that's "illegal" in my business logic.
The async flow looks like:
Req1 hits route, loads existing doc
Req2 hits route, loads existing doc
Req1 checks doc conditions
Req2 checks doc conditions
Req1 embeds the new widget, saves doc
Req2 embeds the new widget, saves doc
I thought that document versioning (__v) would solve this for me as it has in the past, but apparently I never understood this to begin with because both requests are running sequentially and my debugger shows that the version of the document is X in both pre-save states, and X+1 in the post-save state. I don't understand why that doesn't throw a version error.
I think that this is an asynchronous problem to solve, not necessarily strictly Mongoose, and have tagged as such.
Edit: This works but seems remarkably verbose:
model
.findOneAndUpdate({
_id: doc._id,
__v: doc.__v
},
{
$push: {
widgets: {
widget: widget_id,
qty: 1
}
},
$inc: {
__v: 1
}
},
function(err, doc) {
// ...
});
Also, it is unfortunate that I can't alter my existing doc and then run the save() method on it.
My searching found this bug where the versionKey didn't increment automatically when using this method. I suppose I really don't understand versionKey properly!
Have a look at this explanations of versionKey property (if you haven't already)
http://aaronheckmann.tumblr.com/post/48943525537/mongoose-v3-part-1-versioning
The example from the article looks similar to yours except that they are modifying array items (comments) and you are pushing new item to widgets array. But as I understood if you use mongoose v3+ and perform save() operation, this will do all necessary operations with versionKey for you.
So if you will do something like:
model.findById(doc._id, function(err, doc) {
// handle error
doc.widgets.push({ widget: widget_id, count: widget_count });
doc.save(callback);
});
then save() operation should internally looks like this:
doc.update(
{ _id: doc._id, __v: doc.__v },
{
$push: {
widgets: { widget: widget_id, count: widget_count }
},
$inc: {
__v: 1
}
}
);
So maybe you should make sure you use mongoose v3+ and do push() and then save() or is it how you did this stuff initially?
I haven't tested this, just wanted to share my search results and thoughts with you in case this can help somehow.
Then maybe you can try to add widgets check in update query:
model.findById(doc._id, function(err, doc) {
// handle error
// on update query check if there is no such widget in widgets array
doc.update(
{ _id: doc._id, { 'widgets.widget': { $nin: [widget_id] } }},
{
$push: {
widgets: { widget: widget_id, count: widget_count }
}
}
);
});

Easy way to only allow one item per user in mongoose schema array

I'm trying to implement a rating system and I'm struggling to only allow one rating per user in a reasonable way.
Simply put, i have an array of ratings in my schema, containing the "rater" and the rating, as such:
var schema = new Schema({
//...
ratings: [{
by: {
type: Schema.Types.ObjectId
},
rating: {
type: Number,
min: 1,
max: 5,
validate: ratingValidator
}
}],
//...
});
var Model = mongoose.model('Model', schema);
When i get a request, i wish to add the users rating to the array if the user has not already voted this document, otherwise i wish to update the rating (you should not be able to give more than one rating)
One way to do this is to find the document, "loop through" the array of ratings and search for the user. If the user has got already a rating in the array, the rating is changed, otherwise a new rating is pushed. As such:
Model.findById(id)
.select('ratings')
.exec(function(err, doc) {
if(err) return next(err);
if(doc) {
var rated = false;
var ratings = doc.ratings;
for(var i = 0; i < ratings.length; i++) {
if(ratings[i].by === user.id) {
ratings[i].rating = rating;
rated = true;
break;
}
}
if(!rated) {
ratings.push({
by: user.id,
rating: rating
});
}
doc.markModified('ratings');
doc.save();
} else {
//Not found
}
});
Is there an easier way? A way to let mongodb do this automatically?
The mongodb $addToSet operator could be an alternative, however i have not managed to use it for this, since that could allow two ratings with different scores from the same user.
As you note the $addToSet operator will not work in this case as indeed a userId with a different vote value would be a different value and it's own unique member of the set.
So the best way to do this is to actually issue two update statements with complementary logic. Only one will actually be applied depending on the state of the document:
async.series(
[
// Try to update a matching element
function(callback) {
Model.update(
{ "_id": id, "ratings.by": user.id },
{ "$set": { "ratings.$.rating": rating } },
callback
);
},
// Add the element where it does not exist
function(callback) {
Model.update(
{ "_id": id, "ratings.by": { "$ne": user.id } },
{ "$push": { "ratings": { "by": user.id, "rating": rating } }},
callback
);
}
],
function(err,result) {
// all done
}
);
The principle is simple, try to match the userId present in the ratings array for the document and update the entry. If that condition is not met then no document is updated. In the same way, try to match the document where there is no userId present in the ratings array, if there is a match then add the element, otherwise there will be no update.
This does bypass the built in schema validation of mongoose, so you would have to apply your constraints manually ( or inspect the schema validation rules and apply manually ) but it is better than you current approach in one very important aspect.
When you .find() the document and call it back to your client application to modify using code as you are, then there is no guarantee that the document has not changed on the server from another process or request. So when you issue .save() the document on the server may no longer be in the state that it was when it was read and any modifications can overwrite the changes made there.
Hence while there are two operations to the server and not one ( and your current code is two operations anyway ), it is the lesser of two evils to manually validate than to possibly cause a data inconsistency. The two update approach will respect any other updates issued to the document possibly occurring at the same time.

Updating array within mongodb record with mongoose

What is the best way to update a value within an array saved in a mongodb record? Currently, I'm trying it this way:
Record.find({ 'owner': owner}, {}, {sort: { date: -1 }}, function(err, record){
if(!err){
for (var i = 0; i < record[0].array.length; i++){
record[0].array[i].score = 0;
record[0].array[i].changed = true;
record[0].save();
}
}
});
And the schema looks like this:
var recordSchema = mongoose.Schema({
owner: {type: String},
date: {type: Date, default: Date.now},
array: mongoose.Schema.Types.Mixed
});
Right now, I can see that the array updates, I get no error in saving, but when I query the database again, the array hasn't been updated.
It would help if you explained your intent here as naming a property "array" conveys nothing about its purpose. I guess from your code you hope to go and set the score of each item there to zero. Note your save is currently being ignored because you can only save top-level mongoose documents, not nested documents.
Certain find-and-modify operations on arrays can be done with a single database command using the Array Update Operators like $push, $addToSet, etc. However I don't see any operators that can directly make your desired change in a single operation. Thus I think you need to find your record, alter the array date, and save it. (Note findOne is a convenience function you can use if you only care about the first match, which seems to be the case for you).
Record.findOne({ 'owner': owner}, {}, {sort: { date: -1 }}, function(err, record){
if (err) {
//don't just ignore this, log or bubble forward via callbacks
return;
}
if (!record) {
//Record not found, log or send 404 or whatever
return;
}
record.array.forEach(function (item) {
item.score = 0;
item.changed = true;
});
//Now, mongoose can't automatically detect that you've changed the contents of
//record.array, so tell it
//see http://mongoosejs.com/docs/api.html#document_Document-markModified
record.markModified('array');
record.save();
});
If you have a mongoose object of a document, you can of course update the array as in the question, with the following Caveat.
This is in fact a mongoose gotcha. Mongoose cannot track changes in the array of mixed, one has to use markModified:
doc.mixed.type = 'changed';
doc.markModified('mixed.type');
doc.save() // changes to mixed.type are now persisted

Mongoose Changing Schema Format

We're rapidly developing an application that's using Mongoose, and our schema's are changing often. I can't seem to figure out the proper way to update a schema for existing documents, without blowing them away and completely re-recreating them from scratch.
I came across http://mongoosejs.com/docs/api.html#schema_Schema-add, which looks to be right. There's little to no documentation on how to actually implement this, making it very hard for someone who is new to MongoDB.
I simply want to add a new field called enabled. My schema definition is:
var sweepstakesSchema = new Schema({
client_id: {
type: Schema.Types.ObjectId,
ref: 'Client',
index: true
},
name: {
type: String,
default: 'Sweepstakes',
},
design: {
images: {
type: [],
default: []
},
elements: {
type: [],
default: []
}
},
enabled: {
type: Boolean,
default: false
},
schedule: {
start: {
type: Date,
default: Date.now
},
end: {
type: Date,
default: Date.now
}
},
submissions: {
type: Number,
default: 0
}
});
Considering your Mongoose model name as sweepstakesModel,
this code would add enabled field with boolean value false to all the pre-existing documents in your collection:
db.sweepstakesModel.find( { enabled : { $exists : false } } ).forEach(
function (doc) {
doc.enabled = false;
db.sweepstakesModel.save(doc);
}
)
There's nothing built into Mongoose regarding migrating existing documents to comply with a schema change. You need to do that in your own code, as needed. In a case like the new enabled field, it's probably cleanest to write your code so that it treats a missing enabled field as if it was set to false so you don't have to touch the existing docs.
As far as the schema change itself, you just update your Schema definition as you've shown, but changes like new fields with default values will only affect new documents going forward.
I was also searching for something like migrations, but didn't find it. As an alternative you could use defaults. If a key has a default and the key doesn't exist, it will use the default.
Mongoose Defaults
Default values are applied when the document skeleton is constructed. This means that if you create a new document (new MyModel) or if you find an existing document (MyModel.findById), both will have defaults provided that a certain key is missing.
I had the exact same issue, and found that using findOneAndUpdate() rather than calling save allowed us to update the schema file, without having to delete all the old documents first.
I can post a code snippet if requested.
You might use mongo shell to update the existing documents in a specific collection
db.SweeptakesModel.update({}, {$set: {"enabled": false}}, {upsert:false, multi:true})
I had a similar requirement of having to add to an existing schema when building an app with Node, and only found this (long ago posted) query to help.
The schema I added to by introducing the line in the original description of the schema and then running something similar to the following line, just the once, to update existing records:
myModelObject.updateMany( { enabled : { $exists : false } }, { enabled : false } )
'updateMany' being the function I wanted to mention here.
just addition to what Vickar was suggesting, here Mongoose Example written on Javascript (Nodejs):
const mongoose = require('mongoose');
const SweeptakesModel = mongoose.model(Constants.SWEEPTAKES,sweepstakesSchema);
SweeptakesModel.find( { enabled : { $exists : false } }).then(
function(doc){
doc.enabled = false;
doc.save();
}
)

Resources