"Race like" condition with Mongoose - node.js

I have a process that triggers a number of requests that in turn trigger off a number of webhooks. I know the process is complete when I've received all of my webhooks. My model looks something like this.
{
name: 'name',
status: 'PENDING',
children: [{
name: 'child1',
status: 'PENDING'
}, {
name: 'child2',
status: 'PENDING'
}]
}
The idea is as the webhooks come in, I update the subdoc to COMPLETE. At the end of each update I check if the others are complete and if they are, I set status='COMPLETE' on the parent. It looks like one invocation of the service is marking it as COMPLETE after another invocation has determined it was still PENDING but before that second invocation has saved. When the second one saves, it overwrites COMPLETE with PENDING on the parent.
Here is the code from my schema:
methods: {
doUpdate: function(data) {
var child = this.children.id(data.childId);
child.status = 'COMPLETE';
if (_.every(this.children.status, 'COMPLETE'))
this.status = 'COMPLETE';
return this.save();
}
}

I think you can solve your issue when you just modify and save your child objects instead of saving / overwriting the whole document each time.
To do that you can use the positional $ in your update statement.
Essentially it would look something like this:
db.yourChildrenCollection.update(
{ _id: this._id, 'children.name': childName },
{ $set: { 'children.$.status' : 'COMPLETE' } }
)
You have to modify the variable names as I do not know your collection name etc. but I think you get the point.

The solution is to first find and update the status in one operation, using a variant of mongo's findAndModify method, and then check if the other children have completed. I was trying to do two updates in one operation. By breaking it up into two steps, I know that when I check the statuses of the other children, all other invocations will have the most recent state of the document and not get a false reading while another invocation is waiting for save() to complete.

Related

mongoose.watch() fields' name changing by itself

I begin with mongoose and I have to use watch() method on a collection.
When i want to catch insert, there are no problems.
Nevertheless, when I want to retrieve the changes of an update, I don't know why, in some cases, mongoose changes the name of my fields?
registration.watch(). on('change', data => {
if(data.operationType == "update") {
console.log(data.updateDescription.updatedFields);
}
)};
my registration's collection is made up of persons who can accept or decline an invitation, and a person can change they answer. So it's basically a removal of the person from one array of data to be put in the other one.
The only problem I have is my array's name sometimes "change" :
{
__v: 100,
accepted: [
{
_id: 5faa76d048dd6e0017e631d4,
user: 5faa752848dd6e0017e631d2
},
{
_id: 5faa9ab06048a20017774610,
user: 5fa8fabc60260ec31606d71e
},
],
'declined.1': { _id: 5faf037a141f030017863484, user: 5faa74de48dd6e0017e631d0 },
for example here, my field declined change to "declined.1", why it's happening ? and how to avoid this ? or at least, how can i get declined's array in this situation ?
When you update a document in MongoDB, it only writes the deltas to the operations log, which is what the watch function pulls from.
The dot notation declined.1 means index 1 of the declined array. The change document you provided would be expected from pushing a new object onto the declined array. Essentially, it is saving space by not repeating all of the array elements that didn't change.
If you need to retrieve the entire document, you could set the fullDocument to updateLookup. See http://mongodb.github.io/node-mongodb-native/3.0/api/Collection.html#watch

Upsert and $inc Sub-document in Array

The following schema is intended to record total views and views for a very specific day only.
const usersSchema = new Schema({
totalProductsViews: {type: Number, default: 0},
productsViewsStatistics: [{
day: {type: String, default: new Date().toISOString().slice(0, 10), unique: true},
count: {type: Number, default: 0}
}],
});
So today views will be stored in another subdocument different from yesterday. To implement this I tried to use upsert so as subdocument will be created each day when product is viewed and counts will be incremented and recorded based on a particular day. I tried to use the following function but seems not to work the way I intended.
usersSchema.statics.increaseProductsViews = async function (id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
const result = await this.findByIdAndUpdate(id, {
$inc: {
totalProductsViews: 1,
'productsViewsStatistics.$[sub].count': 1
},
},
{
upsert: true,
arrayFilters: [{'sub.day': todayDate}],
new: true
});
console.log(result);
return result;
};
What do I miss to get the functionality I want? Any help will be appreciated.
What you are trying to do here actually requires you to understand some concepts you may not have grasped yet. The two primary ones being:
You cannot use any positional update as part of an upsert since it requires data to be present
Adding items into arrays mixed with "upsert" is generally a problem that you cannot do in a single statement.
It's a little unclear if "upsert" is your actual intention anyway or if you just presumed that was what you had to add in order to get your statement to work. It does complicate things if that is your intent, even if it's unlikely give the finByIdAndUpdate() usage which would imply you were actually expecting the "document" to be always present.
At any rate, it's clear you actually expect to "Update the array element when found, OR insert a new array element where not found". This is actually a two write process, and three when you consider the "upsert" case as well.
For this, you actually need to invoke the statements via bulkWrite():
usersSchema.statics.increaseProductsViews = async function (_id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
await this.bulkWrite([
// Try to match an existing element and update it ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": todayDate },
"update": {
"$inc": {
"totalProductsViews": 1,
"productViewStatistics.$.count": 1
}
}
}
},
// Try to $push where the element is not there but document is - ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": { "$ne": todayDate } },
"update": {
"$inc": { "totalProductViews": 1 },
"$push": { "productViewStatistics": { "day": todayDate, "count": 1 } }
}
}
},
// Finally attempt upsert where the "document" was not there at all,
// only if you actually mean it - so optional
{
"updateOne": {
"filter": { _id },
"update": {
"$setOnInsert": {
"totalProductViews": 1,
"productViewStatistics": [{ "day": todayDate, "count": 1 }]
}
}
}
])
// return the modified document if you really must
return this.findById(_id); // Not atomic, but the lesser of all evils
}
So there's a real good reason here why the positional filtered [<identifier>] operator does not apply here. The main good reason is the intended purpose is to update multiple matching array elements, and you only ever want to update one. This actually has a specific operator in the positional $ operator which does exactly that. It's condition however must be included within the query predicate ( "filter" property in UpdateOne statements ) just as demonstrated in the first two statements of the bulkWrite() above.
So the main problems with using positional filtered [<identifier>] are that just as the first two statements show, you cannot actually alternate between the $inc or $push as would depend on if the document actually contained an array entry for the day. All that will happen is at best no update will be applied when the current day is not matched by the expression in arrayFilters.
The at worst case is an actual "upsert" will throw an error due to MongoDB not being able to decipher the "path name" from the statement, and of course you simply cannot $inc something that does not exist as a "new" array element. That needs a $push.
That leaves you with the mechanic that you also cannot do both the $inc and $push within a single statement. MongoDB will error that you are attempting to "modify the same path" as an illegal operation. Much the same applies to $setOnInsert since whilst that operator only applies to "upsert" operations, it does not preclude the other operations from happening.
Thus the logical steps fall back to what the comments in the code also describe:
Attempt to match where the document contains an existing array element, then update that element. Using $inc in this case
Attempt to match where the document exists but the array element is not present and then $push a new element for the given day with the default count, updating other elements appropriately
IF you actually did intend to upsert documents ( not array elements, because that's the above steps ) then finally actually attempt an upsert creating new properties including a new array.
Finally there is the issue of the bulkWrite(). Whilst this is a single request to the server with a single response, it still is effectively three ( or two if that's all you need ) operations. There is no way around that and it is better than issuing chained separate requests using findByIdAndUpdate() or even updateOne().
Of course the main operational difference from the perspective of code you attempted to implement is that method does not return the modified document. There is no way to get a "document response" from any "Bulk" operation at all.
As such the actual "bulk" process will only ever modify a document with one of the three statements submitted based on the presented logic and most importantly the order of those statements, which is important. But if you actually wanted to "return the document" after modification then the only way to do that is with a separate request to fetch the document.
The only caveat here is that there is the small possibility that other modifications could have occurred to the document other than the "array upsert" since the read and update are separated. There really is no way around that, without possibly "chaining" three separate requests to the server and then deciding which "response document" actually applied the update you wanted to achieve.
So with that context it's generally considered the lesser of evils to do the read separately. It's not ideal, but it's the best option available from a bad bunch.
As a final note, I would strongly suggest actually storing the the day property as a BSON Date instead of as a string. It actually takes less bytes to store and is far more useful in that form. As such the following constructor is probably the clearest and least hacky:
const todayDate = new Date(new Date().setUTCHours(0,0,0,0))

MongoDB - two updates in sequence overlap each other

We are building size calculation mechanism for our system.
In order to calculate sizes, we start with the first atomic operation - findAndModify - to find the object and add lock properties to it (to prevent another calculations for this object to interact with it and wait till the end, as we could have many parallel calculations - in this case others should be postponed), then we calculate size of specific properties and after this operation - we add metadata to object and delete locks.
However, it seems that sometimes, when we have a lot of multiple calculations for single object (especially when we calculate a lot of objects in parallel), some updates aren't executed.
_size metadata during calculation looks like this:
{
_lockedAt: SomeDate,
_transactionId: 'abc'
}
And after calculation it should look like this:
{
somePropertySize: 123,
anotherPropertySize: 1245,
(...)
_total: 131431523 // Some number
// Notice that both _lockedAt and _transactionId should be missing
}
And this is how our update flow looks like:
return Promise.coroutine(function * () {
yield object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => results.value);
// Calculations are performed here, new _size object is built
yield object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
})()
Exemplary real-life object JUST before second update (truncated for brevity):
{
title: 11,
description: 2,
detailedSection: 0,
tags: 2
file: 5625898,
_total: 5625913
}
For some reason, when we have multiple calculations next to each other, sometimes (for new objects, without _size property at all), the objects stay with _size object looking exactly as after locking, despite the fact logs show us that everything went well (calculations were complete, new sizes object was calculated and second DB update was called).
We use MongoDB 3.0, two replicaSets. Any ideas on what is happening?
Put the second update after the then so it will wait until the promise resolves:
object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => {
// Calculations are performed here, new _size object is built
object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
}).catch(err => console.error);
Also make sure you have error handling for your promises using catch.
If you don't really need the lock or transaction fields then I would remove that stuff. If you do need them, something like RethinkDB may work a little better, or PostgresSQL could give real transactions.
All in all, I checked the code very carefully and what was happening in reality, was the fact that completely different part of the code was querying the object from the DB and then, after a few other operations (mine included), it wrote the object to the DB (hence, overwriting my changes).
So, important note for every MongoDB user - please do remember that MongoDB is not transactional, but still atomic, which means that it guarantees that your operation will be persisted, but does not guarantee that data between operations will be persisted.
To sum up, things I learned by this example:
NEVER update whole object in the database with the data obtained from it some time before (e.g. by querying, changing some properties and saving again)
USE $set, $inc, $unset and other special operators. If you have a lot of parameters, use e.g. mongo-dot-notation npm library to flatten your data into $set selector.
If something unexpected is happening with your data (e.g. missing properties after saving) the first thing to investigate is another pending operations on those specific entities
The least probable cause of your problems is MongoDB itself. It's usually code that does not follow atomicity rules (which happens probably with a lot of people used to transactional DBs :)).

Mongodb, incrementing value inside an array. (save() ? update() ?)

var Poll = mongoose.model('Poll', {
title: String,
votes: {
type: Array,
'default' : []
}
});
I have the above schema for my simple poll, and I am uncertain of the best method to change the value of the elements in my votes array.
app.put('/api/polls/:poll_id', function(req, res){
Poll.findById(req.params.poll_id, function(err, poll){
// I see the official website of mongodb use something like
// db.collection.update()
// but that doesn't apply here right? I have direct access to the "poll" object here.
Can I do something like
poll.votes[1] = poll.votes[1] + 1;
poll.save() ?
Helps much appreciated.
});
});
You can to the code as you have above, but of course this involves "retrieving" the document from the server, then making the modification and saving it back.
If you have a lot of concurrent operations doing this, then your results are not going to be consistent, as there is a high potential for "overwriting" the work of another operation that is trying to modify the same content. So your increments can go out of "sync" here.
A better approach is to use the standard .update() type of operations. These will make a single request to the server and modify the document. Even returning the modified document as would be the case with .findByIdAndUpdate():
Poll.findByIdAndUpdate(req.params.poll_id,
{ "$inc": { "votes.1": 1 } },
function(err,doc) {
}
);
So the $inc update operator does the work of modifying the array at the specified position using "dot notation". The operation is atomic, so no other operation can modify at the same time and if there was something issued just before then the result would be correctly incremented by that operation and then also by this one, returning the correct data in the result document.

Mongoose Aggregate doesn't work without find() first

Not sure exactly what's going on, but I have a Model#aggregate() call that works when it's preceded by a Model#find(), but not otherwise. This is the code I'm using (with dummy properties, Thing being my model object):
var query = { my_ids: { $in: _.pluck(this.related_ids, 'my_ids') } };
// This Thing.find is never executed and shouldn't ever be needed.
Thing.find(query);
Thing.aggregate([
{ $match: query },
{ $group: { _id: null, "amount": { "$sum": "$amount" } } },
]).exec(function(error, result) {
// blah blah blah
});
When run as-is, it works as expected - the callback's result is [{ _id: null, amount: <some number> }] as expected. However, if the Thing.find(query); is commented out, it is just a blank array [].
I know Thing.find() is returning a Query object and might be setting some state things in the background, which may be letting the aggregate() call finish, but it also doesn't work when the query is executed (which I'd assume would reset those state variables). I'm cool with leaving the Thing.find(query) call in to make it work for now, but it does make my eye twitch a little bit. Any thoughts?
Found it, bit of a silly solution, but it's good to have answers to similar problems up on SO.
The problem: race conditions. The second I saw "this thing seems to affect an unrelated thing" while using Node.js, that's where my mind should have gone. This code happens immediately after some code where the objects are populated in the database (this is in a test), and there were some validation errors happening that weren't being caught. In any case, the objects weren't done being saved to the database, so with the slight delay of the Thing.find() call, it was delaying the aggregation until they were saved.

Resources