Mongoose Aggregate doesn't work without find() first - node.js

Not sure exactly what's going on, but I have a Model#aggregate() call that works when it's preceded by a Model#find(), but not otherwise. This is the code I'm using (with dummy properties, Thing being my model object):
var query = { my_ids: { $in: _.pluck(this.related_ids, 'my_ids') } };
// This Thing.find is never executed and shouldn't ever be needed.
Thing.find(query);
Thing.aggregate([
{ $match: query },
{ $group: { _id: null, "amount": { "$sum": "$amount" } } },
]).exec(function(error, result) {
// blah blah blah
});
When run as-is, it works as expected - the callback's result is [{ _id: null, amount: <some number> }] as expected. However, if the Thing.find(query); is commented out, it is just a blank array [].
I know Thing.find() is returning a Query object and might be setting some state things in the background, which may be letting the aggregate() call finish, but it also doesn't work when the query is executed (which I'd assume would reset those state variables). I'm cool with leaving the Thing.find(query) call in to make it work for now, but it does make my eye twitch a little bit. Any thoughts?

Found it, bit of a silly solution, but it's good to have answers to similar problems up on SO.
The problem: race conditions. The second I saw "this thing seems to affect an unrelated thing" while using Node.js, that's where my mind should have gone. This code happens immediately after some code where the objects are populated in the database (this is in a test), and there were some validation errors happening that weren't being caught. In any case, the objects weren't done being saved to the database, so with the slight delay of the Thing.find() call, it was delaying the aggregation until they were saved.

Related

Upsert and $inc Sub-document in Array

The following schema is intended to record total views and views for a very specific day only.
const usersSchema = new Schema({
totalProductsViews: {type: Number, default: 0},
productsViewsStatistics: [{
day: {type: String, default: new Date().toISOString().slice(0, 10), unique: true},
count: {type: Number, default: 0}
}],
});
So today views will be stored in another subdocument different from yesterday. To implement this I tried to use upsert so as subdocument will be created each day when product is viewed and counts will be incremented and recorded based on a particular day. I tried to use the following function but seems not to work the way I intended.
usersSchema.statics.increaseProductsViews = async function (id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
const result = await this.findByIdAndUpdate(id, {
$inc: {
totalProductsViews: 1,
'productsViewsStatistics.$[sub].count': 1
},
},
{
upsert: true,
arrayFilters: [{'sub.day': todayDate}],
new: true
});
console.log(result);
return result;
};
What do I miss to get the functionality I want? Any help will be appreciated.
What you are trying to do here actually requires you to understand some concepts you may not have grasped yet. The two primary ones being:
You cannot use any positional update as part of an upsert since it requires data to be present
Adding items into arrays mixed with "upsert" is generally a problem that you cannot do in a single statement.
It's a little unclear if "upsert" is your actual intention anyway or if you just presumed that was what you had to add in order to get your statement to work. It does complicate things if that is your intent, even if it's unlikely give the finByIdAndUpdate() usage which would imply you were actually expecting the "document" to be always present.
At any rate, it's clear you actually expect to "Update the array element when found, OR insert a new array element where not found". This is actually a two write process, and three when you consider the "upsert" case as well.
For this, you actually need to invoke the statements via bulkWrite():
usersSchema.statics.increaseProductsViews = async function (_id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
await this.bulkWrite([
// Try to match an existing element and update it ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": todayDate },
"update": {
"$inc": {
"totalProductsViews": 1,
"productViewStatistics.$.count": 1
}
}
}
},
// Try to $push where the element is not there but document is - ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": { "$ne": todayDate } },
"update": {
"$inc": { "totalProductViews": 1 },
"$push": { "productViewStatistics": { "day": todayDate, "count": 1 } }
}
}
},
// Finally attempt upsert where the "document" was not there at all,
// only if you actually mean it - so optional
{
"updateOne": {
"filter": { _id },
"update": {
"$setOnInsert": {
"totalProductViews": 1,
"productViewStatistics": [{ "day": todayDate, "count": 1 }]
}
}
}
])
// return the modified document if you really must
return this.findById(_id); // Not atomic, but the lesser of all evils
}
So there's a real good reason here why the positional filtered [<identifier>] operator does not apply here. The main good reason is the intended purpose is to update multiple matching array elements, and you only ever want to update one. This actually has a specific operator in the positional $ operator which does exactly that. It's condition however must be included within the query predicate ( "filter" property in UpdateOne statements ) just as demonstrated in the first two statements of the bulkWrite() above.
So the main problems with using positional filtered [<identifier>] are that just as the first two statements show, you cannot actually alternate between the $inc or $push as would depend on if the document actually contained an array entry for the day. All that will happen is at best no update will be applied when the current day is not matched by the expression in arrayFilters.
The at worst case is an actual "upsert" will throw an error due to MongoDB not being able to decipher the "path name" from the statement, and of course you simply cannot $inc something that does not exist as a "new" array element. That needs a $push.
That leaves you with the mechanic that you also cannot do both the $inc and $push within a single statement. MongoDB will error that you are attempting to "modify the same path" as an illegal operation. Much the same applies to $setOnInsert since whilst that operator only applies to "upsert" operations, it does not preclude the other operations from happening.
Thus the logical steps fall back to what the comments in the code also describe:
Attempt to match where the document contains an existing array element, then update that element. Using $inc in this case
Attempt to match where the document exists but the array element is not present and then $push a new element for the given day with the default count, updating other elements appropriately
IF you actually did intend to upsert documents ( not array elements, because that's the above steps ) then finally actually attempt an upsert creating new properties including a new array.
Finally there is the issue of the bulkWrite(). Whilst this is a single request to the server with a single response, it still is effectively three ( or two if that's all you need ) operations. There is no way around that and it is better than issuing chained separate requests using findByIdAndUpdate() or even updateOne().
Of course the main operational difference from the perspective of code you attempted to implement is that method does not return the modified document. There is no way to get a "document response" from any "Bulk" operation at all.
As such the actual "bulk" process will only ever modify a document with one of the three statements submitted based on the presented logic and most importantly the order of those statements, which is important. But if you actually wanted to "return the document" after modification then the only way to do that is with a separate request to fetch the document.
The only caveat here is that there is the small possibility that other modifications could have occurred to the document other than the "array upsert" since the read and update are separated. There really is no way around that, without possibly "chaining" three separate requests to the server and then deciding which "response document" actually applied the update you wanted to achieve.
So with that context it's generally considered the lesser of evils to do the read separately. It's not ideal, but it's the best option available from a bad bunch.
As a final note, I would strongly suggest actually storing the the day property as a BSON Date instead of as a string. It actually takes less bytes to store and is far more useful in that form. As such the following constructor is probably the clearest and least hacky:
const todayDate = new Date(new Date().setUTCHours(0,0,0,0))

MongoDB - two updates in sequence overlap each other

We are building size calculation mechanism for our system.
In order to calculate sizes, we start with the first atomic operation - findAndModify - to find the object and add lock properties to it (to prevent another calculations for this object to interact with it and wait till the end, as we could have many parallel calculations - in this case others should be postponed), then we calculate size of specific properties and after this operation - we add metadata to object and delete locks.
However, it seems that sometimes, when we have a lot of multiple calculations for single object (especially when we calculate a lot of objects in parallel), some updates aren't executed.
_size metadata during calculation looks like this:
{
_lockedAt: SomeDate,
_transactionId: 'abc'
}
And after calculation it should look like this:
{
somePropertySize: 123,
anotherPropertySize: 1245,
(...)
_total: 131431523 // Some number
// Notice that both _lockedAt and _transactionId should be missing
}
And this is how our update flow looks like:
return Promise.coroutine(function * () {
yield object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => results.value);
// Calculations are performed here, new _size object is built
yield object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
})()
Exemplary real-life object JUST before second update (truncated for brevity):
{
title: 11,
description: 2,
detailedSection: 0,
tags: 2
file: 5625898,
_total: 5625913
}
For some reason, when we have multiple calculations next to each other, sometimes (for new objects, without _size property at all), the objects stay with _size object looking exactly as after locking, despite the fact logs show us that everything went well (calculations were complete, new sizes object was calculated and second DB update was called).
We use MongoDB 3.0, two replicaSets. Any ideas on what is happening?
Put the second update after the then so it will wait until the promise resolves:
object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => {
// Calculations are performed here, new _size object is built
object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
}).catch(err => console.error);
Also make sure you have error handling for your promises using catch.
If you don't really need the lock or transaction fields then I would remove that stuff. If you do need them, something like RethinkDB may work a little better, or PostgresSQL could give real transactions.
All in all, I checked the code very carefully and what was happening in reality, was the fact that completely different part of the code was querying the object from the DB and then, after a few other operations (mine included), it wrote the object to the DB (hence, overwriting my changes).
So, important note for every MongoDB user - please do remember that MongoDB is not transactional, but still atomic, which means that it guarantees that your operation will be persisted, but does not guarantee that data between operations will be persisted.
To sum up, things I learned by this example:
NEVER update whole object in the database with the data obtained from it some time before (e.g. by querying, changing some properties and saving again)
USE $set, $inc, $unset and other special operators. If you have a lot of parameters, use e.g. mongo-dot-notation npm library to flatten your data into $set selector.
If something unexpected is happening with your data (e.g. missing properties after saving) the first thing to investigate is another pending operations on those specific entities
The least probable cause of your problems is MongoDB itself. It's usually code that does not follow atomicity rules (which happens probably with a lot of people used to transactional DBs :)).

"Race like" condition with Mongoose

I have a process that triggers a number of requests that in turn trigger off a number of webhooks. I know the process is complete when I've received all of my webhooks. My model looks something like this.
{
name: 'name',
status: 'PENDING',
children: [{
name: 'child1',
status: 'PENDING'
}, {
name: 'child2',
status: 'PENDING'
}]
}
The idea is as the webhooks come in, I update the subdoc to COMPLETE. At the end of each update I check if the others are complete and if they are, I set status='COMPLETE' on the parent. It looks like one invocation of the service is marking it as COMPLETE after another invocation has determined it was still PENDING but before that second invocation has saved. When the second one saves, it overwrites COMPLETE with PENDING on the parent.
Here is the code from my schema:
methods: {
doUpdate: function(data) {
var child = this.children.id(data.childId);
child.status = 'COMPLETE';
if (_.every(this.children.status, 'COMPLETE'))
this.status = 'COMPLETE';
return this.save();
}
}
I think you can solve your issue when you just modify and save your child objects instead of saving / overwriting the whole document each time.
To do that you can use the positional $ in your update statement.
Essentially it would look something like this:
db.yourChildrenCollection.update(
{ _id: this._id, 'children.name': childName },
{ $set: { 'children.$.status' : 'COMPLETE' } }
)
You have to modify the variable names as I do not know your collection name etc. but I think you get the point.
The solution is to first find and update the status in one operation, using a variant of mongo's findAndModify method, and then check if the other children have completed. I was trying to do two updates in one operation. By breaking it up into two steps, I know that when I check the statuses of the other children, all other invocations will have the most recent state of the document and not get a false reading while another invocation is waiting for save() to complete.

Mongodb, incrementing value inside an array. (save() ? update() ?)

var Poll = mongoose.model('Poll', {
title: String,
votes: {
type: Array,
'default' : []
}
});
I have the above schema for my simple poll, and I am uncertain of the best method to change the value of the elements in my votes array.
app.put('/api/polls/:poll_id', function(req, res){
Poll.findById(req.params.poll_id, function(err, poll){
// I see the official website of mongodb use something like
// db.collection.update()
// but that doesn't apply here right? I have direct access to the "poll" object here.
Can I do something like
poll.votes[1] = poll.votes[1] + 1;
poll.save() ?
Helps much appreciated.
});
});
You can to the code as you have above, but of course this involves "retrieving" the document from the server, then making the modification and saving it back.
If you have a lot of concurrent operations doing this, then your results are not going to be consistent, as there is a high potential for "overwriting" the work of another operation that is trying to modify the same content. So your increments can go out of "sync" here.
A better approach is to use the standard .update() type of operations. These will make a single request to the server and modify the document. Even returning the modified document as would be the case with .findByIdAndUpdate():
Poll.findByIdAndUpdate(req.params.poll_id,
{ "$inc": { "votes.1": 1 } },
function(err,doc) {
}
);
So the $inc update operator does the work of modifying the array at the specified position using "dot notation". The operation is atomic, so no other operation can modify at the same time and if there was something issued just before then the result would be correctly incremented by that operation and then also by this one, returning the correct data in the result document.

Can I fetch last inserted subdocument in Mongoose using pop()?

I am doing a findOneAndUpdate call to add to a subdocument array using $addToSet. The operation returns the newly updated document. I would like to extract the subdocument that just got inserted. Is it safe to use pop() in this case? I know that browsers vary in how arrays/objects are ordered but not sure about the node/mongo case.
The general official line for "sets" in MongoDB is that they are not considered to be ordered in any way. There actually has been some discussion that "sets" may even constitute a different data type to that of an ordinary array at some point, but as yet there is no distinction made between types and they are internally treated just as another array.
It would appear however, that the general behavior actually is that any "new" item added to the set is indeed "appended" to the list of current elements, so anything new would indeed be the last element in an array. And arrays always do retain positions, which is the general point of the data type.
So whilst your point is basically asking:
"will the last added item be the last in the list?"
Then that would appear to be the case despite the given cases warning about the order of elements, which would generally appear to be from surface value more about the "internal" ordering, where adding "a" after "c" would not place the "a" element first in the list without additional modification.
But the question also comes off a little bit "funny" in that you are essentially asking:
"did my last update to the set appear on the end?"
Which IMHO seems to be really asking "Did it update or not?", or did the last member I added actually update the "set" or did the element already exist. In this case, rather than looking to see if the last element is the same as the one you wanted to add, the better option would be to check the WriteResult and see if the document was actually modified.
Consider this document:
{
"list": [ "c", "a" ]
}
And then issuing the the update with the Bulk Operations API to get the result:
var bulk = Model.collection("settest").initializeOrderedBulkOp();
bulk.find({}).updateOne({ "$addToSet": { "list": "a" } });
bulk.execute(function(err,result) {
console.log( JSON.stringify( result, undefined, 4 ) );
});
Would give you something like this:
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 0,
"nUpserted" : 0,
"nMatched" : 1,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
The enhanced response tells you that even though 1 document was matched by the update statement, nothing was actually modified in the document itself since the $addToSet operation actually did not do anything where the element was already present in the set.
If you however considered the mongoose .findOneAndUpdate() method, which is derived from the basic underlying .findAndModify() method of the base driver, then the response you get back is the "modified" document by default:
Model.findOneAndUpdate(
{},
{ "$addToSet": { "list": "a" },
function(err,doc) {
console.log( JSON.stringify( doc, undefined, 4 ) );
}
);
So that is basically the same document as before, and there is no way to tell if the update operation actually did anything or not.
Now you could modify that to ask for the original document instead of the modified one:
Model.findOneAndUpdate(
{},
{ "$addToSet": { "list": "a" },
{ "new": false },
function(err,doc) {
console.log( JSON.stringify( doc, undefined, 4 ) );
}
);
But the only thing you really know about the result is that if the element you last added is not present in the original document, then that resulted in an update. Of course it is there "now" because you asked it to do that and the operation succeeded and returned you a document:
Model.findOneAndUpdate(
{},
{ "$addToSet": { "list": "a" },
{ "new": false },
function(err,doc) {
if ( doc.list.indexOf("a") == -1 )
console.log( "updated set" );
}
);
But is the "last element" in the list where added? As covered earlier, it would appear so, but whether this always remains the same is the question. But it really does not make a lot of sense to do so as you cannot guarantee that the operation you just did actually did anything without checking for it properly.
So in an atomic operation such as .findOneAndUpdate() where you get the document returned, or indeed in any "update" variant, the better option to "did this do anything" is to actually check the Write Result or the original document to what you expect the result of the operation to be.
In the end, if you need the whole document back then use .findOneAndUpdate() but with the "original document" option. You can safely assume that your operation succeeded and add the element to the list where it was not present originally if that is your desired result. The difference tells you if the update actually did anything.
If however you only want to know if a new element was indeed added, then just use the Bulk API approach, which gives you the correct response that the document was not updated since the element was already part of the "set".

Resources