How to do mongoose aggregation with nested array documents - node.js

I have a Mongodb collection, Polls with following schema
{
"options" : [
{
"_id" : Object Id,
"option" : String,
"votes" : [ Object Id ] // object ids of users who voted
},.....
]
}
Assume i have userId of the user in node js to whom I want to send this info.
My task is to
(1) include an extra field in the above json object (which i get using mongoose).
as
"myVote" : option._id
I need to find option._id for which
options[someIndex].votes contains userId
(2) change the existing "votes" field in each option to represent number of votes on a particular option as can be seen in example
Example:
{
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : [ 1,2,3 ]
},
{
"_id" : 2,
"option" : "B",
"votes" : [ 5 ]
},
{
"_id" : 3,
"option" : "C",
"votes" : [ ]
}
]
}
So if i user with user id = 5 wants to see the poll, then i need to send following info:
Expected Result :
{
"my_vote" : 2, // user with id 5 voted on option with id 2
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3 //num of votes on option "A"
},
{
"_id" : 2,
"option" : "B",
"votes" : 1 //num of votes on option "B"
},
{
"_id" : 3,
"option" : "C",
"votes" : 0 //num of votes on option "C"
}
]
}

Since it was the question that you actually asked that was neither really provided in the current acceptance answer, and also that it does some unnecessary things, there is another approach:
var userId = 5; // A variable to work into the submitted pipeline
db.sample.aggregate([
{ "$unwind": "$options" },
{ "$group": {
"_id": "$_id",
"my_vote": { "$min": {
"$cond": [
{ "$setIsSubset": [ [userId], "$options.votes" ] },
"$options._id",
false
]
}},
"options": { "$push": {
"_id": "$options._id",
"option": "$options.option",
"votes": { "$size": "$options.votes" }
}}
}}
])
Which of course will give you output per document like this:
{
"_id" : ObjectId("5573a0a8b67e246aba2b4b6e"),
"my_vote" : 2,
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3
},
{
"_id" : 2,
"option" : "B",
"votes" : 1
},
{
"_id" : 3,
"option" : "C",
"votes" : 0
}
]
}
So what you are doing here is using $unwind in order to break down the array for inspection first. The following $group stage ( and the only other stage you need ) makes use of the $min and $push operators for re-construction.
Inside each of those operations, the $cond operation tests the array content via $setIsSubset and either returns the matched _id value or false. When reconstructing the inner array element, specify all elements rather than just the top level document in arguments to $push and make use of the $size operator to count the elements in the array.
You also make mention with a link to another question about dealing with an empty array with $unwind. The $size operator here will do the right thing, so it is not required to $unwind and project a "dummy" value where the array is empty in this case.
Grand note, unless you are actually "aggregating" across documents it generally would be advised to do this operation in client code rather than the aggregation framework. Using $unwind effectively creates a new document in the aggregation pipeline for each element of the array contained in each document, which produces significant overhead.
For such an operation acting on distinct documents only, client code is more efficient to process each document individually.
If you really must persist that server processing is the way to do this, then this is probably most efficient using $map instead:
db.sample.aggregate([
{ "$project": {
"my_vote": {
"$setDifference": [
{ "$map": {
"input": "$options",
"as": "o",
"in": { "$cond": [
{ "$setIsSubset": [ [userId], "$$o.votes" ] },
"$$o._id",
false
]}
}},
[false]
]
},
"options": { "$map": {
"input": "$options",
"as": "o",
"in": {
"_id": "$$o._id",
"option": "$$o.option",
"votes": { "$size": "$$o.votes" }
}
}}
}}
])
So this just "projects" the re-worked results for each document. The my_vote is not the same though, since it is a single element array ( or possible multiple matches ) that the aggregation framework lacks the operators to reduce to a non array element without further overhead:
{
"_id" : ObjectId("5573a0a8b67e246aba2b4b6e"),
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3
},
{
"_id" : 2,
"option" : "B",
"votes" : 1
},
{
"_id" : 3,
"option" : "C",
"votes" : 0
}
],
"my_vote" : [
2
]
}

Check out this question.
It's not asking the same thing, but there's no way to do what you're asking without multiple queries anyway. I would modify the JSON you get back directly, as you're just displaying extra info that is already contained in the result of the query.
Save the userID you're querying for.
Take the results of your query (options array in an object), search through the votes of each element in the array.
When you've found the right vote, attach the _id (perhaps add 'n/a' if you don't find a vote).
Write a function that does 2 and 3, and you can just pass it a userID, and get back a new object with myVote attached.
I don't think doing it like this will be slower than doing another query in Mongoose.

Related

MongoDB Shema to support concurrent update on a document

We were working on a project with a 300 documents with currentValue field in a main collection, in order to track the history of each document of first collection. we created another collection named history with approximately 6.5 millions of documents.
For each input of system we have to add around 30 history item and update currentValue field of main collection, so, We tried computational field design pattern for currentValue, which lead us to have writeConfilict in concurrent situations (at concurrency of around 1000 requests).
Then we tried to compute currentValue field with sum (amount field) and groupBy(mainId field) on history collection which takes too long (> 3s).
Main collection docs:
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
.
.
.
],
"name" : "name",
}
History collection docs:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
}
If you have any other idea to handle this situation(application or db level), I would be thankful.
UPDATE 1
The update query if I use computed pattern would be:
mainCollection.findOneAndUpdate(
{
$and: [
{ _id: id },
{ "stock.storage": fromId },
{ "stock.deletedAt": null }
],
},
{
$inc: {
"stock.$.currentAmount": -1 * amount,
}
},
{
session
}
)
And Aggregation pipeline if I want to calculate currentAmount everytime:
mainCollection.aggregate([
{
$match: {
branch: new ObjectId("...")
}
},
{
$group: {
_id: "$ingredient",
currentAmount: {
$sum: "$amount"
}
}
}])
in order to have computed field, mongo design patterns, suggested computed field,
The Computed Pattern is utilized when we have data that needs to be computed repeatedly in our application. link
like below:
// your main collection will look like this
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
"totalAmount": 20000 // for example
}
but for having concurrent there is a better way to solve this problem with cumulative summation, in this algorithm, we sum last documents inputs, with current input:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
"cumulative": 15 // sum of last documents input
}

Mongodb aggregate project string as number

I have a mongo script which retrieves a value from an array and creates a new document. However, the value which it retrieves is a string. I need the value to be added to the new document as a number instead of a string because it is read by a graphing engine which ignores the value if it is a string.
From the script below, it is "value": {$arrayElemAt: ["$accountBalances", 1]} which needs to be a number instead of a string. Thanks.
db.std_sourceBusinessData.aggregate(
{ $match : {objectType: "Account Balances"}},
{ $project: {_id: 1,entity_ID: 1,objectOrigin: 1,accountBalances: 1}},
{ $unwind: "$accountBalances" },
{ $match: {"accountBalances": "Sales"}}
,
{$project: {
_id: 1
, "value": {$arrayElemAt: ["$accountBalances", 1]}
,"key": {$literal: "sales"}
,"company": "$entity_ID"
,"objectOrigin" : "$objectOrigin"
}}
,{$out: "entity_datapoints"}
)
This is what I currently get:
{
"_id" : ObjectId("5670961f910e1f54662c1d9d"),
"objectOrigin" : "Xero",
"Value" : "500.00",
"key" : "grossprofit",
"company" : "e56e09ef-5c7c-423e-b699-21469bd2ea00"
}
what I want is:
{
"_id" : ObjectId("5670961f910e1f54662c1d9d"),
"objectOrigin" : "Xero",
"Value" : 500.0000000000000,
"key" : "grossprofit",
"company" : "e56e09ef-5c7c-423e-b699-21469bd2ea00"
}

Mongoose mapReduce : reduce returns object or array?

I Have the following Collection :
/* 0 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b4f1cb3d2eacb1300002b"),
"answers" : [],
"questions" : []
}
/* 1 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6b9eb3d2eacb1300002c"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 2 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6baeb3d2eacb1300002d"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 3 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("533b828146ca43634000002d"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 4 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be327b539a4d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 5 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be5ec89d717d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
I am running the following code in order to find how many times the (questions,answers) combination appears in the collection:
o.map= function(){
emit({"questions" : this.questions, "answers" :this.answers },this.clientID)
};
o.reduce = function(answers, collection){
return collection.length;
};
logSearchDB.mapReduce(o,function (err, results) {
results.sort(function(a, b){return b.value-a.value});
for (var i = 0; i < results.length; i++) {
console.log(JSON.stringify(results[i]))
};
})
The output is:
{"_id":{"questions":[],"answers":[]},"value":"51b9c10d91d1a3a52b0000b8"}
{"_id":{"questions":["Color"],"answers":["ORANGE"]},"value":3}
{"_id":{"questions":["1","2","3"],"answers":["1","8"]},"value":2}
I expected that the first row will have "value" : 1
I guess the 'reduce' function got a 'collection' object : "51b9c10d91d1a3a52b0000b8", instead of getting an array : ["51b9c10d91d1a3a52b0000b8"].
Why the map reduce doesn't collect everything into an array?
The reason why you have just a plain value in that first row is because there was only one occurrence of your key value. This is generally how mapReduce works, at least in the way it was specified in the original papers.
So the reduce function is not actually called when there only is a single key. To work around this you use the finalize function in your map reduce:
var finalize = function(key,value) {
if ( typeof(value) != "number" )
value = 1;
return value;
};
db.collection.mapReduce(
mapper,
reducer,
{
"finalize": finalize,
"out": { "inline": 1 }
}
);
That runs over all of the output and sees that when the value is seen to be not a nunber, being the clientID you are emitting, then the value is set at 1 because that is how hany are in the grouping.
Really your query is better suited to the aggregation framework than mapReduce. The aggregation framework is a native code implementation as opposed to using a JavaScript interpreter. It runs much faster than mapReduce:
db.collection.aggregate([
{ "$group": {
"_id": {
"questions": "$questions",
"answers": "$answers"
},
"count": { "$sum": 1 }
}}
])
So it is the better option to use. It was a later introduction to MongoDB so people still tend to think in terms of mapReduce or otherwise there is legacy code from earlier versions of MongoDB. But this has been around for quite a while now.
Also see the operator reference for the aggregation framework.

Querying a property that is in a deeply nested array

So I have this document within the course collection
{
"_id" : ObjectId("53580ff62e868947708073a9"),
"startDate" : ISODate("2014-04-23T19:08:32.401Z"),
"scoreId" : ObjectId("531f28fd495c533e5eaeb00b"),
"rewardId" : null,
"type" : "certificationCourse",
"description" : "This is a description",
"name" : "testingAutoSteps1",
"authorId" : ObjectId("532a121e518cf5402d5dc276"),
"steps" : [
{
"name" : "This is a step",
"description" : "This is a description",
"action" : "submitCategory",
"value" : "532368bc2ab8b9182716f339",
"statusId" : ObjectId("5357e26be86f746b68482c8a"),
"_id" : ObjectId("53580ff62e868947708073ac"),
"required" : true,
"quantity" : 1,
"userId" : [
ObjectId("53554b56e3a1e1dc17db903f")
]
},...
And I want to do is create a query that returns all courses that have a specific userId in the userId array that is in the steps array for a specific userId. I've tried using $elemMatch like so
Course.find({
"steps": {
"$elemMatch": {
"userId": {
"$elemMatch": "53554b56e3a1e1dc17db903f"
}
}
}
},
But It seems to be returning a empty document.
I think this will work for you, you have the syntax off a bit plus you need to use ObjectId():
db.Course.find({ steps : { $elemMatch: { userId:ObjectId("53554b56e3a1e1dc17db903f")} } })
The $elemMatch usage is not necessary unless you actually have compound sub-documents in that nested array element. And also is not necessary unless the value being referenced could possibly duplicate in another compound document.
Since this is an ObjectId we are talking about, then it's going to be unique, at least within this array. So just use the "dot-notation" form:
Course.find({
"steps.userId": ObjectId("53554b56e3a1e1dc17db903f")
},
Go back and look at the $elemMatch documentation. In this case, the direct "dot-notation" form is all you need

Compare two date fields in MongoDB

in my collection each document has 2 dates, modified and sync. I would like to find those which modified > sync, or sync does not exist.
I tried
{'modified': { $gt : 'sync' }}
but it's not showing what I expected. Any ideas?
Thanks
You can not compare a field with the value of another field with the normal query matching. However, you can do this with the aggregation framework:
db.so.aggregate( [
{ $match: …your normal other query… },
{ $match: { $eq: [ '$modified', '$sync' ] } }
] );
I put …your normal other query… in there as you can make that bit use the index. So if you want to do this for only documents where the name field is charles you can do:
db.so.ensureIndex( { name: 1 } );
db.so.aggregate( [
{ $match: { name: 'charles' } },
{ $project: {
modified: 1,
sync: 1,
name: 1,
eq: { $cond: [ { $gt: [ '$modified', '$sync' ] }, 1, 0 ] }
} },
{ $match: { eq: 1 } }
] );
With the input:
{ "_id" : ObjectId("520276459bf0f0f3a6e4589c"), "modified" : 73845345, "sync" : 73234 }
{ "_id" : ObjectId("5202764f9bf0f0f3a6e4589d"), "modified" : 4, "sync" : 4 }
{ "_id" : ObjectId("5202765b9bf0f0f3a6e4589e"), "modified" : 4, "sync" : 4, "name" : "charles" }
{ "_id" : ObjectId("5202765e9bf0f0f3a6e4589f"), "modified" : 4, "sync" : 45, "name" : "charles" }
{ "_id" : ObjectId("520276949bf0f0f3a6e458a1"), "modified" : 46, "sync" : 45, "name" : "charles" }
This returns:
{
"result" : [
{
"_id" : ObjectId("520276949bf0f0f3a6e458a1"),
"modified" : 46,
"sync" : 45,
"name" : "charles",
"eq" : 1
}
],
"ok" : 1
}
If you want any more fields, you need to add them in the $project.
For MongoDB 3.6 and newer:
The $expr operator allows the use of aggregation expressions within the query language, thus you can do the following:
db.test.find({ "$expr": { "$gt": ["$modified", "$sync"] } })
or using aggregation framework with $match pipeline
db.test.aggregate([
{ "$match": { "$expr": { "$gt": ["$modified", "$sync"] } } }
])
For MongoDB 3.0+:
You can also use the aggregation framework with the $redact pipeline operator that allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
Consider running the following aggregate operation which demonstrates the above concept:
db.test.aggregate([
{ "$redact": {
"$cond": [
{ "$gt": ["$modified", "$sync"] },
"$$KEEP",
"$$PRUNE"
]
} }
])
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
Simply
db.collection.find({$where:"this.modified>this.sync"})
Example
Kobkrits-MacBook-Pro-2:~ kobkrit$ mongo
MongoDB shell version: 3.2.3
connecting to: test
> db.time.insert({d1:new Date(), d2: new Date(new Date().getTime()+10000)})
WriteResult({ "nInserted" : 1 })
> db.time.find()
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1<this.d2"})
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1>this.d2"})
> db.time.find({$where:"this.d1==this.d2"})
>
Use Javascript, use foreach And convert Date To toDateString()
db.ledgers.find({}).forEach(function(item){
if(item.fromdate.toDateString() == item.todate.toDateString())
{
printjson(item)
}
})
Right now your query is trying to return all results such that the modified field is greater than the word 'sync'. Try getting rid of the quotes around sync and see if that fixes anything. Otherwise, I did a little research and found this question. What you're trying to do just might not be possible in a single query, but you should be able to manipulate your data once you pull everything from the database.
To fix this issue without aggregation change your query to this:
{'modified': { $gt : ISODate(this.sync) }}

Resources