Possible? $add values of array to compare with object total value

Possible? $add values of array to compare with object total value - node.js

mongodb native for node.js (driver version is 2.2.4 and MongoDB shell version: 3.2.9)
My collection has objects like this:
{x:[{v:0.002},{v:0.00002}],t:0.00202} //<this one has the full total in its values
{x:[{v:0.002},{v:0.002}],t:0.00202}
{x:[{v:0.002},{v:0.002}],t:0.3}
(shown here without their object ids)
I am unsure how to add up all the x.v to return only objects where the total of x.v is greater or equal to the objects t
aggregate({"t":{"$gte":{"$add":["x.v"]}}})
returns every object, I don't have any other idea on the order of syntax from reading the docs.
Can mongodb even do this in a query?

With MongoDB 3.2, a couple of approaches you can take here. You can query with the $where operator:
db.collection.find({
"$where": function() {
return (this.x.reduce(function (a, b) {
return a + b.v;
}, 0) > this.t);
}
})
Sample Output
/* 1 */
{
"_id" : ObjectId("587107b3cbe62793a0f14e74"),
"x" : [
{
"v" : 0.002
},
{
"v" : 0.002
}
],
"t" : 0.00202
}
But note this is bound to be a not very efficient solution since a query operation with the $where operator calls the JavaScript engine to evaluate JavaScript code on every document and checks the condition for each.
This is very slow as MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index.
It is advisable to combine with indexed queries if you can so that the query may be faster. However, it's strongly recommended to use JavaScript expressions and the $where operator as a last resort when you can't structure the data in any other way, or when you are dealing with a small subset of data.
A better approach would be to use the aggregation framework where you can use the $unwind operator to flatten the array x, calculate the sums for x.v within a $group pipeline and subsequently filtering the documents using the $redact pipeline stage. This allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition is false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient.
db.collection.aggregate([
{ "$unwind": "$x" },
{
"$group": {
"_id": "$_id",
"x": { "$push": "$x" },
"t": { "$first": "$t" },
"y": { "$sum": "$x.v" }
}
},
{
"$redact": {
"$cond": [
{ "$gt": [ "$y", "$t" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
Sample Output
/* 1 */
{
"_id" : ObjectId("587107b3cbe62793a0f14e74"),
"x" : [
{
"v" : 0.002
},
{
"v" : 0.002
}
],
"t" : 0.00202,
"y" : 0.004
}
However, as much as this solution is better than the previous solution that uses $where, bear in mind that the use of $unwind operator can also limit performance with larger datasets since it produces a cartesian product of the documents i.e. a copy of each document per array entry, which uses more memory (possible memory cap on aggregation pipelines of 10% total memory) and therefore takes time to produce as well processing the documents during the flattening process.
Also, this solution requires knowledge of the document fields since this is needed in the $group pipeline where you retain the fields in the grouping process by using the accumulators like $first or $last. That can be a huge limitation if your query needs to be dynamic.
For the most efficient solution, I would suggest bumping your MongoDB server to 3.4, and use the combination of the $redact pipeline stage and the new $reduce array operator to filter the documents in a seamless manner.
The $reduce is for calculating the sum of the x.v fields in the array by applying an expression to each element in an array and combining them into a single value.
You can then use this an an expression with the $redact pipeline's evaluation to get the desired result:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$reduce": {
"input": "$x",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.v"] }
}
},
"$t"
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
Sample Output
/* 1 */
{
"_id" : ObjectId("587107b3cbe62793a0f14e74"),
"x" : [
{
"v" : 0.002
},
{
"v" : 0.002
}
],
"t" : 0.00202
}

Related

updateOne nested Array in mongodb

I have a group collection that has the array order that contains ids.
I would like to use updateOne to set multiple items in that order array.
I tried this which updates one value in the array:
db.groups.updateOne({
_id: '831e0572-0f04-4d84-b1cf-64ffa9a12199'
},
{$set: {'order.0': 'b6386841-2ff7-4d90-af5d-7499dd49ca4b'}}
)
That correctly updates (or sets) the array value with index 0.
However, I want to set more array values and updateOne also supports a pipeline so I tried this:
db.slides.updateOne({
_id: '831e0572-0f04-4d84-b1cf-64ffa9a12199'
},
[
{$set: {'order.0': 'b6386841-2ff7-4d90-af5d-7499dd49ca4b1'}}
]
)
This does NOTHING if the order array is empty. But if it's not, it replaces every element in the order array with an object { 0: 'b6386841-2ff7-4d90-af5d-7499dd49ca4b1' }.
I don't understand that behavior.
In the optimal case I would just do
db.slides.updateOne({
_id: '831e0572-0f04-4d84-b1cf-64ffa9a12199'
},
[
{$set: {'order.0': 'b6386841-2ff7-4d90-af5d-7499dd49ca4b1'}},
{$set: {'order.1': 'otherid'}},
{$set: {'order.2': 'anotherone'}},
]
)
And that would just update the order array with the values.
What is happening here and how can I achieve my desired behavior?

The update by index position in the array is only supported in regular update queries, but not in aggregation queries,
They have explained this feature in regular update query $set operator documentation, but not it aggregation $set.
The correct implementation in regular update query:
db.slides.updateOne({
_id: '831e0572-0f04-4d84-b1cf-64ffa9a12199'
},
{
$set: {
'order.0': 'b6386841-2ff7-4d90-af5d-7499dd49ca4b1',
'order.1': 'otherid',
'order.2': 'anotherone'
}
}
)
If you are looking for only an aggregation query, it is totally long process than the above regular update query, i don't recommend that way instead, you can format your input in your client-side language and use regular query.

If you have to use aggregation framework, try this (you will have to pass array of indexes and array of updated values separately):
$map and $range to iterate over the order array by indexes
$cond and $arrayElemAt to check if the current index is in the array of indexes that has to be updates. If it is, update it with the same index from the array of new values. If it is not, keep the current value.
NOTE: This will work only if the array of indexes that you want to update starts from 0 and goes up (as in your example).
db.collection.update({
_id: '831e0572-0f04-4d84-b1cf-64ffa9a12199'
},
[
{
"$set": {
"order": {
"$map": {
input: {
$range: [
0,
{
$size: "$order"
}
]
},
in: {
$cond: [
{
$in: [
"$$this",
[
0,
1,
2
]
]
},
{
$arrayElemAt: [
[
"b6386841-2ff7-4d90-af5d-7499dd49ca4b1",
"otherid",
"anotherone"
],
"$$this"
]
},
{
$arrayElemAt: [
"$order",
"$$this"
]
}
]
}
}
}
}
}
])
Here is the working example: https://mongoplayground.net/p/P4irM9Ouyza

mongodb find with calculated field

I'm trying to create a mongodb query using the filtered value in the filter. For example:
var myIdVariable = '1jig23h34r34r30h';
var myVisibleVariable = false;
var myDistanceVariable = 100;
db.getCollection.find({
'_id': myIdVariable,
'isVisible': myVisibleVariable,
'distanceRange': {$lte: {myDistanceVariable - distanceRange}}
})
So, I want filter the distanceRange from database based on the calculation of (myDistanceVariable - distanceRange), with the distanceRange given in the same query.
I don't know if I give you a clear explanation of my problem. It's possible?
Thanks you.

Use the $expr operator to build a query expression that allows you to compare fields from the same document as well as compare the distanceRange field with the calculation of the field itself and your variables.
You would need to use the logical $and query operator to include the other query expressions thus your final query would look like the following:
db.getCollection('collectionName').find({
'$expr': {
'$and': [
{ 'isVisible': myVisibleVariable },
{ '$lte': [
'$distanceRange', {
'$subtract': [
myDistanceVariable, '$distanceRange'
]
}
] }
]
}
})
If your MongoDB server doesn't support the $expr operator then go for the aggregation framework route with $redact
db.getCollection('collectionName').aggregate([
{ "$redact": {
"$cond": [
{
'$and': [
{ 'isVisible': myVisibleVariable },
{ '$lte': [
'$distanceRange', {
'$subtract': [
x, '$distanceRange'
]
}
] }
]
},
"$$KEEP",
"$$PRUNE"
]
} }
])
Note
Including the _id in the query expressions means you are narrowing down your selection to just a single document and the query may not return any results since it's looking for a specific document with that _id AND the same document should satisfy the other query expressions.

Mongoose aggregation "$sum" of rows in sub document

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,
With this in mind, I have a job model with schema like below :
{
name: {
type: String,
required: true
},
info: String,
active: {
type: Boolean,
default: true
},
all_service: [
price: {
type: Number,
min: 0,
required: true
},
all_sub_item: [{
name: String,
price:{ // << -- this is the price I want to calculate
type: Number,
min: 0
},
owner: {
user_id: { // <<-- here is the filter I want to put
type: Schema.Types.ObjectId,
required: true
},
name: String,
...
}
}]
],
date_create: {
type: Date,
default : Date.now
},
date_update: {
type: Date,
default : Date.now
}
}
I would like to have a sum of price column, where owner is present, I tried below but no luck
Job.aggregate(
[
{
$group: {
_id: {}, // not sure what to put here
amount: { $sum: '$all_service.all_sub_item.price' }
},
$match: {'not sure how to limit the user': given_user_id}
}
],
//{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
function(err, summary) {
console.log(err);
console.log(summary);
}
);
Could someone guide me in the right direction. thank you in advance

Primer
As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.
The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.
Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.
You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.
However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.
So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:
{
"a": 1,
"b": [
{
"c": 1,
"d": [
{ "e": 1 }, { "e": 2 }
]
},
{
"c": 2,
"d": [
{ "e": 1 }, { "e": 2 }
]
}
]
}
Becomes this:
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }
And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:
db.collection.aggregate([
{ "$unwind": "$b" },
{ "$unwind": "$b.d" }
])
So there the "pipe" first array from "$b" like so:
{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.
Solving
With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.
So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.
Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.
So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Match again to filter the array elements
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$ifNull": [ "$owner", given_user_id ] },
given_user_id
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$project": {
"all_items": {
"$setDifference": [
{ "$map": {
"input": "$all_service",
"as": "A",
"in": {
"$setDifference": [
{ "$map": {
"input": "$$A.all_sub_item",
"as": "B",
"in": {
"$cond": {
"if": { "$eq": [ "$$B.owner", given_user_id ] },
"then": "$$B",
"else": false
}
}
}},
false
]
}
}},
[[]]
]
}
}},
// De-normalize the "two" level array. "Double" $unwind
{ "$unwind": "$all_items" },
{ "$unwind": "$all_items" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_items.price" }
}}
],
function(err,results) {
}
)
That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.
A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.
The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:
[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]
As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.
Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.
So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.

It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL".
On that assumption, you'll want to do your $match first, to reduce the input set to your sum. So your first stage should be something like
$match: { all_service.all_sub_items.owner : { $exists: true } }
Think of this as then passing all matching documents to your second stage.
Now, because you are summing an array, you have to do another step. Aggregation operators work on documents - there isn't really a way to sum an array. So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. Think of this as a cross join. This will be $unwind.
$unwind: { "$all_service.all_sub_items" }
Now you've just made a much larger number of documents, but in a form where we can sum them. Now we can perform the $group. In your $group, you specify a transformation. The line:
_id: {}, // not sure what to put here
is creating a field in the output document, which is not the same documents as the input documents. So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. But this will allow $sum to work.
I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct.
$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }
This will output documents with an _id field equivalent to your original document ID, and your sum.
I'll let you put it together, I'm not super familiar with node. You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. Good luck!

Mongo query getting totals

If I had a schema that looked something like this:
var person = new Schema({
active: {type: Boolean},
otherSetting: {type: Boolean}
});
Would it be possible with just one query to get the entire total count of all people, total people active, total people inactive, as well as the total count for people with otherSetting set to true and other Setting set to false? Would otherSetting and active have to be broken into two queries?
I've been playing around with the aggregate framework on this problem and although this seems like a simple problem, I can't seem to do it with just one query.
Is it even possible? Thanks for any help.

The aggregation framework has logical operators such as $cond that work well with your boolean conition here:
db.collection.aggregate([
{ "$group": {
"_id": null,
"active": { "$sum": { "$cond": [ "$active", 1, 0 ] } },
"inActive": { "$sum": { "$cond": [ "$active", 0, 1 ] } },
"total": { "$sum": 1 }
}}
])
The $cond operator is a "ternary" operator ( if/then/else ) that allows the evaluation of a logical condition to return the true ( then ) or false ( else ) values.
The "boolean" is evaluated as true/false in the first argument to $cond which passes the appropriate value to $sum in order to get the conditional totals.
Everything works within a single $group pipeline stage with a grouping key _id of null since you want to add up the whole collection. If grouping on the value of another field then replace that null with the field you want.

Mongodb aggregation in deep

I have scheme like this below. I'm using nodejs mongodb offical driver.
Could I use aggregation pipeline framework for grouping someprop.subpop.title?
I can make it with map reduce but aggregation is much more faster than map/reduce. I couldn't find any example to go deep in objects when grouping.
{
id_ : ObjectID("234bv123"),
username : "ugurozpinar",
someprop : {
subprop : [
{title:"Movies",count:5},
{title:"Sport",count:10}
]
}
},
{
id_ : ObjectID("234bv123"),
username : "otheruser",
someprop : {
subprop : [
{title:"Movies",count:9},
{title:"Theatre",count:8}
]
}
}
expected result
[
{id_:"Movies",total:14},
{id_:"Theatre",total:8},
{id_:"Sport",total:10}
]

Yes, you can use dot notation to reach inside of objects and use $unwind with $group to get total counts by title:
db.test.aggregate([
{$unwind: '$someprop.subprop'},
{$group: {
_id: '$someprop.subprop.title',
count: {$sum: '$someprop.subprop.count'}
}}
])

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string