Combine Different Grouping Totals in Aggregate Output

Combine Different Grouping Totals in Aggregate Output - node.js

Now that I've had a weekend of banging my head on $project, aggregate(), and $group, it's time for another round of throwing myself on your mercy. I'm trying to do a call where I get back the totals for users, grouped by sex (this was the easier part) and grouped by age range (this is defeating me).
I got it to work with one group:
Person.aggregate([
{
$match: {
user_id: id
}
},
{
$group: {
_id: '$gender',
total: { $sum: 1 }
}
}
])
.exec(function(err, result) {
etc...
From that, it'll give me how many men, how many women in a nice json output. But if I add a second group, it seems to skip the first and throw hissy fits about the second:
Person.aggregate([
{
$match: {
user_id: id
}
},
{
$group: {
_id: '$gender',
total: { $sum: 1 }
},
$group: {
_id: '$age',
age: { $gte: 21 },
age: { $lte: 30 },
total: { $sum: 1 }
}
}
])
.exec(function(err, result) {
etc...
It doesn't like the $gte or $lte. If I switch it to $project, then it'll do the gte/lte but throws fits about $sum or $count. On top of that, I can't find any examples anywhere of how to construct a multi-request return. It's all just "here's this one thing," but I don't want to make 12+ calls just to get all the Person age-groups. I was hoping for output that looks something like this:
[
{"_id":"male","total":49},
{"_id":"woman","total":42},
{"_id":"age0_10", "total": 1},
{"_id":"age11_20", "total": 5},
{"_id":"age21_30", "total": 15}
]
(I have no idea how to make the _id for age be something other than the actual age, which doesn't make sense, b/c I don't want an id of 1517191919 or whatever, I want a reliable name so I know where to output it in my template. So I do know that _id: "$age" won't give me what I want, but I don't know how to get what I want, either.)
The only time I've seen more than one thing, it was a $match, a $group, and a $project. But if $project means I can't use $sum or $count, can I do multiple $groups, and if I can, what's the trick to it?

As for the case of producing the results in different age groupings, the $cond operator of the aggregation framework can help here. As a ternary operator, it takes a logical result ( if condition ) and can return a value where true ( then ) or otherwise where false ( else ). In the case of varying age groups you would "nest" the calls in the else condition to meet each range until logically exhausted.
The overall case is not really practical to do in a single pass with both results for "gender" and "age" in groupings. Whilst it "could" be done, the only method is basically accumulating all data in arrays and working that out again for subsuquent groupings. Not a great idea, as it almost always would break the practical BSON limit of 16MB when attempting to keep the data. So a better approach is generally required.
As such, where the API supports ( you are under nodejs, so it does ), then it is usually best to run each query separately and combine the results. The node async library has just such features:
async.concat(
[
// Gender aggregator
[
{ "$group": {
"_id": "$gender",
"total": { "$sum": 1 }
}}
],
// Age aggregator
[
{ "$group": {
"_id": {
"$cond": {
"if": { "$lte": [ "$age", 10 ] },
"then": "age_0_10",
"else": {
"$cond": {
"if": { "$lte": [ "$age", 20 ] },
"then": "age_11_20",
"else": {
"$cond": {
"if": { "$lte": [ "$age", 30 ] },
"then": "age_21_30",
"else": "age_over_30"
}
}
}
}
}
},
"total": { "$sum": 1 }
}}
]
],
function(pipeline,callback) {
Person.aggregate(pipeline,callback);
},
function(err,results) {
if (err) throw err;
console.log(results);
}
);
The default execution of async.concat here will kick off the tasks to run in parallel, so both can be running on the server at the same time. Each pipeline in the input array will be passed to the aggregate method, which is going to then return the results and combine the output arrays in the final result.
The end result is not only do you have the results nicely keyed to age groups, but the two result sets appear to be in the same combined response, with no other work required to merge the content.
This is not only convenient, but the parallel execution makes this much more time efficient and far less taxing ( if not beating the impossible ) on the aggregation method being used to return the results.

Related

get result value as 0 if no data found using nodejs and mongodb

I have data collection which has different students with diff marks,I want to count how many students got certain numbers of 100,90,80,70,60 etc marks and if no student has that specific mark, I want that resulting output as 0
Eg:
,
{ $group: { "_id": studentMarks, "studentMarksTotal": { $sum: 1 } } },
{ $sort: { _id: 1 } },
{
$project: {
_id: 0,
studentMarksTotal: 1,
studentMarks:{
$cond: { if: { $eq: ["$_id", "$studentMarks"] }, then: "$_id", else: "0" },
,
},
}}
]
this gives the result for values that have count
Eg: studentMarks:70,
studentMarksTotal:9
studentMarks:90,
studentMarksTotal:6
but I also want count data if no student got certain marks
eg:studentMarks:70,
studentMarksTotal:9
studentMarks:90,
studentMarksTotal:6
studentMarks:100,
studentMarksTotal:0
How can I achieve not, I have tried $ifNull and if else also but am unable to get the desired result
If anyone could help me with this, Thanks in advance.

mongodb - find every last document from a property-type

I have a document-scheme like this:
{
"pair":"BTCUSDT",
"ask":{
"amount":33107101.800000004,
"total":507,
"high":72000,
"low":65132
},
"bid":{
"amount":32368164.399999995,
"total":498,
"high":65131.99,
"low":60200.2
},
"updateStamp":1636632371639
}
now my DB there are documents with different values in pair and also documents with the same value. Some of them have a updateStamp that is, lets say a few seconds old, and some have a updateStamp that is a few minutes old or older.
(I wrote simpler values in updateStamp for simplicity)
{
"pair":"BTCUSDT",
"ask": ...,
"updateStamp": 100
},
{
"pair":"BTCUSDT",
"ask": ...,
"updateStamp": 200
},
{
"pair":"ETHUDST",
"ask": ...,
"updateStamp": 500
},
{
"pair":"ETHUDST",
"ask": ...,
"updateStamp": 200
},
{
"pair":"DOGEUSDT",
"ask": ...,
"updateStamp": 600
},
Now I want to compare every latest document of a pair and find the 10 documents, for the pairs with the largest ask.total-value. Simple saif, like a Top-10 from the latest of every pair.
But I don't get it how do manage this? I have been fiddeling around with aggregation and multiple finds for a while now. Maybe someone knows how to solve this?

You can first $group by pair. Then use the result to perform sub-pipeline $lookup to fetch the "last 10" documents.
In the sub-pipeline:
$match with pair; let is used to assign the value in grouped pair into variable p; which is later refered to as $$p. The $match means the variable $$p is equals to pair in the $lookup, which is equal to we only getting record related to that specific pair
$sort by updateStamp
$limit by 10
db.collection.aggregate([
{
$group: {
_id: "$pair"
}
},
{
"$lookup": {
"from": "collection",
let: {
p: "$_id"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$pair",
"$$p"
]
}
}
},
{
$sort: {
updateStamp: -1
}
},
{
$limit: 10
}
],
"as": "output array field"
}
}
])
Here is the Mongo playground for your reference.

need help writing aggregated query with grouping multiple fields

I am new to using mongodb and mongoose for my backend stack and Im having a hard time getting from SQL to NoSQL when it comes to query building.
I have an array of object that looks like this:
{
timestamp: "12313113",
symbol: "XY",
amount: 121212
value: 24324234
}
I want to query the collection to get the following output grouped by symbol:
{
symbol: xy,
occurences: 1231
summedAmount: 2131231
summedValue: 23131313
}
Could anyone tell me how to do it using aggregate on the Model? My timestamp filtering works already, but the grouping throws errors
let result = await TransactionEvent.aggregate([
{
$match : {
timestamp : { $gte: new Date(Date.now() - INTERVALS[timeframe]) }
}
},
{
$group : {
what to do in here
}
]);
Lets say I have another field in my object with a key of "direction" that can either be "IN" our "OUT". How could I also group the occurences of these values?
Expected output
{
symbol: xy,
occurences: 1231
summedAmount: 2131231
summedValue: 23131313
in: occurrences where direction property is "IN"
out: occurences where direction property is "OUT"
}

In MongoDB's $group stage, the _id key is mandatory and
it should be the keys which you want to be merged (It's symbol in your case).
Make sure that you pre-fix it with a `$ sign since you are referencing a key in your document.
Following the _id key, you can add all the additional operations to be performed for the required keys. In your specific use case, use $sum to add values to the user-defined key.
Note: Use "$sum": 1 to add 1 for each occurences ans "$sum": "$<Key-Name>" to add existing key's value.
Below code should be your $group stage
{
"$group": {
"_id": "$symbol", // Group by key (Use Sub-Object to group by multiple keys
"occurences": {"$sum": 1}, // Add `1` for each occurences
"summedAmount": {"$sum": "$amount"}, // Add `amount` values of grouped data
"summedValue": {"$sum": "$value"}, // Add `value` values of grouped data
}
}
Comment if you have any additional doubts.

You use $group and $sum
db.collection.aggregate([
{
"$group": {
"_id": "$symbol",
"summbedAmount": {
"$sum": "$amount"
},
"summbedValue": {
"$sum": "$value"
},
"occurences": {
$sum: 1
}
}
}
])
Working Mongo playground
Update 1
you can use $cond to check condition.
First parameter what is the condition
Second parameter - what we need to do if the condition is true (We need to increase by 1 if condition true)
Third parameter - what we need to do if the condition is false (No need to increase anything)
Here is the code
db.collection.aggregate([
{
"$group": {
"_id": "$symbol",
"summbedAmount": { "$sum": "$amount" },
"summbedValue": { "$sum": "$value" },
"occurences": { $sum: 1 },
in: {
$sum: {
$cond: [ { $eq: [ "$direction", "in" ] }, 1, 0 ]
}
},
out: {
$sum: {
$cond: [ { $eq: [ "$direction", "out" ] }, 1, 0 ] }
}
}
}
])
Working Mongo playground

MongoDB aggregation $group stage by already created values / variable from outside

Imaging I have an array of objects, available before the aggregate query:
const groupBy = [
{
realm: 1,
latest_timestamp: 1318874398, //Date.now() values, usually different to each other
item_id: 1234, //always the same
},
{
realm: 2,
latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
item_id: 1234,
},
{
realm: ..., //there are many of them
latest_timestamp: ...,
item_id: 1234,
},
{
realm: 10,
latest_timestamp: 1318874398, //but sometimes then can be the same
item_id: 1234,
},
]
And collection (example set available on MongoPlayground) with the following schema:
{
realm: Number,
timestamp: Number,
item_id: Number,
field: Number, //any other useless fields in this case
}
My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?
What have been tried already.
Okay, let skip crap ideas, like:
for (const element of groupBy) {
//array of `find` queries
}
My current working aggregation query is something like that:
//first stage
{
$match: {
"item": 1234
"realm" [1,2,3,4...,10]
}
},
{
$group: {
_id: {
realm: '$realm',
},
latest_timestamp: {
$max: '$timestamp',
},
data: {
$push: '$$ROOT',
},
},
},
{
$unwind: '$data',
},
{
$addFields: {
'data.latest_timestamp': {
$cond: {
if: {
$eq: ['$data.timestamp', '$latest_timestamp'],
},
then: '$latest_timestamp',
else: '$$REMOVE',
},
},
},
},
{
$replaceRoot: {
newRoot: '$data',
},
},
//At last, after this stages I can do useful job
but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?
As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.
Recent update:
I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.
For what it's worth:
After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap
After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:
[
{
x: x (number, actual $price),
y: y (number, actual $realm),
value: price * quantity,
quantity: sum_of_quantity_on_price_level
}
]
As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.

So, I have found an answer to my question in another, but relevant way.
I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.
//building $facet query before aggregation
const ObjectQuery = {}
for (const realm of realms) {
Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
}
//mongoose query here
aggregation([{
$facet: ObjectQuery
},
...
])
So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.
Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.

Mongoose aggregation "$sum" of rows in sub document

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,
With this in mind, I have a job model with schema like below :
{
name: {
type: String,
required: true
},
info: String,
active: {
type: Boolean,
default: true
},
all_service: [
price: {
type: Number,
min: 0,
required: true
},
all_sub_item: [{
name: String,
price:{ // << -- this is the price I want to calculate
type: Number,
min: 0
},
owner: {
user_id: { // <<-- here is the filter I want to put
type: Schema.Types.ObjectId,
required: true
},
name: String,
...
}
}]
],
date_create: {
type: Date,
default : Date.now
},
date_update: {
type: Date,
default : Date.now
}
}
I would like to have a sum of price column, where owner is present, I tried below but no luck
Job.aggregate(
[
{
$group: {
_id: {}, // not sure what to put here
amount: { $sum: '$all_service.all_sub_item.price' }
},
$match: {'not sure how to limit the user': given_user_id}
}
],
//{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
function(err, summary) {
console.log(err);
console.log(summary);
}
);
Could someone guide me in the right direction. thank you in advance

Primer
As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.
The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.
Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.
You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.
However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.
So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:
{
"a": 1,
"b": [
{
"c": 1,
"d": [
{ "e": 1 }, { "e": 2 }
]
},
{
"c": 2,
"d": [
{ "e": 1 }, { "e": 2 }
]
}
]
}
Becomes this:
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }
And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:
db.collection.aggregate([
{ "$unwind": "$b" },
{ "$unwind": "$b.d" }
])
So there the "pipe" first array from "$b" like so:
{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.
Solving
With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.
So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.
Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.
So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Match again to filter the array elements
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$ifNull": [ "$owner", given_user_id ] },
given_user_id
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$project": {
"all_items": {
"$setDifference": [
{ "$map": {
"input": "$all_service",
"as": "A",
"in": {
"$setDifference": [
{ "$map": {
"input": "$$A.all_sub_item",
"as": "B",
"in": {
"$cond": {
"if": { "$eq": [ "$$B.owner", given_user_id ] },
"then": "$$B",
"else": false
}
}
}},
false
]
}
}},
[[]]
]
}
}},
// De-normalize the "two" level array. "Double" $unwind
{ "$unwind": "$all_items" },
{ "$unwind": "$all_items" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_items.price" }
}}
],
function(err,results) {
}
)
That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.
A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.
The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:
[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]
As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.
Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.
So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.

It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL".
On that assumption, you'll want to do your $match first, to reduce the input set to your sum. So your first stage should be something like
$match: { all_service.all_sub_items.owner : { $exists: true } }
Think of this as then passing all matching documents to your second stage.
Now, because you are summing an array, you have to do another step. Aggregation operators work on documents - there isn't really a way to sum an array. So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. Think of this as a cross join. This will be $unwind.
$unwind: { "$all_service.all_sub_items" }
Now you've just made a much larger number of documents, but in a form where we can sum them. Now we can perform the $group. In your $group, you specify a transformation. The line:
_id: {}, // not sure what to put here
is creating a field in the output document, which is not the same documents as the input documents. So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. But this will allow $sum to work.
I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct.
$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }
This will output documents with an _id field equivalent to your original document ID, and your sum.
I'll let you put it together, I'm not super familiar with node. You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. Good luck!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string