Mongodb aggregation distinct, average and sum with group by between two collection - node.js

I have user collection having data like this
{
"_id" : ObjectId("5da594c15324fec81d000027"),
"password" : "******",
"activation" : "Active",
"userType" : "Author",
"email" : "something#gmail.com",
"name" : "Something",
"profilePicture" : "profile_pictures/5da594c15324fec81d0000271607094354423image.png",
"__v" : 0
}
and On the other hand userlogs has data like this
{
"_id" : ObjectId("5fcb7bb4485c34a41900002b"),
"duration" : 2.54,
"page" : 1,
"activityDetails" : "Viewed Page for seconds",
"contentType" : "article",
"activityType" : "articlePageStayTime",
"label" : 3,
"bookId" : ObjectId("5f93e2cc74153f8c1800003f"),
"ipAddress" : "::1",
"creator" : ObjectId("5da594c15324fec81d000027"),
"created" : ISODate("2020-12-05T12:23:16.867Z"),
"__v" : 0
}
What I am trying to do is equivalent to this sql query
SELECT name,count(page),sum(duration),avg(DISTINCT(label)),COUNT(DISTINCT(bookId)) FROM users JOIN userlogs ON users._id=userlogs.creator where userlogs.activityType<>"articleListeningTime" group by users._id.
I can do normal group by and sum together.But How to do avg distinct and count distinct with this? I am using mongodb version 3.2

I don't think this require $group stage, you can use $setUnion and $size, $avg operators,
$lookup with userlogs collection
$project to show required fields, and filter userlogs as per your condition
$project to get statistics from userlogs
get total logs count using $size
get total duration sum using $sum
get average of unique label using $setUnion and $avg
get count of unique bookId using $serUnion and $size
db.users.aggregate([
{
$lookup: {
from: "userlogs",
localField: "_id",
foreignField: "creator",
as: "userlogs"
}
},
{
$project: {
name: 1,
userlogs: {
$filter: {
input: "$userlogs",
as: "u",
cond: { $ne: ["$$u.activityType", "articleListeningTime"] }
}
}
}
},
{
$project: {
name: 1,
totalCount: { $size: "$userlogs" },
durationSum: { $sum: "$userlogs.duration" },
labelAvg: { $avg: { $setUnion: "$userlogs.label" } },
bookIdCount: { $size: { $setUnion: "$userlogs.bookId" } }
}
}
])
Playground

Related

MongoDB query to get the sum of all document's array field length

Below is the sample document of a collection, say "CollectionA"
{
"_id" : ObjectId("5ec3f19225701c4f7ab11a5f"),
"workshop" : ObjectId("5ebd37a3d33055331eb4730f"),
"participant" : ObjectId("5ebd382dd33055331eb47310"),
"status" : "analyzed",
"createdBy" : ObjectId("5eb7aa24d33055331eb4728c"),
"updatedBy" : ObjectId("5eb7aa24d33055331eb4728c"),
"results" : [
{
"analyze_by" : {
"user_name" : "m",
"user_id" : "5eb7aa24d33055331eb4728c"
},
"category_list" : [
"Communication",
"Controlling",
"Leading",
"Organizing",
"Planning",
"Staffing"
],
"analyzed_date" : ISODate("2020-05-19T14:48:49.993Z"),
}
],
"summary" : [],
"isDeleted" : false,
"isActive" : true,
"updatedDate" : ISODate("2020-05-19T14:48:50.827Z"),
"createdDate" : ISODate("2020-05-19T14:47:46.374Z"),
"__v" : 0
}
I need to query all the documents to get the "results" array length and return a sum of all document's "results" length.
For example,
document 1 has "results" length - 5
document 2 has "results" length - 6
then output should be 11.
Can we write a query, instead of getting all, iterating and the adding the results length??
If I had understand clearly you would like to project the length of the result attribute.
So you should check the $size operator would work for you.
https://docs.mongodb.com/manual/reference/operator/aggregation/size/
You can use $group and $sum to calculate the total size of a field which contains the size of your results array. To create the field, You can use $size in $addFields to calculate the size of results in each document and put it the field. As below:
db.getCollection('your_collection').aggregate([
{
$addFields: {
result_length: { $size: "$results"}
}
},
{
$group: {
_id: '',
total_result_length: { $sum: '$result_length' }
}
}
])
You use an aggregation grouping query with $sum and $size aggregation operators to get the total sum of array elements size for all documents in the collection.
db.collection.aggregate( [
{
$group: {
_id: null,
total_count: { $sum: { $size: "$results" } }
}
}
] )
Aggregation using Mongoose's Model.aggregate():
SomeModel.aggregate([
{
$group: {
_id: null,
total_count: { $sum: { $size: "$results" } }
}
}
]).
then(function (result) {
console.log(result);
});

Mongodb aggregate $group stage takes a long time

I'm practicing how to use MongoDB aggregation, but they seem to take a really long time (running time).
The problem seems to happen whenever I use $group. All other queries run just fine.
I have some 1.3 million dummy documents that need to perform two basic operations: get a count of the IP addresses and unique IP addresses.
My schema looks something like this:
{
"_id":"5da51af103eb566faee6b8b4",
"ip_address":"...",
"country":"CL",
"browser":{
"user_agent":...",
}
}
Running a basic $group query takes about 12s on average, which is much too slow.
I did a little research, and someone suggested creating an index on ip_addresses. That seems to have slowed it down because queries now take 13-15s.
I use MongoDB and the query I'm running looks like this:
visitorsModel.aggregate([
{
'$group': {
'_id': '$ip_address',
'count': {
'$sum': 1
}
}
}
]).allowDiskUse(true)
.exec(function (err, docs) {
if (err) throw err;
return res.send({
uniqueCount: docs.length
})
})
Any help is appreciated.
Edit: I forgot to mention, someone suggested it might be a hardware issue? I'm running the query on a core i5, 8GB RAM laptop if it helps.
Edit 2: The query plan:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"ip_address" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "metrics.visitors",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1387324,
"executionTimeMillis" : 7671,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1387324,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 1387324,
"executionTimeMillisEstimate" : 9,
"works" : 1387326,
"advanced" : 1387324,
"needTime" : 1,
"needYield" : 0,
"saveState" : 10930,
"restoreState" : 10930,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1387324
}
}
}
},
{
"$group" : {
"_id" : "$ip_address",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
This is some info about using $group aggregation stage, if it uses indexes, and its limitations and what can be tried to overcome these.
1. The $group Stage Doesn't Use Index:
Mongodb Aggregation: Does $group use index?
2. $group Operator and Memory:
The $group stage has a limit of 100 megabytes of RAM. By default, if
the stage exceeds this limit, $group returns an error. To allow for
the handling of large datasets, set the allowDiskUse option to true.
This flag enables $group operations to write to temporary files.
See MongoDb docs on $group Operator and Memory
3. An Example Using $group and Count:
A collection called as cities:
{ "_id" : 1, "city" : "Bangalore", "country" : "India" }
{ "_id" : 2, "city" : "New York", "country" : "United States" }
{ "_id" : 3, "city" : "Canberra", "country" : "Australia" }
{ "_id" : 4, "city" : "Hyderabad", "country" : "India" }
{ "_id" : 5, "city" : "Chicago", "country" : "United States" }
{ "_id" : 6, "city" : "Amritsar", "country" : "India" }
{ "_id" : 7, "city" : "Ankara", "country" : "Turkey" }
{ "_id" : 8, "city" : "Sydney", "country" : "Australia" }
{ "_id" : 9, "city" : "Srinagar", "country" : "India" }
{ "_id" : 10, "city" : "San Francisco", "country" : "United States" }
Query the collection to count the cities by each country:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
] )
The Result:
{ "cityCount" : 3, "country" : "United States" }
{ "cityCount" : 1, "country" : "Turkey" }
{ "cityCount" : 2, "country" : "Australia" }
{ "cityCount" : 4, "country" : "India" }
4. Using allowDiskUse Option:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
], { allowDiskUse : true } )
Note, in this case it makes no difference in query performance or output. This is to show the usage only.
5. Some Options to Try (suggestions):
You can try a few things to get some result (for trial purposes only):
Use $limit stage and restrict the number of documents processed and
see what is the result. For example, you can try { $limit: 1000 }.
Note this stage needs to come before the $group stage.
You can also use the $match, $project stages before the $group
stage to control the shape and size of the input. This may
return a result (instead of an error).
[EDIT ADD]
Notes on Distinct and Count:
Using the same cities collection - to get unique countries and a count of them you can try using the aggregate stage $count along with $group as in the following two queries.
Distinct:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } }
] )
The Result:
{ "country" : "United States" }
{ "country" : "Turkey" }
{ "country" : "India" }
{ "country" : "Australia" }
To get the above result as a single document with an array of unique values, use the $addToSetoperator:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: null, uniqueCountries: { $addToSet: "$country" } } },
{ $project: { _id: 0 } },
] )
The Result: { "uniqueCountries" : [ "United States", "Turkey", "India", "Australia" ] }
Count:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } },
{ $count: "uniqueCountryCount" }
] )
The Result: { "uniqueCountryCount" : 4 }
In the above queries the $match stage is used to filter any documents with non-existing or null countryfield. The $project stage reshapes the result document(s).
MongoDB Query Language:
Note the two queries get similar results when using the MongoDB query language commands: db.collection.distinct("country") and db.cities.distinct("country").length (note the distinct returns an array).
You can create index
db.collectionname.createIndex( { ip_address: "text" } )
Try this, it is more faster.
I think it will help you.

How to use $subtract with array of document in mongoDB

We have two collections in which have users and organizations.
db.users.aggregate([
{$match: {"organization.metaInfo":{"$ne": "disabled"}}},
{"$unwind":"$organization"},
{"$group":{"_id":"$organization.organizationId", "count":{"$sum":1}}},
{$lookup: {from: "organisations",
localField: "_id",
foreignField: "_id", as: "orgDta"}},
{"$project":{"_id":1, "count":1,
"orgDta.organizationLicence.userCount":1
}}
])
When this query is performed return a result like which is good to me.
{
"_id" : "768d3090-d4f5-11e7-a503-9b68b90cdb4e",
"count" : 5.0,
"orgDta" : [{
"organizationLicence" : {
"userCount" : 50
}
}]
},
{
"_id" : "d9933740-c29c-11e7-9481-b52c5f3e2e70",
"count" : 1.0,
"orgDta" : [{
"organizationLicence" : {
"userCount" : 1
}
}]
},
{
"_id" : "5386ebc0-c29b-11e7-9481-b52c5f3e2e70",
"count" : 1.0,
"orgDta" : [{
"organizationLicence" : {
"userCount" : 1
}
}]
}
Now, I want to perform a subtract operation in between count and userCount.But I don't know that how to use here.
I was trying together with $project
{"$project":{"_id":1, "count":1, "orgDta.dObjects":1, "orgDta.organizationLicence.userCount":1,
"remainingUser": { $subtract: [ "$orgDta.organizationLicence.userCount", "$count"]}
But Mongo returns error
{
"message" : "cant $subtract adouble from a array",
"stack" : "MongoError: cant $subtract adouble from a array" }
Use $group instead wih $unwind (before) like this,
Aggregate pipeline
db.users.aggregate([
{
$unwind: '$orgDta'
}, {
$group: {
_id: '$_id',
remainingUser: {
$push: {
$subtract: ['$orgDta.organizationLicence.userCount', '$count']
}
}
}
}
])
What we are doing here is unwind the target array, subtract all the elements (in your case, the element child's element value) and then group the array back with result (of substraction) value.
Add other items you might want in your final result document, above is just a sample MongoDB aggregate query.

Searching value in 2 different fields mongodb + node.js

I am newbie. But I try to learn the most logical ways to write the queries.
Assume I have collection which is as;
{
"id" : NumberInt(1),
"school" : [
{
"name" : "george",
"code" : "01"
},
{
"name" : "michelangelo",
"code" : "01"
}
],
"enrolledStudents" : [
{
"userName" : "elisabeth",
"code" : NumberInt(21)
}
]
}
{
"id" : NumberInt(2),
"school" : [
{
"name" : "leonarda da vinci",
"code" : "01"
}
],
"enrolledStudents" : [
{
"userName" : "michelangelo",
"code" : NumberInt(25)
}
]
}
I want to list occurence of a key with their corresponding code values.
As an example key : michelangelo
To find the occurence of the key, I wrote two differen aggregation queries as;
db.test.aggregate([
{$unwind: "$school"},
{$match : {"school.name" : "michelangelo"}},
{$project: {_id: "$id", "key" : "$school.name", "code" : "$school.code"}}
])
and
db.test.aggregate([
{$unwind: "$enrolledStudents"},
{$match : {"enrolledStudents.userName" : "michelangelo"}},
{$project: {_id: "$id", "key" : "$enrolledStudents.userName", "code" : "$enrolledStudents.code"}}
])
the result of these 2 queries return what I want as;
{ "_id" : 1, "key" : "michelangelo", "code" : "01" }
{ "_id" : 2, "key" : "michelangelo", "code" : 25 }
One of them to search in enrolledStudents, the other one is searching in school field.
Can these 2 queries reduced into more logical query? Or is this the only way to do it?
ps: I am aware that database structure is not logical, but I tried to simulate.
edit
I try to write a query with find.
db.test.find({$or: [{"enrolledStudents.userName" : "michelangelo"} , {"school.name" : "michelangelo"}]}).pretty()
but this returns the whole documents as;
{
"id" : 1,
"school" : [
{
"name" : "george",
"code" : "01"
},
{
"name" : "michelangelo",
"code" : "01"
}
],
"enrolledStudents" : [
{
"userName" : "elisabeth",
"code" : 21
}
]
}
{
"id" : 2,
"school" : [
{
"name" : "leonarda da vinci",
"code" : "01"
}
],
"enrolledStudents" : [
{
"userName" : "michelangelo",
"code" : 25
}
]
}
Mongo 3.4
$match - This stage will keep all the school array and enrolledStudents where there is atleast one embedded document matching both the query condition
$group - This stage will combine all the school and enrolledStudents array to 2d array for each _id in a group.
$project - This stage will $filter the merge array for matching query condition and $map the array to with new labels values array.
$unwind - This stage will flatten the array.
$addFields & $replaceRoot - This stages will add the id field and promote the values array to the top.
db.collection.aggregate([
{$match : {$or: [{"enrolledStudents.userName" : "michelangelo"} , {"school.name" : "michelangelo"}]}},
{$group: {_id: "$id", merge : {$push:{$setUnion:["$school", "$enrolledStudents"]}}}},
{$project: {
values: {
$map:
{
input: {
$filter: {
input: {"$arrayElemAt":["$merge",0]},
as: "onef",
cond: {
$or: [{
$eq: ["$$onef.userName", "michelangelo"]
}, {
$eq: ["$$onef.name", "michelangelo"]
}]
}
}
},
as: "onem",
in: {
key : { $ifNull: [ "$$onem.userName", "$$onem.name" ] },
code : "$$onem.code"}
}
}
}
},
{$unwind: "$values"},
{$addFields:{"values.id":"$_id"}},
{$replaceRoot: { newRoot:"$values"}}
])
Sample Response
{ "_id" : 2, "key" : "michelangelo", "code" : 25 }
{ "_id" : 1, "key" : "michelangelo", "code" : "01" }
Mongo <= 3.2
Replace last two stages of above aggregation with $project to format the response.
{$project: {"_id": 0 , id:"$_id", key:"$values.key", code:"$values.code"}}
Sample Response
{ "_id" : 2, "key" : "michelangelo", "code" : 25 }
{ "_id" : 1, "key" : "michelangelo", "code" : "01" }
You can use $redact instead of $group & match and add $project with $map to format the response.
$redact to go through a document level at a time and perform $$DESCEND and $$PRUNE on the matching criteria.
The only thing to note is usage of $ifNull in the first document level for id so that you can $$DESCEND to embedded document level for further processing.
db.collection.aggregate([
{
$redact: {
$cond: [{
$or: [{
$eq: ["$userName", "michelangelo"]
}, {
$eq: ["$name", "michelangelo"]
}, {
$ifNull: ["$id", false]
}]
}, "$$DESCEND", "$$PRUNE"]
}
},
{
$project: {
id:1,
values: {
$map:
{
input: {$setUnion:["$school", "$enrolledStudents"]},
as: "onem",
in: {
key : { $ifNull: [ "$$onem.userName", "$$onem.name" ] },
code : "$$onem.code"}
}
}
}
},
{$unwind: "$values"},
{$project: {_id:0,id:"$id", key:"$values.key", code:"$values.code"}}
])

How to use Mongoose sum operation?

i have simple schema like this
{
"productName": "pppppp"
"sku" : {
"carted" : [
{
"_id" : ObjectId("56c6d606c0987668109a21f7"),
"timestamp" : ISODate("2016-02-19T08:44:54.043+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d653172fb54817ec2356"),
"timestamp" : ISODate("2016-02-19T08:46:11.902+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d6a7172fb54817ec2358"),
"timestamp" : ISODate("2016-02-19T08:47:35.652+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
}
],
"qty" : NumberInt(14)
}
}
how the way to view the product "pppppp" and show the quantity to 20? the sku.quantity added with all available sku.carted.qty.
i want it looks like this
{
"productName": "pppppp"
"qty" : 20
}
Please try this one with $group, $sum and $add
> db.collection.aggregate([
{$unwind: '$sku.carted'},
// sum the `qty` in the carted array, put this result to `qt`
{$group: {
_id: {productName: '$productName', q: '$sku.qty'},
qt: {$sum: '$sku.carted.qty'}
}},
// add the `qt` and `sku.qty`
// and reshape the output result.
{$project: {
_id: 0,
productName: '$_id.productName',
qty: {$add: ['$_id.q', '$qt']}
}}
]);

Resources