If I want to perform a $group and $sum on a mongodb collection from my node server (using mongoosoe), is it possible to return 0 for the non existing groups?
the collection has the following fields: ssn, name, gender, city.
model.aggregate([
{
$group : { _id : { city:"$city", gender:"$gender"}, count{ $sum:1 }}
}], function (err,result) {
if(err) {
//err
}
else{
//response
}
});
if there are people of both genders in the city - the query will return:
{
"_id" : {
"city" : "NY",
"gender" : "male"
},
"count" : 11
},
{
"_id" : {
"city" : "NY",
"gender" : "female"
},
"count" : 31
}
but if people of one gender are not present in a city - no value will be returned. for example no males in LA:
{
"_id" : {
"city" : "LA",
"gender" : "female"
},
"count" : 53
}
is it possible to make the query return the following result for given scenario without having a collection with cities and population quantities?
{
"_id" : {
"city" : "LA",
"gender" : "male"
},
"count" : 0
},
{
"_id" : {
"city" : "LA",
"gender" : "female"
},
"count" : 53
}
thanks,
If the possible values are finite and known as in your example, you could use $cond to combine the counts for male and female into one document per city like this:
[
{
$group : {
_id: {
city:"$city"
},
males:{
$sum: {
$cond: {if: {$eq:["$gender", "male"]}, then: 1, else: 0}
}
},
females:{
$sum: {
$cond: {if: {$eq:["$gender", "female"]}, then: 1, else: 0}
}
}
}
}
]
Related
I'm practicing how to use MongoDB aggregation, but they seem to take a really long time (running time).
The problem seems to happen whenever I use $group. All other queries run just fine.
I have some 1.3 million dummy documents that need to perform two basic operations: get a count of the IP addresses and unique IP addresses.
My schema looks something like this:
{
"_id":"5da51af103eb566faee6b8b4",
"ip_address":"...",
"country":"CL",
"browser":{
"user_agent":...",
}
}
Running a basic $group query takes about 12s on average, which is much too slow.
I did a little research, and someone suggested creating an index on ip_addresses. That seems to have slowed it down because queries now take 13-15s.
I use MongoDB and the query I'm running looks like this:
visitorsModel.aggregate([
{
'$group': {
'_id': '$ip_address',
'count': {
'$sum': 1
}
}
}
]).allowDiskUse(true)
.exec(function (err, docs) {
if (err) throw err;
return res.send({
uniqueCount: docs.length
})
})
Any help is appreciated.
Edit: I forgot to mention, someone suggested it might be a hardware issue? I'm running the query on a core i5, 8GB RAM laptop if it helps.
Edit 2: The query plan:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"ip_address" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "metrics.visitors",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1387324,
"executionTimeMillis" : 7671,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1387324,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 1387324,
"executionTimeMillisEstimate" : 9,
"works" : 1387326,
"advanced" : 1387324,
"needTime" : 1,
"needYield" : 0,
"saveState" : 10930,
"restoreState" : 10930,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1387324
}
}
}
},
{
"$group" : {
"_id" : "$ip_address",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
This is some info about using $group aggregation stage, if it uses indexes, and its limitations and what can be tried to overcome these.
1. The $group Stage Doesn't Use Index:
Mongodb Aggregation: Does $group use index?
2. $group Operator and Memory:
The $group stage has a limit of 100 megabytes of RAM. By default, if
the stage exceeds this limit, $group returns an error. To allow for
the handling of large datasets, set the allowDiskUse option to true.
This flag enables $group operations to write to temporary files.
See MongoDb docs on $group Operator and Memory
3. An Example Using $group and Count:
A collection called as cities:
{ "_id" : 1, "city" : "Bangalore", "country" : "India" }
{ "_id" : 2, "city" : "New York", "country" : "United States" }
{ "_id" : 3, "city" : "Canberra", "country" : "Australia" }
{ "_id" : 4, "city" : "Hyderabad", "country" : "India" }
{ "_id" : 5, "city" : "Chicago", "country" : "United States" }
{ "_id" : 6, "city" : "Amritsar", "country" : "India" }
{ "_id" : 7, "city" : "Ankara", "country" : "Turkey" }
{ "_id" : 8, "city" : "Sydney", "country" : "Australia" }
{ "_id" : 9, "city" : "Srinagar", "country" : "India" }
{ "_id" : 10, "city" : "San Francisco", "country" : "United States" }
Query the collection to count the cities by each country:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
] )
The Result:
{ "cityCount" : 3, "country" : "United States" }
{ "cityCount" : 1, "country" : "Turkey" }
{ "cityCount" : 2, "country" : "Australia" }
{ "cityCount" : 4, "country" : "India" }
4. Using allowDiskUse Option:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
], { allowDiskUse : true } )
Note, in this case it makes no difference in query performance or output. This is to show the usage only.
5. Some Options to Try (suggestions):
You can try a few things to get some result (for trial purposes only):
Use $limit stage and restrict the number of documents processed and
see what is the result. For example, you can try { $limit: 1000 }.
Note this stage needs to come before the $group stage.
You can also use the $match, $project stages before the $group
stage to control the shape and size of the input. This may
return a result (instead of an error).
[EDIT ADD]
Notes on Distinct and Count:
Using the same cities collection - to get unique countries and a count of them you can try using the aggregate stage $count along with $group as in the following two queries.
Distinct:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } }
] )
The Result:
{ "country" : "United States" }
{ "country" : "Turkey" }
{ "country" : "India" }
{ "country" : "Australia" }
To get the above result as a single document with an array of unique values, use the $addToSetoperator:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: null, uniqueCountries: { $addToSet: "$country" } } },
{ $project: { _id: 0 } },
] )
The Result: { "uniqueCountries" : [ "United States", "Turkey", "India", "Australia" ] }
Count:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } },
{ $count: "uniqueCountryCount" }
] )
The Result: { "uniqueCountryCount" : 4 }
In the above queries the $match stage is used to filter any documents with non-existing or null countryfield. The $project stage reshapes the result document(s).
MongoDB Query Language:
Note the two queries get similar results when using the MongoDB query language commands: db.collection.distinct("country") and db.cities.distinct("country").length (note the distinct returns an array).
You can create index
db.collectionname.createIndex( { ip_address: "text" } )
Try this, it is more faster.
I think it will help you.
I have nested data as below,
{
"_id" : ObjectId("5a30ee450889c5f0ebc21116"),
"academicyear" : "2017-18",
"fid" : "be02",
"fname" : "ABC",
"fdept" : "Comp",
"degree" : "BE",
"class" : "1",
"sem" : "8",
"dept" : "Comp",
"section" : "Theory",
"subname" : "BDA",
"fbValueList" : [
{
"_id" : ObjectId("5a30eecd3e3457056c93f7af"),
"score" : 20,
"rating" : "Fair"
},
{
"_id" : ObjectId("5a30eefd3e3457056c93f7b0"),
"score" : 10,
"rating" : "Fair"
},
{
"_id" : ObjectId("5a337e53341bf419040865c4"),
"score" : 88,
"rating" : "Excellent"
},
{
"_id" : ObjectId("5a337ee2341bf419040865c7"),
"score" : 75,
"rating" : "Very Good"
},
{
"_id" : ObjectId("5a3380b583dde50ddcea350e"),
"score" : 72,
"rating" : "Very Good"
}
]
},
{
"_id" : ObjectId("5a3764f1bc19b77dd9fd9a57"),
"academicyear" : "2017-18",
"fid" : "be02",
"fname" : "ABC",
"fdept" : "Comp",
"degree" : "BE",
"class" : "1",
"sem" : "5",
"dept" : "Comp",
"section" : "Theory",
"subname" : "BDA",
"fbValueList" : [
{
"_id" : ObjectId("5a3764f1bc19b77dd9fd9a59"),
"score" : 88,
"rating" : "Excellent"
},
{
"_id" : ObjectId("5a37667aee64bce1b14747d2"),
"score" : 74,
"rating" : "Good"
},
{
"_id" : ObjectId("5a3766b3ee64bce1b14747dc"),
"score" : 74,
"rating" : "Good"
}
]
}
We are trying to perform aggregation using this,
db.fbresults.aggregate([{$match:{academicyear:"2017-18",fdept:'Comp'}},{$group:{_id: {fname: "$fname", rating:"$fbValueList.rating"},count: {"$sum":1}}}])
and we get result like,
{ "_id" : { "fname" : "ABC", "rating" : [ "Fair","Fair","Excellent","Very Good", "Very Good", "Excellent", "Good", "Good" ] }, "count" : 2 }
but we are expecting result like,
{ "_id" : { "fname" : "ABC", "rating_group" : [
{
rating: "Excellent"
count: 2
},
{
rating: "Very Good"
count: 2
},
{
rating: "Good"
count: 2
},
{
rating: "Fair"
count: 2
},
] }, "count" : 2 }
We want to get individual faculty group by their name and inside that group by their rating response and count of rating.
We have already tried this one but we did not the result.
Mongodb Aggregate Nested Group
This should get you going:
db.collection.aggregate([{
$match: {
academicyear: "2017-18",
fdept:'Comp'
}
}, {
$unwind: "$fbValueList" // flatten the fbValueList array into multiple documents
}, {
$group: {
_id: {
fname: "$fname",
rating:"$fbValueList.rating"
},
count: {
"$sum": 1 // this will give us the count per combination of fname and fbValueList.rating
}
}
}, {
$group: {
_id: "$_id.fname", // we only want one bucket per fname
rating_group: {
$push: { // we push the exact structure you were asking for
rating: "$_id.rating",
count: "$count"
}
},
count: {
$avg: "$count" // this will be the average across all entries in the fname bucket
}
}
}])
This is a long aggregation pipeline, there may be some aggregations that are un-necessary, so please check and discard whichever are irrelevant.
NOTE: This will only work with Mongo 3.4+.
You need to use $unwind and then $group and $push ratings with their counts.
matchAcademicYear = {
$match: {
academicyear:"2017-18", fdept:'Comp'
}
}
groupByNameAndRating = {
$group: {
_id: {
fname: "$fname", rating:"$fbValueList.rating"
},
count: {
"$sum":1
}
}
}
unwindRating = {
$unwind: "$_id.rating"
}
addFullRating = {
$addFields: {
"_id.full_rating": "$count"
}
}
replaceIdRoot = {
$replaceRoot: {
newRoot: "$_id"
}
}
groupByRatingAndFname = {
$group: {
_id: {
"rating": "$rating",
"fname": "$fname"
},
count: {"$sum": 1},
full_rating: {"$first": "$full_rating"}
}
}
addFullRatingAndCount = {
$addFields: {
"_id.count": "$count",
"_id.full_rating": "$full_count"
}
}
groupByFname = {
$group: {
_id: "$fname",
rating_group: { $push: {rating: "$rating", count: "$count"}},
count: { $first: "$full_rating"}
}
}
db.fbresults.aggregate([
matchAcademicYear,
groupByNameAndRating,
unwindRating,
addFullRating,
unwindRating,
replaceIdRoot,
groupByRatingAndFname,
addFullRatingAndCount,
replaceIdRoot,
groupByFname
])
I have a collection called "products" which has an array of "bids" objects.
I want to find out the Maximum bid for each product, for this I am aggregating Products on $max with $bids.bidamount field. However this is only giving me the largest bid amount. How do I project all the bid fields for the max aggregation.
Here is a sample document
{
"_id" : ObjectId("58109a5138fe12215cfdc064"),
"product_id" : 2,
"item_name" : "Auction Item1",
"item_description" : "Test",
"seller_name" : "ak#gmail.com",
"item_price" : "20",
"item_quantity" : 7,
"sale_type" : "Auction",
"posted_at" : "2016:10:26 04:58:09",
"expires_at" : "2016:10:30 04:58:09",
"bids" : [
{
"bid_id" : 1,
"bidder" : "ak#gmail.com",
"bid_amount" : 300,
"bit_time" : "2016:10:26 22:36:29"
},
{
"bid_id" : 2,
"bidder" : "ak#gmail.com",
"bid_amount" : 100,
"bit_time" : "2016:10:26 22:37:29"
}
],
"orders" : [
{
"buyer" : "ak#gmail.com",
"quantity" : "2"
},
{
"buyer" : "ak#gmail.com",
"quantity" : "3"
}
]
}
Here is my mongo query:
db.products.aggregate([
{
$project: {
bidMax: { $max: "$bids.bid_amount"}
}
}
])
which gives the following result:
{
"_id" : ObjectId("58109a5138fe12215cfdc064"),
"bidMax" : 300
}
db.products.aggregate([{$unwind:"$bids"},{$group:{_id:"$_id", sum:{$sum:"$bids.bid_amount"}}},{$project:{doc:"$$ROOT", _id:1, sum:1}, {$sort:{"sum":-1}},{$limit:1}]),
which return something like { "_id" : ObjectId("5811b667c50fb1ec88227860"), "sum" : 600, doc:{your document....} }
This should do it:
db.products.aggregate([{
$unwind: '$bids'
}, {
$group: {
_id: '$products_id',
maxBid: {
$max: '$bids.bid_amount'
}
}
}])
db.collectionName.aggregate(
[
{
$group:
{
_id: "$product_id",
maxBidAmount: { $max: "$bids.bid_amount" }
}
}
]
)
Hey use this query, you will get the result.
i have simple schema like this
{
"productName": "pppppp"
"sku" : {
"carted" : [
{
"_id" : ObjectId("56c6d606c0987668109a21f7"),
"timestamp" : ISODate("2016-02-19T08:44:54.043+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d653172fb54817ec2356"),
"timestamp" : ISODate("2016-02-19T08:46:11.902+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d6a7172fb54817ec2358"),
"timestamp" : ISODate("2016-02-19T08:47:35.652+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
}
],
"qty" : NumberInt(14)
}
}
how the way to view the product "pppppp" and show the quantity to 20? the sku.quantity added with all available sku.carted.qty.
i want it looks like this
{
"productName": "pppppp"
"qty" : 20
}
Please try this one with $group, $sum and $add
> db.collection.aggregate([
{$unwind: '$sku.carted'},
// sum the `qty` in the carted array, put this result to `qt`
{$group: {
_id: {productName: '$productName', q: '$sku.qty'},
qt: {$sum: '$sku.carted.qty'}
}},
// add the `qt` and `sku.qty`
// and reshape the output result.
{$project: {
_id: 0,
productName: '$_id.productName',
qty: {$add: ['$_id.q', '$qt']}
}}
]);
Lets say i have a collection of books like this :
{author:"john", category:"action", title:"foobar200"},
{author:"peter", category:"scifi" , title:"42test"},
{author:"peter", category:"novel", title:"whatever_t"},
{author:"jane", category:"novel", title:"the return"},
{author:"john", category:"action", title:"extreme test"},
{author:"peter", category:"scifi", title:"such title"},
{author:"jane", category:"action", title:"super book "}
I want to do a query similar to :
SELECT author,category, count(*) FROM books GROUP BY category, author
==> result :
john -> action -> 2
john -> novel -> 0
john -> scifi -> 0
jane -> action -> 1
etc...
the closest i've been to the solution is this :
db.books.aggregate(
{
$match: {category:"action"}
},
{
$group: { _id: '$author', result: { $sum: 1 } }
}
);
==> result
{ "_id" : "jane", "result" : 1 }
{ "_id" : "john", "result" : 2 }
{ "_id" : "peter", "result" : 0 }
But i can't understand how to perform the second "group by" with categories.
What is the best way to do this ?
Thanks
You can include multiple fields in the _id used by $group to provide multi-field grouping:
db.books.aggregate([
{$group: {
_id: {category: '$category', author: '$author'},
result: {$sum: 1}
}}
])
Result:
{
"_id" : {
"category" : "action",
"author" : "jane"
},
"result" : 1
},
{
"_id" : {
"category" : "novel",
"author" : "jane"
},
"result" : 1
},
{
"_id" : {
"category" : "novel",
"author" : "peter"
},
"result" : 1
},
{
"_id" : {
"category" : "scifi",
"author" : "peter"
},
"result" : 2
},
{
"_id" : {
"category" : "action",
"author" : "john"
},
"result" : 2
}