How to get max values for distinct elements in mongodb? - node.js

I have records in my collection
{
"_id" : ObjectId("5c37a71c54956d08afb590ef"),
"user_id" : 45,
"result" : 9,
}
{
"_id" : ObjectId("5c37a7ad54956d08afb590f0"),
"user_id" : 1,
"result" : 3,
}
{
"_id" : ObjectId("5c37a80254956d08afb590f1"),
"user_id" : 45,
"result" : 10,
}
How to get distinct records with max values (result) for each user (user_id field is unique) ?
I expect result like this:
{
"_id" : ObjectId("5c37a80254956d08afb590f1"),
"user_id" : 45, //distinct user_id
"result" : 10, //max result for user
}
{
"_id" : ObjectId("5c37a7ad54956d08afb590f0"),
"user_id" : 1, //distinct user_id
"result" : 3, //max result for user
}

You can use below aggregation:
db.col.aggregate([
{
$sort: { result: -1 }
},
{
$group: {
_id: "$user_id",
result: { $first: "$result" },
o_id: { $first: "$_id" }
}
},
{
$project: {
_id: "$o_id",
user_id: "$_id",
result: 1
}
}
])
You need to use $sort first to be able to capture both _id and result from highest result document using $group and $first operators. Output:
{ "result" : 3, "_id" : ObjectId("5c37a7ad54956d08afb590f0"), "user_id" : 1 }
{ "result" : 10, "_id" : ObjectId("5c37a80254956d08afb590f1"), "user_id" : 45 }

Related

Mongodb aggregate $group stage takes a long time

I'm practicing how to use MongoDB aggregation, but they seem to take a really long time (running time).
The problem seems to happen whenever I use $group. All other queries run just fine.
I have some 1.3 million dummy documents that need to perform two basic operations: get a count of the IP addresses and unique IP addresses.
My schema looks something like this:
{
"_id":"5da51af103eb566faee6b8b4",
"ip_address":"...",
"country":"CL",
"browser":{
"user_agent":...",
}
}
Running a basic $group query takes about 12s on average, which is much too slow.
I did a little research, and someone suggested creating an index on ip_addresses. That seems to have slowed it down because queries now take 13-15s.
I use MongoDB and the query I'm running looks like this:
visitorsModel.aggregate([
{
'$group': {
'_id': '$ip_address',
'count': {
'$sum': 1
}
}
}
]).allowDiskUse(true)
.exec(function (err, docs) {
if (err) throw err;
return res.send({
uniqueCount: docs.length
})
})
Any help is appreciated.
Edit: I forgot to mention, someone suggested it might be a hardware issue? I'm running the query on a core i5, 8GB RAM laptop if it helps.
Edit 2: The query plan:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"ip_address" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "metrics.visitors",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1387324,
"executionTimeMillis" : 7671,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1387324,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 1387324,
"executionTimeMillisEstimate" : 9,
"works" : 1387326,
"advanced" : 1387324,
"needTime" : 1,
"needYield" : 0,
"saveState" : 10930,
"restoreState" : 10930,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1387324
}
}
}
},
{
"$group" : {
"_id" : "$ip_address",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
This is some info about using $group aggregation stage, if it uses indexes, and its limitations and what can be tried to overcome these.
1. The $group Stage Doesn't Use Index:
Mongodb Aggregation: Does $group use index?
2. $group Operator and Memory:
The $group stage has a limit of 100 megabytes of RAM. By default, if
the stage exceeds this limit, $group returns an error. To allow for
the handling of large datasets, set the allowDiskUse option to true.
This flag enables $group operations to write to temporary files.
See MongoDb docs on $group Operator and Memory
3. An Example Using $group and Count:
A collection called as cities:
{ "_id" : 1, "city" : "Bangalore", "country" : "India" }
{ "_id" : 2, "city" : "New York", "country" : "United States" }
{ "_id" : 3, "city" : "Canberra", "country" : "Australia" }
{ "_id" : 4, "city" : "Hyderabad", "country" : "India" }
{ "_id" : 5, "city" : "Chicago", "country" : "United States" }
{ "_id" : 6, "city" : "Amritsar", "country" : "India" }
{ "_id" : 7, "city" : "Ankara", "country" : "Turkey" }
{ "_id" : 8, "city" : "Sydney", "country" : "Australia" }
{ "_id" : 9, "city" : "Srinagar", "country" : "India" }
{ "_id" : 10, "city" : "San Francisco", "country" : "United States" }
Query the collection to count the cities by each country:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
] )
The Result:
{ "cityCount" : 3, "country" : "United States" }
{ "cityCount" : 1, "country" : "Turkey" }
{ "cityCount" : 2, "country" : "Australia" }
{ "cityCount" : 4, "country" : "India" }
4. Using allowDiskUse Option:
db.cities.aggregate( [
{ $group: { _id: "$country", cityCount: { $sum: 1 } } },
{ $project: { country: "$_id", _id: 0, cityCount: 1 } }
], { allowDiskUse : true } )
Note, in this case it makes no difference in query performance or output. This is to show the usage only.
5. Some Options to Try (suggestions):
You can try a few things to get some result (for trial purposes only):
Use $limit stage and restrict the number of documents processed and
see what is the result. For example, you can try { $limit: 1000 }.
Note this stage needs to come before the $group stage.
You can also use the $match, $project stages before the $group
stage to control the shape and size of the input. This may
return a result (instead of an error).
[EDIT ADD]
Notes on Distinct and Count:
Using the same cities collection - to get unique countries and a count of them you can try using the aggregate stage $count along with $group as in the following two queries.
Distinct:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } }
] )
The Result:
{ "country" : "United States" }
{ "country" : "Turkey" }
{ "country" : "India" }
{ "country" : "Australia" }
To get the above result as a single document with an array of unique values, use the $addToSetoperator:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: null, uniqueCountries: { $addToSet: "$country" } } },
{ $project: { _id: 0 } },
] )
The Result: { "uniqueCountries" : [ "United States", "Turkey", "India", "Australia" ] }
Count:
db.cities.aggregate( [
{ $match: { country: { $exists: true } } },
{ $group: { _id: "$country" } },
{ $project: { country: "$_id", _id: 0 } },
{ $count: "uniqueCountryCount" }
] )
The Result: { "uniqueCountryCount" : 4 }
In the above queries the $match stage is used to filter any documents with non-existing or null countryfield. The $project stage reshapes the result document(s).
MongoDB Query Language:
Note the two queries get similar results when using the MongoDB query language commands: db.collection.distinct("country") and db.cities.distinct("country").length (note the distinct returns an array).
You can create index
db.collectionname.createIndex( { ip_address: "text" } )
Try this, it is more faster.
I think it will help you.

MongoDB aggregate on Nested data

I have nested data as below,
{
"_id" : ObjectId("5a30ee450889c5f0ebc21116"),
"academicyear" : "2017-18",
"fid" : "be02",
"fname" : "ABC",
"fdept" : "Comp",
"degree" : "BE",
"class" : "1",
"sem" : "8",
"dept" : "Comp",
"section" : "Theory",
"subname" : "BDA",
"fbValueList" : [
{
"_id" : ObjectId("5a30eecd3e3457056c93f7af"),
"score" : 20,
"rating" : "Fair"
},
{
"_id" : ObjectId("5a30eefd3e3457056c93f7b0"),
"score" : 10,
"rating" : "Fair"
},
{
"_id" : ObjectId("5a337e53341bf419040865c4"),
"score" : 88,
"rating" : "Excellent"
},
{
"_id" : ObjectId("5a337ee2341bf419040865c7"),
"score" : 75,
"rating" : "Very Good"
},
{
"_id" : ObjectId("5a3380b583dde50ddcea350e"),
"score" : 72,
"rating" : "Very Good"
}
]
},
{
"_id" : ObjectId("5a3764f1bc19b77dd9fd9a57"),
"academicyear" : "2017-18",
"fid" : "be02",
"fname" : "ABC",
"fdept" : "Comp",
"degree" : "BE",
"class" : "1",
"sem" : "5",
"dept" : "Comp",
"section" : "Theory",
"subname" : "BDA",
"fbValueList" : [
{
"_id" : ObjectId("5a3764f1bc19b77dd9fd9a59"),
"score" : 88,
"rating" : "Excellent"
},
{
"_id" : ObjectId("5a37667aee64bce1b14747d2"),
"score" : 74,
"rating" : "Good"
},
{
"_id" : ObjectId("5a3766b3ee64bce1b14747dc"),
"score" : 74,
"rating" : "Good"
}
]
}
We are trying to perform aggregation using this,
db.fbresults.aggregate([{$match:{academicyear:"2017-18",fdept:'Comp'}},{$group:{_id: {fname: "$fname", rating:"$fbValueList.rating"},count: {"$sum":1}}}])
and we get result like,
{ "_id" : { "fname" : "ABC", "rating" : [ "Fair","Fair","Excellent","Very Good", "Very Good", "Excellent", "Good", "Good" ] }, "count" : 2 }
but we are expecting result like,
{ "_id" : { "fname" : "ABC", "rating_group" : [
{
rating: "Excellent"
count: 2
},
{
rating: "Very Good"
count: 2
},
{
rating: "Good"
count: 2
},
{
rating: "Fair"
count: 2
},
] }, "count" : 2 }
We want to get individual faculty group by their name and inside that group by their rating response and count of rating.
We have already tried this one but we did not the result.
Mongodb Aggregate Nested Group
This should get you going:
db.collection.aggregate([{
$match: {
academicyear: "2017-18",
fdept:'Comp'
}
}, {
$unwind: "$fbValueList" // flatten the fbValueList array into multiple documents
}, {
$group: {
_id: {
fname: "$fname",
rating:"$fbValueList.rating"
},
count: {
"$sum": 1 // this will give us the count per combination of fname and fbValueList.rating
}
}
}, {
$group: {
_id: "$_id.fname", // we only want one bucket per fname
rating_group: {
$push: { // we push the exact structure you were asking for
rating: "$_id.rating",
count: "$count"
}
},
count: {
$avg: "$count" // this will be the average across all entries in the fname bucket
}
}
}])
This is a long aggregation pipeline, there may be some aggregations that are un-necessary, so please check and discard whichever are irrelevant.
NOTE: This will only work with Mongo 3.4+.
You need to use $unwind and then $group and $push ratings with their counts.
matchAcademicYear = {
$match: {
academicyear:"2017-18", fdept:'Comp'
}
}
groupByNameAndRating = {
$group: {
_id: {
fname: "$fname", rating:"$fbValueList.rating"
},
count: {
"$sum":1
}
}
}
unwindRating = {
$unwind: "$_id.rating"
}
addFullRating = {
$addFields: {
"_id.full_rating": "$count"
}
}
replaceIdRoot = {
$replaceRoot: {
newRoot: "$_id"
}
}
groupByRatingAndFname = {
$group: {
_id: {
"rating": "$rating",
"fname": "$fname"
},
count: {"$sum": 1},
full_rating: {"$first": "$full_rating"}
}
}
addFullRatingAndCount = {
$addFields: {
"_id.count": "$count",
"_id.full_rating": "$full_count"
}
}
groupByFname = {
$group: {
_id: "$fname",
rating_group: { $push: {rating: "$rating", count: "$count"}},
count: { $first: "$full_rating"}
}
}
db.fbresults.aggregate([
matchAcademicYear,
groupByNameAndRating,
unwindRating,
addFullRating,
unwindRating,
replaceIdRoot,
groupByRatingAndFname,
addFullRatingAndCount,
replaceIdRoot,
groupByFname
])

Mongo Node driver how to get all fields of $max aggregate from an array of objects

I have a collection called "products" which has an array of "bids" objects.
I want to find out the Maximum bid for each product, for this I am aggregating Products on $max with $bids.bidamount field. However this is only giving me the largest bid amount. How do I project all the bid fields for the max aggregation.
Here is a sample document
{
"_id" : ObjectId("58109a5138fe12215cfdc064"),
"product_id" : 2,
"item_name" : "Auction Item1",
"item_description" : "Test",
"seller_name" : "ak#gmail.com",
"item_price" : "20",
"item_quantity" : 7,
"sale_type" : "Auction",
"posted_at" : "2016:10:26 04:58:09",
"expires_at" : "2016:10:30 04:58:09",
"bids" : [
{
"bid_id" : 1,
"bidder" : "ak#gmail.com",
"bid_amount" : 300,
"bit_time" : "2016:10:26 22:36:29"
},
{
"bid_id" : 2,
"bidder" : "ak#gmail.com",
"bid_amount" : 100,
"bit_time" : "2016:10:26 22:37:29"
}
],
"orders" : [
{
"buyer" : "ak#gmail.com",
"quantity" : "2"
},
{
"buyer" : "ak#gmail.com",
"quantity" : "3"
}
]
}
Here is my mongo query:
db.products.aggregate([
{
$project: {
bidMax: { $max: "$bids.bid_amount"}
}
}
])
which gives the following result:
{
"_id" : ObjectId("58109a5138fe12215cfdc064"),
"bidMax" : 300
}
db.products.aggregate([{$unwind:"$bids"},{$group:{_id:"$_id", sum:{$sum:"$bids.bid_amount"}}},{$project:{doc:"$$ROOT", _id:1, sum:1}, {$sort:{"sum":-1}},{$limit:1}]),
which return something like { "_id" : ObjectId("5811b667c50fb1ec88227860"), "sum" : 600, doc:{your document....} }
This should do it:
db.products.aggregate([{
$unwind: '$bids'
}, {
$group: {
_id: '$products_id',
maxBid: {
$max: '$bids.bid_amount'
}
}
}])
db.collectionName.aggregate(
[
{
$group:
{
_id: "$product_id",
maxBidAmount: { $max: "$bids.bid_amount" }
}
}
]
)
Hey use this query, you will get the result.

Mongodb lookup + group by with mongoose

I have two different mongoose collection as follow :
{ "_id" : 1, "countryId" : 1, "price" : 12, "quantity" : 24 }
{ "_id" : 2, "countryId" : 2, "price" : 20, "quantity" : 1 }
{ "_id" : 3 }
{ "_id" : 4, "countryId" : 1, "price" : 12, "quantity" : 24 }
{ "_id" : 1, "id" : 1, description: "Colombia"}
{ "_id" : 3, "id" : 2, description: "Mexic" }
I'm trying to aggregate them so that i can have a result as follow :
{"country":"Colombia","total":48}
{"country":"Mexic","total":1}
I've tried many things but it's always failing here is the last version of what i'm working on ( i've changed the data but you get the idea ) :
Model.aggregate([
{
$lookup:
{
from: "countryList",
localField: "countryId",
foreignField: "id",
as: "country"
},
{
$project: {
quantity:1, country:{$country:"$countryList.description"}
}
},{
$group:{
{ _id : null, qtyCountry: { $sum: "$quantity" } }
}
}
}],function (err, result) {
if (err) {
console.log(err);
} else {
console.log(result)
}
}
);
Is it even possible ?
Yes, it is possible. You can try the following aggregation pipeline.
var pipeline = [
{"$match":{"countryId":{"$exists":true}}},
{"$group" : {"_id":"$countryId", "quantity":{"$sum":"$quantity"}}},
{"$lookup":{"from":"countryList","localField":"_id", "foreignField":"id","as":"country"}},
{"$unwind":"$country"},
{"$project": {"country":"$country.description", "total":"$quantity", _id:0}}
]
Sample output:
{ "country" : "Mexic", "total" : 1 }
{ "country" : "Colombia", "total" : 48 }

How to use Mongoose sum operation?

i have simple schema like this
{
"productName": "pppppp"
"sku" : {
"carted" : [
{
"_id" : ObjectId("56c6d606c0987668109a21f7"),
"timestamp" : ISODate("2016-02-19T08:44:54.043+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d653172fb54817ec2356"),
"timestamp" : ISODate("2016-02-19T08:46:11.902+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
},
{
"_id" : ObjectId("56c6d6a7172fb54817ec2358"),
"timestamp" : ISODate("2016-02-19T08:47:35.652+0000"),
"cartId" : "56c6c1fd60c4491c157e433d",
"qty" : NumberInt(2)
}
],
"qty" : NumberInt(14)
}
}
how the way to view the product "pppppp" and show the quantity to 20? the sku.quantity added with all available sku.carted.qty.
i want it looks like this
{
"productName": "pppppp"
"qty" : 20
}
Please try this one with $group, $sum and $add
> db.collection.aggregate([
{$unwind: '$sku.carted'},
// sum the `qty` in the carted array, put this result to `qt`
{$group: {
_id: {productName: '$productName', q: '$sku.qty'},
qt: {$sum: '$sku.carted.qty'}
}},
// add the `qt` and `sku.qty`
// and reshape the output result.
{$project: {
_id: 0,
productName: '$_id.productName',
qty: {$add: ['$_id.q', '$qt']}
}}
]);

Resources