MongoDB - get documents with max attribute per group in a collection - node.js

My data looks like this:
session, age, firstName, lastName
1, 28, John, Doe
1, 21, Donna, Keren
2, 32, Jenna, Haze
2, 52, Tommy, Lee
..
..
I'd like to get all the rows which are the largest (by age) per session. So So for the above input my output would look like:
sessionid, age, firstName, lastName
1, 28, John, Doe
2, 52, Tommy, Lee
because John has the largest age in the session = 1 group and Tommy has the largest age on the session=2 group.
I need to export the result to a file (csv) and it may contain lots of records.
How can I achieve this?

MongoDB aggregation offers the $max operator, but in your case you want the "whole" record as it is. So the appropriate thing to do here is $sort and then use the $first operator within a $group statement:
db.collection.aggregate([
{ "$sort": { "session": 1, "age": -1 } },
{ "$group": {
"_id": "$session",
"age": { "$first": "$age" },
"firstName": { "$first" "$firstName" },
"lastName": { "$first": "$lastName" }
}}
])
So the "sorting" gets the order right, and the "grouping" picks the first occurrence within the "grouping" key where those fields exist.
Mostly $first here because the $sort is done in reverse order. You can also use $last when in an ascending order as well.

You could try the below aggregation query which uses max attribute: http://docs.mongodb.org/manual/reference/operator/aggregation/max/
db.collection.aggregate([
$group: {
"_id": "$session",
"age": { $max: "$age" }
},
{ $out : "max_age" }
])
The results should be outputted to the new collection max_age and then you could dump it into a csv.
Note: it will give only the session and max age and will not return other fields.

Related

Mongo DB sort by calculated field

I have users collection in mongo db.
I want to sort the docs in the collection based on the delta between wins and losts
In the case of the document below: 2-11 = -9
How do I do that? (I am using mongo from node js )
{
"_id": {
"$oid": "615ebae9f5b277c71b886906"
},
"name": "jack",
"wins": {
"$numberInt": "2"
},
"losts": {
"$numberInt": "11"
},
"lastVisit": {
"$date": {
"$numberLong": "1633852348009"
}
}
}
$sort doens't allow us to use an expression and sort by it so the only solution i think is to add one more field, sort by it and then removed it.
Query
on sort you can set it 1/-1 depending on the order you want
aggregate(
[{"$set": {"dif": {"$subtract": ["$wins", "$losts"]}}},
{"$sort": {"dif": 1}},
{"$unset": ["dif"]}])

how to get the document details according to the field value in mongodb aggregate

I have a collection named users
var UserSchema = new Schema({
name: String,
age: Number,
points: {type: Number, default: 0}
})
all users have some different points like 5, 10, 20, 50
so i want to count the number of users having 5 points, 10 points etc, and want to show the counted users details also, like who are those users which are having 5 points, 10 points etc.
how to write query for that in $aggregate
You can write a group stage and push all the values you need using the $push operator
db.collection.aggregate([
{
"$group": {
"_id": "$points",
"details": {
"$push": {
"name": "$name",
"age": "$age"
}
}
}
}
])
In the above example, I've grouped according to points and for each group, you'll get an array containing the name and age of the people having those points

Mongoose - Aggregation of two queries with condition

I've two different collections that are connected by the id of the garden. I've a list of gardens and I've a list of allocations where it will be stored the start and the end date of the allocation. I can check if a garden is allocated by verifying if today is between both dates in the allocation table.
Garden
{
"_id": "5b98df3c9275f2291c0d7dc3",
"id": "h1",
"size": 43
}
Allocation
{
"_id": "5b9bcb8ecb9dee0015150549",
"user": "5b9a2cd21eb58700141a3449",
"garden": "5b98df5c9275f2291c0d7dc6",
"start_date":"2018-09-14T00:00:00.000Z",
"end_date": "2018-11-14T00:00:00.000Z"
}
How can I return all the existing gardens with an aditional field 'ocupied' with true or false depending on if they exist on the allocation document between start_date and end_date?
I'd like to get an array of gardens with the following data
{
"_id": "5b98df3c9275f2291c0d7dc3",
"id": "h1",
"size": 43,
"occupied": true
}
You can do it one of two ways.
var today = ISODate();
Using $lookup
db.garden.aggregate([
{"$lookup":{
"from":"allocation",
"localField":"_id",
"foreignField":"garden",
"as":"garden"
}},
{"$unwind":"$garden"},
{"$addFields":{
"occupied":{
"$and":[
{"$gte":["$garden.start_date",today]},
{"$lt":["$garden.end_date",today]}
]
}
}},
{"$project":{"garden":0}}
])
Using $lookup with pipeline
db.garden.aggregate([
{"$lookup":{
"from":"allocation",
"let":{"garden_id":"$_id"},
"pipeline":[
{"$match":{"$expr":{"$eq":["$$garden_id","$garden"]},"start_date":{"$gte":today},"end_date":{"$lt":today}}}
],
"as":"garden"
}},
{"$addFields":{
"occupied":{"$gt":[{"$size":"$garden"},0]}
}},
{"$project":{"garden":0}}
])

Sorting and placing matched values on top

I am using MongoDB and Node.js to display a record set in a page. I have got as far as displaying them on the page alphabetically, but I would like to display one row (the "default" row) at the top, and all the others alphabetically beneath it.
I know, I know, Mongo is definitely not SQL, but in SQL I would have done something like this:
SELECT *
FROM themes
ORDER BY name != "Default", name ASC;
or perhaps even
SELECT * FROM themes WHERE name = "Default"
UNION
SELECT * FROM themes WHERE name != "Default" ORDER BY name ASC;
I have tried a few variations of Mongo's sorting options, such as
"$orderby": {'name': {'$eq': 'Default'}, 'name': 1}}
but without any luck so far. I have been searching a lot for approaches to this problem but I haven't found anything. I am new to Mongo but perhaps I'm going about this all wrong.
My basic code at the moment:
var db = req.db;
var collection = db.get('themes');
collection.find({"$query": {}, "$orderby": {'name': 1}}, function(e, results) {
res.render('themes-saved', {
title: 'Themes',
section: 'themes',
page: 'saved',
themes: results
});
});
You cannot do that in MongoDB, as sorting must be on a specific value already present in a field of your document. What you "can" do is $project a "weighting" to the record(s) matching your condition. As in:
collection.aggregate(
[
{ "$project": {
"each": 1,
"field": 1,
"youWant": 1,
"name": 1,
"weight": {
"$cond": [
{ "$eq": [ "$name", "Default" ] },
10,
0
]
}
}},
{ "$sort": { "weight": -1, "name": 1 } }
],
function(err,results) {
}
);
So you logically inspect the field you want to match a value in ( or other logic ) and then assign a value to that field, and a lower score or 0 to those that do not match.
When you then $sort on that "weighting" first in order ( decending from highest in this case ) so that those values are listed before others with a lower weighting.

Querying mongodb for dups but allow certain duplicates based on timestamps

So I have a set of data that have timestamps associated with it. I want mongo to aggregate the ones that have duplicates within a 3 min timestamp. I'll show you an example of what I mean:
Original Data:
[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]
After querying, it would be:
[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]
Because the second entry was within the 3 min bubble created by the first entry. I've gotten the code so that it aggregates and removed dupes that have the same fruit but now I only want to combine the ones that are within the timestamp bubble.
We should be able to do this! First lets split up an hour in 3 minute 'bubbles':
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57]
Now to group these documents we need to modify the timestamp a little. As far I as know this isn't currently possible with the aggregation framework so instead I will use the group() method.
In order to group fruits within the same time period we need to set the timestamp to the nearest minute 'bubble'. We can do this with timestamp.minutes -= (timestamp.minutes % 3).
Here is the resulting query:
db.collection.group({
keyf: function (doc) {
var timestamp = new ISODate(doc.timestamp);
// seconds must be equal across a 'bubble'
timestamp.setUTCSeconds(0);
// round down to the nearest 3 minute 'bubble'
var remainder = timestamp.getUTCMinutes() % 3;
var bubbleMinute = timestamp.getUTCMinutes() - remainder;
timestamp.setUTCMinutes(bubbleMinute);
return { fruit: doc.fruit, 'timestamp': timestamp };
},
reduce: function (curr, result) {
result.sum += 1;
},
initial: {
sum : 0
}
});
Example results:
[
{
"fruit" : "apple",
"timestamp" : ISODate("2014-07-17T06:45:00Z"),
"sum" : 2
},
{
"fruit" : "apple",
"timestamp" : ISODate("2014-07-17T06:54:00Z"),
"sum" : 1
},
{
"fruit" : "banana",
"timestamp" : ISODate("2014-07-17T09:03:00Z"),
"sum" : 1
},
{
"fruit" : "orange",
"timestamp" : ISODate("2014-07-17T14:24:00Z"),
"sum" : 2
}
]
To make this easier you could precompute the 'bubble' timestamp and insert it into the document as a separate field. The documents you create would look something like this:
[
{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z", "bubble": "2014-07-17T06:45:00Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z", "bubble": "2014-07-17T06:45:00Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z", "bubble": "2014-07-17T06:54:00Z"}
]
Of course this takes up more storage. However, with this document structure you can use the aggregate function[0].
db.collection.aggregate(
[
{ $group: { _id: { fruit: "$fruit", bubble: "$bubble"} , sum: { $sum: 1 } } },
]
)
Hope that helps!
[0] MongoDB aggregation comparison: group(), $group and MapReduce

Resources