I have a simple db layout like this:
client
id
sex (male/female)
birthday (date)
client
id
sex (male/female)
birthday (date)
(...)
I'm trying to write an aggregation command that outputs how many male and female clients I've got, and I'd also like to output the average age of males and females, not sure I can do this in the same command or I need 2 separate ones?
// Count of males/females, average age
Clients.aggregate({
$project : {"sex" : 1,
"sexCount" : 1,
"birthday" : 1,
"avgAge" : 1
}
},
{
$match: {"sex": {$exists: true}}
},
{
$group: {
_id : "$sex",
sexCount : { $sum: 1 },
avgAge : { $avg: "$birthday" },
}
},
{ $sort: { _id: 1 } }
, function(err, sex_dbres) {
if (err)
throw err;
else{
(...)
}
});
With the code above I get the counts of male/female, but avgAge comes as 0. Any ideas?
Many thanks
The answer would be much simpler if you were storing age in the original document (as Dmitry posted, you could just do a straight avgAge:{$avg:"$age"} in your $group step.
Aggregation Framework is pretty nifty though and has many cool operators which allow you to compute this missing age field "on the fly".
I'm going to store each step of the aggregation in a variable so it's easier to see what's going on:
today = new Date();
// split today and bday into numerical year and numerical day-of-the-year
project1= {
"$project" : {
"sex" : 1,
"todayYear" : {
"$year" : today
},
"todayDay" : {
"$dayOfYear" : today
},
"by" : {
"$year" : "$bday"
},
"bd" : {
"$dayOfYear" : "$bday"
}
}
};
// calculate age in days by subtracting bday in days from today in days
project2 = {
"$project" : {
"sex" : 1,
"age" : {
"$subtract" : [
{
"$add" : [
{
"$multiply" : [
"$todayYear",
365
]
},
"$todayDay"
]
},
{
"$add" : [
{
"$multiply" : [
"$by",
365
]
},
"$bd"
]
}
]
}
}
};
// sum up for each sex the count and compute avg age (in days)
group = {
"$group" : {
"_id" : "$sex",
"total" : {
"$sum" : 1
},
"avgAge" : {
"$avg" : "$age"
}
}
};
// divide days by 365 to get age in years.
project3 = {
"$project" : {
"_id" : 0,
"sex" : "$_id",
"total" : 1,
"averageAge" : {
"$divide" : [
"$avgAge",
365
]
}
}
};
Now you can run the aggregation:
> db.client.find({},{_id:0})
{ "sex" : "male", "bday" : ISODate("2000-02-02T08:00:00Z") }
{ "sex" : "male", "bday" : ISODate("1987-02-02T08:00:00Z") }
{ "sex" : "female", "bday" : ISODate("1989-02-02T08:00:00Z") }
{ "sex" : "female", "bday" : ISODate("1993-11-02T08:00:00Z") }
> db.client.aggregate([ project1, project2, group, project3 ])
{
"result" : [
{
"sex" : "female",
"total" : 2,
"averageAge" : 21.34109589041096
},
{
"sex" : "male",
"total" : 2,
"averageAge" : 19.215068493150685
}
],
"ok" : 1
}
>
The reason this is not simple is currently Aggregation Framework does not support direct subtraction of dates. Please vote for https://jira.mongodb.org/browse/SERVER-6239 which is targeted for the next major release - once it's implemented it should allow subtraction of dates directly (though you will still need to convert it to appropriate granularity, years in this case probably).
The date object can't be "averaged", but numbers can. You can convert your dates to the timestamp value, and then find average from it. But still that won't be an average age, you'll need to subtract result from the current date outside of the aggregation function.
Another option is to assume that age can be calculated using only year part of the date (that is, if I was born on December 1, 2000, in today's report I'll be 12 years old, not 11). In this case you can use date operators to extract year value.
$project : {"sex" : 1,
"sexCount" : 1,
"year" : {$year: "$birthday"},
}
},
$project : {"sex" : 1,
"sexCount" : 1,
"age" : {$subtract: [2012, '$year']},
}
},
Related
We were working on a project with a 300 documents with currentValue field in a main collection, in order to track the history of each document of first collection. we created another collection named history with approximately 6.5 millions of documents.
For each input of system we have to add around 30 history item and update currentValue field of main collection, so, We tried computational field design pattern for currentValue, which lead us to have writeConfilict in concurrent situations (at concurrency of around 1000 requests).
Then we tried to compute currentValue field with sum (amount field) and groupBy(mainId field) on history collection which takes too long (> 3s).
Main collection docs:
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
.
.
.
],
"name" : "name",
}
History collection docs:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
}
If you have any other idea to handle this situation(application or db level), I would be thankful.
UPDATE 1
The update query if I use computed pattern would be:
mainCollection.findOneAndUpdate(
{
$and: [
{ _id: id },
{ "stock.storage": fromId },
{ "stock.deletedAt": null }
],
},
{
$inc: {
"stock.$.currentAmount": -1 * amount,
}
},
{
session
}
)
And Aggregation pipeline if I want to calculate currentAmount everytime:
mainCollection.aggregate([
{
$match: {
branch: new ObjectId("...")
}
},
{
$group: {
_id: "$ingredient",
currentAmount: {
$sum: "$amount"
}
}
}])
in order to have computed field, mongo design patterns, suggested computed field,
The Computed Pattern is utilized when we have data that needs to be computed repeatedly in our application. link
like below:
// your main collection will look like this
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
"totalAmount": 20000 // for example
}
but for having concurrent there is a better way to solve this problem with cumulative summation, in this algorithm, we sum last documents inputs, with current input:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
"cumulative": 15 // sum of last documents input
}
I have a Mongoose Model for users. Each user has a certain amount of points. I'd like to create a field that is the users rank where:
rank = user position sorted by rank / total users
Let's suppose the user model looks like this:
{
'name': 'bob',
'points': 15,
'rank': 9/15,
}
(I realize that the fraction would really be a decimal when stored).
Is there a way that I can update all of these users by:
1) Sorting them by points
2) Get a user's index in this sorted list
3) Divide that index by the total number of items in the list
I'm not sure what kind of mongo operators are out there for finding a doc's position in query results and for finding the total size of the query results.
Using the previous answer is not a good idea. It requires recalculating rank after each update of points values.
Mongo version 5.0+ introduced $rank aggregation:
db.users.aggregate([
{
$setWindowFields: {
sortBy: { points: 1 },
output: {
rank: {
$rank: {}
}
}
}
}
])
will output
{ "points": 140, "rank": 1 },
{ "points": 160, "rank": 2 },
{ "points": 170, "rank": 3 },
{ "points": 180, "rank": 4 },
{ "points": 220, "rank": 5 }
You can do this using a couple of queries and a bit of JavaScript. Expanding on the steps you outlined, what you need to do is:
Find all of the user documents, sort them by points in descending order and assign the results to a cursor. You might want to ensure that you have an index on this field to make this query run faster.
Get the count for the number of documents returned.
Keep track of the position of the document within the results using an index.
Iterate through the documents, calculating the rank using the count and the index, and updating the corresponding user's rank with the result of that calculation.
In the mongo shell, the code would look something like the following.
var c = db.user.find().sort({ "points": -1 });
var count = c.count();
var i = 1;
while (c.hasNext()) {
var rank = i / count;
var user = c.next();
db.user.update(
{ "_id": user._id },
{ "$set": { "rank": rank } }
);
i++;
}
So if you had the following three users in your collection:
{
"_id" : ObjectId("54f0af63cfb269d664de0b4e"),
"name" : "bob",
"points" : 15,
"rank" : 0
}
{
"_id" : ObjectId("54f0af7fcfb269d664de0b4f"),
"name" : "arnold",
"points" : 20,
"rank" : 0
}
{
"_id" : ObjectId("54f0af95cfb269d664de0b50"),
"name" : "claus",
"points" : 10,
"rank" : 0
}
After the update their documents would look like this:
{
"_id" : ObjectId("54f0af63cfb269d664de0b4e"),
"name" : "bob",
"points" : 15,
"rank" : 0.6666666666666666
}
{
"_id" : ObjectId("54f0af7fcfb269d664de0b4f"),
"name" : "arnold",
"points" : 20,
"rank" : 0.3333333333333333
}
{
"_id" : ObjectId("54f0af95cfb269d664de0b50"),
"name" : "claus",
"points" : 10,
"rank" : 1
}
I have a collection with a sub-document consisting of more than 40K records.
My aggregate query takes about 300 secs. I have tried optimizing the same using compound as well as multi-key indexing, which completes in 180 secs.
I still require a reduced query time execution.
here is my collection:
{
"_id" : ObjectId("545b32cc7e9b99112e7ddd97"),
"grp_id" : 654,
"user_id" : 2,
"mod_on" : ISODate("2014-11-06T08:35:40.857Z"),
"crtd_on" : ISODate("2014-11-06T08:35:24.791Z"),
"uploadTp" : 0,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"total_cnt" : 25,
"succ_cnt" : 25,
"fail_cnt" : 0
}
and following is my query
db.member_id_transactions.aggregate([ { '$match':
{ id_url: { '$elemMatch': { mid: 'xyz12794' } } } },
{ '$unwind': '$id_url' },
{ '$match': { grp_id: 654, 'id_url.mid': 'xyz12794' } } ])
has anyone faced the same issue?
here's the o/p for aggregate query with explain option
{
"result" : [
{
"_id" : ObjectId("546342467e6d1f4951b56285"),
"grp_id" : 685,
"user_id" : 2,
"mod_on" : ISODate("2014-11-12T11:24:01.336Z"),
"crtd_on" : ISODate("2014-11-12T11:19:34.682Z"),
"uploadTp" : 1,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"__v" : 0,
"total_cnt" : 21406,
"succ_cnt" : 21402,
"fail_cnt" : 4
}
],
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("545c8d37ab9cc679383a1b1b")
}
}
One way to reduce the number of records being filtered further is to include the field grp_id, in the first $match operator.
db.member_id_transactions.aggregate([
{$match:{ "id_url.mid": 'xyz12794',"grp_id": 654 } },
{$unwind: "$id_url" },
{$match: { "id_url.mid": "xyz12794" } }
])
See how the performance is now. Add grp_id to the index to get better response time.
The above aggregation query though it works, is unnecessary. since you are not altering the structure of the document, and you expect only one element in the array to match the filter condition, you could just use a simple find and project.
db.member_id_transactions.find(
{ "id_url.mid": "xyz12794","grp_id": 654 },
{"_id":0,"grp_id":1,"id_url":{$elemMatch:{"mid":"xyz12794"}},
"user_id":1,"mod_on":1,"crtd_on":1,"uploadTp":1,
"tp":1,"status":1,"incl":1,"total_cnt":1,
"succ_cnt":1,"fail_cnt":1
}
)
I have a database with 800+ different bars, clubs and restaurants across Australia.
I want to build a list of links for my website counting the number of different venues across different suburbs and primary categories.
Like this:
Restaurants, Bowen Hills (15)
Restaurants, Dawes Point (6)
Clubs, Sydney (138)
I could do it the hard way by first getting all venues. Then run a Venue.distinct('details.location.suburb') to get all the unique suburbs.
From here I could run subsequent queries to get the count for the number of venues in that particular suburb and category.
It will be a lot of calls though. There's got to be better way?
Can the Mongo aggregation framework help here?
It seems to be impossible to do this in a single query.
Here's the Venue model:
{
"name" : "Johnny's Bar & Grill",
"meta" : {
"category" : {
"all" : [
"restaurant",
"bar"
],
"primary" : "restaurant"
}
},
"details" : {
"location" : {
"streetNumber" : "180",
"streetName" : "abbotsford road",
"suburb" : "bowen hills",
"city" : "brisbane",
"postcode" : "4006",
"state" : "qld",
"country" : "australia"
},
"contact" : {
"phone" : [
"(07) 5555 5555"
]
}
}
}
}
Here's the prettified solution from BatScream that I ended up using:
Venue.aggregate([
{
$group: {
_id: {
primary: '$meta.category.primary',
suburb: '$details.location.suburb',
country: '$details.location.country',
state: '$details.location.state',
city: '$details.location.city'
},
count: {
$sum: 1
},
type: {
$first: '$meta.category.primary'
}
}
},
{
$sort: {
count: -1
}
},
{
$limit: 50
},
// Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
{
$project: {
_id: 0,
type : '$type',
location : '$_id.suburb',
count: 1
}
}
],
function(err, res){
next(err, res);
});
}
You can get a very useful and easily transformable output using the following aggregation operation.
Group the records based on their country, category, state, city and
suburb.
Get the count of the records in each group.
Obtain the type of the group from the first record of the group.
Project the necessary fields.
Code:
db.collection.aggregate([
{$group:{"_id":{"primary":"$meta.category.primary",
"suburb":"$details.location.suburb",
"country":"$details.location.country",
"state":"$details.location.state",
"city":"$details.location.city"},
"count":{$sum:1},
"type":{$first:"$meta.category.primary"}}},
{$sort:{"count":-1}},
{$project:{"_id":0,
"type":"$type",
"location":"$_id.suburb",
"count":1}}
])
sample o/p:
{ "count" : 1, "type" : "restaurant", "location" : "bowen hills" }
in my collection each document has 2 dates, modified and sync. I would like to find those which modified > sync, or sync does not exist.
I tried
{'modified': { $gt : 'sync' }}
but it's not showing what I expected. Any ideas?
Thanks
You can not compare a field with the value of another field with the normal query matching. However, you can do this with the aggregation framework:
db.so.aggregate( [
{ $match: …your normal other query… },
{ $match: { $eq: [ '$modified', '$sync' ] } }
] );
I put …your normal other query… in there as you can make that bit use the index. So if you want to do this for only documents where the name field is charles you can do:
db.so.ensureIndex( { name: 1 } );
db.so.aggregate( [
{ $match: { name: 'charles' } },
{ $project: {
modified: 1,
sync: 1,
name: 1,
eq: { $cond: [ { $gt: [ '$modified', '$sync' ] }, 1, 0 ] }
} },
{ $match: { eq: 1 } }
] );
With the input:
{ "_id" : ObjectId("520276459bf0f0f3a6e4589c"), "modified" : 73845345, "sync" : 73234 }
{ "_id" : ObjectId("5202764f9bf0f0f3a6e4589d"), "modified" : 4, "sync" : 4 }
{ "_id" : ObjectId("5202765b9bf0f0f3a6e4589e"), "modified" : 4, "sync" : 4, "name" : "charles" }
{ "_id" : ObjectId("5202765e9bf0f0f3a6e4589f"), "modified" : 4, "sync" : 45, "name" : "charles" }
{ "_id" : ObjectId("520276949bf0f0f3a6e458a1"), "modified" : 46, "sync" : 45, "name" : "charles" }
This returns:
{
"result" : [
{
"_id" : ObjectId("520276949bf0f0f3a6e458a1"),
"modified" : 46,
"sync" : 45,
"name" : "charles",
"eq" : 1
}
],
"ok" : 1
}
If you want any more fields, you need to add them in the $project.
For MongoDB 3.6 and newer:
The $expr operator allows the use of aggregation expressions within the query language, thus you can do the following:
db.test.find({ "$expr": { "$gt": ["$modified", "$sync"] } })
or using aggregation framework with $match pipeline
db.test.aggregate([
{ "$match": { "$expr": { "$gt": ["$modified", "$sync"] } } }
])
For MongoDB 3.0+:
You can also use the aggregation framework with the $redact pipeline operator that allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
Consider running the following aggregate operation which demonstrates the above concept:
db.test.aggregate([
{ "$redact": {
"$cond": [
{ "$gt": ["$modified", "$sync"] },
"$$KEEP",
"$$PRUNE"
]
} }
])
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
Simply
db.collection.find({$where:"this.modified>this.sync"})
Example
Kobkrits-MacBook-Pro-2:~ kobkrit$ mongo
MongoDB shell version: 3.2.3
connecting to: test
> db.time.insert({d1:new Date(), d2: new Date(new Date().getTime()+10000)})
WriteResult({ "nInserted" : 1 })
> db.time.find()
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1<this.d2"})
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1>this.d2"})
> db.time.find({$where:"this.d1==this.d2"})
>
Use Javascript, use foreach And convert Date To toDateString()
db.ledgers.find({}).forEach(function(item){
if(item.fromdate.toDateString() == item.todate.toDateString())
{
printjson(item)
}
})
Right now your query is trying to return all results such that the modified field is greater than the word 'sync'. Try getting rid of the quotes around sync and see if that fixes anything. Otherwise, I did a little research and found this question. What you're trying to do just might not be possible in a single query, but you should be able to manipulate your data once you pull everything from the database.
To fix this issue without aggregation change your query to this:
{'modified': { $gt : ISODate(this.sync) }}