For a collection with 200k documents like the following:
{
name: Mario,
profession: plumber,
level: 8,
},
{
name: Luigi,
profession: plumber,
level: 5,
},
{
name: Walter,
profession: cook,
level: 10,
},
{
name: Jesse,
profession: cook,
level: 3,
}
What would be an efficient query to get only a single document per profession, with or without sorting by level?
To expand on #felix answer, you can do this easily with a MongoDB group operation. For example, in the MongoDB shell:
db.yourCollection.aggregate([
{ $group: { _id: '$profession', doc: { $first: '$$ROOT' } } }
])
will give you results like:
{
_id: 'plumber',
doc: {
name: 'Mario',
profession: 'plumber',
level: 8
}
},
{
_id: 'cook',
doc: {
...
}
},
...
If more fine-grained control is required, you can filter the results before or after the grouping with $match. Any MongoDB adaptor for Node should give you access to this method.
use the $group and aggregate() function to get your expected result
Query:
db.getCollection('agg').aggregate([
{ $group: { _id: '$profession', doc: { $first: '$$ROOT' }, count: { $sum: 1 } } }
])
Output:
/* 1 */
{
"_id" : "cook",
"doc" : {
"_id" : ObjectId("59157ace263af55a862b78dc"),
"name" : "Walter",
"profession" : "cook",
"level" : 10.0
},
"count" : 2.0
}
/* 2 */
{
"_id" : "plumber",
"doc" : {
"_id" : ObjectId("59157ace263af55a862b78da"),
"name" : "Mario",
"profession" : "plumber",
"level" : 8.0
},
"count" : 2.0
}
Related
Hi suppose I have these two collections
sample1 :
{
"_id" : {
"date" : ISODate("2020-02-11T18:30:00Z"),
"price" : 4,
"offer" : 0,
"itemCode" : "A001"
"customerId" : ObjectId("5e43de778b57693cd46859eb"),
"sellerId" : ObjectId("5e43e5cdc11f750864f46820"),
},
"charges" : 168
}
{
"_id" : {
"date" : ISODate("2020-02-11T18:30:00Z"),
"coverPrice" :5.5 ,
"offer" : 38,
"itemCode" : "B001"
"customerId" : ObjectId("5e43de778b57693cd46859eb"),
"sellerId" : ObjectId("5e43e5cdc11f750864f46820"),
},
"charges" : 209.5
}
NOTE : sample1's _id doesnot have any ObjectId().
sample2 :
{
"paymentReceivedOnDate" : ISODate("2020-02-12T18:30:00Z"),
"customerId" : ObjectId("5e43de778b57693cd46859eb"),
"sellerId" : ObjectId("5e43e5cdc11f750864f46820"),
"amount" : 30,
}
{
"paymentReceivedOnDate" : ISODate("2020-02-12T18:30:00Z"),
"customerId" : ObjectId("5e43de778b57693cd46859eb"),
"sellerId" : ObjectId("5e43e5cdc11f750864f46820"),
"amount" : 160,
}
{
"paymentReceivedOnDate" : ISODate("2020-02-11T18:30:00Z"),
"customerId" : ObjectId("5e43de778b57693cd46859eb"),
"sellerId" : ObjectId("5e43e5cdc11f750864f46820"),
"amount" : 50,
}
My problem statements :
1: Firstly i need to calculate the totalCharges from sample 1 collection. against the [date,customerId,sellerId ]
2: Secondly i need to calculate totalAmount from sample 2 collection.
3: Than i need calculte the outstanding i.e [totalCharges - totalAmount].
4: lastly and most importantly i need to save the projected result into a new collection suppose "result" with the following fields-['customerId','sellerId','date','totalCharges','outstanding'(i.e: [totalCharges - totalAmount]),'totalAmount'.
You can try below query :
db.sample1.aggregate([
/** groups data & sum up charges */
{ $group: { _id: { date: '$_id.date', customerId: '$_id.customerId', sellerId: '$_id.sellerId' }, totalCharges: { $sum: '$charges' } } },
/** finds matching docs from sample2 */
{
$lookup:
{
from: "sample2",
let: { customerId: '$_id.customerId', sellerId: '$_id.sellerId' },
pipeline: [
{
$match:
{
$expr:
{
$and:
[
{ $eq: ["$customerId", "$$customerId"] },
{ $eq: ["$sellerId", "$$sellerId"] }
]
}
}
},
{ $project: { amount: 1, _id: 0 } }
],
as: "TotalAmount" // TotalAmount is an array of objects, each object will have just amount field in it.
}
},
/** retains only needed fields */
{
$project: {
totalCharges: 1, outstanding: {
$subtract: ['$totalCharges', {
$reduce: {
input: '$TotalAmount',
initialValue: 0,
in: { $add: ["$$value", "$$this.amount"] }
}
}]
}, TotalAmount: {
$reduce: {
input: '$TotalAmount',
initialValue: 0,
in: { $add: ["$$value", "$$this.amount"] }
}
}
}
}
])
Test : MongoDB-Playground
Ref : aggregation-pipeline
Note : At the end of the aggregation you can have either $merge or $out stage to write aggregation results into new collection, If your MongoDB v >=4.2 then prefer $merge cause it will merge fields to existing documents/adds new documents to existing collection or if no collection is found with given name it would create new collection, But where as $out will completely replaces existing collection if provided collection name already exists or creates new collection with provided name.
This is my user collection
{
"_id" : ObjectId("58e8cb640f861e6c40627a06"),
"actorId" : "665991",
"login" : "petroav",
"gravatar_id" : "",
"url" : "https://api.github.com/users/petroav",
"avatar_url" : "https://avatars.githubusercontent.com/u/665991?"
}
This is my repo collection
{
"_id" : ObjectId("58e8cb640f861e6c40627a07"),
"repoId" : "28688495",
"name" : "petroav/6.828",
"url" : "https://api.github.com/repos/petroav/6.828"
}
This is my events collections
{
"_id" : ObjectId("58e8cb640f861e6c40627a08"),
"eventId" : "2489651045",
"type" : "CreateEvent",
"actorLogin" : "petroav",
"repoId" : "28688495",
"eventDate" : ISODate("2015-01-01T15:00:00.000+0000"),
"public" : true
}
I am trying to do following queries on above data
Return list of all repositories with their top contributor
Find the repository with the highest number of events from an actor (by login). If multiple repos have the same number of events, return the one with the latest event.
Return actor details and list of contributed repositories by login
I tried 3 one by doing this
db.events.aggregate(
[ {
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"repos",
localField:"repoId",
foreignField:"repoId",
as:"Repostory"
}
},
{
$group:{ _id : "$Repostory", repo: { $push: "$$ROOT" } }
}
]
).pretty()
Please help. I am new to mongodb.
These should work, you may have to update some of the variable names if they don't match your code exactly. Because you are using actorLogin and repoId as references instead of _id, you likely want to create indexes for the fields to help with performance.
Also you may want to add a $project stage at the end of these pipelines if you want to clean up the final formats, remove extra fields, rename fields, etc..
For Number 1
db.repos.aggregate(
[
{
$lookup:{
from:"events",
localField:"repoId",
foreignField:"repoId",
as:"Event"
}
},{
$unwind:"$Event"
},
{
$group:{
_id : {repo: "$_id", user: "$Event.actorLogin" },
contributionCount: { $sum:1 },//number of times logged in
}
},
{
$sort: {
contributionCount: -1
}
},{
$group:{
_id: {repo:'$_id.repo'},
contributionCount: {$first: '$contributionCount' },
actorLogin: {$first: '$_id.user' }
}
}
]
).then(console.log)
For Number 2
db.events.aggregate(
[ {
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"repos",
localField:"repoId",
foreignField:"repoId",
as:"Repostory"
}
},{
$unwind:"$Repostory"
},
{
$group:{
_id : "$Repostory",
loginCount: { $sum:1 },//number of times logged in
lastLoginDate: {$max:'$eventDate'} //largest ISODate for the repo
}
},
{
$sort: {
loginCount: -1,
date: -1
}
},
{limit:1}
]
).then(console.log)
For number 3
db.user.aggregate(
[
{
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"events",
localField:"actorLogin",
foreignField:"actorLogin",
as:"Events"
}
},{
$unwind:"$Events"
},
{
$lookup:{
from:"repos",
localField:"Events.repoId",
foreignField:"repoId",
as:"Repostory"
}
},{
$unwind:"$Repostory"
},{
$group: {
_id:'$actorLogin',
user: {$first:'$$ROOT'}
repos: {$addToSet:'$Repostory'}
}
}
]
).then(console.log)
Data:
{
"properties" : {
"user" : ObjectId("51d3053627f4169a52000005"),
"createdOn" : ISODate("2013-07-02T18:00:03.841Z")
},
"_id" : ObjectId("51d31a87716fb81a58000003"),
"geometry" : {
"type" : "Point",
"coordinates" : [ 10, 10 ]
}
},{
"properties" : {
"user" : ObjectId("51d3053627f4169a52000005"),
"createdOn" : ISODate("2013-07-02T18:23:03.841Z")
},
"_id" : ObjectId("51d31a87716fb81a58000003"),
"geometry" : {
"type" : "Point",
"coordinates" : [ 20, 20 ]
}
}
And I am trying the following query:
db.locations.aggregate(
{ $group: {
_id: "$properties.user",
locations: {
$push: {
location: {geometry: "$geometry", properties: "$properties"}
}
}
}},
{ $sort: { "properties.createdOn": 1 }}
);
No matter which direction I change the sort flag (1/-1) the order of my results does not change.
Any ideas?
After the $group, your pipeline docs only contain those fields defined in the $group, so properties.createdOn doesn't exist for the $sort operation.
Move your $sort before the $group in the pipeline instead:
db.locations.aggregate(
{ $sort: { "properties.createdOn": 1 }},
{ $group: {
_id: "$properties.user",
locations: {
$push: {
location: {geometry: "$geometry", properties: "$properties"}
}
}
}}
);
I've just got stuck with this problem. I've got two Mongoose schemas:
var childrenSchema = mongoose.Schema({
name: {
type: String
},
age: {
type: Number,
min: 0
}
});
var parentSchema = mongoose.Schema({
name : {
type: String
},
children: [childrenSchema]
});
Question is, how to fetch all subdocuments (in this case, childrenSchema objects) from every parent document? Let's suppose I have some data:
var parents = [
{ name: "John Smith",
children: [
{ name: "Peter", age: 2 }, { name: "Margaret", age: 20 }
]},
{ name: "Another Smith",
children: [
{ name: "Martha", age: 10 }, { name: "John", age: 22 }
]}
];
I would like to retrieve - in a single query - all children older than 18. Is it possible? Every answer will be appreciated, thanks!
You can use $elemMatch as a query-projection operator in the most recent MongoDB versions. From the mongo shell:
db.parents.find(
{'children.age': {$gte: 18}},
{children:{$elemMatch:{age: {$gte: 18}}}})
This filters younger children's documents out of the children array:
{ "_id" : ..., "children" : [ { "name" : "Margaret", "age" : 20 } ] }
{ "_id" : ..., "children" : [ { "name" : "John", "age" : 22 } ] }
As you can see, children are still grouped inside their parent documents. MongoDB queries return documents from collections. You can use the aggregation framework's $unwind method to split them into separate documents:
> db.parents.aggregate({
$match: {'children.age': {$gte: 18}}
}, {
$unwind: '$children'
}, {
$match: {'children.age': {$gte: 18}}
}, {
$project: {
name: '$children.name',
age:'$children.age'
}
})
{
"result" : [
{
"_id" : ObjectId("51a7bf04dacca8ba98434eb5"),
"name" : "Margaret",
"age" : 20
},
{
"_id" : ObjectId("51a7bf04dacca8ba98434eb6"),
"name" : "John",
"age" : 22
}
],
"ok" : 1
}
I repeat the $match clause for performance: the first time through it eliminates parents with no children at least 18 years old, so the $unwind only considers useful documents. The second $match removes $unwind output that doesn't match, and the $project hoists children's info from subdocuments to the top level.
In Mongoose, you can also use the elegant .populate() function like this:
parents
.find({})
.populate({
path: 'children',
match: { age: { $gte: 18 }},
select: 'name age -_id'
})
.exec()
A. Jesse Jiryu Davis's response works like a charm, however for later versions of Mongoose (Mongoose 5.x) we get the error:
Mongoose 5.x disallows passing a spread of operators to Model.aggregate(). Instead of Model.aggregate({ $match }, { $skip }), do Model.aggregate([{ $match }, { $skip }])
So the code would simply now be:
> db.parents.aggregate([{
$match: {'children.age': {$gte: 18}}
}, {
$unwind: '$children'
}, {
$match: {'children.age': {$gte: 18}}
}, {
$project: {
name: '$children.name',
age:'$children.age'
}
}])
{
"result" : [
{
"_id" : ObjectId("51a7bf04dacca8ba98434eb5"),
"name" : "Margaret",
"age" : 20
},
{
"_id" : ObjectId("51a7bf04dacca8ba98434eb6"),
"name" : "John",
"age" : 22
}
],
"ok" : 1
}
(note the array brackets around the queries)
Hope this helps someone!
Consider the following document in the collection named 'CityAssociation'
{
"_id" : "MY_ID",
"ThisCityID" : "001",
"CityIDs" : [{
"CityID" : "001",
"CityName" : "Bangalore"
}, {
"CityID" : "002",
"CityName" : "Mysore"
}],
"CityUserDetails": {
"User" : "ABCD"
}
}
Now I have User value i.e. in above case I have value ABCD and want to find it with only city where the first level's field ThisCityID matches to the embedded array documnet's field CityID. Finally I need to project as follows (for the above case):
{
'UserName': 'ABCD',
'HomeTown':'Bangalore'
}
In Node.js + MongoDB native drive, I wrote a aggregation query as follows which is not working as expected.
collection.aggregate([
{ $match: { 'CityUserDetails.User': 'ABCD', 'CityIDs': { $elemMatch: { CityID: ThisCityID}}} },
{ $unwind: "$CityIDs" },
{ $group: {
_id: '$_id',
CityUserDetails: { $first: "$CityUserDetails" },
CityIDs: { $first: "$CityIDs" }
}
},
{ $project: {
_id: 0,
"UserName": "$CityUserDetails.User",
"HomeTown": "$CityIDs.CityName"
}
}
], function (err, doc) {
if (err) return console.error(err);
console.dir(doc);
}
);
Can anyone tell me how this can be done with query.
Note: On MongoDB schema we don't have control to change it.
You can use the $eq operator to check if the first level's field ThisCityID matches embedded array document's field CityID.
db.city.aggregate([
{ $match : { "CityUserDetails.User" : "ABCD" }},
{ $unwind : "$CityIDs" },
{ $project : {
matches : { $eq: ["$CityIDs.CityID","$ThisCityID"]},
UserName : "$CityUserDetails.User",
HomeTown : "$CityIDs.CityName"
}},
{ $match : { matches : true }},
{ $project : {
_id : 0,
UserName : 1,
HomeTown : 1
}},
])
And the result is:
{
"result" : [
{
"UserName" : "ABCD",
"HomeTown" : "Bangalore"
}
],
"ok" : 1
}