I got the following aggregation:
It scans all the messages and groups them by a docId and returns only the last updated message in each group.
db.getCollection('Messages').aggregate([ { '$match': { docType: 'order' }}, { '$sort': { updatedAt: -1 } }, { '$group': { _id: '$docId', content: { '$first': '$content' }}}])
which returns -
[
{
"_id" : "some id1",
"content" : "some msg1
}
/* 11 */
{
"_id" : "some id2",
"content" : "some msg2"
}
...
]
It is working as intended (not sure about optimization).
But now I need to add another thing on top of that.
In the UI I got a list of documents and I need to show only the latest message for each. But I also got paging so I dont need to bring the last message for XXXXXX documents but only for 1 page.
So basically something like this -
.find({'docId':{$in:['doc1', 'doc2', 'doc3'...]}}) - if the page had 3 items
But I am not sure how to combine all of that together.
Message sample:
{
"_id": "11111"
"docType": "order",
"docId": "12345", - this is not unique there can be many messages for 1 docId
"content": "my message",
"updatedAt" "01/01/2020..."
}
Adding
{ '$match': { _id: { '$in': ["docId1", "docId2"]} }}
at the end did the trick!
edit:
or actually I think It might be better to add it as the first pipeline so:
db.getCollection('Messages').aggregate([ { '$match': { docId: { '$in': ["5d79cba1-925b-416b-9408-6f4429d7c107", "8e31c748-c86d-407e-8d83-9810c8e23e3e"]} }}, { '$match': { docType: 'order' }}, { '$sort': { cAt: -1 } }, { '$group': { _id: '$docId', content: { '$first': '$content' }}}])
Since I am adding those dynamically I ended up with 2 $match properties. I actually not so sure what difference does it make to use $match + $and vs having 2 different $match (optimization wise).
Related
My Schema looks something like this.
{
_id: '1',
items: {
'id1': 'item1',
'id2': 'item2',
'id3': 'item3'
}
}
Following is the query
ItemModel.find({}, {
items: 1,
_id: 0
});
And the result of the find query is:
{ "items" : { "21" : "item21", "22" : "item22", "23" : "item23" } }
{ "items" : { "31" : "item31", "32" : "item32", "33" : "item33" } }
{ "items" : { "11" : "item11", "12" : "item32", "13" : "item13" } }
What I want is:
["item21", "item22", "item23",
"item31", "item32", "item33",
"item11", "item12", "item13"]
Currently, I am doing the processing on the node.js end for getting the above. I want to reduce the output payload size coming from MongoDB. The "items" key is redundant and the IDs mentioned are not required as well when I fetch it. Here, the IDs are quite small like 21, 22, 13, etc. but those are acutally 50 characters in length.
If not the above, any other efficient alternatives are also welcome.
One example of how to achieve that is the following aggregation:
[
{
$project: {
items: {
$objectToArray: '$items',
},
},
},
{ $unwind: '$items' },
{
$project: {
items: '$items.v',
},
},
{
$group: {
_id: null,
items: {
$push: '$items',
},
},
}
];
What this does is first we convert with $project & $objectToArray field to an array so that we could use $unwind. This way we'll have documents with different items. Now we convert with another $project to make it a string instead of an object (which would be { v: <value>, k: <value> }. And, finally, we $group them together.
Final result:
To get exactly that list, you'll need in your code to access items field, like result[0].items ([0] because aggregation will return an array).
I am working in a mongoDB + node (with express and mongoose) API.
I have a document with more or less this structure:
// npcs
{
"_id" : ObjectId("5ea6c0f88e8ecfd3cdc39eae"),
"flavor" : {
"gender" : "...",
"description" : "...",
"imageUrl" : "...",
"class" : "...",
"campaign" : [
{
"campaignId" : "5eac9dfe8e8ecfd3cdc41aa0",
"unlocked" : true
}
]
},
},
// ...
And a second document in a separate table that is as follows:
// user
{
"_id" : ObjectId("5e987f8e4b88382a98c84042"),
"username" : "KuluGary",
"campaigns" : [
"5eac9dfe8e8ecfd3cdc41aa0",
"5eac9e458e8ecfd3cdc41ac1",
"5eac9e978e8ecfd3cdc41adb",
"5eac9eae8e8ecfd3cdc41ae3"
]
}
What I want to do is make a query in which I obtain all the NPCs that are a part of a campaign the user is part of, and are unlocked. The second part is fairly easy, just thought of once I retrieve the NPCs to filter those with unclocked false, but I'm having a hard time visualizing the query since I'm fairly unfamiliar with mongoDBs syntax and usage.
Any help would be greatly appreciated.
I understand you want to "join" a user with all relevant NPC's?
A simple aggregation with $lookup would work:
db.userCollection.aggregate([
{
$match: {
// match relevant users with whatever condition you want
}
},
{
$lookup: {
from: "npc_collection",
let: {campaigns: "$campaigns"},
pipeline: [
{
$match: {
$expr: {
$gt: [
{
$size: {
$filter: {
input: "$flavor.campaign",
as: "campaign",
cond: {
$and: [
{$setIsSubset: ["$flavor.campaign.campaignId", "$$campaigns"]},
{$eq: ["$$campaign.unlocked", true]}
]
}
}
}
},
0
]
}
}
}
],
as: "relevant_npcs"
}
}
])
Note that due to the need of an NPC to be active in a specific campaign and not just a unlocked in any we require the use of $filter.
I recommend that if you only want to lookup on one user you split this into 2 calls as i feel using $elemMatch would give better performance:
let campaigns = await db.userCollection.distinct("campaigns", {_id: userId})
let results = await db.npcCollection.find({"flavor.campaign": {$elemMatch: { campaignId: {$in: campaigns}, unlocked: true}}})
This is my user collection
{
"_id" : ObjectId("58e8cb640f861e6c40627a06"),
"actorId" : "665991",
"login" : "petroav",
"gravatar_id" : "",
"url" : "https://api.github.com/users/petroav",
"avatar_url" : "https://avatars.githubusercontent.com/u/665991?"
}
This is my repo collection
{
"_id" : ObjectId("58e8cb640f861e6c40627a07"),
"repoId" : "28688495",
"name" : "petroav/6.828",
"url" : "https://api.github.com/repos/petroav/6.828"
}
This is my events collections
{
"_id" : ObjectId("58e8cb640f861e6c40627a08"),
"eventId" : "2489651045",
"type" : "CreateEvent",
"actorLogin" : "petroav",
"repoId" : "28688495",
"eventDate" : ISODate("2015-01-01T15:00:00.000+0000"),
"public" : true
}
I am trying to do following queries on above data
Return list of all repositories with their top contributor
Find the repository with the highest number of events from an actor (by login). If multiple repos have the same number of events, return the one with the latest event.
Return actor details and list of contributed repositories by login
I tried 3 one by doing this
db.events.aggregate(
[ {
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"repos",
localField:"repoId",
foreignField:"repoId",
as:"Repostory"
}
},
{
$group:{ _id : "$Repostory", repo: { $push: "$$ROOT" } }
}
]
).pretty()
Please help. I am new to mongodb.
These should work, you may have to update some of the variable names if they don't match your code exactly. Because you are using actorLogin and repoId as references instead of _id, you likely want to create indexes for the fields to help with performance.
Also you may want to add a $project stage at the end of these pipelines if you want to clean up the final formats, remove extra fields, rename fields, etc..
For Number 1
db.repos.aggregate(
[
{
$lookup:{
from:"events",
localField:"repoId",
foreignField:"repoId",
as:"Event"
}
},{
$unwind:"$Event"
},
{
$group:{
_id : {repo: "$_id", user: "$Event.actorLogin" },
contributionCount: { $sum:1 },//number of times logged in
}
},
{
$sort: {
contributionCount: -1
}
},{
$group:{
_id: {repo:'$_id.repo'},
contributionCount: {$first: '$contributionCount' },
actorLogin: {$first: '$_id.user' }
}
}
]
).then(console.log)
For Number 2
db.events.aggregate(
[ {
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"repos",
localField:"repoId",
foreignField:"repoId",
as:"Repostory"
}
},{
$unwind:"$Repostory"
},
{
$group:{
_id : "$Repostory",
loginCount: { $sum:1 },//number of times logged in
lastLoginDate: {$max:'$eventDate'} //largest ISODate for the repo
}
},
{
$sort: {
loginCount: -1,
date: -1
}
},
{limit:1}
]
).then(console.log)
For number 3
db.user.aggregate(
[
{
$match:{"actorLogin":"petroav"}
},
{
$lookup:{
from:"events",
localField:"actorLogin",
foreignField:"actorLogin",
as:"Events"
}
},{
$unwind:"$Events"
},
{
$lookup:{
from:"repos",
localField:"Events.repoId",
foreignField:"repoId",
as:"Repostory"
}
},{
$unwind:"$Repostory"
},{
$group: {
_id:'$actorLogin',
user: {$first:'$$ROOT'}
repos: {$addToSet:'$Repostory'}
}
}
]
).then(console.log)
I have the following collection which represents a swipe record when a member goes to the gym.
{
"_id" : ObjectId(""),
"content" : {
"Date_Key" : "",
"TRANSACTION_EVENT_KEY" : "",
"SITE_NAME" : "",
"Swipe_DateTime" : "",
"Gender" : "",
"Post_Out_Code" : "",
"Year_Of_Birth" : "",
"Time_Key" : "",
"MemberID_Hash" : "",
"Member_Key_Hash" : "",
"Swipes" : ""
},
"collection" : "observations"
}
I want to return the number of members for every number of gym swipes in a given month.
For example:
{
{"nrOfGymSwipes": 0, "nrOfMembers": 10}, // 10 members who swiped 0 times
{"nrOfGymSwipes": 1, "nrOfMembers": 15}, // 15 members who swiped once
{"nrOfGymSwipes": 2, "nrOfMembers": 17},
...
}
I have tried the following:
collection
.aggregate(
[{$match: {"content.Swipe_DateTime": {$regex:"201602"}}},
{$group: {_id: "$content.MemberID_Hash", "nrOfGymSwipes":{$sum: 1}}},
{$sort: {"nrOfGymSwipes": 1}}],
which returns for each member the number of swipes in the given month.
.........
{ _id: '111', nrOfGymSwipes: 16 },
{ _id: '112', nrOfGymSwipes: 16 },
{ _id: '113', nrOfGymSwipes: 17 },
...............
Now I was thinking of doing a group by the number of gym swipes and count the ids, have tried this but it doesn't return what i expected
collection
.aggregate(
[{$match: {"content.Swipe_DateTime": {$regex:"201602"}}},
{$group: {_id: "$content.MemberID_Hash", "nrOfGymSwipes":{$sum: 1}}},
{$group: {_id: "nrOfGymSwipes", "nrOfMembers":{$sum: 1}}}, <---added this
{$sort: {"nrOfGymSwipes": 1}}],
Any idea how i can solve this?
Also, is there a way to change the way i get the json output? for example instead of showing _id: "32131" part, output nrOfMembers: "312321"
You were almost there with your final group, you only needed to prefix your _id key with $ to indicate the number of swipes field. The $sort pipeline is where another problem is because the field you are trying to sort on does not exist. The aggregate pipeline works on the premise that results from a stage in the pipeline are passed on to the next as modified documents (with their own structure depending on the aggregate operation) and the last group pipeline only produces two fields, "_id" and "nrOfMembers".
You can use the $project pipeline step in order for the $sort stage to work since it creates the "nrOfGymSwipes" field for you by replacing the previous _id key and you can then get the final output in the desired structure. So you final aggregate operation should be:
collection.aggregate([
{ "$match": { "content.Swipe_DateTime": { "$regex":"201602" } } },
{ "$group": { "_id": "$content.MemberID_Hash", "nrOfGymSwipes": { "$sum": 1 } } },
{ "$group": { "_id": "$nrOfGymSwipes", "nrOfMembers": { "$sum": 1 } } },
{ "$project": { "_id": 0, "nrOfGymSwipes": "$_id", "nrOfMembers": 1 } },
{ "$sort": { "nrOfGymSwipes": 1 } }
], function (err, result) { ... });
I have a collection db.activities, each item of which has a dueDate. I need to present data in a following format, which basically a list of activities which are due today and this week:
{
"today": [
{ _id: 1, name: "activity #1" ... },
{ _id: 2, name: "activity #2" ... }
],
"thisWeek": [
{ _id: 3, name: "activity #3" ... }
]
}
I managed to accomplish this by simply querying for the last week's activities as a flat list and then grouping them with javascript on the client, but I suspect this is a very dirty solution and would like to do this on server.
look up mongo aggregation pipeline.
your aggregation has a match by date, group by date and a maybe a sort/order stage also by date.
lacking the data scheme it will be along the lines of
db.collection.aggregate([{ $match: {"duedate": { "$gte" : start_dt, "$lte" : end_dt} } ,
{ $group: {_id: "$duedate", recordid : "$_id" , name: "$name" },
{"$sort" : {"_id" : 1} } ] );
if you want 'all' records remove the $match or use { $match: {} } as one does with find.
in my opinion, you cannot aggregate both by day and week within one command. the weekly one may be achieved by projecting duedate using mongos $dayOfWeek. along the lines of
db.collection.aggregate([
{ $match: {"duedate": { "$gte" : start_dt, "$lte" : end_dt} } ,
{ $project : { dayOfWeek: { $dayOfWeek: "$duedate" } },
{ $group: {_id: "$dayOfWeek", recordid : "$_id" , name: "$name" },
{"$sort" : {"_id" : 1} } ] );
check out http://docs.mongodb.org/manual/reference/operator/aggregation/dayOfWeek/