Getting the number of unique values of a query - node.js

I have some documents with the following structure:
{
"_id": "53ad76d70ddd13e015c0aed1",
"action": "login",
"actor": {
"name": "John",
"id": 21337037
}
}
How can I make a query in Node.js that will return the number of the unique actors that have done a specific action. For example if I have a activity stream log, that shows all the actions done by the actors, and a actorscan make a specific action multiple times, how can I get the number of all the unique actors that have done the "login" action. The actors are identified by actor.id

db.collection.distinct()
db.collection.distinct("actor.id", { action: "login"})
will return all unique occiriences and then you can get count of a result set.
PS
do not forget about db.collection.ensureIndex({action: 1})

You can use aggregation framework for this:
db.coll.aggregate([
/* Filter only actions you're looking for */
{ $match : { action : "login" }},
/* finally group the documents by actors to calculate the num. of actions */
{ $group : { _id : "$actor", numActions: { $sum : 1 }}}
]);
This query will group the documents by the entire actor sub-document and calculate the number of actions by using $sum. The $match operator will filter only documents with specific action.
However, that query will work only if your actor sub-documents are the same. You said that you're identifying your actors by id field. So if, for some reason, actor sub-documents are not exactly the same, you will have problems with your results.
Consider these these three documents:
{
...
"actor": {
"name": "John",
"id": 21337037
}
},
{
...
"actor": {
"name": "john",
"id": 21337037
}
},
{
...
"actor": {
"surname" : "Nash",
"name": "John",
"id": 21337037
}
}
They will be grouped in three different groups, even though the id field is the same.
To overcome this problem, you will need to group by actor.id.
db.coll.aggregate([
/* Filter only actions you're looking for */
{ $match : { action : "login" }},
/* finally group the documents to calculate the num. of actions */
{ $group : { _id : "$actor.id", numActions: { $sum : 1 }}}
]);
This query will correctly group your documents by looking only at the actor.id field.
Edit
You didn't specify what driver you were using so I wrote the examples for MongoDB shell.
Aggregation with Node.js driver is very similar but with one difference: Node.js is async The results of the aggregation are returned in the callback. You can check the Node.js aggregation documentation for more examples:
So the aggregation command in Node.js will look like this:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://127.0.0.1:27017/test', function(err, db) {
if(err) throw err;
var collection = db.collection('auditlogs');
collection.aggregate([
{ $match : { action : "login" }},
{ $group : { _id : "$actor.id", numActions: { $sum : 1 }}} ],
function(err, docs) {
if (err) console.error(err);
console.log(docs);
// do something with results
}
);
});
For these test documents:
{
"_id" : ObjectId("53b162ea698171cc1677fab8"),
"action" : "login",
"actor" : {
"name" : "John",
"id" : 21337037
}
},
{
"_id" : ObjectId("53b162ee698171cc1677fab9"),
"action" : "login",
"actor" : {
"name" : "john",
"id" : 21337037
}
},
{
"_id" : ObjectId("53b162f7698171cc1677faba"),
"action" : "login",
"actor" : {
"name" : "john",
"surname" : "nash",
"id" : 21337037
}
},
{
"_id" : ObjectId("53b16319698171cc1677fabb"),
"action" : "login",
"actor" : {
"name" : "foo",
"id" : 10000
}
}
It will return the following result:
[ { _id: 10000, numActions: 1 },
{ _id: 21337037, numActions: 3 } ]

The aggregation framework is your answer:
db.actors.aggregate([
// If you really need to filter
{ "$match": { "action": "login" } },
// Then group
{ "$group": {
"_id": {
"action": "$action",
"actor": "$actor"
},
"count": { "$sum": 1 }
}}
])
Your "actor" combination is "unique", so all you need to do it have the common "grouping keys" under the _id value for the $group pipeline stage and count those "distinct" combinations with $sum.

Related

Query nested array

I have the below User document. I want to return a list of all 'friends' where friends.name is equal to "Bob".
{
"_id" : ObjectId("5a4be9f200471a49d2e23ce4"),
"name": "James"
"friends" : [
{
"_id" : ObjectId("5a4be9f200471a49d2e23ce6"),
"dob" : ISODate("2018-01-02T00:00:00.000Z"),
"name" : "Bob"
},
{
"_id" : ObjectId("5a4be9f200471a49d2e23ce5"),
"dob" : ISODate("2018-01-02T00:00:00.000Z"),
"name" : "Fred"
}
],
"__v" : 0
}
When I try to query using the below, its working but its returning the whole friends list, not just Bob.
User.findOne({ "friends.name": "Bob" }, function(err, friends) {
if(err) return next(err);
res.send(friends);
});
How can I query so I only to return Bob object and not Fred?
Your query is correct, but it returns all user documents having at least one friend matching your condition.
If you just want matching items from the friends collection, you might do something like this.
db.User.aggregate([
{ $unwind: "$friends" },
{ $replaceRoot: { newRoot: "$friends" } },
{ $match: { name: "Bob" }}
])

How can I check for duplicate documents in Mongoose?

Here is an example of a nested document that I have in my collection:
"person" : [
{
"title" : "front-end developer",
"skills" : [
{
"name" : "js",
"project" : "1",
},
{
"name" : "CSS",
"project" : "5",
}
]
},
{
"title" : "software engineer",
"skills" : [
{
"name" : "Java",
"project" : "1",
},
{
"name" : "c++",
"project" : "5",
}
]
}
]
Is there a simple way of determining whether other documents are identical to this object e.g. has the same keys, value and array indexes? Currently my method of checking for duplicates is very long and requires multiple nested loops. Any help would be greatly appreciated. Thanks!
If you want to get a list of identical (except for the _id field, obviously) documents in your collection, here is how you can do that:
collection.aggregate({
$project: {
"_id": 1, // keep the _id field where it is anyway
"doc": "$$ROOT" // store the entire document in the "doc" field
}
}, {
$project: {
"doc._id": 0 // remove the _id from the stored document because we do not want to compare it
}
}, {
$group: {
"_id": "$doc", // group by the entire document's contents as in "compare the whole document"
"ids": { $push: "$_id" }, // create an array of all IDs that form this group
"count": { $sum: 1 } // count the number of documents in this group
}
}, {
$match: {
"count": { $gt: 1 } // only show what's duplicated
}
})
As always with the aggregation framework, you can try to make sense of what exactly is going on in each step by commenting out all steps and then activating everything again stage by stage.

Category hierarchy aggregation using mongodb and nodejs

My document structure is as follows:
{
"_id" : ObjectId("54d81827e4a4449d023b4e34"),
"cat_id" : 1,
"description" : "Refridgerator",
"image" : "refridgerator",
"parent" : null,
"slug" : "refridgerator"
}
{
"_id" : ObjectId("54dc38bce4a4449d023b4e58"),
"name" : "Ice Cream",
"description" : "Ice Cream",
"image" : "ice-cream.jpg",
"slug" : "ice-cream",
"parent" : "54d81827e4a4449d023b4e34"
}
{
"_id" : ObjectId("54dc3705e4a4449d023b4e56"),
"name" : "Chocolate",
"description" : "Chocolate",
"image" : "chocolate.jpg",
"slug" : "chocolate",
"parent" : "54d81827e4a4449d023b4e34"
}
I’m making a category hierarchy using mongodb and nodejs.
Now I wish to query for _id = ‘54d81827e4a4449d023b4e34’ (Refridgerator) and should get back all the child categories
How to achieve the above in nodejs?
Also, nodejs uses async call to the database, I’m unable to get the json structured with parent – child relations.
How would I do the async call for this?
You want the refridgerator and all the subcategories?
And async is also a problem?
I think you can use aggregation here.
Say you're looking for a category with _id variable which is an ObjectId of what you want, and it's subcategories.
db.yourCollection.aggregate({
// get stuff where you have the parent or subcats.
$match: {
$or: [
{_id: ObjectId("54de8b9f022ff38bbf5e0530")},
{parent: ObjectId("54de8b9f022ff38bbf5e0530")}
]
}
},
// reshape the data you'll need further on from each mached doc
{
$project: {
_id: false,
data: {
id: '$_id',
name: '$name'
// I guess you'll also want the `slug` and `image` here.
// but that's homework :)
},
parent: '$parent'
}
},
// now put a common _id so you can group them, and also put stuff into arrays
{
$project: {
id: {$literal: 'id'},
mainCategory: {
// if our parent is null, put our data.
// otherwise put null here.
$cond: [{$eq: [null, '$parent']}, {_id: '$data.id', name: '$data.name'}, undefined]
},
subcat: {
// here is the other way around.
$cond: [{$ne: [null, '$parent']}, {_id: '$data.id', name: '$data.name'}, null]
}
}
// that stage produces for each doc either a mainCat or subcat
// (and the other prop equals to null)
},
// finally, group the things so you can have them together
{
$group: {
_id: '$id',
// a bit hacky, but mongo will yield to it
mainCategory: {$max: '$mainCategory'},
subCategories: {
// this will, unfortunately, also add the `null` we have
// assigned to main category up there
$addToSet: '$subcat'
}
}
},
// so we get rid of the unwanted _id = 'id' and the null from subcats.
{
$project: {
_id: false,
mainCategory: 1,
subCategories: {
$setDifference: ['$subCategories', [null]]
}
}
})
Given this data set:
[{
"_id" : ObjectId("54de8b9f022ff38bbf5e0530"),
"name" : "Fridge",
"parent" : null
},
{
"_id" : ObjectId("54de8bba022ff38bbf5e0531"),
"name" : "choco",
"parent" : ObjectId("54de8b9f022ff38bbf5e0530")
},
{
"_id" : ObjectId("54de8bc8022ff38bbf5e0532"),
"name" : "apple",
"parent" : ObjectId("54de8b9f022ff38bbf5e0530")
}
I get this result:
{
"result" : [
{
"mainCategory" : {
"_id" : ObjectId("54de8b9f022ff38bbf5e0530"),
"name" : "Fridge"
},
"subCategories" : [
{
"_id" : ObjectId("54de8bc8022ff38bbf5e0532"),
"name" : "apple"
},
{
"_id" : ObjectId("54de8bba022ff38bbf5e0531"),
"name" : "choco"
}
]
}
],
"ok" : 1
}
As for async, typically you'd do something like this:
db.collection.aggregate(thePipeLineAbove, function(err, results) {
// handle err
if (err) {
// deal with it
} else {
console.log(results);
}
});
But that depends a bit on your MongoDB driver.
You could expand this even if you have deeper hierarchy structure.
This has nothing to do with NodeJS, it's your data structure that matters.
refer to my answer to this question, the first part is about how to implement it efficiently.

MongoDb - $match filter not working in subdocument

This is Collection Structure
[{
"_id" : "....",
"name" : "aaaa",
"level_max_leaves" : [
{
level : "ObjectIdString 1",
max_leaves : 4,
}
]
},
{
"_id" : "....",
"name" : "bbbb",
"level_max_leaves" : [
{
level : "ObjectIdString 2",
max_leaves : 2,
}
]
}]
I need to find the subdocument value of level_max_leaves.level filter when its matching with given input value.
And this how I tried,
For example,
var empLevelId = 'ObjectIdString 1' ;
MyModel.aggregate(
{$unwind: "$level_max_leaves"},
{$match: {"$level_max_leaves.level": empLevelId } },
{$group: { "_id": "$level_max_leaves.level",
"total": { "$sum": "$level_max_leaves.max_leaves" }}},
function (err, res) {
console.log(res);
});
But here the $match filter is not working. I can't find out exact results of ObjectIdString 1
If I filter with name field, its working fine. like this,
{$match: {"$name": "aaaa" } },
But in subdocument level its returns 0.
{$match: {"$level_max_leaves.level": "ObjectIdString 1"} },
My expected result was,
{
"_id" : "ObjectIdString 1",
"total" : 4,
}
You have typed the $match incorrectly. Fields with $ prefixes are either for the implemented operators or for "variable" references to field content. So you just type the field name:
MyModel.aggregate(
[
{ "$match": { "level_max_leaves.level": "ObjectIdString 1" } },
{ "$unwind": "$level_max_leaves" },
{ "$match": { "level_max_leaves.level": "ObjectIdString 1" } },
{ "$group": {
"_id": "$level_max_leaves.level",
"total": { "$sum": "$level_max_leaves.max_leaves" }
}}
],
function (err, res) {
console.log(res);
}
);
Which on the sample you provide produces:
{ "_id" : "ObjectIdString 1", "total" : 4 }
It is also good practice to $match first in your pipeline. That is in fact the only time an index can be used. But not only for that, as without the initial $match statement, your aggregation pipeline would perform an $unwind operation on every document in the collection, whether it met the conditions or not.
So generally what you want to do here is
Match the documents that contain the required elements in the array
Unwind the array of the matching documents
Match the required array content excluding all others

Search for most common "data" with mongoose, mongodb

My structure.
User:
{
name: "One",
favoriteWorkouts: [ids of workouts],
workouts: [ { name: "My workout 1" },...]
}
I want to get list of favorits/hottest workouts from database.
db.users.aggregate(
{ $unwind : "$favorite" },
{ $group : { _id : "$favorite" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
)
This returns
{
"hot": [
{
"_id": "521f6c27145c5d515f000006",
"number": 1
},
{
"_id": "521f6c2f145c5d515f000007",
"number": 1
},...
]}
But I want
{
hot: [
{object of hottest workout 1, object of hottest workout 2,...}
]}
How do you sort hottest data and fill the result with object, not just ids?
You are correct to want to use MongoDB's aggregation framework. Aggregation will give you the output you are looking for if used correctly. If you are looking for just a list of the _id's of all users' favorite workouts, then I believe that you would need to add an additional $group operation to your pipeline:
db.users.aggregate(
{ $unwind : "$favoriteWorkouts" },
{ $group : { _id : "$favoriteWorkouts", number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $group : { _id : "oneDocumentWithWorkoutArray", hot : { $push : "$_id" } } }
)
This will yield a document of the following form, with the workout ids listed by popularity:
{
"_id" : "oneDocumentWithWorkoutArray",
"hot" : [
"workout6",
"workout1",
"workout5",
"workout4",
"workout3",
"workout2"
]
}

Resources