how can i combine match document's subdocument together as one and return it as an array of object ? i have tried $group but don't seem to work.
my query ( this return array of object in this case there are two )
User.find({
'business_details.business_location': {
$near: coords,
$maxDistance: maxDistance
},
'deal_details.deals_expired_date': {
$gte: new Date()
}
}, {
'deal_details': 1
}).limit(limit).exec(function(err, locations) {
if (err) {
return res.status(500).json(err)
}
console.log(locations)
the console.log(locations) result
// give me the result below
[{
_id: 55 c0b8c62fd875a93c8ff7ea, // first document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 1st deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.3 // 2nd deal
}],
business_details: {}
}, {
_id: 55 a79898e0268bc40e62cd3a, // second document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 3rd deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.78 // 4th deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.32 // 5th deal
}],
business_details: {}
}]
what i wanted to do is to combine these both deal_details field together and return it as an array of object. It will contain 5 deals in one array of object instead of two separated array of objects.
i have try to do it in my backend (nodejs) by using concat or push, however when there's more than 2 match document i'm having problem to concat them together, is there any way to combine all match documents and return it as one ? like what i mentioned above ?
What you are probably missing here is the $unwind pipeline stage, which is what you typically use to "de-normalize" array content, particularly when your grouping operation intends to work across documents in your query result:
User.aggregate(
[
// Your basic query conditions
{ "$match": {
"business_details.business_location": {
"$near": coords,
"$maxDistance": maxDistance
},
"deal_details.deals_expired_date": {
"$gte": new Date()
}},
// Limit query results here
{ "$limit": limit },
// Unwind the array
{ "$unwind": "$deal_details" },
// Group on the common location
{ "$group": {
"_id": "$deal_details.deals_location",
"prices": {
"$push": "$deal_details.deals_price"
}
}}
],
function(err,results) {
if (err) throw err;
console.log(JSON.stringify(results,undefined,2));
}
);
Which gives output like:
{
"_id": "101.6833,3.1333",
"prices": [
12.12,
34.3,
12.12,
34.78,
34.32
]
}
Depending on how many documents actually match the grouping.
Alternately, you might want to look at the $geoNear pipeline stage, which gives a bit more control, especially when dealing with content in arrays.
Also beware that with "location" data in an array, only the "nearest" result is being considered here and not "all" of the array content. So other items in the array may not be actually "near" the queried point. That is more of a design consideration though as any query operation you do will need to consider this.
You can merge them with reduce:
locations = locations.reduce(function(prev, location){
previous = prev.concat(location.deal_details)
return previous
},[])
Related
I have a pretty simple $lookup aggregation query like the following:
{'$lookup':
{'from': 'edge',
'localField': 'gid',
'foreignField': 'to',
'as': 'from'}}
When I run this on a match with enough documents I get the following error:
Command failed with error 4568: 'Total size of documents in edge
matching { $match: { $and: [ { from: { $eq: "geneDatabase:hugo" }
}, {} ] } } exceeds maximum document size' on server
All attempts to limit the number of documents fail. allowDiskUse: true does nothing. Sending a cursor in does nothing. Adding in a $limit into the aggregation also fails.
How could this be?
Then I see the error again. Where did that $match and $and and $eq come from? Is the aggregation pipeline behind the scenes farming out the $lookup call to another aggregation, one it runs on its own that I have no ability to provide limits for or use cursors with??
What is going on here?
As stated earlier in comment, the error occurs because when performing the $lookup which by default produces a target "array" within the parent document from the results of the foreign collection, the total size of documents selected for that array causes the parent to exceed the 16MB BSON Limit.
The counter for this is to process with an $unwind which immediately follows the $lookup pipeline stage. This actually alters the behavior of $lookup in such that instead of producing an array in the parent, the results are instead a "copy" of each parent for every document matched.
Pretty much just like regular usage of $unwind, with the exception that instead of processing as a "separate" pipeline stage, the unwinding action is actually added to the $lookup pipeline operation itself. Ideally you also follow the $unwind with a $match condition, which also creates a matching argument to also be added to the $lookup. You can actually see this in the explain output for the pipeline.
The topic is actually covered (briefly) in a section of Aggregation Pipeline Optimization in the core documentation:
$lookup + $unwind Coalescence
New in version 3.2.
When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.
Best demonstrated with a listing that puts the server under stress by creating "related" documents that would exceed the 16MB BSON limit. Done as briefly as possible to both break and work around the BSON Limit:
const MongoClient = require('mongodb').MongoClient;
const uri = 'mongodb://localhost/test';
function data(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
let db;
try {
db = await MongoClient.connect(uri);
console.log('Cleaning....');
// Clean data
await Promise.all(
["source","edge"].map(c => db.collection(c).remove() )
);
console.log('Inserting...')
await db.collection('edge').insertMany(
Array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 }))
);
await db.collection('source').insert({ _id: 1 })
console.log('Fattening up....');
await db.collection('edge').updateMany(
{},
{ $set: { data: "x".repeat(100000) } }
);
// The full pipeline. Failing test uses only the $lookup stage
let pipeline = [
{ $lookup: {
from: 'edge',
localField: '_id',
foreignField: 'gid',
as: 'results'
}},
{ $unwind: '$results' },
{ $match: { 'results._id': { $gte: 1, $lte: 5 } } },
{ $project: { 'results.data': 0 } },
{ $group: { _id: '$_id', results: { $push: '$results' } } }
];
// List and iterate each test case
let tests = [
'Failing.. Size exceeded...',
'Working.. Applied $unwind...',
'Explain output...'
];
for (let [idx, test] of Object.entries(tests)) {
console.log(test);
try {
let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline),
options = (( +idx === tests.length-1 ) ? { explain: true } : {});
await new Promise((end,error) => {
let cursor = db.collection('source').aggregate(currpipe,options);
for ( let [key, value] of Object.entries({ error, end, data }) )
cursor.on(key,value);
});
} catch(e) {
console.error(e);
}
}
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
After inserting some initial data, the listing will attempt to run an aggregate merely consisting of $lookup which will fail with the following error:
{ MongoError: Total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size
Which is basically telling you the BSON limit was exceeded on retrieval.
By contrast the next attempt adds the $unwind and $match pipeline stages
The Explain output:
{
"$lookup": {
"from": "edge",
"as": "results",
"localField": "_id",
"foreignField": "gid",
"unwinding": { // $unwind now is unwinding
"preserveNullAndEmptyArrays": false
},
"matching": { // $match now is matching
"$and": [ // and actually executed against
{ // the foreign collection
"_id": {
"$gte": 1
}
},
{
"_id": {
"$lte": 5
}
}
]
}
}
},
// $unwind and $match stages removed
{
"$project": {
"results": {
"data": false
}
}
},
{
"$group": {
"_id": "$_id",
"results": {
"$push": "$results"
}
}
}
And that result of course succeeds, because as the results are no longer being placed into the parent document then the BSON limit cannot be exceeded.
This really just happens as a result of adding $unwind only, but the $match is added for example to show that this is also added into the $lookup stage and that the overall effect is to "limit" the results returned in an effective way, since it's all done in that $lookup operation and no other results other than those matching are actually returned.
By constructing in this way you can query for "referenced data" that would exceed the BSON limit and then if you want $group the results back into an array format, once they have been effectively filtered by the "hidden query" that is actually being performed by $lookup.
MongoDB 3.6 and Above - Additional for "LEFT JOIN"
As all the content above notes, the BSON Limit is a "hard" limit that you cannot breach and this is generally why the $unwind is necessary as an interim step. There is however the limitation that the "LEFT JOIN" becomes an "INNER JOIN" by virtue of the $unwind where it cannot preserve the content. Also even preserveNulAndEmptyArrays would negate the "coalescence" and still leave the intact array, causing the same BSON Limit problem.
MongoDB 3.6 adds new syntax to $lookup that allows a "sub-pipeline" expression to be used in place of the "local" and "foreign" keys. So instead of using the "coalescence" option as demonstrated, as long as the produced array does not also breach the limit it is possible to put conditions in that pipeline which returns the array "intact", and possibly with no matches as would be indicative of a "LEFT JOIN".
The new expression would then be:
{ "$lookup": {
"from": "edge",
"let": { "gid": "$gid" },
"pipeline": [
{ "$match": {
"_id": { "$gte": 1, "$lte": 5 },
"$expr": { "$eq": [ "$$gid", "$to" ] }
}}
],
"as": "from"
}}
In fact this would be basically what MongoDB is doing "under the covers" with the previous syntax since 3.6 uses $expr "internally" in order to construct the statement. The difference of course is there is no "unwinding" option present in how the $lookup actually gets executed.
If no documents are actually produced as a result of the "pipeline" expression, then the target array within the master document will in fact be empty, just as a "LEFT JOIN" actually does and would be the normal behavior of $lookup without any other options.
However the output array to MUST NOT cause the document where it is being created to exceed the BSON Limit. So it really is up to you to ensure that any "matching" content by the conditions stays under this limit or the same error will persist, unless of course you actually use $unwind to effect the "INNER JOIN".
I had same issue with fllowing Node.js query becuase 'redemptions' collection has more then 400,000 of data. I am using Mongo DB server 4.2 and Node JS driver 3.5.3.
db.collection('businesses').aggregate(
{
$lookup: { from: 'redemptions', localField: "_id", foreignField: "business._id", as: "redemptions" }
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
I have modified query as below to make it work super fast.
db.collection('businesses').aggregate(query,
{
$lookup:
{
from: 'redemptions',
let: { "businessId": "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$business._id", "$$businessId"] } } },
{ $group: { _id: "$_id", totalCount: { $sum: 1 } } },
{ $project: { "_id": 0, "totalCount": 1 } }
],
as: "redemptions"
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
}
I am using aggregates to query my schema for counts over date ranges, my problem is i am not getting any response from the server (Times out everytime), other mongoose queries are working fine (find, save, etc.) and when i call aggregates it depends on the pipeline (when i only use match i get a response when i add unwind i don't get any).
Connection Code:
var promise = mongoose.connect('mongodb://<username>:<password>#<db>.mlab.com:<port>/<db-name>', {
useMongoClient: true,
replset: {
ha: true, // Make sure the high availability checks are on
haInterval: 5000 // Run every 5 seconds
}
});
promise.then(function(db){
console.log('DB Connected');
}).catch(function(e){
console.log('DB Not Connected');
console.errors(e.message);
process.exit(1);
});
Schema:
var ProspectSchema = new Schema({
contact_name: {
type: String,
required: true
},
company_name: {
type: String,
required: true
},
contact_info: {
type: Array,
required: true
},
description:{
type: String,
required: true
},
product:{
type: Schema.Types.ObjectId, ref: 'Product'
},
progression:{
type: String
},
creator:{
type: String
},
sales: {
type: Schema.Types.ObjectId,
ref: 'User'
},
technical_sales: {
type: Schema.Types.ObjectId,
ref: 'User'
},
actions: [{
type: {type: String},
description: {type: String},
date: {type: Date}
}],
sales_connect_id: {
type: String
},
date_created: {
type: Date,
default: Date.now
}
});
Aggregation code:
exports.getActionsIn = function(start_date, end_date) {
var start = new Date(start_date);
var end = new Date(end_date);
return Prospect.aggregate([
{
$match: {
// "actions": {
// $elemMatch: {
// "type": {
// "$exists": true
// }
// }
// }
"actions.date": {
$gte: start,
$lte: end
}
}
}
,{
$project: {
_id: 0,
actions: 1
}
}
,{
$unwind: "actions"
}
,{
$group: {
_id: "actions.date",
count: {
$sum: 1
}
}
}
// ,{
// $project: {
// _id: 0,
// date: {
// $dateToString: {
// format: "%d/%m/%Y",
// date: "actions.date"
// }
// }
// // ,
// // count : "$count"
// }
// }
]).exec();
}
Calling the Aggregation:
router.get('/test',function(req, res, next){
var start_date = req.query.start_date;
var end_date = req.query.end_date;
ProspectCont.getActionsIn(start_date,end_date).then(function(value, err){
if(err)console.log(err);
res.json(value);
});
})
My Main Problem is that i get no response at all, i can work with an error message the issue is i am not getting any so i don't know what is wrong.
Mongoose Version: 4.11.8
P.s. I tried multiple variations of the aggregation pipeline, so this isn't my first try, i have an aggregation working on the main prospects schema but not the actions sub-document
You have several problems here, mostly by missing concepts. Lazy readers can skip to the bottom for the full pipeline example, but the main body here is in the explanation of why things are done as they are.
You are trying to select on a date range. The very first thing to check on any long running operation is that you have a valid index. You might have one, or you might not. But you should issue: ( from the shell )
db.prospects.createIndex({ "actions.date": 1 })
Just to be sure. You probably really should add this to the schema definition so you know this should be deployed. So add to your defined schema:
ProspectSchema.index({ "actions.date": 1 })
When querying with a "range" on elements of an array, you need to understand that those are "multiple conditions" which you are expecting to match elements "between". Whilst you generally can get away with querying a "single property" of an array using "Dot Notation", you are missing that the application of [$gte][1] and $lte is like specifying the property several times with $and explicitly.
Whenever you have such "multiple conditions" you always mean to use $elemMatch. Without it, you are simply testing every value in the array to see if it is greater than or less than ( being some may be greater and some may be lesser ). The $elemMatch operator makes sure that "both" are applied to the same "element", and not just all array values as "Dot notation" exposes them:
{ "$match": {
"actions": {
"$elemMatch": { "date": { "$gte": start, "$lte: end } }
}
}}
That will now only match documents where the "array elements" fall between the specified date. Without it, you are selecting and processing a lot more data which is irrelevant to the selection.
Array Filtering: Marked in Bold because it's prominence cannot be ignored. Any initial $match works just like any "query" in that it's "job" is to "select documents" valid to the expression. This however does not have any effect on the contents of the array in the documents returned.
Whenever you have such a condition for document selection, you nearly always intend to "filter" such content from the array itself. This is a separate process, and really should be performed before any other operations that work with the content. Especially [$unwind][4].
So you really should add a $filter in either an $addFields or $project as is appropriate to your intend "immediately" following any document selection:
{ "$project": {
"_id": 0,
"actions": {
"$filter": {
"input": "$actions",
"as": "a",
"in": {
"$and": [
{ "$gte": [ "$$a.date", start ] },
{ "$lte": [ "$$a.date", end ] }
]
}
}
}
}}
Now the array content, which you already know "must" have contained at least one valid item due to the initial query conditions, is "reduced" down to only those entries that actually match the date range that you want. This removes a lot of overhead from later processing.
Note the different "logical variants" of $gte and $lte in use within the $filter condition. These evaluate to return a boolean for expressions that require them.
Grouping It's probably just as an attempt at getting a result, but the code you have does not really do anything with the dates in question. Since typical date values should be provided with millisecond precision, you general want to reduce them.
Commented code suggests usage of $dateToString within a $project. It is strongly recommended that you do not do that. If you intend such a reduction, then supply that expression directly to the grouping key within $group instead:
{ "$group": {
"_id": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$actions.date"
}
},
"count": { "$sum": 1 }
}}
I personally don't like returning a "string" when a natural Date object serializes properly for me already. So I like to use the "math" approach to "round" dates instead:
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$actions.date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$actions.date", new Date(0) ] },
1000 * 60 * 60 * 24
]}
],
new Date(0)
]
},
"count": { "$sum": 1 }
}}
That returns a valid Date object "rounded" to the current day. Mileage may vary on preferred approaches, but it's the one I like. And it takes the least bytes to transfer.
The usage of Date(0) represents the "epoch date". So when you $subtract one BSON Date from another you end up with the milliseconds difference between the two as an integer. When $add an integer value to a BSON Date, you get a new BSON Date representing the sum of the milliseconds value between the two. This is the basis of converting to numeric, rounding to the nearest start of day, and then converting numeric back to a Date value.
By making that statement directly within the $group rather than $project, you are basically saving what actually gets interpreted as "go through all the data and return this calculated value, then go and do...". Much the same as working through a pile of objects, marking them with a pen first and then actually counting them as a separate step.
As a single pipeline stage it saves considerable resources as you do the accumulation at the same time as calculating the value to accumulate on. When you think it though much like the provided analogy, it just makes a lot of sense.
As a full pipeline example you would put the above together as:
Prospect.aggregate([
{ "$match": {
"actions": {
"$elemMatch": { "date": { "$gte": start, "$lte: end } }
}
}},
{ "$project": {
"_id": 0,
"actions": {
"$filter": {
"input": "$actions",
"as": "a",
"in": {
"$and": [
{ "$gte": [ "$$a.date", start ] },
{ "$lte": [ "$$a.date", end ] }
]
}
}
}
}},
{ "$unwind": "$actions" },
{ "$group": {
"_id": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$actions.date"
}
},
"count": { "$sum": 1 }
}}
])
And honestly if after making sure an index is in place, and following that pipeline you still have timeout problems, then reduce the date selection down until you get a reasonable response time.
If it's still taking too long ( or the date reduction is not reasonable ) then your hardware simply is not up to the task. If you really have a lot of data then you have to be reasonable with expectations. So scale up or scale out, but those things are outside the scope of any question here.
As it stands those improvements should make a significant difference over any attempt shown so far. Mostly due to a few fundamental concepts that are being missed.
I tried something like below but dint get the expected result , This should work if the fields are not an array
// union of includes and excludes as excludesAndExcludes
getIncludesAndExcludes: (req, res)=>{
console.log('called setunion');
experienceModel.aggregate([
{ $group: {_id : {includes:"$includes", excludes:"$excludes"}}},
{ $project: { includesAndExcludes: { $setUnion: [ "$_id.includes", "$_id.excludes" ] }, _id:0 } }
], (err, data) => {
if (err) {
res.status(500).send(error);
} else {
res.json(data);
}
})
},
Most efficiently exclude the document if neither array has any element via $exists and most importantly you are going to need $ifNull to replace with a null where arrays don't exist.
experienceModel.aggregate([
// Don't include documents that have no arrays
{ "$match": {
"$or": [
{ "includes.0": { "$exists": true } },
{ "excludes.0": { "$exists": true } }
]
},
// Join the arrays and exclude the nulls
{ "$project": {
"_id": 0,
"list": {
"$setDifference": [
{ "$setUnion": [
{ "$ifNull": [ "$includes", [null] ] },
{ "$ifNull": [ "$excludes", [null] ] }
]},
[null]
]
}},
// Unwind. By earlier conditions the array must have some entries
{ "$unwind": "$list" },
// And $group on the list values as the key to produce distinct results
{ "$group": { "_id": "$list" }
],(err, data) => {
// rest of code
})
So the first $match is to filter so that at least one array must be present as a logic rule, which also possibly speeds things up. Next you join the arrays with $setUnion, being careful to replace with an array of a single element [null] using $ifNull if the field was not present. If you did not do this then any $setUnion result would be null rather than listing entries of either array. So that is quite important.
Since it is possible for the output of $setUnion to have a null item in it's list, you can remove that with $setDifference, which is the shortest form of filter you can write and works well with "sets".
All that really remains is to "de-normalize" the array in each document to single documents of each element using [$unwind][6], and we don't need new options like preserveNullAndEmptyArrays because all the logic above already took car of that. And then the final $group is simply done on those values in order to produce "unique" output, which is what the _id key in a $group statement is for.
If you want, then you can even just strip down the result from aggregate to a simple list of strings for your response using .map():
data = data.map( d => d._id );
Then all your service returns is an array of strings, and no embedded structures.
[
"Breakfast"
"Airport Pickup"
"Dinner"
"Accomodation"
]
I have a Comments collection in Mongoose, and a query that returns the most recent five (an arbitrary number) Comments.
Every Comment is associated with another document. What I would like to do is make a query that returns the most recent 5 comments, with comments associated with the same other document combined.
So instead of a list like this:
results = [
{ _id: 123, associated: 12 },
{ _id: 122, associated: 8 },
{ _id: 121, associated: 12 },
{ _id: 120, associated: 12 },
{ _id: 119, associated: 17 }
]
I'd like to return a list like this:
results = [
{ _id: 124, associated: 3 },
{ _id: 125, associated: 19 },
[
{ _id: 123, associated: 12 },
{ _id: 121, associated: 12 },
{ _id: 120, associated: 12 },
],
{ _id: 122, associated: 8 },
{ _id: 119, associated: 17 }
]
Please don't worry too much about the data format: it's just a sketch to try to show the sort of thing I want. I want a result set of a specified size, but with some results grouped according to some criterion.
Obviously one way to do this would be to just make the query, crawl and modify the results, then recursively make the query again until the result set is as long as desired. That way seems awkward. Is there a better way to go about this? I'm having trouble phrasing it in a Google search in a way that gets me anywhere near anyone who might have insight.
Here's an aggregation pipeline query that will do what you are asking for:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 }
])
Lacking any other fields from the sample data to sort by, I used $_id.
If you'd like results that are a little closer in structure to the sample result set you provided you could add a $project to the end:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will print only the result set. Note that even comments that do not share an association object will be in an array. It will be an array of 1 length.
If you are concerned about limiting the results in the grouping as Neil Lunn is suggesting, perhaps a $match in the beginning is a smart idea.
db.comments.aggregate([
{ $match: { createDate: { $gte: new Date(new Date() - 5 * 60000) } } },
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will only include comments made in the last 5 minutes assuming you have a createDate type field. If you do, you might also consider using that as the field to sort by instead of "_id". If you do not have a createDate type field, I'm not sure how best to limit the comments that are grouped as I do not know of a "current _id" in the way that there is a "current time".
I honestly think you are asking a lot here and cannot really see the utility myself, but I'm always happy to have that explained to me if there is something useful I have missed.
Bottom line is you want comments from the last five distinct users by date, and then some sort of grouping of additional comments by those users. The last part is where I see difficulty in rules no matter how you want to attack this, but I'll try to keep this to the most brief form.
No way this happens in a single query of any sort. But there are things that can be done to make it an efficient server response:
var DataStore = require('nedb'),
store = new DataStore();
async.waterfall(
function(callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId } },
{ "$sort": { "associated": 1, "createdDate": -1 } },
{ "$group": {
"_id": "$associated",
"date": { "$first": "$createdDate" }
}},
{ "$sort": { "date": -1 } },
{ "$limit": 5 }
],
callback);
},
function(docs,callback) {
async.each(docs,function(doc,callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId, "associated": doc._id } },
{ "$sort": { "createdDate": -1 } },
{ "$limit": 5 },
{ "$group": {
"_id": "$associated",
"docs": {
"$push": {
"_id": "$_id", "createdDate": "$createdDate"
}
},
"firstDate": { "$first": "$createdDate" }
}}
],
function(err,results) {
if (err) callback(err);
async.each(results,function(result,callback) {
store.insert( result, function(err, result) {
callback(err);
});
},function(err) {
callback(err);
});
}
);
},
callback);
},
function(err) {
if (err) throw err;
store.find({}).sort({ "firstDate": - 1 }).exec(function(err,docs) {
if (err) throw err;
console.log( JSON.stringify( docs, undefined, 4 ) );
});
}
);
Now I stuck more document properties in both the document and the array, but the simplified form based on your sample would then come out like this:
results = [
{ "_id": 3, "docs": [124] },
{ "_id": 19, "docs": [125] },
{ "_id": 12, "docs": [123,121,120] },
{ "_id": 8, "docs": [122] },
{ "_id": 17, "docs": [119] }
]
So the essential idea is to first find your distinct "users" who where the last to comment by basically chopping off the last 5. Without filtering some kind of range here that would go over the entire collection to get those results, so it would be best to restrict this in some way, as in the last hour or last few hours or something sensible as required. Just add those conditions to the $match along with the current post that is associated with the comments.
Once you have those 5, then you want to get any possible "grouped" details for multiple comments by those users. Again, some sort of limit is generally advised for a timeframe, but as a general case this is just looking for the most recent comments by the user on the current post and restricting that to 5.
The execution here is done in parallel, which will use more resources but is fairly effective considering there are only 5 queries to run anyway. In contrast to your example output, the array here is inside the document result, and it contains the original document id values for each comment for reference. Any other content related to the document would be pushed into the array as well as required (ie The content of the comment).
The other little trick here is using nedb as a means for storing the output of each query in an "in memory" collection. This need only really be a standard hash data structure, but nedb gives you a way of doing that while maintaining the MongoDB statement form that you may be used to.
Once all results are obtained you just return them as your output, and sorted as shown to retain the order of who commented last. The actual comments are grouped in the array for each item and you can traverse this to output how you like.
Bottom line here is that you are asking for a compounded version of the "top N results problem", which is something often asked of MongoDB. I've written about ways to tackle this before to show how it's possible in a single aggregation pipeline stage, but it really is not practical for anything more than a relatively small result set.
If you really want to join in the insanity, then you can look at Mongodb aggregation $group, restrict length of array for one of the more detailed examples. But for my money, I would run on parallel queries any day. Node.js has the right sort of environment to support them, so you would be crazy to do it otherwise.
I have the following mongodb query in node.js which gives me a list of unique zip codes with a count of how many times the zip code appears in the database.
collection.aggregate( [
{
$group: {
_id: "$Location.Zip",
count: { $sum: 1 }
}
},
{ $sort: { _id: 1 } },
{ $match: { count: { $gt: 1 } } }
], function ( lookupErr, lookupData ) {
if (lookupErr) {
res.send(lookupErr);
return;
}
res.send(lookupData.sort());
});
});
How can this query be modified to return one specific zip code? I've tried the condition clause but have not been able to get it to work.
Aggregations that require filtered results can be done with the $match operator. Without tweaking what you already have, I would suggest just sticking in a $match for the zip code you want returned at the top of the aggregation list.
collection.aggregate( [
{
$match: {
zip: 47421
}
},
{
$group: {
...
This example will result in every aggregation operation after the $match working on only the data set that is returned by the $match of the zip key to the value 47421.
in the $match pipeline operator add
{ $match: { count: { $gt: 1 },
_id : "10002" //replace 10002 with the zip code you want
}}
As a side note, you should put the $match operator first and in general as high in the aggregation chain as you can.