I have the following mongodb query in node.js which gives me a list of unique zip codes with a count of how many times the zip code appears in the database.
collection.aggregate( [
{
$group: {
_id: "$Location.Zip",
count: { $sum: 1 }
}
},
{ $sort: { _id: 1 } },
{ $match: { count: { $gt: 1 } } }
], function ( lookupErr, lookupData ) {
if (lookupErr) {
res.send(lookupErr);
return;
}
res.send(lookupData.sort());
});
});
How can this query be modified to return one specific zip code? I've tried the condition clause but have not been able to get it to work.
Aggregations that require filtered results can be done with the $match operator. Without tweaking what you already have, I would suggest just sticking in a $match for the zip code you want returned at the top of the aggregation list.
collection.aggregate( [
{
$match: {
zip: 47421
}
},
{
$group: {
...
This example will result in every aggregation operation after the $match working on only the data set that is returned by the $match of the zip key to the value 47421.
in the $match pipeline operator add
{ $match: { count: { $gt: 1 },
_id : "10002" //replace 10002 with the zip code you want
}}
As a side note, you should put the $match operator first and in general as high in the aggregation chain as you can.
Related
I have a pretty simple $lookup aggregation query like the following:
{'$lookup':
{'from': 'edge',
'localField': 'gid',
'foreignField': 'to',
'as': 'from'}}
When I run this on a match with enough documents I get the following error:
Command failed with error 4568: 'Total size of documents in edge
matching { $match: { $and: [ { from: { $eq: "geneDatabase:hugo" }
}, {} ] } } exceeds maximum document size' on server
All attempts to limit the number of documents fail. allowDiskUse: true does nothing. Sending a cursor in does nothing. Adding in a $limit into the aggregation also fails.
How could this be?
Then I see the error again. Where did that $match and $and and $eq come from? Is the aggregation pipeline behind the scenes farming out the $lookup call to another aggregation, one it runs on its own that I have no ability to provide limits for or use cursors with??
What is going on here?
As stated earlier in comment, the error occurs because when performing the $lookup which by default produces a target "array" within the parent document from the results of the foreign collection, the total size of documents selected for that array causes the parent to exceed the 16MB BSON Limit.
The counter for this is to process with an $unwind which immediately follows the $lookup pipeline stage. This actually alters the behavior of $lookup in such that instead of producing an array in the parent, the results are instead a "copy" of each parent for every document matched.
Pretty much just like regular usage of $unwind, with the exception that instead of processing as a "separate" pipeline stage, the unwinding action is actually added to the $lookup pipeline operation itself. Ideally you also follow the $unwind with a $match condition, which also creates a matching argument to also be added to the $lookup. You can actually see this in the explain output for the pipeline.
The topic is actually covered (briefly) in a section of Aggregation Pipeline Optimization in the core documentation:
$lookup + $unwind Coalescence
New in version 3.2.
When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.
Best demonstrated with a listing that puts the server under stress by creating "related" documents that would exceed the 16MB BSON limit. Done as briefly as possible to both break and work around the BSON Limit:
const MongoClient = require('mongodb').MongoClient;
const uri = 'mongodb://localhost/test';
function data(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
let db;
try {
db = await MongoClient.connect(uri);
console.log('Cleaning....');
// Clean data
await Promise.all(
["source","edge"].map(c => db.collection(c).remove() )
);
console.log('Inserting...')
await db.collection('edge').insertMany(
Array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 }))
);
await db.collection('source').insert({ _id: 1 })
console.log('Fattening up....');
await db.collection('edge').updateMany(
{},
{ $set: { data: "x".repeat(100000) } }
);
// The full pipeline. Failing test uses only the $lookup stage
let pipeline = [
{ $lookup: {
from: 'edge',
localField: '_id',
foreignField: 'gid',
as: 'results'
}},
{ $unwind: '$results' },
{ $match: { 'results._id': { $gte: 1, $lte: 5 } } },
{ $project: { 'results.data': 0 } },
{ $group: { _id: '$_id', results: { $push: '$results' } } }
];
// List and iterate each test case
let tests = [
'Failing.. Size exceeded...',
'Working.. Applied $unwind...',
'Explain output...'
];
for (let [idx, test] of Object.entries(tests)) {
console.log(test);
try {
let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline),
options = (( +idx === tests.length-1 ) ? { explain: true } : {});
await new Promise((end,error) => {
let cursor = db.collection('source').aggregate(currpipe,options);
for ( let [key, value] of Object.entries({ error, end, data }) )
cursor.on(key,value);
});
} catch(e) {
console.error(e);
}
}
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
After inserting some initial data, the listing will attempt to run an aggregate merely consisting of $lookup which will fail with the following error:
{ MongoError: Total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size
Which is basically telling you the BSON limit was exceeded on retrieval.
By contrast the next attempt adds the $unwind and $match pipeline stages
The Explain output:
{
"$lookup": {
"from": "edge",
"as": "results",
"localField": "_id",
"foreignField": "gid",
"unwinding": { // $unwind now is unwinding
"preserveNullAndEmptyArrays": false
},
"matching": { // $match now is matching
"$and": [ // and actually executed against
{ // the foreign collection
"_id": {
"$gte": 1
}
},
{
"_id": {
"$lte": 5
}
}
]
}
}
},
// $unwind and $match stages removed
{
"$project": {
"results": {
"data": false
}
}
},
{
"$group": {
"_id": "$_id",
"results": {
"$push": "$results"
}
}
}
And that result of course succeeds, because as the results are no longer being placed into the parent document then the BSON limit cannot be exceeded.
This really just happens as a result of adding $unwind only, but the $match is added for example to show that this is also added into the $lookup stage and that the overall effect is to "limit" the results returned in an effective way, since it's all done in that $lookup operation and no other results other than those matching are actually returned.
By constructing in this way you can query for "referenced data" that would exceed the BSON limit and then if you want $group the results back into an array format, once they have been effectively filtered by the "hidden query" that is actually being performed by $lookup.
MongoDB 3.6 and Above - Additional for "LEFT JOIN"
As all the content above notes, the BSON Limit is a "hard" limit that you cannot breach and this is generally why the $unwind is necessary as an interim step. There is however the limitation that the "LEFT JOIN" becomes an "INNER JOIN" by virtue of the $unwind where it cannot preserve the content. Also even preserveNulAndEmptyArrays would negate the "coalescence" and still leave the intact array, causing the same BSON Limit problem.
MongoDB 3.6 adds new syntax to $lookup that allows a "sub-pipeline" expression to be used in place of the "local" and "foreign" keys. So instead of using the "coalescence" option as demonstrated, as long as the produced array does not also breach the limit it is possible to put conditions in that pipeline which returns the array "intact", and possibly with no matches as would be indicative of a "LEFT JOIN".
The new expression would then be:
{ "$lookup": {
"from": "edge",
"let": { "gid": "$gid" },
"pipeline": [
{ "$match": {
"_id": { "$gte": 1, "$lte": 5 },
"$expr": { "$eq": [ "$$gid", "$to" ] }
}}
],
"as": "from"
}}
In fact this would be basically what MongoDB is doing "under the covers" with the previous syntax since 3.6 uses $expr "internally" in order to construct the statement. The difference of course is there is no "unwinding" option present in how the $lookup actually gets executed.
If no documents are actually produced as a result of the "pipeline" expression, then the target array within the master document will in fact be empty, just as a "LEFT JOIN" actually does and would be the normal behavior of $lookup without any other options.
However the output array to MUST NOT cause the document where it is being created to exceed the BSON Limit. So it really is up to you to ensure that any "matching" content by the conditions stays under this limit or the same error will persist, unless of course you actually use $unwind to effect the "INNER JOIN".
I had same issue with fllowing Node.js query becuase 'redemptions' collection has more then 400,000 of data. I am using Mongo DB server 4.2 and Node JS driver 3.5.3.
db.collection('businesses').aggregate(
{
$lookup: { from: 'redemptions', localField: "_id", foreignField: "business._id", as: "redemptions" }
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
I have modified query as below to make it work super fast.
db.collection('businesses').aggregate(query,
{
$lookup:
{
from: 'redemptions',
let: { "businessId": "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$business._id", "$$businessId"] } } },
{ $group: { _id: "$_id", totalCount: { $sum: 1 } } },
{ $project: { "_id": 0, "totalCount": 1 } }
],
as: "redemptions"
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
}
how can i combine match document's subdocument together as one and return it as an array of object ? i have tried $group but don't seem to work.
my query ( this return array of object in this case there are two )
User.find({
'business_details.business_location': {
$near: coords,
$maxDistance: maxDistance
},
'deal_details.deals_expired_date': {
$gte: new Date()
}
}, {
'deal_details': 1
}).limit(limit).exec(function(err, locations) {
if (err) {
return res.status(500).json(err)
}
console.log(locations)
the console.log(locations) result
// give me the result below
[{
_id: 55 c0b8c62fd875a93c8ff7ea, // first document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 1st deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.3 // 2nd deal
}],
business_details: {}
}, {
_id: 55 a79898e0268bc40e62cd3a, // second document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 3rd deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.78 // 4th deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.32 // 5th deal
}],
business_details: {}
}]
what i wanted to do is to combine these both deal_details field together and return it as an array of object. It will contain 5 deals in one array of object instead of two separated array of objects.
i have try to do it in my backend (nodejs) by using concat or push, however when there's more than 2 match document i'm having problem to concat them together, is there any way to combine all match documents and return it as one ? like what i mentioned above ?
What you are probably missing here is the $unwind pipeline stage, which is what you typically use to "de-normalize" array content, particularly when your grouping operation intends to work across documents in your query result:
User.aggregate(
[
// Your basic query conditions
{ "$match": {
"business_details.business_location": {
"$near": coords,
"$maxDistance": maxDistance
},
"deal_details.deals_expired_date": {
"$gte": new Date()
}},
// Limit query results here
{ "$limit": limit },
// Unwind the array
{ "$unwind": "$deal_details" },
// Group on the common location
{ "$group": {
"_id": "$deal_details.deals_location",
"prices": {
"$push": "$deal_details.deals_price"
}
}}
],
function(err,results) {
if (err) throw err;
console.log(JSON.stringify(results,undefined,2));
}
);
Which gives output like:
{
"_id": "101.6833,3.1333",
"prices": [
12.12,
34.3,
12.12,
34.78,
34.32
]
}
Depending on how many documents actually match the grouping.
Alternately, you might want to look at the $geoNear pipeline stage, which gives a bit more control, especially when dealing with content in arrays.
Also beware that with "location" data in an array, only the "nearest" result is being considered here and not "all" of the array content. So other items in the array may not be actually "near" the queried point. That is more of a design consideration though as any query operation you do will need to consider this.
You can merge them with reduce:
locations = locations.reduce(function(prev, location){
previous = prev.concat(location.deal_details)
return previous
},[])
I'm having a lot of difficulty in solving this mongodb (mongoose) problem.
There is this schema 'Recommend' (username, roomId, ll and date) and its collection contains recommendation of user.
I need to get a list of most recommended rooms (by roomId). Below is the schema and my tried solution with mongoose query.
var recommendSchema = mongoose.Schema({
username: String,
roomId: String,
ll: { type: { type: String }, coordinates: [ ] },
date: Date
})
recommendSchema.index({ ll: '2dsphere' });
var Recommend = mongoose.model('Recommend', recommendSchema);
Recommend.aggregate(
{
$group:
{
_id: '$roomId',
recommendCount: { $sum: 1 }
}
},
function (err, res) {
if (err) return handleError(err);
var resultSet = res.sort({'recommendCount': 'desc'});
}
);
The results returned from the aggregation pipeline are just plain objects. So you do the sorting as a pipeline stage, not as a separate operation:
Recommend.aggregate(
[
// Grouping pipeline
{ "$group": {
"_id": '$roomId',
"recommendCount": { "$sum": 1 }
}},
// Sorting pipeline
{ "$sort": { "recommendCount": -1 } },
// Optionally limit results
{ "$limit": 5 }
],
function(err,result) {
// Result is an array of documents
}
);
So there are various pipeline operators that can be used to $group or $sort or $limit and other things as well. These can be presented in any order, and as many times as required. Just understanding that one "pipeline" stage flows results into the next to act on.
I want to fetch all users user_totaldocs and user_totalthings and want to sum those variables.
How can it's possible? Here is user schema:
var user_schema = mongoose.Schema({
local : {
...
...
user_id : String,
user_totaldocs : Number,
user_totalthings : Number
....
}
});
You can use the Aggregation Pipeline to add calculated fields to a result. There are some examples below using the mongo shell, but the syntax in Mongoose's Aggregate() helper is similar.
For example, to calculate sums (per user document) you can use the $add expression in a $project stage:
db.user.aggregate(
// Limit to relevant documents and potentially take advantage of an index
{ $match: {
user_id: "foo"
}},
{ $project: {
user_id: 1,
total: { $add: ["$user_totaldocs", "$user_totalthings"] }
}}
)
To calculate totals across multiple documents you need to use a $group stage with a $sum accumulator, for example:
db.user.aggregate(
{ $group: {
_id: null,
total: { $sum: { $add: ["$user_totaldocs", "$user_totalthings"] } },
totaldocs: { $sum: "$user_totaldocs" },
totalthings: { $sum: "$user_totalthings" }
}}
)
You may want only the one total field; I've added in totaldocs and totalthings as examples of calculating multiple fields.
A group _id of null will combine values from all documents passed to the $group stage, but you can also use other criteria here (such as grouping by user_id).
You can use aggregation framework provided by mongodb. For your case --
if you want to fetch sum of user_totaldocs and sum of user_totalthings across the collection (meaning for all users), do --
db.user_schemas.aggregate(
[
{
$group : {
user_id : null,
user_totaldocs: { $sum: "$user_totaldocs"}, // for your case use local.user_totaldocs
user_totalthings: { $sum: "$user_totalthings" }, // for your case use local.user_totalthings
count: { $sum: 1 } // for no. of documents count
}
}
])
To sum user_totaldocs and user_totalthings for particular user in a collection(assuming there are multiple document for a user), this will return sum for each user, DO --
db.user_schemas.aggregate(
[
{
$group : {
user_id : "$user_id",
user_totaldocs: { $sum: "$user_totaldocs"}, // for your case use local.user_totaldocs
user_totalthings: { $sum: "$user_totalthings" }, // for your case use local.user_totalthings
count: { $sum: 1 } // for no. of documents count
}
}
])
No need to provide individual user id.
For more info read:
1. http://docs.mongodb.org/manual/reference/operator/aggregation/group/#pipe._S_group
2. http://docs.mongodb.org/manual/core/aggregation/
In my mongoose model I have an invoiceHeader and invoiceLine collection both relating to account object.
In a view I want to display the total invoiceAmount for the account.
In SQL something like select sum(amount) from invoiceHeader group by account.
How can i achieve similar with nodeJS and mongoose?
Assuming amount is the property that you want to sum on, it would be like:
InvoiceHeaderModel.aggregate({
$match: {
account: '<Account_ID>'
}
}, {
$group: {
_id: null,
total: {
$sum: "$amount"
}
}
}, {
$project: {
_id: 0,
total: 1
}
}, function(err, res) {
// res contains the result
});
The $match operator is used to match certain documents. The $group operator is used to group the documents. The $project operator is used to select certain fields from the documents.