Can I put query inside lookup operator of mongoose? [duplicate] - node.js

I have a pretty simple $lookup aggregation query like the following:
{'$lookup':
{'from': 'edge',
'localField': 'gid',
'foreignField': 'to',
'as': 'from'}}
When I run this on a match with enough documents I get the following error:
Command failed with error 4568: 'Total size of documents in edge
matching { $match: { $and: [ { from: { $eq: "geneDatabase:hugo" }
}, {} ] } } exceeds maximum document size' on server
All attempts to limit the number of documents fail. allowDiskUse: true does nothing. Sending a cursor in does nothing. Adding in a $limit into the aggregation also fails.
How could this be?
Then I see the error again. Where did that $match and $and and $eq come from? Is the aggregation pipeline behind the scenes farming out the $lookup call to another aggregation, one it runs on its own that I have no ability to provide limits for or use cursors with??
What is going on here?

As stated earlier in comment, the error occurs because when performing the $lookup which by default produces a target "array" within the parent document from the results of the foreign collection, the total size of documents selected for that array causes the parent to exceed the 16MB BSON Limit.
The counter for this is to process with an $unwind which immediately follows the $lookup pipeline stage. This actually alters the behavior of $lookup in such that instead of producing an array in the parent, the results are instead a "copy" of each parent for every document matched.
Pretty much just like regular usage of $unwind, with the exception that instead of processing as a "separate" pipeline stage, the unwinding action is actually added to the $lookup pipeline operation itself. Ideally you also follow the $unwind with a $match condition, which also creates a matching argument to also be added to the $lookup. You can actually see this in the explain output for the pipeline.
The topic is actually covered (briefly) in a section of Aggregation Pipeline Optimization in the core documentation:
$lookup + $unwind Coalescence
New in version 3.2.
When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.
Best demonstrated with a listing that puts the server under stress by creating "related" documents that would exceed the 16MB BSON limit. Done as briefly as possible to both break and work around the BSON Limit:
const MongoClient = require('mongodb').MongoClient;
const uri = 'mongodb://localhost/test';
function data(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
let db;
try {
db = await MongoClient.connect(uri);
console.log('Cleaning....');
// Clean data
await Promise.all(
["source","edge"].map(c => db.collection(c).remove() )
);
console.log('Inserting...')
await db.collection('edge').insertMany(
Array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 }))
);
await db.collection('source').insert({ _id: 1 })
console.log('Fattening up....');
await db.collection('edge').updateMany(
{},
{ $set: { data: "x".repeat(100000) } }
);
// The full pipeline. Failing test uses only the $lookup stage
let pipeline = [
{ $lookup: {
from: 'edge',
localField: '_id',
foreignField: 'gid',
as: 'results'
}},
{ $unwind: '$results' },
{ $match: { 'results._id': { $gte: 1, $lte: 5 } } },
{ $project: { 'results.data': 0 } },
{ $group: { _id: '$_id', results: { $push: '$results' } } }
];
// List and iterate each test case
let tests = [
'Failing.. Size exceeded...',
'Working.. Applied $unwind...',
'Explain output...'
];
for (let [idx, test] of Object.entries(tests)) {
console.log(test);
try {
let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline),
options = (( +idx === tests.length-1 ) ? { explain: true } : {});
await new Promise((end,error) => {
let cursor = db.collection('source').aggregate(currpipe,options);
for ( let [key, value] of Object.entries({ error, end, data }) )
cursor.on(key,value);
});
} catch(e) {
console.error(e);
}
}
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
After inserting some initial data, the listing will attempt to run an aggregate merely consisting of $lookup which will fail with the following error:
{ MongoError: Total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size
Which is basically telling you the BSON limit was exceeded on retrieval.
By contrast the next attempt adds the $unwind and $match pipeline stages
The Explain output:
{
"$lookup": {
"from": "edge",
"as": "results",
"localField": "_id",
"foreignField": "gid",
"unwinding": { // $unwind now is unwinding
"preserveNullAndEmptyArrays": false
},
"matching": { // $match now is matching
"$and": [ // and actually executed against
{ // the foreign collection
"_id": {
"$gte": 1
}
},
{
"_id": {
"$lte": 5
}
}
]
}
}
},
// $unwind and $match stages removed
{
"$project": {
"results": {
"data": false
}
}
},
{
"$group": {
"_id": "$_id",
"results": {
"$push": "$results"
}
}
}
And that result of course succeeds, because as the results are no longer being placed into the parent document then the BSON limit cannot be exceeded.
This really just happens as a result of adding $unwind only, but the $match is added for example to show that this is also added into the $lookup stage and that the overall effect is to "limit" the results returned in an effective way, since it's all done in that $lookup operation and no other results other than those matching are actually returned.
By constructing in this way you can query for "referenced data" that would exceed the BSON limit and then if you want $group the results back into an array format, once they have been effectively filtered by the "hidden query" that is actually being performed by $lookup.
MongoDB 3.6 and Above - Additional for "LEFT JOIN"
As all the content above notes, the BSON Limit is a "hard" limit that you cannot breach and this is generally why the $unwind is necessary as an interim step. There is however the limitation that the "LEFT JOIN" becomes an "INNER JOIN" by virtue of the $unwind where it cannot preserve the content. Also even preserveNulAndEmptyArrays would negate the "coalescence" and still leave the intact array, causing the same BSON Limit problem.
MongoDB 3.6 adds new syntax to $lookup that allows a "sub-pipeline" expression to be used in place of the "local" and "foreign" keys. So instead of using the "coalescence" option as demonstrated, as long as the produced array does not also breach the limit it is possible to put conditions in that pipeline which returns the array "intact", and possibly with no matches as would be indicative of a "LEFT JOIN".
The new expression would then be:
{ "$lookup": {
"from": "edge",
"let": { "gid": "$gid" },
"pipeline": [
{ "$match": {
"_id": { "$gte": 1, "$lte": 5 },
"$expr": { "$eq": [ "$$gid", "$to" ] }
}}
],
"as": "from"
}}
In fact this would be basically what MongoDB is doing "under the covers" with the previous syntax since 3.6 uses $expr "internally" in order to construct the statement. The difference of course is there is no "unwinding" option present in how the $lookup actually gets executed.
If no documents are actually produced as a result of the "pipeline" expression, then the target array within the master document will in fact be empty, just as a "LEFT JOIN" actually does and would be the normal behavior of $lookup without any other options.
However the output array to MUST NOT cause the document where it is being created to exceed the BSON Limit. So it really is up to you to ensure that any "matching" content by the conditions stays under this limit or the same error will persist, unless of course you actually use $unwind to effect the "INNER JOIN".

I had same issue with fllowing Node.js query becuase 'redemptions' collection has more then 400,000 of data. I am using Mongo DB server 4.2 and Node JS driver 3.5.3.
db.collection('businesses').aggregate(
{
$lookup: { from: 'redemptions', localField: "_id", foreignField: "business._id", as: "redemptions" }
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
I have modified query as below to make it work super fast.
db.collection('businesses').aggregate(query,
{
$lookup:
{
from: 'redemptions',
let: { "businessId": "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$business._id", "$$businessId"] } } },
{ $group: { _id: "$_id", totalCount: { $sum: 1 } } },
{ $project: { "_id": 0, "totalCount": 1 } }
],
as: "redemptions"
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
}

Related

MongoDB get total count aggregation pipeline with $search

I have to implement search using search indexes in MongoDB Atlas as well as normal browse feature. This includes filtering, match, sort, skip, limit (pagination). I have made an aggregation pipeline to achieve all this.
First I push the search query to my pipeline, then match, then sort, then skip and finally the limit query.
Here's how it goes:
query = [];
query.push({
$search: {
index: 'default'
text: {
query: searchQuery
path: { }
}
}
});
query.push({
$sort: sort,
});
query.push({
$match: {
type: match
},
query.push({
$skip: skip
});
query.push({
$limit: perPage
});
let documents = await collection.aggregate(query);
The results I get so far are correct. However, for pagination, I also want to get the total count of documents. The count must take the "match" and "searchQuery" (if any) take into account.
I have tried $facet but it gives error $_internalSearchMongotRemote is not allowed to be used within a $facet stage
So, I see a few challenges with this query.
The $sort stage may not be needed. All search queries are sorted by relevance score by default. If you need to sort on some other criterion, then it may be appropriate.
The $match stage is probably not needed. What most people are looking for when they are trying to match is a compound filter. As you can see from the docs, a filter behaves like a $match in ordinary MongoDB. From where I am, your query should probably look something like:
If you would like a fast count of the documents returned, you need to use the new count operator. It is available on 4.4.11 and 5.0.4 clusters. You can read about it here.
query = [];
query.push({
$search: {
index: 'default'
"compound": {
"filter": [{
"text": {
"query": match,
"path": type
}
}],
"must": [{
"text": {
"query": searchQuery,
"path": { }
}
}]
},
"count": {
"type": "total"
}
}
});
query.push({
$skip: skip
});
query.push({
$limit: perPage
});
/* let's you use the count, can obviously project more fields */
query.push({
$project: "$$SEARCH_META"
});
let documents = await collection.aggregate(query);
[1]: https://docs.atlas.mongodb.com/reference/atlas-search/compound/#mongodb-data-filter
[2]: https://docs.atlas.mongodb.com/reference/atlas-search/counting/#std-label-count-ref

Use $lookup with a Conditional Join

provided I have following documents
User
{
uuid: string,
isActive: boolean,
lastLogin: datetime,
createdOn: datetime
}
Projects
{
id: string,
users: [
{
uuid: string,
otherInfo: ...
},
{... more users}
]
}
And I want to select all users that didn't login since 2 weeks and are inactive or since 5 weeks that don't have projects.
Now, the 2 weeks is working fine but I cannot seem to figure out how to do the "5 weeks and don't have projects" part
I came up with something like below but the last part does not work because $exists obviously is not a top level operator.
Anyone ever did anything like this?
Thanks!
return await this.collection
.aggregate([
{
$match: {
$and: [
{
$expr: {
$allElementsTrue: {
$map: {
input: [`$lastLogin`, `$createdOn`],
in: { $lt: [`$$this`, twoWeeksAgo] }
}
}
}
},
{
$or: [
{
isActive: false
},
{
$and: [
{
$expr: {
$allElementsTrue: {
$map: {
input: [`$lastLogin`, `$createdOn`],
in: { $lt: [`$$this`, fiveWeeksAgo] }
}
}
}
},
{
//No projects exists on this user
$exists: {
$lookup: {
from: _.get(Config, `env.collection.projects`),
let: {
currentUser: `$$ROOT`
},
pipeline: [
{
$project: {
_id: 0,
users: {
$filter: {
input: `$users`,
as: `user`,
cond: {
$eq: [`$$user.uuid`, `$currentUser.uuid`]
}
}
}
}
}
]
}
}
}
]
}
]
}
]
}
}
])
.toArray();
Not certain why you thought $expr was needed in the initial $match, but really:
const getResults = () => {
const now = Date.now();
const twoWeeksAgo = new Date(now - (1000 * 60 * 60 * 24 * 7 * 2 ));
const fiveWeeksAgo = new Date(now - (1000 * 60 * 60 * 24 * 7 * 5 ));
// as long a mongoDriverCollectionReference points to a "Collection" object
// for the "users" collection
return mongoDriverCollectionReference.aggregate([
// No $expr, since you can actually use an index. $expr cannot do that
{ "$match": {
"$or": [
// Active and "logged in"/created in the last 2 weeks
{
"isActive": true,
"$or": [
{ "lastLogin": { "$gte": twoWeeksAgo } },
{ "createdOn": { "$gte": twoWeeksAgo } }
]
},
// Also want those who...
// Not Active and "logged in"/created in the last 5 weeks
// we'll "tag" them later
{
"isActive": false,
"$or": [
{ "lastLogin": { "$gte": fiveWeeksAgo } },
{ "createdOn": { "$gte": fiveWeeksAgo } }
]
}
]
}},
// Now we do the "conditional" stuff, just to return a matching result or not
{ "$lookup": {
"from": _.get(Config, `env.collection.projects`), // there are a lot cleaner ways to register models than this
"let": {
"uuid": {
"$cond": {
"if": "$isActive", // this is boolean afterall
"then": null, // don't really want to match
"else": "$uuid" // Okay to match the 5 week results
}
}
},
"pipeline": [
// Nothing complex here as null will return nothing. Just do $in for the array
{ "$match": { "$in": [ "$$uuid", "$users.uuid" ] } },
// Don't really need the detail, so just reduce any matches to one result of [null]
{ "$group": { "_id": null } }
],
"as": "projects"
}},
// Now test if the $lookup returned something where it mattered
{ "$match": {
"$or": [
{ "active": true }, // remember we selected the active ones already
{
"projects.0": { "$exists": false } // So now we only need to know the "inactive" returned no array result.
}
]
}}
]).toArray(); // returns a Promise
};
It's pretty simple as calculated expressions via $expr are actually really bad and not what you want in a first pipeline stage. Also "not what you need" since createdOn and lastLogin really should not have been merged into an array for $allElementsTrue which would just be an AND condition, where you described logic would really mean OR. So the $or does just fine here.
So does the $or on the separation of conditions for the isActive of true/false. Again it's either "two weeks" OR "five weeks". And this certainly does not need $expr since standard inequality range matching works fine, and uses an "index".
Then you really just want to do the "conditional" things in the let for $lookup instead of your "does it exist" thinking. All you really need to know ( since the range selection of dates is actually already done ) is whether active is now true or false. Where it's active ( meaning by your logic you don't care about projects ) simply make the $$uuid used within the $match pipeline stage a null value so it will not match and the $lookup returns an empty array. Where false ( also already matching the date conditions from earlier ) then you use the actual value and "join" ( where there are projects of course ).
Then it's just a simple matter of keeping the active users, and then only testing the remaining false values for active to see if the "projects" array from the $lookup actually returned anything. If it did not, then they just don't have projects.
Probably should note here is since users is an "array" within the projects collection, you use $in for the $match condition against the single value to the array.
Note that for brevity we can use $group inside the inner pipeline to only return one result instead of possibly many matches to actual matched projects. You don't care about the content or the "count", but simply if one was returned or nothing. Again following the presented logic.
This gets you your desired results, and it does so in a manner that is efficient and actually uses indexes where available.
Also return await certainly does not do what you think it does, and in fact it's an ESLint warning message ( I suggest you enable ESLint in your project ) since it's not a smart thing to do. It does nothing really, as you would need to await getResults() ( as per the example naming ) anyway, as the await keyword is not "magic" but just a prettier way of writing then(). As well as hopefully being easier to understand, once you understand what async/await is really for syntactically that is.

MongoDB request: Filter only specific field from embedded documents in embedded array

I have got some troubles with a MongoDB request that I would like to execute. I am using MongoDB 3.2 with Mongoose in a node.js context. Here is the document:
{
_id: ObjectId('12345'),
name: "The name",
components: [
{
name: "Component 1",
description: "The description",
container: {
parameters: [
{
name: "Parameter 1",
value: "1"
},
// other parameters
],
// other information...
}
},
// other components
]
}
And I would like to list all the parameters name for a specific component (using component name) in the specific document (using _id) with this output:
['Parameter 1', 'Parameter 2', ...]
I have got a mongoose Schema to handle the find or distinct methods in my application. I tried many operations using $ positioning operator. Here is my attempt but return all parameters form all components:
listParameters(req, res) {
DocumentModel.distinct(
"components.container.parameters.name",
{_id: req.params.modelId, "components.name": req.params.compId},
(err, doc) => {
if (err) {
return res.status(500).send(err);
}
res.status(200).json(doc);
}
);
}
but the output is the list of parameter name but without the filter of the specific component. Can you help me to find the right request? (if possible in mongoose JS but if it is a Mongo command line, it will be very good :))
You would need to run an aggregation pipeline that uses the $arrayElemAt and $filter operators to get the desired result.
The $filter operator will filter the components array to return the element satisfying the given condition whilst the $arrayElemAt returns the document from the array at a given index position. With that document you can then project the nested parameters array elements to another array.
Combining the above you ideally want to have the following aggregate operation:
DocumentModel.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(req.params.modelId) } },
{ "$project": {
"component": {
"$arrayElemAt": [
{
"$filter": {
"input": "$components",
"as": "el",
"cond": { "$eq": ["$$el.name", req.params.compId] }
}
}, 0
]
}
} },
{ "$project": {
"parameters": "$component.container.parameters.name"
} }
], (err, doc) => {
if (err) {
return res.status(500).send(err);
}
const result = doc.length >= 1 ? doc[0].parameters : [];
res.status(200).json(result);
})
i don`t know how to do it with mongoose but this will work for you with mongo
db.getCollection('your collaction').aggregate([ // change to youe collaction
{$match: {_id: ObjectId("5a97ff4cf832104b76d29af7")}}, //change to you id
{$unwind: '$components'},
{$match: {'components.name': 'Component 1'}}, // change to the name you want
{$unwind: '$components.container.parameters'},
{
$group: {
_id: '$_id',
params: {$push: '$components.container.parameters.name'}
}
}
]);

how to combine array of object result in mongodb

how can i combine match document's subdocument together as one and return it as an array of object ? i have tried $group but don't seem to work.
my query ( this return array of object in this case there are two )
User.find({
'business_details.business_location': {
$near: coords,
$maxDistance: maxDistance
},
'deal_details.deals_expired_date': {
$gte: new Date()
}
}, {
'deal_details': 1
}).limit(limit).exec(function(err, locations) {
if (err) {
return res.status(500).json(err)
}
console.log(locations)
the console.log(locations) result
// give me the result below
[{
_id: 55 c0b8c62fd875a93c8ff7ea, // first document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 1st deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.3 // 2nd deal
}],
business_details: {}
}, {
_id: 55 a79898e0268bc40e62cd3a, // second document
deal_details: [{
deals_location: '101.6833,3.1333',
deals_price: 12.12 // 3rd deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.78 // 4th deal
}, {
deals_location: '101.6833,3.1333',
deals_price: 34.32 // 5th deal
}],
business_details: {}
}]
what i wanted to do is to combine these both deal_details field together and return it as an array of object. It will contain 5 deals in one array of object instead of two separated array of objects.
i have try to do it in my backend (nodejs) by using concat or push, however when there's more than 2 match document i'm having problem to concat them together, is there any way to combine all match documents and return it as one ? like what i mentioned above ?
What you are probably missing here is the $unwind pipeline stage, which is what you typically use to "de-normalize" array content, particularly when your grouping operation intends to work across documents in your query result:
User.aggregate(
[
// Your basic query conditions
{ "$match": {
"business_details.business_location": {
"$near": coords,
"$maxDistance": maxDistance
},
"deal_details.deals_expired_date": {
"$gte": new Date()
}},
// Limit query results here
{ "$limit": limit },
// Unwind the array
{ "$unwind": "$deal_details" },
// Group on the common location
{ "$group": {
"_id": "$deal_details.deals_location",
"prices": {
"$push": "$deal_details.deals_price"
}
}}
],
function(err,results) {
if (err) throw err;
console.log(JSON.stringify(results,undefined,2));
}
);
Which gives output like:
{
"_id": "101.6833,3.1333",
"prices": [
12.12,
34.3,
12.12,
34.78,
34.32
]
}
Depending on how many documents actually match the grouping.
Alternately, you might want to look at the $geoNear pipeline stage, which gives a bit more control, especially when dealing with content in arrays.
Also beware that with "location" data in an array, only the "nearest" result is being considered here and not "all" of the array content. So other items in the array may not be actually "near" the queried point. That is more of a design consideration though as any query operation you do will need to consider this.
You can merge them with reduce:
locations = locations.reduce(function(prev, location){
previous = prev.concat(location.deal_details)
return previous
},[])

MongoDB aggregate query with a where in node.js

I have the following mongodb query in node.js which gives me a list of unique zip codes with a count of how many times the zip code appears in the database.
collection.aggregate( [
{
$group: {
_id: "$Location.Zip",
count: { $sum: 1 }
}
},
{ $sort: { _id: 1 } },
{ $match: { count: { $gt: 1 } } }
], function ( lookupErr, lookupData ) {
if (lookupErr) {
res.send(lookupErr);
return;
}
res.send(lookupData.sort());
});
});
How can this query be modified to return one specific zip code? I've tried the condition clause but have not been able to get it to work.
Aggregations that require filtered results can be done with the $match operator. Without tweaking what you already have, I would suggest just sticking in a $match for the zip code you want returned at the top of the aggregation list.
collection.aggregate( [
{
$match: {
zip: 47421
}
},
{
$group: {
...
This example will result in every aggregation operation after the $match working on only the data set that is returned by the $match of the zip key to the value 47421.
in the $match pipeline operator add
{ $match: { count: { $gt: 1 },
_id : "10002" //replace 10002 with the zip code you want
}}
As a side note, you should put the $match operator first and in general as high in the aggregation chain as you can.

Resources