Actually, in the database I got a job that I request with a GET route:
So when I populate candidates I got this response format :
My problem here is I don't need that "id" object, I just need a "selected_candidates" array with users inside as objects. Actually it's an object, in another object that is in an Array.
Here the code from my controller (the populate is in the jobsService):
If I change the data format of the job like that way:
...It is working great (with a path: "candidates_selected") like expected BUT I don't have that "status" string (Normal because I don't have it anymore in the DataBase. Because of ObjectId):
I would like a solution to have them both, but maybe it's the limit of noSQL?
A solution without populate but with a Loop (I don't think it's a good idea):
I think there is no convenience way to achieve it. However you may try the aggregate framework from the native MongoDB driver.
Let your Mongoose schemas be ASchema and BSchema
const result = await ASchema.aggregate([
{$addFields: {org_doc: '$$ROOT'}}, // save original document to retrieve later
{$unwind: '$candidates_selected'},
{
$lookup: {
from: BSchema.collection.name,
let: {
selected_id: '$candidates_selected.id',
status: '$candidates_selected.status',
},
pipeline: [
{
$match: {$expr: {$eq: ['$$selected_id', '$_id']}}, // find candidate by id
},
{
$addFields: {status: '$$status'} // attach status
}
],
as: 'found_candidate'
}
},
{
$group: { // regroup the very first $unwind stage
_id: '$_id',
org_doc: {$first: '$org_doc'},
found_candidates: {
$push: {$arrayElemAt: ['$found_candidate', 0]} // result of $lookup is an array, concat them to reform the original array
}
}
},
{
$addFields: {'org_doc.candidates_selected': '$found_candidates'} // attach found_candidates to the saved original document
},
{
$replaceRoot: {newRoot: '$org_doc'} // recover the original document
}
])
Related
I have two tables, one for messages to be logged and one that is for each member of the group. When a message is upvoted, I want to add an entry for the message and push each reaction on that message to an array.
Every time a reaction is added I want to update the member table to reflect the sum of all of the reaction.value fields.
This is the code I have written to do so. When this runs from a sandbox I made in VisualStudio using a MongoDB add on it runs fine, however when ran using my app, only the message document is added and without any error it appears to skip the aggregation to the member document.
Here is the code I am using to insert the documents to the database and to aggregate once the document is inserted:
await mongodb.collection("Messages").updateOne({ _id: reaction.message.id.toString()},
{
$set:{
authorid: reaction.message.author.id,
author: reaction.message.author.username
},
$push: {
reactions: {
reauth: reAuth,
reaction: reaction.emoji.name,
removed: false,
value: actualKarmaDB,
}
}
}, {safe: true, "upsert": true})
await mongodb.collection("Messages").aggregate([
{
$match: {
_id: reaction.message.id
}
},
{
$project: {
_id: "$authorid",
username: "$author",
messageKarma: {$sum: "$reactions.value"},
}
},
{ $merge: {
into: "Members",
on: "_id",
whenMatched: "replace",
whenNotMatched: "insert"
}
}])
Also here is a look at what the insertion into “Messages” looks like:
In this case the answer was due to mongoose not supporting merge for aggregation. Had to use $out.
Additionally this is a known issue with Mongoose Node.js, see here:
Mongodb node.js $out with aggregation only working if calling toArray()
I have an endpoint that does an operation such as this:
const pipeline = [
{
$match: {
$and: [
{
$or: [...],
},
],
},
},
{
$group: {
_id : '$someProp',
anotherProp: { $push: '$$ROOT' },
},
},
{ $sort: { date: -1 } },
{ $limit: 10 },
]
const groupedDocs = await MyModel.aggregate(pipeline);
The idea here is that the returned documents look like this:
[
{
_id: 'some value',
anotherProp: [ /* ... array of documents where "someProp" === "some value" */ ],
},
{
_id: 'another value',
anotherProp: [ /* ... array of documents where "someProp" === "another value" */ ],
},
...
]
After getting these results, the endpoint responds with an array containing all the members of anotherProp, like this:
const response = groupedDocs.reduce((docs, group) => docs.concat(group.anotherProp), []);
res.status(200).json(response);
My problem is that the final documents in the response contain the _id field, but I want to rename that field to id. This question addresses this issue, and specifically this answer is what should work, but for some reason the transform function doesn't get invoked. To put it differently, I've tried doing this:
schema.set('toJSON', {
virtuals: true,
transform: function (doc, ret) {
console.log(`transforming toJSON for document ${doc._id}`);
delete ret._id;
},
});
schema.set('toObject', {
virtuals: true,
transform: function (doc, ret) {
console.log(`transforming toObject for document ${doc._id}`);
delete ret._id;
},
});
But the console.log statements are not executed, meaning that the transform function is not getting invoked. So I still get the _id in the response instead of id.
So my question is how can I get id instead of _id in this scenario?
Worth mentioning that toJSON and toObject are invoked (the console.logs show) in other places where I read properties from the documents. Like if I do:
const doc = await MyModel.findById('someId');
const name = doc.name;
res.status(200).json(doc);
The response contains id instead of _id. It's almost like the transform function is invoked once I do anything with the documents, but if I pass the documents directly as they arrive from the database, neither toJSON nor toObject is invoked.
Thanks in advance for your insights. :)
The toJSON and toObject methods won't work here because they don't apply to documents from an aggregation pipeline. Mongoose doesn't convert aggregation docs to mongoose docs, it returns the raw objects returned by the pipeline operation. I ultimately achieved this by adding pipeline stages to first add an id field with the same value as the _id field, then a second stage to remove the _id field. So essentially my pipeline became:
const pipeline = [
{
$match: {
$and: [
{
$or: [...],
},
],
},
},
// change the "_id" to "id"
{ $addFields: { id: '$_id' } },
{ $unset: ['_id'] },
{
$group: {
_id : '$someProp',
anotherProp: { $push: '$$ROOT' },
},
},
{ $sort: { date: -1 } },
{ $limit: 10 },
]
const groupedDocs = await MyModel.aggregate(pipeline);
It is possible to recast the raw objects into mongoose documents after getting them from the aggregate. You just need to transform them back one by one. They will then trigger the toJSON on return.
const document = Model.hydrate(rawObject);
Answer found here:
Cast plain object to mongoose document
I'm using a aggregate query to retrieve data from multiple collections, however there is a strange behavior that I dont seem to understand.
I need to lookup throw two collections, thus the lookup inside the pipeline. And also use the _id from the collection I'm making the aggregation(campaignadgroups) to match on the second nested collection (broadcastplans)
This is my query:
db.getCollection('campaignadgroups').aggregate([
{
$match: { "campaign_id": ObjectId("5fc8f7125148d7d0a19dcbcb")} // hardcoded just for tests
},
{
$lookup: {
from: "broadcastreports",
let: {campaignadgroupid: "$_id"},
pipeline: [
{
$match: {"reported_at": { $gte:ISODate("2020-12-01T15:56:58.743Z"), $lte: ISODate("2020-12-03T15:56:58.743Z")} }
},
{
$lookup: {
from: "broadcastplans",
localField: "broadcast_plan_id",
foreignField: "_id",
as: "broadcastplan"
}
},
{$unwind: "$broadcastplan"},
{
$match: { "broadcastplan.campaign_ad_group_id": {$eq: "$$campaignadgroupid"} // The problem happens here
}
}
],
as: "report"
}
},
])
The issue is that by matching with $$campaignadgroupid the report documents is empty.
However, if I replace the variable with the hardcoded id like ObjectId("5fc8f7275148d7d0a19dcbcc") I get the documents that I pretend.
For reference I'm debugging this issue on Robot3T so I can then translate to mongoose later.
I tried already to use $toObjectId however the _ids are not strings but ObjectIds already.
Thank you very much
Ok this is why I love and hate to code. After 3h debugging after asking here I immediately discovered the issue... I just needed to change from
$match: { "broadcastplan.campaign_ad_group_id": {$eq: "$$campaignadgroupid"}
to
$match: { $expr: { $eq: ["$broadcastplan.campaign_ad_group_id", "$$campaignadgroupid"]}
I have two collections. A 'users' collection and an 'events' collection. There is a primary key on the events collection which indicates which user the event belongs to.
I would like to count how many events a user has matching a certain condition.
Currently, I am performing this like:
db.users.find({ usersMatchingACondition }).forEach(user => {
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
print(`This user has ${eventCount} events`);
});
Ideally what I would like returned is an array or object with the UserID and how many events that user has.
With 10,000 users - this is obviously producing 10,000 queries and I think it could be made a lot more efficient!
I presume this is easy with some kind of aggregate query - but I'm not familiar with the syntax and am struggling to wrap my head around it.
Any help would be greatly appreciated!
You need $lookup to get the data from events matched by user_id. Then you can use $filter to apply your event-level condition and to get a count you can use $size operator
db.users.aggregate([
{
$match: { //users matching condition }
},
{
$lookup:
{
from: 'events',
localField: '_id', //your "primary key"
foreignField: 'user_id',
as: 'user_events'
}
},
{
$addFields: {
user_events: {
$filter: {
input: "$user_events",
cond: {
$eq: [
'$$this.title', 'An event title that I want to find'
]
}
}
}
}
},
{
$project: {
_id: 1,
// other fields you want to retrieve: 1,
totalEvents: { $size: "$user_events" }
}
}
])
There isn't much optimization that can be done without aggregate but since you specifically said that
First, instead of
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
Do
const eventCount = db.events.count({
title: 'An event title that I want to find',
userId: user._id
});
This will greatly speed up your queries because the find query actually fetches the documents first and then does the counting.
For returning an array you can just initialize an array at the start and push {userid: id, count: eventCount} objects to it.
I have a pretty simple $lookup aggregation query like the following:
{'$lookup':
{'from': 'edge',
'localField': 'gid',
'foreignField': 'to',
'as': 'from'}}
When I run this on a match with enough documents I get the following error:
Command failed with error 4568: 'Total size of documents in edge
matching { $match: { $and: [ { from: { $eq: "geneDatabase:hugo" }
}, {} ] } } exceeds maximum document size' on server
All attempts to limit the number of documents fail. allowDiskUse: true does nothing. Sending a cursor in does nothing. Adding in a $limit into the aggregation also fails.
How could this be?
Then I see the error again. Where did that $match and $and and $eq come from? Is the aggregation pipeline behind the scenes farming out the $lookup call to another aggregation, one it runs on its own that I have no ability to provide limits for or use cursors with??
What is going on here?
As stated earlier in comment, the error occurs because when performing the $lookup which by default produces a target "array" within the parent document from the results of the foreign collection, the total size of documents selected for that array causes the parent to exceed the 16MB BSON Limit.
The counter for this is to process with an $unwind which immediately follows the $lookup pipeline stage. This actually alters the behavior of $lookup in such that instead of producing an array in the parent, the results are instead a "copy" of each parent for every document matched.
Pretty much just like regular usage of $unwind, with the exception that instead of processing as a "separate" pipeline stage, the unwinding action is actually added to the $lookup pipeline operation itself. Ideally you also follow the $unwind with a $match condition, which also creates a matching argument to also be added to the $lookup. You can actually see this in the explain output for the pipeline.
The topic is actually covered (briefly) in a section of Aggregation Pipeline Optimization in the core documentation:
$lookup + $unwind Coalescence
New in version 3.2.
When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.
Best demonstrated with a listing that puts the server under stress by creating "related" documents that would exceed the 16MB BSON limit. Done as briefly as possible to both break and work around the BSON Limit:
const MongoClient = require('mongodb').MongoClient;
const uri = 'mongodb://localhost/test';
function data(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
let db;
try {
db = await MongoClient.connect(uri);
console.log('Cleaning....');
// Clean data
await Promise.all(
["source","edge"].map(c => db.collection(c).remove() )
);
console.log('Inserting...')
await db.collection('edge').insertMany(
Array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 }))
);
await db.collection('source').insert({ _id: 1 })
console.log('Fattening up....');
await db.collection('edge').updateMany(
{},
{ $set: { data: "x".repeat(100000) } }
);
// The full pipeline. Failing test uses only the $lookup stage
let pipeline = [
{ $lookup: {
from: 'edge',
localField: '_id',
foreignField: 'gid',
as: 'results'
}},
{ $unwind: '$results' },
{ $match: { 'results._id': { $gte: 1, $lte: 5 } } },
{ $project: { 'results.data': 0 } },
{ $group: { _id: '$_id', results: { $push: '$results' } } }
];
// List and iterate each test case
let tests = [
'Failing.. Size exceeded...',
'Working.. Applied $unwind...',
'Explain output...'
];
for (let [idx, test] of Object.entries(tests)) {
console.log(test);
try {
let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline),
options = (( +idx === tests.length-1 ) ? { explain: true } : {});
await new Promise((end,error) => {
let cursor = db.collection('source').aggregate(currpipe,options);
for ( let [key, value] of Object.entries({ error, end, data }) )
cursor.on(key,value);
});
} catch(e) {
console.error(e);
}
}
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
After inserting some initial data, the listing will attempt to run an aggregate merely consisting of $lookup which will fail with the following error:
{ MongoError: Total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size
Which is basically telling you the BSON limit was exceeded on retrieval.
By contrast the next attempt adds the $unwind and $match pipeline stages
The Explain output:
{
"$lookup": {
"from": "edge",
"as": "results",
"localField": "_id",
"foreignField": "gid",
"unwinding": { // $unwind now is unwinding
"preserveNullAndEmptyArrays": false
},
"matching": { // $match now is matching
"$and": [ // and actually executed against
{ // the foreign collection
"_id": {
"$gte": 1
}
},
{
"_id": {
"$lte": 5
}
}
]
}
}
},
// $unwind and $match stages removed
{
"$project": {
"results": {
"data": false
}
}
},
{
"$group": {
"_id": "$_id",
"results": {
"$push": "$results"
}
}
}
And that result of course succeeds, because as the results are no longer being placed into the parent document then the BSON limit cannot be exceeded.
This really just happens as a result of adding $unwind only, but the $match is added for example to show that this is also added into the $lookup stage and that the overall effect is to "limit" the results returned in an effective way, since it's all done in that $lookup operation and no other results other than those matching are actually returned.
By constructing in this way you can query for "referenced data" that would exceed the BSON limit and then if you want $group the results back into an array format, once they have been effectively filtered by the "hidden query" that is actually being performed by $lookup.
MongoDB 3.6 and Above - Additional for "LEFT JOIN"
As all the content above notes, the BSON Limit is a "hard" limit that you cannot breach and this is generally why the $unwind is necessary as an interim step. There is however the limitation that the "LEFT JOIN" becomes an "INNER JOIN" by virtue of the $unwind where it cannot preserve the content. Also even preserveNulAndEmptyArrays would negate the "coalescence" and still leave the intact array, causing the same BSON Limit problem.
MongoDB 3.6 adds new syntax to $lookup that allows a "sub-pipeline" expression to be used in place of the "local" and "foreign" keys. So instead of using the "coalescence" option as demonstrated, as long as the produced array does not also breach the limit it is possible to put conditions in that pipeline which returns the array "intact", and possibly with no matches as would be indicative of a "LEFT JOIN".
The new expression would then be:
{ "$lookup": {
"from": "edge",
"let": { "gid": "$gid" },
"pipeline": [
{ "$match": {
"_id": { "$gte": 1, "$lte": 5 },
"$expr": { "$eq": [ "$$gid", "$to" ] }
}}
],
"as": "from"
}}
In fact this would be basically what MongoDB is doing "under the covers" with the previous syntax since 3.6 uses $expr "internally" in order to construct the statement. The difference of course is there is no "unwinding" option present in how the $lookup actually gets executed.
If no documents are actually produced as a result of the "pipeline" expression, then the target array within the master document will in fact be empty, just as a "LEFT JOIN" actually does and would be the normal behavior of $lookup without any other options.
However the output array to MUST NOT cause the document where it is being created to exceed the BSON Limit. So it really is up to you to ensure that any "matching" content by the conditions stays under this limit or the same error will persist, unless of course you actually use $unwind to effect the "INNER JOIN".
I had same issue with fllowing Node.js query becuase 'redemptions' collection has more then 400,000 of data. I am using Mongo DB server 4.2 and Node JS driver 3.5.3.
db.collection('businesses').aggregate(
{
$lookup: { from: 'redemptions', localField: "_id", foreignField: "business._id", as: "redemptions" }
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
I have modified query as below to make it work super fast.
db.collection('businesses').aggregate(query,
{
$lookup:
{
from: 'redemptions',
let: { "businessId": "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$business._id", "$$businessId"] } } },
{ $group: { _id: "$_id", totalCount: { $sum: 1 } } },
{ $project: { "_id": 0, "totalCount": 1 } }
],
as: "redemptions"
},
{
$project: {
_id: 1,
name: 1,
email: 1,
"totalredemptions" : {$size:"$redemptions"}
}
}
}