MongoDB group by aggregation query - node.js

The data I have is:
[
{ type: 'software' },
{ type: 'hardware' },
{ type: 'software' },
{ type: 'network' },
{ type: 'test' },
...
]
I want to create a MongoDB group by aggregation pipeline to return the data like this:
I only want 3 objects in result
the third object in result {_id: 'other', count: 2}, This should be the sum of counts of type other that software and hardware
[
{_id: 'software', count: 2},
{_id: 'hardware', count: 1},
{_id: 'other', count: 2},
]

This is the exact query (MongoPlayground) that you need if those data are separate documents. Just add $project stage before group and then $switch operator. (If those field data are number, you might wanna check $bucket
db.collection.aggregate([
{
"$project": {
type: {
"$switch": {
"branches": [
{
"case": {
"$eq": [
"$type",
"software"
]
},
"then": "software"
},
{
"case": {
"$eq": [
"$type",
"hardware"
]
},
"then": "hardware"
}
],
default: "other"
}
}
}
},
{
"$group": {
"_id": "$type",
"count": {
"$sum": 1
}
}
}
])
Also, I'd like to recommend avoiding field name type. Actually it doesn't reserve in MongoDB, but however it could bring conflicts with some drivers since, in schema/model files, type fields are referred to the exact BSON type of the field.

Related

how to calculate total for each enum of a field in aggregate?

hello I have this function where I want to calculate the number of orders for each status in one array, the code is
let statusEnum = ["pending", "canceled", "completed"];
let userOrders = await Orders.aggregate([
{
$match: {
$or: [
{ senderId: new mongoose.Types.ObjectId(req.user._id) },
{ driverId: new mongoose.Types.ObjectId(req.user._id) },
{ reciverId: new mongoose.Types.ObjectId(req.user._id) },
],
},
},
{
$group: {
_id: null,
totalOrders: { $sum: 1 },
totalPendingOrders: "??", //I want to determine this for each order status
totalCompletedOrders: "??",
totalCanceledOrders: "??",
},
},
]);
so I could add add a $match and use {status : "pending"} but this will filter only the pending orders, I could also map the status enum and replace each element instead of the "pending" above and then push each iteration in another array , but that just seems so messy, is there any other way to calculate total for each order status with using only one aggregate?
thanks
You can use group as you used, but with condition
db.collection.aggregate([
{
$group: {
_id: null,
totalPendingOrders: {
$sum: { $cond: [ { $eq: [ "$status", "pending" ] }, 1, 0 ] }
},
totalCompletedOrders: {
$sum: { $cond: [ { $eq: [ "$status", "completed" ] }, 1, 0 ] }
},
totalCanceledOrders: {
$sum: { $cond: [ { $eq: [ "$status", "canceled" ] }, 1, 0 ] }
}
}
}
])
Working Mongo playground

How to add foreign fields to objects inside an array based on a value of an object?

I have the following collection (sectors):
[
{
sector: "IT",
organizations: [
{
org: "ACME",
owners: [
"Josh",
"Fred"
]
}
]
}
]
I also have another collection (owners):
[
{
name: "Josh",
age: 65,
male: true,
location: "LA"
}
]
I want the aggregation query to do the following:
For each sector document, go though each organization.
Find an owner document corresponding to index 0 of the owners array.
Add the { name, age, male } fields to the organization.
I want to get this result:
[
{
sector: "IT",
organizations: [
{
org: "ACME",
owners: [
"Josh",
"Fred"
],
name: "Josh",
age: 65,
male: true
}
]
}
]
I am writing this in Node.js. This is my current code:
await Sector.aggregate([
// Perhaps something with $lookup?
{ $match: query },
{ $skip: skip },
{ $limit: limit }
]);
I am totally new to aggregation with MongoDB. Can anyone tell me how it's done?
Thanks in advance.
You can use aggregation
$addFields and $arrayElementAt helps to get the first element by looking with $map
$unwind to deconstruct the array
$lookup to join collections
$group to reconstruct the array
Here is the code
db.collection1.aggregate([
{
$addFields: {
organizations: {
$map: {
input: "$organizations",
in: {
firstName: { "$arrayElemAt": [ "$$this.owners", 0 ] },
org: "$$this.org",
owners: "$$this.owners"
}
}
}
}
},
{ $unwind: "$organizations},
{
"$lookup": {
"from": "collection2",
"localField": "organizations.firstName",
"foreignField": "name",
"as": "join"
}
},
{ $addFields: { join: { "$arrayElemAt": [ "$join", 0 } } },
{
$addFields: {
"organizations.age": "$join.age",
"organizations.location": "$join.location",
"organizations.male": "$join.male",
"join": "$$REMOVE"
}
},
{
"$group": {
"_id": "$_id",
"organizations": { "$push": "$organizations" },
"sector": { $first: "$sector" }
}
}
])
Working Mongo playground

MongoDB Mongoose aggregate query deeply nested array remove empty results and populate references

This question is a follow up to a previous question for which I have accepted an answer already. I have an aggregate query that returns the results of a deeply nested array of subdocuments based on a date range. The query returns the correct results within the specified date range, however it also returns an empty array for the results that do not match the query.
Technologies: MongoDB 3.6, Mongoose 5.5, NodeJS 12
Question 1:
Is there any way to remove the results that don't match the query?
Question 2:
Is there any way to 'populate' the Person db reference in the results? For example to get the Person Display Name I usually use 'populate' such as find().populate({ path: 'Person', select: 'DisplayName'})
Records schema
let RecordsSchema = new Schema({
RecordID: {
type: Number,
index: true
},
RecordType: {
type: String
},
Status: {
type: String
},
// ItemReport array of subdocuments
ItemReport: [ItemReportSchema],
}, {
collection: 'records',
selectPopulatedPaths: false
});
let ItemReportSchema = new Schema({
// ObjectId reference
ReportBy: {
type: Schema.Types.ObjectId,
ref: 'people'
},
ReportDate: {
type: Date,
required: true
},
WorkDoneBy: [{
Person: {
type: Schema.Types.ObjectId,
ref: 'people'
},
CompletedHours: {
type: Number,
required: true
},
DateCompleted: {
type: Date
}
}],
});
Query
Works but also returns empty results and also need to populate the Display Name property of the Person db reference
db.records.aggregate([
{
"$project": {
"ItemReport": {
$map: {
input: "$ItemReport",
as: "ir",
in: {
WorkDoneBy: {
$filter: {
input: "$$ir.WorkDoneBy",
as: "value",
cond: {
"$and": [
{ "$ne": [ "$$value.DateCompleted", null ] },
{ "$gt": [ "$$value.DateCompleted", new Date("2017-01-01T12:00:00.000Z") ] },
{ "$lt": [ "$$value.DateCompleted", new Date("2018-12-31T12:00:00.000Z") ] }
]
}
}
}
}
}
}
}
}
])
Actual Results
{
"_id": "5dcb6406e63830b7aa5427ca",
"ItemReport": [
{
"WorkDoneBy": [
{
"_id": "5dcb6406e63830b7aa53d8ea",
"PersonID": 111,
"ReportID": 8855,
"CompletedHours": 3,
"DateCompleted": "2017-01-20T05:00:00.000Z",
"Person": "5dcb6409e63830b7aa54fdba"
}
]
}
]
},
{
"_id": "5dcb6406e63830b7aa5427f1",
"ItemReport": [
{
"WorkDoneBy": [
{
"_id": "5dcb6406e63830b7aa53dcdc",
"PersonID": 4,
"ReportID": 9673,
"CompletedHours": 17,
"DateCompleted": "2017-05-18T04:00:00.000Z",
"Person": "5dcb6409e63830b7aa54fd69"
},
{
"_id": "5dcb6406e63830b7aa53dcdd",
"PersonID": 320,
"ReportID": 9673,
"CompletedHours": 3,
"DateCompleted": "2017-05-18T04:00:00.000Z",
"Person": "5dcb6409e63830b7aa54fe88"
}
]
}
]
},
{
"_id": "5dcb6406e63830b7aa5427f2",
"ItemReport": [
{
"WorkDoneBy": []
}
]
},
{
"_id": "5dcb6406e63830b7aa5427f3",
"ItemReport": [
{
"WorkDoneBy": []
}
]
},
{
"_id": "5dcb6406e63830b7aa5427f4",
"ItemReport": [
{
"WorkDoneBy": []
}
]
},
{
"_id": "5dcb6406e63830b7aa5427f5",
"ItemReport": [
{
"WorkDoneBy": []
}
]
},
Desired results
Note the results with an empty "WorkDoneBy" array are removed (question 1), and the "Person" display name is populated (question 2).
{
"_id": "5dcb6406e63830b7aa5427f1",
"ItemReport": [
{
"WorkDoneBy": [
{
"_id": "5dcb6406e63830b7aa53dcdc",
"CompletedHours": 17,
"DateCompleted": "2017-05-18T04:00:00.000Z",
"Person": {
_id: "5dcb6409e63830b7aa54fe88",
DisplayName: "Joe Jones"
}
},
{
"_id": "5dcb6406e63830b7aa53dcdd",
"CompletedHours": 3,
"DateCompleted": "2017-05-18T04:00:00.000Z",
"Person": {
_id: "5dcb6409e63830b7aa54fe88",
DisplayName: "Alice Smith"
}
}
]
}
]
},
First question is relatively easy to answer and there are multiple ways to do that. I would prefer using $anyElementTrue along with $map as those operators are pretty self-explanatory.
{
"$match": {
$expr: { $anyElementTrue: { $map: { input: "$ItemReport", in: { $gt: [ { $size: "$$this.WorkDoneBy" }, 0 ] } } } }
}
}
MongoPlayground
Second part is a bit more complicated but still possible. Instead of populate you need to run $lookup to bring the data from other collection. The problem is that your Person values are deeply nested so you need to prepare a list of id values before using $reduce and $setUnion. Once you get the data you need to merge your nested objects with people entities using $map and $mergeObjects.
{
$addFields: {
people: {
$reduce: {
input: "$ItemReport",
initialValue: [],
in: { $setUnion: [ "$$value", "$$this.WorkDoneBy.Person" ] }
}
}
}
},
{
$lookup: {
from: "people",
localField: "peopleIds",
foreignField: "_id",
as: "people"
}
},
{
$project: {
_id: 1,
ItemReport: {
$map: {
input: "$ItemReport",
as: "ir",
in: {
WorkDoneBy: {
$map: {
input: "$$ir.WorkDoneBy",
as: "wdb",
in: {
$mergeObjects: [
"$$wdb",
{
Person: { $arrayElemAt: [{ $filter: { input: "$people", cond: { $eq: [ "$$this._id", "$$wdb.Person" ] } } } , 0] }
}
]
}
}
}
}
}
}
}
}
Complete Solution

Query by data already in the object

I'm writing a query that gets data from "coll2" based on data that is inside "coll1".
Coll1 has the following data structure:
{
"_id": "asdf",
"name": "John",
"bags": [
{
"type": "typ1",
"size": "siz1"
},
{
"type": "typ2",
"size": "siz2"
}
]
}
Coll2 has the following data structure:
{
_id: "qwer",
coll1Name: "John",
types: ["typ1", "typ3"],
sizes: ["siz1", "siz4"]
}
{
_id: "zxcv",
coll1Name: "John",
types: ["typ2", "typ3"],
sizes: ["siz1", "siz2"]
}
{
_id: "fghj",
coll1Name: "John",
types: ["typ2", "typ3"],
sizes: ["siz1", "siz4"]
}
I want to get all the documents in coll2 that have the same Type+Size combo as in coll1 using the $lookup stage of the aggregation pipeline. I understand that this can be achieved by using the $lookup pipeline and $expr but I cant seem to figure out how to dynamically make a query to pass into the $match stage.
The output I would like to get for the above data would be:
{
_id: "qwer",
coll1Name: "John",
types: ["typ1", "typ3"],
sizes: ["siz1", "siz4"]
}
{
_id: "zxcv",
coll1Name: "John",
types: ["typ2", "typ3"],
sizes: ["siz1", "siz2"]
}
You can use $lookup to get the data from Col2. Then you need to check if there's any element in Col2 ($anyElemenTrue) that matches with Col1. $map and $in can be used here. Then you just need to $unwind and promote Col2 to root level using $replaceRoot
db.Col1.aggregate([
{
$lookup: {
from: "Col2",
localField: "name",
foreignField: "coll1Name",
as: "Col2"
}
},
{
$project: {
Col2: {
$filter: {
input: "$Col2",
as: "c2",
cond: {
$anyElementTrue: {
$map: {
input: "$bags",
as: "b",
in: {
$and: [
{ $in: [ "$$b.type", "$$c2.types" ] },
{ $in: [ "$$b.size", "$$c2.sizes" ] },
]
}
}
}
}
}
}
}
},
{
$unwind: "$Col2"
},
{
$replaceRoot: {
newRoot: "$Col2"
}
}
])
You are correct in your approach to use $lookup with the pipeline field to filter the input documents in the $match pipeline
The $expr expression should typically follow
"$expr": {
"$and": [
{ "$eq": [ "$name", "$$coll1_name" ] },
{ "$setEquals": [ "$bags.type", "$$types" ] },
{ "$setEquals": [ "$bags.size", "$$sizes" ] }
]
}
where the first match expression in the $and conditional { "$eq": [ "$name", "$$coll1_name" ] } checks to see if the name field in coll1 collection matches the coll1Name field in the input documents from coll2.
Of course the fields from coll2 should be defined in a variable in the pipeline with the let field for the $lookup pipeline to access them.
The other match filters are basically checking if the arrays are equal where "$bags.type" from coll1 resolves to an array of types i.e. [ "typ1", "typ3" ] for example.
On getting the output field from $lookup which happens to be an array, you can filter the documents in coll2 on that array field where there can be some empty lists as a resul of the above $lookup pipeline $match filter:
{ "$match": { "coll1Data.0": { "$exists": true } } }
Overall your aggregate pipeline operation would be as follows:
db.getCollection('coll2').aggregate([
{ "$lookup" : {
"from": "coll1",
"let": { "coll1_name": "$coll1Name", "types": "$types", "sizes": "$sizes" },
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$eq": [ "$name", "$$coll1_name" ] },
{ "$setEquals": [ "$bags.type", "$$types" ] },
{ "$setEquals": [ "$bags.size", "$$sizes" ] }
]
}
} }
],
"as": "coll1Data"
} },
{ "$match": { "coll1Data.0": { "$exists": true } } },
{ "$project": { "coll1Data": 0 } }
])

MongoDB: Concatenate Multiple Arrays

I have 3 arrays of ObjectIds I want to concatenate into a single array, and then sort by creation date. $setUnion does precisely what I want, but I'd like to try without using it.
Schema of object I want to sort:
var chirpSchema = new mongoose.Schema({
interactions: {
_liked : ["55035390d3e910505be02ce2"] // [{ type: $oid, ref: "interaction" }]
, _shared : ["507f191e810c19729de860ea", "507f191e810c19729de860ea"] // [{ type: $oid, ref: "interaction" }]
, _viewed : ["507f1f77bcf86cd799439011"] // [{ type: $oid, ref: "interaction" }]
}
});
Desired result: Concatenate _liked, _shared, and _viewed into a single array, and then sort them by creation date using aggregate pipeline. See below
["507f1f77bcf86cd799439011", "507f191e810c19729de860ea", "507f191e810c19729de860ea", "55035390d3e910505be02ce2"]
I know I'm suppose to use $push, $each, $group, and $unwind in some combination or other, but I'm having trouble piecing together the documenation to make this happen.
Update: Query
model_user.aggregate([
{ $match : { '_id' : { $in : following } } }
, { $project : { 'interactions' : 1 } }
, { $project : {
"combined": { $setUnion : [
"$interactions._liked"
, "$interactions._shared"
, "$interactions._viewed"
]}
}}
])
.exec(function (err, data) {
if (err) return next(err);
next(data); // Combined is returning null
})
If all the Object _id values are "unique" then $setUnion is your best option. It is of course not "ordered" in any way as it works with a "set", and that does not guarantee order. But you can always unwind and $sort.
[
{ "$project": {
"combined": { "$setUnion": [
{ "$ifNull": [ "$interactions._liked", [] ] },
{ "$ifNull": [ "$interactions._shared", [] ] },
{ "$ifNull", [ "$interactions._viewed", [] ] }
]}
}},
{ "$unwind": "$combined" },
{ "$sort": { "combined": 1 } },
{ "$group": {
"_id": "$_id",
"combined": { "$push": "$combined" }
}}
]
Of course again since this is a "set" of distinct values you can do the old way instead with $addToSet, after processing $unwind on each array:
[
{ "$unwind": "$interactions._liked" },
{ "$unwind": "$interactions._shared" },
{ "$unwind": "$interactions._viewed" },
{ "$project": {
"interactions": 1,
"type": { "$const": [ "liked", "shared", "viewed" ] }
}}
{ "$unwind": "$type" },
{ "$group": {
"_id": "$_id",
"combined": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "liked" ] },
"$interactions._liked",
{ "$cond": [
{ "$eq": [ "$type", "shared" ] },
"$interactions._shared",
"$interactions._viewed"
]}
]
}
}
}},
{ "$unwind": "$combined" },
{ "$sort": { "combined": 1 } },
{ "$group": {
"_id": "$_id",
"combined": { "$push": "$combined" }
}}
]
But still the same thing applies to ordering.
Future releases even have the ability to concatenate arrays without reducing to a "set":
[
{ "$project": {
"combined": { "$concatArrays": [
"$interactions._liked",
"$interactions._shared",
"$interactions._viewed"
]}
}},
{ "$unwind": "$combined" },
{ "$sort": { "combined": 1 } },
{ "$group": {
"_id": "$_id",
"combined": { "$push": "$combined" }
}}
]
But still there is no way to re-order the results without procesing $unwind and $sort.
You might therefore consider that unless you need this grouped across multiple documents, that the basic "contenate and sort" operation is best handled in client code. MongoDB has no way to do this "in place" on the array at present, so per document in client code is your best bet.
But if you do need to do this grouping over multiple documents, then the sort of approaches as shown here are for you.
Also note that "creation" here means creation of the ObjectId value itself and not other properties from your referenced objects. If you need those, then you perform a populate on the id values after the aggregation or query instead, and of course sort in client code.

Resources