This question already has answers here:
Add some kind of row number to a mongodb aggregate command / pipeline
(3 answers)
Closed 4 years ago.
I have following collection
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker"
}
I need to $project one more key index with $concat with 'INV-00' + index of the root element
My output should be something like that
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne",
"index": "INV-001"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent",
"index": "INV-002"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker",
"index": "INV-003"
}
and can I change createdAt format to this Thu Jan 18 2018 using $dateToString or something else???
Thanks in advance!!!
While I would certainly recommend you to do that on the client side as opposed to inside MongoDB, here is how you could get what you want - pretty brute-force but working:
db.collection.aggregate([
// you should add a $sort stage here to make sure you get the right indexes
{
$group: {
_id: null, // group all documents into the same bucket
docs: { $push: "$$ROOT" } // just to create an array of all documents
}
}, {
$project: {
docs: { // transform the "docs" field
$map: { // into something
input: { $range: [ 0, { $size: "$docs" } ] }, // an array from 0 to n - 1 where n is the number of documents
as: "this", // which shall be accessible using "$$this"
in: {
$mergeObjects: [ // we join two documents
{ $arrayElemAt: [ "$docs", "$$this" ] }, // one is the nth document in our "docs" array
{ "index": { $concat: [ 'INV-00', { $substr: [ { $add: [ "$$this", 1 ] }, 0, -1 ] } ] } } // and the second document is the one with our "index" field
]
}
}
}
}
}, {
$unwind: "$docs" // flatten the result structure
}, {
$replaceRoot: {
newRoot: "$docs" // restore the original document structure
}
}])
Related
I have a data sample something like this:
"diagnostics" : {
"_ID" : "554bbf7b761e06f02fef3561",
"tests" : [
{
"_id" : "59d678064e4645ec562a37e2",
"name" : "RBC",
},
{
"_id" : "59d678064e4645ec562a37e1",
"name" : "Calcium",
}
]
}
I want to get all distinct _ID and count of all test groups in with there names
which is something like this:
"_ID" : "554bbf7b761e06f02fef3561"{ {"name" : "Calcium", count :(count of Calcium)},{"name" : "RBC", count :(count of RBC)}
Thing to keep in mind are tests is inside diagnostics and contain any number of $name field it can be two or one or any number of times and I want individual count of each distinct name .
db.collection('transactions').aggregate([
{ $unwind : '$diagnostics.tests' },
{ $group : {
_id: {
"Test_Name" : '$diagnostics.tests.name',
"ID" : '$diagnostics._id'
},
test_count: { $sum: 1 }
}
}
])
and I am getting result something like this
[
{
"_id": {
"Test_Name": "Fasting Blood Sugar",
"ID": "554bbf7b761e06f02fef3561"
},
"test_count": 76
},
{
"_id": {
"Test_Name": "Fasting Blood Sugar",
"ID": "566726c35dc18d13242fffcc"
},
"test_count": 1
},
{
"_id": {
"Test_Name": "CBC - 7 Part",
"ID": "566726c35dc18d13242fffcc"
},
"test_count": 1
},
{
"_id": {
"Test_Name": "RBC",
"ID": "554bbf7b761e06f02fef3561"
},
"test_count": 1
},
{
"_id": {
"Test_Name": "Fasting Blood Sugar",
"ID": "5a2c9edfe0d0ec71aef1e526"
},
"test_count": 6
},
{
"_id": {
"Test_Name": "Calcium",
"ID": "554bbf7b761e06f02fef3561"
},
"test_count": 77
}
]
Can anybody help me with the query?
You need to use mulitple $group stages here.
First $unwind the tests and $group it by "name" and then resize it to original and lastly then $group by "diagnostics_ID" and for the tests count you can check the $size of the "tests" array.
db.collection.aggregate([
{ "$unwind": "$diagnostics.tests" },
{ "$group": {
"_id": {
"_id": "$diagnostics.tests.name",
"diagnosticID": "$diagnostics._ID"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"_ID": "$_id.diagnosticID"
},
"tests": {
"$push": {
"name": "$_id._id",
"count": "$count"
}
}
}},
{ "$project": {
"diagnostics._ID": "$_id._ID",
"diagnostics.tests": "$tests",
"_id": 0,
"testCount": { "$size": "$tests" }
}}
])
Code snippet
I have following collection
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress...",
}
I need to group by status and get all the keys dynamically which are in status
[
{
"completed": [
{
"_id": "5b18d31a27a37696ec8b5773",
"status": "completed",
"description": "completed..."
}
]
},
{
"pending": [
{
"_id": "5b18d14cbc83fd271b6a157c",
"status": "pending",
"description": "You have to complete the challenge..."
},
{
"_id": "5b18d31a27a37696ec8b5775",
"status": "pending",
"description": "pending..."
}
]
},
{
"inProgress": [
{
"_id": "5b18d31a27a37696ec8b5776",
"status": "inProgress",
"description": "inProgress..."
}
]
}
]
Not that I think it's a good idea and mostly because I don't see any "aggregation" here at all is that after "grouping" to add to an array you similarly $push all that content into array by the "status" grouping key and then convert into keys of a document in a $replaceRoot with $arrayToObject:
db.collection.aggregate([
{ "$group": {
"_id": "$status",
"data": { "$push": "$$ROOT" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": "$_id",
"v": "$data"
}
}
}},
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$data" }
}}
])
Returns:
{
"inProgress" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress..."
}
],
"completed" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed..."
}
],
"pending" : [
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge..."
},
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending..."
}
]
}
That might be okay IF you actually "aggregated" beforehand, but on any practically sized collection all that is doing is trying force the whole collection into a single document, and that's likely to break the BSON Limit of 16MB, so I just would not recommend even attempting this without "grouping" something else before this step.
Frankly, the same following code does the same thing, and without aggregation tricks and no BSON limit problem:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d => {
if (!obj.hasOwnProperty(d.status))
obj[d.status] = [];
obj[d.status].push(d);
})
printjson(obj);
Or a bit shorter:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d =>
obj[d.status] = [
...(obj.hasOwnProperty(d.status)) ? obj[d.status] : [],
d
]
)
printjson(obj);
Aggregations are used for "data reduction" and anything that is simply "reshaping results" without actually reducing the data returned from the server is usually better handled in client code anyway. You're still returning all data no matter what you do, and the client processing of the cursor has considerably less overhead. And NO restrictions.
I have following users collection
[{
"_id" : ObjectId("5afadfdf08a7aa6f1a27d986"),
"firstName" : "bruce",
"friends" : [ ObjectId("5afd1c42af18d985a06ac306"),ObjectId("5afd257daf18d985a06ac6ac") ]
},
{
"_id" : ObjectId("5afbfe21daf4b13ddde07dbe"),
"firstName" : "clerk",
"friends" : [],
}]
and have friends collection
[{
"_id" : ObjectId("5afd1c42af18d985a06ac306"),
"recipient" : ObjectId("5afaab572c4ec049aeb0bcba"),
"requester" : ObjectId("5afadfdf08a7aa6f1a27d986"),
"status" : 2,
},
{
"_id" : ObjectId("5afd257daf18d985a06ac6ac"),
"recipient" : ObjectId("5afadfdf08a7aa6f1a27d986"),
"requester" : ObjectId("5afbfe21daf4b13ddde07dbe"),
"status" : 1,
}]
suppose I have an user logged in with _id: "5afaab572c4ec049aeb0bcba" and this _id matches the recipient of the friends
Now I have to add a field friendsStatus which contains the status from friends collection... And if does not matches the any recipient from the array then its status should be 0
So when I get all users then my output should be
[{
"_id" : ObjectId("5afadfdf08a7aa6f1a27d986"),
"firstName" : "bruce",
"friends" : [ ObjectId("5afd1c42af18d985a06ac306") ],
"friendStatus": 2
},
{
"_id" : ObjectId("5afbfe21daf4b13ddde07dbe"),
"firstName" : "clerk",
"friends" : [],
"friendStatus": 0
}]
Thanks in advance!!!
If you have MongoDB 3.6 then you can use $lookup with a "sub-pipeline"
User.aggregate([
{ "$lookup": {
"from": Friend.collection.name,
"let": { "friends": "$friends" },
"pipeline": [
{ "$match": {
"recipient": ObjectId("5afaab572c4ec049aeb0bcba"),
"$expr": { "$in": [ "$_id", "$$friends" ] }
}},
{ "$project": { "status": 1 } }
],
"as": "friends"
}},
{ "$addFields": {
"friends": {
"$map": {
"input": "$friends",
"in": "$$this._id"
}
},
"friendsStatus": {
"$ifNull": [ { "$min": "$friends.status" }, 0 ]
}
}}
])
For earlier versions, it's ideal to actually use $unwind in order to ensure you don't breach the BSON Limit:
User.aggregate([
{ "$lookup": {
"from": Friend.collection.name,
"localField": "friends",
"foreignField": "_id",
"as": "friends"
}},
{ "$unwind": { "path": "$friends", "preserveNullAndEmptyArrays": true } },
{ "$match": {
"$or": [
{ "friends.recipient": ObjectId("5afaab572c4ec049aeb0bcba") },
{ "friends": null }
]
}},
{ "$group": {
"_id": "$_id",
"firstName": { "$first": "$firstName" },
"friends": { "$push": "$friends._id" },
"friendsStatus": {
"$min": {
"$ifNull": ["$friends.status",0]
}
}
}}
])
There is "one difference" from the most optimal form here in that the pipeline optimization does not actually "roll-up" the $match condition into the $lookup itself:
{
"$lookup" : {
"from" : "friends",
"as" : "friends",
"localField" : "friends",
"foreignField" : "_id",
"unwinding" : {
"preserveNullAndEmptyArrays" : true
}
}
},
{
"$match" : { // <-- outside will preserved array
Because of the preserveNullAndEmptyArrays option being true then the "fully optimized" action where the condition would actually be applied to the foreign collection "before" results are returned does not happen.
So the only purpose of unwinding here is purely to avoid what would normally be a target "array" from the $lookup result causing the parent document to grow beyond the BSON Limit. Additional conditions of the $match are then applied "after" this stage. The default $unwind without the option presumes false for the preservation and a matching condition is added instead to do this. This of course would result in the documents with no foreign matches being excluded.
And not really advisable because of that BSON Limit, but there is also applying $filter to the resulting array of $lookup:
User.aggregate([
{ "$lookup": {
"from": Friend.collection.name,
"localField": "friends",
"foreignField": "_id",
"as": "friends"
}},
{ "$addFields": {
"friends": {
"$map": {
"input": {
"$filter": {
"input": "$friends",
"cond": {
"$eq": [
"$$this.recipient",
ObjectId("5afaab572c4ec049aeb0bcba")
]
}
}
},
"in": "$$this._id"
}
},
"friendsStatus": {
"$ifNull": [
{ "$min": {
"$map": {
"input": {
"$filter": {
"input": "$friends",
"cond": {
"$eq": [
"$$this.recipient",
ObjectId("5afaab572c4ec049aeb0bcba")
]
}
}
},
"in": "$$this.status"
}
}},
0
]
}
}}
])
In either case we're basically adding the "additional condition" to the join being not just on the directly related field but also with the additional constraint of the queried ObjectId value for "recipient".
Not really sure what you are expecting for "friendsStatus" since the result is an array and there can possibly be more than one ( as far as I know ) and therefore just applying $min here to extract one value from the array in either case.
The governing condition in each case is $ifNull which is applied where there isn't anything in the "friends" output array to extract from and then you simply return the result of 0 where that is the case.
All output the same thing:
{
"_id" : ObjectId("5afadfdf08a7aa6f1a27d986"),
"firstName" : "bruce",
"friends" : [
ObjectId("5afd1c42af18d985a06ac306")
],
"friendsStatus" : 2
}
{
"_id" : ObjectId("5afbfe21daf4b13ddde07dbe"),
"firstName" : "clerk",
"friends" : [ ],
"friendsStatus" : 0
}
Given collection(#name: users) Structure:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"ts" : ISODate("2016-06-17T18:30:00.996Z")
},
{
"count" : 31,
"ts" : ISODate("2016-06-18T18:30:00.288Z")
}
]
}
I want to query this collection based on username field, and ts to be returned in 'yyyy-mm-dd' format.
Expected Output:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"date" : "2016-06-17"
},
{
"count" : 31,
"date" : "2016-06-18"
}
]
}
I have tried something like this:
db.users.aggregate([
{$match:{"username":"xyz"}},
{$project:{ "followers":{"count":1,
"date":"$followers.ts.toISOString().slice(0,10).replace(/-/g,'-')"
}}
}
])
But it doesn't seems to be working. Can anyone please help?
Thanks much.
Consider running an aggregation pipeline that will allow you to flatten the data list first, project the new field using the $dateToString operator, then regroup the flattened docs to get your desired result.
The above can be shown in three distinct pipelines:
db.users.aggregate([
{ "$match": { "username": "xyz" } },
{ "$unwind": "$followers" },
{
"$project": {
"username": 1,
"count": "$followers.count",
"date": { "$dateToString": { "format": "%Y-%m-%d", "date": "$followers.ts" } }
}
},
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"followers": { "$push": {
"count": "$count",
"date": "$date"
}}
}
}
])
With MongoDB 3.4 and newer, you can use the new $addFields pipeline step together with $map to create the array field without the need to unwind and group:
db.users.aggregate([
{ "$match": { "username": "xyz" } },
{
"$addFields": {
"followers": {
"$map": {
"input": "$followers",
"as": "follower",
"in": {
"count": "$$follower.count",
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$$follower.ts"
}
}
}
}
}
}
}
])
The best and easiest way to do this is to transform each element in the array with the $map operator. Of course in the "in" expression, you need to use the $dateToString to convert you "date" to string using a format specifiers.
db.coll.aggregate(
[
{ "$match": { "username": "XYZ" } },
{ "$project": {
"username": 1,
"followers": {
"$map": {
"input": "$followers",
"as": "f",
"in": {
"count": "$$f.count",
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$$f.ts"
}
}
}
}
}
}}
]
)
which produces:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"date" : "2016-06-17"
},
{
"count" : 31,
"date" : "2016-06-18"
}
]
}
I'd like to get the "population" of each city's last timestamp using the aggregate function.
In a MongoDB like this:
{
"_id": {"$oid": "55354bc97b5dfd021f2be661"},
"timestamp": {"$date": "2015-04-20T18:56:09.000Z"},
"city": "Roma",
"population": [
{"age": 90,"count": 1000},
{"age": 25,"count": 25}
]
},
{
"_id": {"$oid": "55354c357b5dfd021f2be663"},
"timestamp": {"$date": "2015-04-20T18:57:57.000Z"},
"city": "Madrid",
"population": [
{"age": 90,"count": 10},
{"age": 75,"count": 2343},
{"age": 50,"count": 500},
{"age": 70,"count": 5000}
]
},
{
"_id": {"$oid": "55362da541c37aef07d4ea9a"},
"timestamp": {"$date": "2015-04-21T10:59:49.000Z"},
"city": "Roma",
"population": [
{"age": 90,"count": 5}
]
}
I'd like to retrieve all the cities, but for each one only the latest timestamp:
{
"city": "Roma",
"population": [
{"age": 90,"count": 5}
]
},
{
"city": "Madrid",
"population": [
{"age": 90,"count": 10},
{"age": 75,"count": 2343},
{"age": 50,"count": 500},
{"age": 70,"count": 5000}
]
}
I have tried something like this answer, but I don't know how to "unwind" the populations after getting the latest timestamp for each city:
db.collection('population').aggregate([
{ $unwind: '$population' },
{ $group: { _id: '$city', timestamp: { $max: '$timestamp' } } },
{ $sort: { _id : -1 } }
], function(err, results) {
res.send(results)
});
The following aggregation pipeline will give you the desired result. The first step in the pipeline orders the documents by the timestamp field (descending) and then groups the ordered documents by the city field in the next $group stage. Within the $group operator, you can extract the population array field by way of the $$ROOT operator. The $first operator returns the value that results from applying the $$ROOT expression to the first document in a group of documents that share the same city key. The final pipeline stage involves projecting the fields from the previous pipeline into the desired fields:
db.population.aggregate([
{
"$sort": { "timestamp": -1 }
},
{
"$group": {
"_id": "$city",
"doc": { "$first": "$$ROOT" }
}
},
{
"$project": {
"_id": 0,
"city": "$_id",
"population": "$doc.population"
}
}
]);
Output:
/* 0 */
{
"result" : [
{
"city" : "Madrid",
"population" : [
{
"age" : 90,
"count" : 10
},
{
"age" : 75,
"count" : 2343
},
{
"age" : 50,
"count" : 500
},
{
"age" : 70,
"count" : 5000
}
]
},
{
"city" : "Roma",
"population" : [
{
"age" : 90,
"count" : 5
}
]
}
],
"ok" : 1
}
I think that you want to use $project instead of $unwind:
db.collection('population').aggregate([{
$group: {
_id: '$city',
timestamp: {$max: '$timestamp'}
}
}, {
$project: {
population: '$doc.population'
}
}, {
$sort: {
_id : -1
}
}], function(err, results) {
res.send(results)
});
I use this to sort any timestamp field using aggregation, I am sorting it by the latest update time of the document. If you need you can group it later. You can learn more about [aggregate sorting here.][1]
aggregate.push({ $sort: { updated_at: -1 } });
What I do is I make blocks of aggregate actions push them into an array and execute it all together. I find it easier to debug if something is not working properly.
[1]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/