I have a variable var correctAnswers;
In my MongoDB I have the following document (below). I am trying to write a query that takes all of the "correct" fields from the "quiz" field and put them into their own array, so I can set that array equal to var correctAnswers;.
"title" : "Economics questions"
"quiz": "[{
"question": "Which of these involves the analysis of of a business's financial statements, often used in stock valuation?",
"choices": ["Fundamental analysis", "Technical analysis"],
"correct": 0
}, {
"question": "What was the name of the bond purchasing program started by the U.S. Federal Reserve in response to the 2008 financial crisis?",
"choices": ["Stimulus Package", "Mercantilism", "Quantitative Easing"],
"correct": 2
}, {
"question": "Which term describes a debt security issued by a government, company, or other entity?",
"choices": ["Bond", "Stock", "Mutual fund"],
"correct": 0
}, {
"question": "Which of these companies has the largest market capitalization (as of October 2015)?",
"choices": ["Microsoft", "General Electric", "Apple", "Bank of America"],
"correct": 2
}, {
"question": "Which of these is a measure of the size of an economy?",
"choices": ["Unemployment rate", "Purchasing power index", "Gross Domestic Product"],
"correct": 2
}]"
How should I go about that, or can someone point me in the right direction? I have tried projections, but should I do an aggregation? Thank you for any help.
Edit for clarity: the output I am looking for in this example is an array, [0,2,0,2,2]
you can get this result
[{correct:0},{correct:2},{correct:0},{correct:2}] but [0,2,0,2,2] type of result is not possible unless we use distinct
db.quiz.aggregate(
// Initial document match (uses index, if a suitable one is available)
{ $match: {
"title" : "Economics questions"
}},
// Convert embedded array into stream of documents
{ $unwind: '$quiz' },
},
// Note: Could add a `$group` by _id here if multiple matches are expected
// Final projection: exclude fields with 0, include fields with 1
{ $project: {
_id: 0,
score: "$quiz.correct"
}} )
db.users.find( { }, { "quiz.correct": 1,"_id":0 } )
// above query will return following output :
{
"quiz" : [
{
"correct" : 0
},
{
"correct" : 2
},
{
"correct" : 0
},
{
"correct" : 2
},
{
"correct" : 2
}
]
}
Process this output as per requirement in the node js.
Try this:
db.getCollection('quize').aggregate([
{$match:{_id: id }},
{$unwind:'$quiz'},
{$group:{
_id:null,
score: {$push:"$quiz.correct"}
}}
])
It will give you the expected output.
One way to achieve this through aggregation
db.collectionName.aggregate([
// use index key in match pipeline,just for e.g using title here
{ $match: { "title" : "Economics questions" }},
{ $unwind: "$quiz" },
{ $group: {
_id:null,
quiz: { $push: "$quiz.correct" }
}
},
//this is not required, use projection only if you want to exclude/include fields
{
$project: {_id: 0, quiz: 1}
}
])
Above query will give you the following output
{
"quiz" : [ 0, 2, 0, 2, 2 ]
}
Then simply process this output as per your need.
Related
this is my sample data in this I have a userId and a array "watchHistory", "watchHistory" array contains the list of videos that is watched by the user :
{
"_id": "62821344445c30b35b441f11",
"userId": 579,
"__v": 0,
"watchHistory": [
{
"seenTime": "2022-05-23T08:29:19.781Z",
"videoId": 789456,
"uploadTime": "2022-03-29T12:33:35.312Z",
"description": "Biography of Indira Gandhi",
"speaker": "andrews",
"title": "Indira Gandhi",
"_id": "628b45df775e3973f3a670ec"
},
{
"seenTime": "2022-05-23T08:29:39.867Z",
"videoId": 789455,
"uploadTime": "2022-03-31T07:37:39.712Z",
"description": "What are some healthy food habits to stay healthy",
"speaker": "morris",
"title": "Healthy Food Habits",
"_id": "628b45f3775e3973f3a670"
},
]
}
I need to match the userId and after that i need to sort it with "watchHistory.seenTime", seenTime field indicates when the user saw the video. so i need to sort like the last watched video should come first in the list.
I don't have permission to use unwind so can any one help me from this. Thank you.
If you are using MongoDB version 5.2 and above, you can use $sortArray operator in an aggregation pipeline. Your pipeline should look something like this:
db.collection.aggregate(
[
{"$match":
{ _id: '62821344445c30b35b441f11' }
},
{
"$project": {
_id: 1,
"userId": 1,
"__v": 1,
"watchHistory": {
"$sortArray": { input: "$watchHistory", sortBy: { seenTime: -1 }}
}
}
}
]
);
Please modify the filter for "$match" stage, according to the key and value you need to filter on. Here's the link to the documentation.
Without using unwind, it's not possible to do it via an aggregation pipeline, but you can use update method and $push operator, as a workaround like this:
db.collection.update({
_id: "62821344445c30b35b441f11"
},
{
$push: {
watchHistory: {
"$each": [],
"$sort": {
seenTime: -1
},
}
}
})
Please see the working example here
I'm building a ranking system for my book collection ordered by award count and below is the result returns from my database
{
'awardCount' : 12000,
'year' : 1967,
'by' : 'IEEE Computer Society'
},
{
'awardCount' : 11230,
'year' : 1993,
'by' : 'National Academy of Engineering'
},
{
'awardCount' : 10600,
'year' : 1993,
'by' : 'National Academy of Engineering'
}
........about 10000+ more and sorted by awardCount
I use this query to get the result above
.find()
.sort({awardCount: -1 })
my question is if it is possible to have add 'rank' field in each item to show ranking order? For example
{
'awardCount' : 12000,
'year' : 1967,
'by' : 'IEEE Computer Society',
'rank': 1
},
{
'awardCount' : 11230,
'year' : 1993,
'by' : 'National Academy of Engineering',
'rank': 2
}
if not, what would be the best solution to get ranking in this situation? Thank you very much!
you could use mongodb aggregate to achieve this.You basically sort the documents by wins and kills and then push each user in a new array while preserving the index, this index is the users' rank after that you can simply use 'unwind' on this newly created array and find the user in question, this will give you the user's rank with other details.
Here is a sample aggregate pipeline:
[{
"$sort": {
"awardCount": -1
}
},
{
"$group": {
"_id": false,
"books": {
"$push": {
"_id": "$_id",
"by": "$by",
"year": "$year",
"awardCount": "$awardCount"
}
}
}
},
{
"$unwind": {
"path": "$books",
"includeArrayIndex": "rank"
}
}]
I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,
With this in mind, I have a job model with schema like below :
{
name: {
type: String,
required: true
},
info: String,
active: {
type: Boolean,
default: true
},
all_service: [
price: {
type: Number,
min: 0,
required: true
},
all_sub_item: [{
name: String,
price:{ // << -- this is the price I want to calculate
type: Number,
min: 0
},
owner: {
user_id: { // <<-- here is the filter I want to put
type: Schema.Types.ObjectId,
required: true
},
name: String,
...
}
}]
],
date_create: {
type: Date,
default : Date.now
},
date_update: {
type: Date,
default : Date.now
}
}
I would like to have a sum of price column, where owner is present, I tried below but no luck
Job.aggregate(
[
{
$group: {
_id: {}, // not sure what to put here
amount: { $sum: '$all_service.all_sub_item.price' }
},
$match: {'not sure how to limit the user': given_user_id}
}
],
//{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
function(err, summary) {
console.log(err);
console.log(summary);
}
);
Could someone guide me in the right direction. thank you in advance
Primer
As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.
The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.
Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.
You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.
However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.
So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:
{
"a": 1,
"b": [
{
"c": 1,
"d": [
{ "e": 1 }, { "e": 2 }
]
},
{
"c": 2,
"d": [
{ "e": 1 }, { "e": 2 }
]
}
]
}
Becomes this:
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }
And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:
db.collection.aggregate([
{ "$unwind": "$b" },
{ "$unwind": "$b.d" }
])
So there the "pipe" first array from "$b" like so:
{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.
Solving
With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.
So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.
Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.
So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Match again to filter the array elements
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$ifNull": [ "$owner", given_user_id ] },
given_user_id
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$project": {
"all_items": {
"$setDifference": [
{ "$map": {
"input": "$all_service",
"as": "A",
"in": {
"$setDifference": [
{ "$map": {
"input": "$$A.all_sub_item",
"as": "B",
"in": {
"$cond": {
"if": { "$eq": [ "$$B.owner", given_user_id ] },
"then": "$$B",
"else": false
}
}
}},
false
]
}
}},
[[]]
]
}
}},
// De-normalize the "two" level array. "Double" $unwind
{ "$unwind": "$all_items" },
{ "$unwind": "$all_items" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_items.price" }
}}
],
function(err,results) {
}
)
That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.
A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.
The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:
[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]
As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.
Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.
So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.
It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL".
On that assumption, you'll want to do your $match first, to reduce the input set to your sum. So your first stage should be something like
$match: { all_service.all_sub_items.owner : { $exists: true } }
Think of this as then passing all matching documents to your second stage.
Now, because you are summing an array, you have to do another step. Aggregation operators work on documents - there isn't really a way to sum an array. So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. Think of this as a cross join. This will be $unwind.
$unwind: { "$all_service.all_sub_items" }
Now you've just made a much larger number of documents, but in a form where we can sum them. Now we can perform the $group. In your $group, you specify a transformation. The line:
_id: {}, // not sure what to put here
is creating a field in the output document, which is not the same documents as the input documents. So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. But this will allow $sum to work.
I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct.
$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }
This will output documents with an _id field equivalent to your original document ID, and your sum.
I'll let you put it together, I'm not super familiar with node. You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. Good luck!
I have a database with 800+ different bars, clubs and restaurants across Australia.
I want to build a list of links for my website counting the number of different venues across different suburbs and primary categories.
Like this:
Restaurants, Bowen Hills (15)
Restaurants, Dawes Point (6)
Clubs, Sydney (138)
I could do it the hard way by first getting all venues. Then run a Venue.distinct('details.location.suburb') to get all the unique suburbs.
From here I could run subsequent queries to get the count for the number of venues in that particular suburb and category.
It will be a lot of calls though. There's got to be better way?
Can the Mongo aggregation framework help here?
It seems to be impossible to do this in a single query.
Here's the Venue model:
{
"name" : "Johnny's Bar & Grill",
"meta" : {
"category" : {
"all" : [
"restaurant",
"bar"
],
"primary" : "restaurant"
}
},
"details" : {
"location" : {
"streetNumber" : "180",
"streetName" : "abbotsford road",
"suburb" : "bowen hills",
"city" : "brisbane",
"postcode" : "4006",
"state" : "qld",
"country" : "australia"
},
"contact" : {
"phone" : [
"(07) 5555 5555"
]
}
}
}
}
Here's the prettified solution from BatScream that I ended up using:
Venue.aggregate([
{
$group: {
_id: {
primary: '$meta.category.primary',
suburb: '$details.location.suburb',
country: '$details.location.country',
state: '$details.location.state',
city: '$details.location.city'
},
count: {
$sum: 1
},
type: {
$first: '$meta.category.primary'
}
}
},
{
$sort: {
count: -1
}
},
{
$limit: 50
},
// Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
{
$project: {
_id: 0,
type : '$type',
location : '$_id.suburb',
count: 1
}
}
],
function(err, res){
next(err, res);
});
}
You can get a very useful and easily transformable output using the following aggregation operation.
Group the records based on their country, category, state, city and
suburb.
Get the count of the records in each group.
Obtain the type of the group from the first record of the group.
Project the necessary fields.
Code:
db.collection.aggregate([
{$group:{"_id":{"primary":"$meta.category.primary",
"suburb":"$details.location.suburb",
"country":"$details.location.country",
"state":"$details.location.state",
"city":"$details.location.city"},
"count":{$sum:1},
"type":{$first:"$meta.category.primary"}}},
{$sort:{"count":-1}},
{$project:{"_id":0,
"type":"$type",
"location":"$_id.suburb",
"count":1}}
])
sample o/p:
{ "count" : 1, "type" : "restaurant", "location" : "bowen hills" }
I'm trying to aggregate datas by Date in Mongo, but I can't quite achieve what I want.
Right now, I'm using this:
db.aggregData.aggregate( { $group: {_id: "$Date".toString(),
tweets: { $sum: "$CrawledTweets"} } },
{ $match:{ _id: {$gte: ISODate("2013-03-19T12:31:00.247Z") }}},
{ $sort: {Date:-1} }
)
It results with this:
"result" : [
{
"_id" : ISODate("2013-03-19T12:50:00.641Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:45:00.631Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:55:00.640Z"),
"tweets" : 123
},
{
"_id" : ISODate("2013-03-19T12:40:00.628Z"),
"tweets" : 91
},
{
"_id" : ISODate("2013-03-19T12:31:00.253Z"),
"tweets" : 43
},
{
"_id" : ISODate("2013-03-19T13:20:00.652Z"),
"tweets" : 125
},
{
"_id" : ISODate("2013-03-19T12:31:00.252Z"),
"tweets" : 30
}
],
"ok" : 1
It seems to do the job, but with further inspection, we see that there is repetition:
ISODate("2013-03-19T12:31:00.253Z") and ISODate("2013-03-19T12:31:00.252Z").
The only thing that changes is the last bit before the Z.
So here is my question. What is this part ? And how can I do to ignore it in the aggregation ?
Thank you in advance.
EDIT: I wanna aggregate by date, so whole year/month/day + hour and minute. I don't care of the rest.
EDIT: My db in on mongolab, so I'm on 2.2
Well, I did it another way: I save all my date with seconds/milliseconds at 0. So I can keep a simple aggregate, with not a little more code server side, thanks to moment.js
You are trying to aggregate by "whole" date, in other words to drop the time from ISODate(), right? There are several ways to do it, I describe them in detail on my blog in the post called
Stupid Date Tricks with Aggregation Framework.
You can see the full step-by-step breakdown there, but to summarize you have two choices:
if you don't care about the aggregated-on value to be an ISODate() then you can use the {$year}, {$month} and {$dayOfMonth} operators in {$project} phase to pull out just Y-M-D to then {$group} on.
if you do care about the grouped-on value staying an ISODate you can {$subtract} the time part in {$project} phase and be left with ISODate() type - the caveat is that this method requires MongoDB 2.4 (just released) which adds support for date arithmetic and for $millisecond operator (see exact code in the blog post).
Here is probably what you want:
db.aggregData.aggregate([
{
$project:{
CrawledTweets: 1,
newDate: {
year:{$year:"$Date"},
month: {$month:"$Date"},
day: {$dayOfMonth:"$Date"},
hour: {$hour: "$Date"},
min: {$minute: "$Date"}
}
}
},
{
$group: {
_id: "$newDate",
tweets: { $sum: "$CrawledTweets"}
}
}
])
Without being a Mongo expert and without knowing your db fields I'd come up with something like this. Perhaps you can build upon this:
db.aggregData.aggregate(
{
$project:{
CrawledTweets: 1,
groupedTime: {
year:{$year:"$_id"},
month: {$month:"$_id"},
day: {$dayOfMonth:"$_id"},
hour: {$hour: "$_id"},
min: {$minute: "$_id"}
}
}
},
{
$group: {
_id: { groupedTime: "$CrawledTweets" },
tweets: { $sum: "$tweets"}
}
}
)
You can now use the MongoDB date aggregation operators, I have a post on my blog that goes over the Schema setup, using it in Node.js, etc:
http://smyl.es/how-to-use-mongodb-date-aggregation-operators-in-node-js-with-mongoose-dayofmonth-dayofyear-dayofweek-etc/