Dynamic keys after $group by - node.js

I have following collection
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress...",
}
I need to group by status and get all the keys dynamically which are in status
[
{
"completed": [
{
"_id": "5b18d31a27a37696ec8b5773",
"status": "completed",
"description": "completed..."
}
]
},
{
"pending": [
{
"_id": "5b18d14cbc83fd271b6a157c",
"status": "pending",
"description": "You have to complete the challenge..."
},
{
"_id": "5b18d31a27a37696ec8b5775",
"status": "pending",
"description": "pending..."
}
]
},
{
"inProgress": [
{
"_id": "5b18d31a27a37696ec8b5776",
"status": "inProgress",
"description": "inProgress..."
}
]
}
]

Not that I think it's a good idea and mostly because I don't see any "aggregation" here at all is that after "grouping" to add to an array you similarly $push all that content into array by the "status" grouping key and then convert into keys of a document in a $replaceRoot with $arrayToObject:
db.collection.aggregate([
{ "$group": {
"_id": "$status",
"data": { "$push": "$$ROOT" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": "$_id",
"v": "$data"
}
}
}},
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$data" }
}}
])
Returns:
{
"inProgress" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress..."
}
],
"completed" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed..."
}
],
"pending" : [
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge..."
},
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending..."
}
]
}
That might be okay IF you actually "aggregated" beforehand, but on any practically sized collection all that is doing is trying force the whole collection into a single document, and that's likely to break the BSON Limit of 16MB, so I just would not recommend even attempting this without "grouping" something else before this step.
Frankly, the same following code does the same thing, and without aggregation tricks and no BSON limit problem:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d => {
if (!obj.hasOwnProperty(d.status))
obj[d.status] = [];
obj[d.status].push(d);
})
printjson(obj);
Or a bit shorter:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d =>
obj[d.status] = [
...(obj.hasOwnProperty(d.status)) ? obj[d.status] : [],
d
]
)
printjson(obj);
Aggregations are used for "data reduction" and anything that is simply "reshaping results" without actually reducing the data returned from the server is usually better handled in client code anyway. You're still returning all data no matter what you do, and the client processing of the cursor has considerably less overhead. And NO restrictions.

Related

Pymongo query to extract value of a matching key

I have below document:
{
"_id": "61f7d5cfd0c32b744d3f81c2",
"_form": "61e66b8fd0c32b744d3e24a0",
"_workflow": "61e54fe2d0c32b744d3e0b7c",
"_appUser": "61e6b098d0c32b744d3e3808",
"sectionResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c3",
"name": "Project Details & Goals",
"order": 2,
"fieldResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c4",
"fieldType": "Text",
"name": "Project Name",
"value": "TRT",
"order": 0
},
{
"_id": "61f7d5cfd0c32b744d3f81c5",
"fieldType": "Number",
"name": "Amount Requested",
"value": "20",
"order": 1
},
{
"_id": "61f7d5cfd0c32b744d3f81c6",
"fieldType": "Number",
"name": "Project Cost",
"value": "50",
"order": 1
},
{
"_id": "61f7d5cfd0c32b744d3f81c7",
"fieldType": "Comment",
"name": "Project Goals",
"value": "TRT",
"order": 3
}
]
},
{
"_id": "61f7d5cfd0c32b744d3f81c8",
"name": "Section Heading",
"order": 2,
"fieldResponse": [{
"_id": "61f7d5cfd0c32b744d3f81c9",
"fieldType": "Multiselectdropdown",
"name": "Multiselectdropdown",
"value": "Y",
"order": 0
},
{
"_id": "61f7d5cfd0c32b744d3f81ca",
"fieldType": "Image_Upload",
"name": "Image Upload",
"value": "Y",
"order": 1
}
]
}
],
"order": 2,
"status": "Reviewed",
"updatedAt": "2022-01-31T12:27:59.541Z",
"createdAt": "2022-01-31T12:27:59.541Z",
"__v": 0
}
Inside the document, there is a sectionResponse which contains response of multiple sections. Inside this, there is a fieldResponse which contains the name and value. I have to extract the value from all the documents where name is Amount Requested.
How can I write a query for such a situation?
Here is a solution that returns only matching material and requires no $unwind.
db.foo.aggregate([
// This stage alone is enough to give you the basic info.
// You will get not only doc _id but also an array of arrays
// (fieldResponse within sectionResponse) containing the whole
// fieldResponse doc. It is slight awkward but if you need structural data
// other than *just* the value, it is a good start:
{$project: {
// outer filter removes inner filter results where size is 0
// i.e. no "Amount Requested" found.
XX: {$filter: {input:
{$map: {
input: "$sectionResponse", as: "z1", in:
// inner filter gets only name = Amount Requested
{$filter: {input: "$$z1.fieldResponse",
as: "z1",
cond: {$eq:["$$z1.name","Amount Requested"]}
}}
}},
as: "z2",
cond: {$ne: ["$$z2", [] ]}
}}
}}
which yields (given a slightly expanded input set where subdocs were copied but the value and order changed for clarity):
{
"_id" : 0,
"XX" : [
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "20",
"order" : 1
},
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "77",
"order" : 18
}
],
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "99",
"order" : 818
}
]
]
}
{
"_id" : 1,
"XX" : [
[
{
"_id" : "61f7d5cfd0c32b744d3f81c5",
"fieldType" : "Number",
"name" : "Amount Requested",
"value" : "333",
"order" : 1
}
]
]
}
{ "_id" : 2, "XX" : [ ] }
If you don't want to know about top level docs that contained
NO fieldResponses where name = "Amount Requested" then append this stage:
{$match: {XX: {$ne: [] }}}
Finally, if you really want just the values, append this reduce stage:
,{$addFields: {XX: {$reduce: {
input: "$XX",
initialValue: [],
in: {$concatArrays: ["$$value",
{$map: {input: "$$this",
as:"z",
in: "$$z.value"
}} ] }
}}
}}
which yields:
{ "_id" : 0, "XX" : [ "20", "77", "99" ] }
{ "_id" : 1, "XX" : [ "333" ] }
If you want a little more than just value(like order for example) then have $map return a doc instead of a scalar, e.g.:
{$map: {input: "$$this",
as:"z",
in: {v:"$$z.value",o:"$$z.order"}
}} ] }
to yield:
{
"_id" : 0,
"XX" : [
{
"v" : "20",
"o" : 1
},
{
"v" : "77",
"o" : 18
},
{
"v" : "99",
"o" : 818
}
]
}
{ "_id" : 1, "XX" : [ { "v" : "333", "o" : 1 } ] }
Again, the input set provided by the OP was expanded with additional {name:"Amount Requested"} subdocs tossed into different sectionResponse arrays to generate a more complex structure.
Maybe something like this which you may easy adapt to python supposing you need only value from sectionResponse.$[].fieldResponse.$[] elements having the name "Amount Requested":
db.collection.aggregate([
{
$match: {
"sectionResponse.fieldResponse.name": "Amount Requested"
}
},
{
"$project": {
"sectionResponse": {
"$filter": {
"input": {
"$map": {
"input": "$sectionResponse",
"as": "somesub",
"in": {
"_id": "$$somesub._id",
"fieldResponse": {
"$filter": {
"input": "$$somesub.fieldResponse",
"as": "sub",
"cond": {
"$eq": [
"$$sub.name",
"Amount Requested"
]
}
}
}
}
}
},
"as": "some",
"cond": {
"$gt": [
{
"$size": "$$some.fieldResponse"
},
0
]
}
}
}
}
},
{
$unwind: "$sectionResponse"
},
{
$unwind: "$sectionResponse.fieldResponse"
},
{
$project: {
value: "$sectionResponse.fieldResponse.value"
}
}
])
Match the documents containing at least one element with sectionResponse.fieldResponse.name:"Amount Requested"
project/map all sectionResponse.fieldResponse elements containing name="Amount Requested" ( non empty elements only )
unwind the sectionResponse array
unwind the fieldResponse array
project only the value field.
playground
For best results index on "sectionResponse.fieldResponse.name" need to be added.

How to define a default value when creating an index in Elasticsearch

I need to create an index in elasticsearch by assigning a default value for a field. Ex,
In python3,
request_body = {
"settings":{
"number_of_shards":1,
"number_of_replicas":1
},
"mappings":{
"properties":{
"name":{
"type":"keyword"
},
"school":{
"type":"keyword"
},
"pass":{
"type":"keyword"
}
}
}
}
from elasticsearch import Elasticsearch
es = Elasticsearch(['https://....'])
es.indices.create(index="test-index", ignore=400, body= request_body)
in above scenario, the index will be created with those fields. But i need to put a default value to "pass" as True. Can i do that here?
Elastic search is schema-less. It allows any number of fields and any content in fields without any logical constraints.
In a distributed system integrity checking can be expensive so checks like RDBMS are not available in elastic search.
Best way is to do validations at client side.
Another approach is to use ingest
Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.
**For testing**
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "2",
"_source": {
"name": "a",
"school":"aa"
}
}
]
}
PUT _ingest/pipeline/default-value_pipeline
{
"description": "Set default value",
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
}
**Indexing document**
POST my-index-000001/_doc?pipeline=default-value_pipeline
{
"name":"sss",
"school":"sss"
}
**Result**
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "hlQDGXoB5tcHqHDtaEQb",
"_score" : 1.0,
"_source" : {
"school" : "sss",
"pass" : "true",
"name" : "sss"
}
},

$concat field with index in $map mongodb? [duplicate]

This question already has answers here:
Add some kind of row number to a mongodb aggregate command / pipeline
(3 answers)
Closed 4 years ago.
I have following collection
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker"
}
I need to $project one more key index with $concat with 'INV-00' + index of the root element
My output should be something like that
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne",
"index": "INV-001"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent",
"index": "INV-002"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker",
"index": "INV-003"
}
and can I change createdAt format to this Thu Jan 18 2018 using $dateToString or something else???
Thanks in advance!!!
While I would certainly recommend you to do that on the client side as opposed to inside MongoDB, here is how you could get what you want - pretty brute-force but working:
db.collection.aggregate([
// you should add a $sort stage here to make sure you get the right indexes
{
$group: {
_id: null, // group all documents into the same bucket
docs: { $push: "$$ROOT" } // just to create an array of all documents
}
}, {
$project: {
docs: { // transform the "docs" field
$map: { // into something
input: { $range: [ 0, { $size: "$docs" } ] }, // an array from 0 to n - 1 where n is the number of documents
as: "this", // which shall be accessible using "$$this"
in: {
$mergeObjects: [ // we join two documents
{ $arrayElemAt: [ "$docs", "$$this" ] }, // one is the nth document in our "docs" array
{ "index": { $concat: [ 'INV-00', { $substr: [ { $add: [ "$$this", 1 ] }, 0, -1 ] } ] } } // and the second document is the one with our "index" field
]
}
}
}
}
}, {
$unwind: "$docs" // flatten the result structure
}, {
$replaceRoot: {
newRoot: "$docs" // restore the original document structure
}
}])

Converting a MongoDB aggregate into an ArangoDB COLLECT

I'm migrating data from Mongo to Arango and I need to reproduce a $group aggregation. I have successfully reproduced the results but I'm concerned that my approach maybe sub-optimal. Can the AQL be improved?
I have a collection of data that looks like this:
{
"_id" : ObjectId("5b17f9d85b2c1998598f054e"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b1808145b2c1998598f054f"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b18083c5b2c1998598f0550"),
"department" : "Development",
"region" : "Europe"
}
{
"_id" : ObjectId("5b1809a75b2c1998598f0551"),
"department" : "Sales"
}
Note the value can be a string, Array or not present
In Mongo I'm using the following code to aggregate the data:
db.test.aggregate([
{
$unwind:{
path:"$department",
preserveNullAndEmptyArrays: true
}
},
{
$unwind:{
path:"$region",
preserveNullAndEmptyArrays: true
}
},
{
$group:{
_id:{
department:{ $ifNull: [ "$department", "null" ] },
region:{ $ifNull: [ "$region", "null" ] },
},
count:{$sum:1}
}
}
])
In Arango I'm using the following AQL:
FOR i IN test
LET FIELD1=(FOR a IN APPEND([],NOT_NULL(i.department,"null")) RETURN a)
LET FIELD2=(FOR a IN APPEND([],NOT_NULL(i.region,"null")) RETURN a)
FOR f1 IN FIELD1
FOR f2 IN FIELD2
COLLECT id={department:f1,region:f2} WITH COUNT INTO counter
RETURN {_id:id,count:counter}
Edit:
The APPEND is used to convert string values into an Array
Both produce results that look like this;
{
"_id" : {
"department" : "Marketing",
"region" : "US"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Development",
"region" : "Europe"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Sales",
"region" : "null"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Marketing",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "US"
},
"count" : 2.0
}
Your approach seems alright. I would suggest to use TO_ARRAY() instead of APPEND() to make it easier to understand though.
Both functions skip null values, thus it is unavoidable to provide some placeholder, or test for null explicitly and return an array with a null value (or whatever works best for you):
FOR doc IN test
FOR field1 IN doc.department == null ? [ null ] : TO_ARRAY(doc.department)
FOR field2 IN doc.region == null ? [ null ] : TO_ARRAY(doc.region)
COLLECT department = field1, region = field2
WITH COUNT INTO count
RETURN { _id: { department, region }, count }
Collection test:
[
{
"_key": "5b17f9d85b2c1998598f054e",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b18083c5b2c1998598f0550",
"department": "Development",
"region": "Europe"
},
{
"_key": "5b1808145b2c1998598f054f",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b1809a75b2c1998598f0551",
"department": "Sales"
}
]
Result:
[
{
"_id": {
"department": "Development",
"region": "Europe"
},
"count": 1
},
{
"_id": {
"department": "Marketing",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Marketing",
"region": "US"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": null
},
"count": 1
},
{
"_id": {
"department": "Sales",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": "US"
},
"count": 2
}
]

paginating search results in mongoDB

i am trying to paginate my search results in mongoDB below
{
"data": [
{
"_id": "538037b869a1ca1c1ffc96e3",
"jobs": "america movie"
},
{
"_id": "538037a169a1ca1c1ffc96e0",
"jobs": "superman movie"
},
{
"_id": "538037a769a1ca1c1ffc96e1",
"jobs": "spider man movie"
},
{
"_id": "538037af69a1ca1c1ffc96e2",
"jobs": "iron man movie"
},
{
"_id": "538037c569a1ca1c1ffc96e4",
"jobs": "social network movie"
}
],
"Total_results": 5,
"author": "Solomon David"
}
which as been indexed and sorted by textScore so i implemented pagination like these below
app.get('/search/:q/limit/:lim/skip/:skip',function(req,res){
var l = parseInt(req.params.lim);
var s = parseInt(req.params.skip);
db.jobs.aggregate({$match:{$text:{$search:req.params.q}}},
{$sort:{score:{$meta:"textScore"}}},{$skip:s},{$limit:l},function(err,docs){res.send({data:docs,Total_results:docs.length,author:"Solomon David"});});
});
but when i tried like this localhost:3000/search/movie/limit/1/skip/0
i limit my result to 1 and skipped none so i have to get results like this below.
{
"data": [
{
"_id": "538037b869a1ca1c1ffc96e3",
"jobs": "america movie"
}
]}
but i am getting like this
{
"data": [
{
"_id": "538037a169a1ca1c1ffc96e0",
"jobs": "superman movie"
}
],
"Total_results": 1,
"author": "Solomon David"
}
please help me what am i doing wrong
There seem to be a few things to explain here so I'll try and step through them in turn. But the first thing to address is the document structure you are presenting. Arrays are not going to produce the results you want, so here is a basic collection structure, calling it "movies" for now:
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
}
{
"_id" : "538037a769a1ca1c1ffc96e1",
"jobs" : "spider man movie",
"author" : "Solomon David"
}
{
"_id" : "538037af69a1ca1c1ffc96e2",
"jobs" : "iron man movie",
"author" : "Solomon David"
}
{
"_id" : "538037c569a1ca1c1ffc96e4",
"jobs" : "social network movie",
"author" : "Solomon David"
}
So there are all of the items in separate documents, each with it's own details and "author" key as well. Let us now consider the basic text search statement, still using aggregation:
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$sort": { "score": { "$meta": "textScore" } } }
])
That will search the created "text" index for the term provided and return the results ranked by "textScore" from that query. The form used here is shorthand for these stages which you might use to actually see the "score" values:
{ "$project": {
"jobs": 1,
"author": 1,
"score": { "$meta": "textScore" }
}},
{ "$sort": { "score": 1 }}
But the results produced on the sample will be this:
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
}
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
{
"_id" : "538037c569a1ca1c1ffc96e4",
"jobs" : "social network movie",
"author" : "Solomon David"
}
{
"_id" : "538037af69a1ca1c1ffc96e2",
"jobs" : "iron man movie",
"author" : "Solomon David"
}
{
"_id" : "538037a769a1ca1c1ffc96e1",
"jobs" : "spider man movie",
"author" : "Solomon David"
}
Actually everything there has the same "textScore" but this is the order in which MongoDB will return them. Unless you are providing some other weighting or additional sort field then that order does not change.
That essentially covers the first part of what is meant to happen with text searches. A text search cannot modify the order or filter the contents of an array contained inside a document so this is why the documents are separated.
Paging these results is a simple process, even if $skip and $limit are not the most efficient ways to go about it, but generally you won't have much other option when using a "text search".
What you seem to be trying to achieve though is producing some "statistics" about your search within your result somehow. At any rate, storing documents with items within arrays is not the way to go about this. So the first thing to look at is a combined aggregation example:
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$sort": { "score": { "$meta": "textScore" } } },
{ "$group": {
"_id": null,
"data": {
"$push": {
"_id": "$_id",
"jobs": "$jobs",
"author": "$author"
}
},
"Total_Results": { "$sum": 1 },
"author": {
"$push": "$author"
}
}},
{ "$unwind": "$author" },
{ "$group": {
"_id": "$author",
"data": { "$first": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"authorCount": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": { "$first": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"Author_Info": {
"$push": {
"author": "$_id",
"count": "$authorCount"
}
}
}},
{ "$unwind": "$data" },
{ "$skip": 0 },
{ "$limit": 2 },
{ "$group": {
"_id": null,
"data": { "$push": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"Author_Info": { "$first": "$Author_Info" }
}}
])
What you see here in many stages is that you are getting some "statistics" about your total search results in "Total_Results" and "Author_Info" as well as using $skip and $limit to select a "page" of two entries to return:
{
"_id" : null,
"data" : [
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
},
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
],
"Total_Results" : 5,
"Author_Info" : [
{
"author" : "Solomon David",
"count" : 5
}
]
}
The problem here is that you can see this will become very unpractical when you have a large set of results. The key part here is that in order to get these "statistics", you need to use $group to $push all of the results into an array of a single document. That might be fine for a few hundred results or more, but for thousands there would be a significant performance drop, not to mention memory resource usage and the very real possibility of basically breaking the 16MB BSON limit for an individual document.
So doing everything in aggregation is not the most practical solution, and if you really need the "statistics" then your best option is to separate this into two queries. SO first the aggregate for "statistics":
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$group": {
"_id": "$author",
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"Total_Results": { "$sum": "$count" },
"Author_Info": {
"$push": {
"author": "$_id",
"count": "$count"
}
}
}}
])
That is basically the same thing except this time we are not storing "data" with the actual search results and not worrying about paging as this is a single record of results just providing the statistics. It very quickly gets down to a single record and more or less stays there, so this is a solution that scales.
It should also be apparent that you would not need to do this for every "page" and only need to run this with the initial query. The "statistics" can be easily cached so you can just retrieve that data with each "page" request.
All that is to do now is simply run the query per page of results desired without that "statistics", and this can be done simply using the .find() form:
db.movies.find(
{ "$text": { "$search": "movie" } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } }).skip(0).limit(2)
The short lesson here is that is you want "statistics" from your search, do that in a separate step to the actual paging of results. That is pretty common practice for general database paging in as simple as a "statistic" for "Total Results".
Beyond that, other options are to look at full text search solutions external to MongoDB. These are more feature laden than the "toe in the water" implementation that MongoDB offers out of the box and will also likely offer better performance solutions for "paging" large sets of results over that $skip and $limit can offer.

Resources