paginating search results in mongoDB

paginating search results in mongoDB - node.js

i am trying to paginate my search results in mongoDB below
{
"data": [
{
"_id": "538037b869a1ca1c1ffc96e3",
"jobs": "america movie"
},
{
"_id": "538037a169a1ca1c1ffc96e0",
"jobs": "superman movie"
},
{
"_id": "538037a769a1ca1c1ffc96e1",
"jobs": "spider man movie"
},
{
"_id": "538037af69a1ca1c1ffc96e2",
"jobs": "iron man movie"
},
{
"_id": "538037c569a1ca1c1ffc96e4",
"jobs": "social network movie"
}
],
"Total_results": 5,
"author": "Solomon David"
}
which as been indexed and sorted by textScore so i implemented pagination like these below
app.get('/search/:q/limit/:lim/skip/:skip',function(req,res){
var l = parseInt(req.params.lim);
var s = parseInt(req.params.skip);
db.jobs.aggregate({$match:{$text:{$search:req.params.q}}},
{$sort:{score:{$meta:"textScore"}}},{$skip:s},{$limit:l},function(err,docs){res.send({data:docs,Total_results:docs.length,author:"Solomon David"});});
});
but when i tried like this localhost:3000/search/movie/limit/1/skip/0
i limit my result to 1 and skipped none so i have to get results like this below.
{
"data": [
{
"_id": "538037b869a1ca1c1ffc96e3",
"jobs": "america movie"
}
]}
but i am getting like this
{
"data": [
{
"_id": "538037a169a1ca1c1ffc96e0",
"jobs": "superman movie"
}
],
"Total_results": 1,
"author": "Solomon David"
}
please help me what am i doing wrong

There seem to be a few things to explain here so I'll try and step through them in turn. But the first thing to address is the document structure you are presenting. Arrays are not going to produce the results you want, so here is a basic collection structure, calling it "movies" for now:
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
}
{
"_id" : "538037a769a1ca1c1ffc96e1",
"jobs" : "spider man movie",
"author" : "Solomon David"
}
{
"_id" : "538037af69a1ca1c1ffc96e2",
"jobs" : "iron man movie",
"author" : "Solomon David"
}
{
"_id" : "538037c569a1ca1c1ffc96e4",
"jobs" : "social network movie",
"author" : "Solomon David"
}
So there are all of the items in separate documents, each with it's own details and "author" key as well. Let us now consider the basic text search statement, still using aggregation:
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$sort": { "score": { "$meta": "textScore" } } }
])
That will search the created "text" index for the term provided and return the results ranked by "textScore" from that query. The form used here is shorthand for these stages which you might use to actually see the "score" values:
{ "$project": {
"jobs": 1,
"author": 1,
"score": { "$meta": "textScore" }
}},
{ "$sort": { "score": 1 }}
But the results produced on the sample will be this:
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
}
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
{
"_id" : "538037c569a1ca1c1ffc96e4",
"jobs" : "social network movie",
"author" : "Solomon David"
}
{
"_id" : "538037af69a1ca1c1ffc96e2",
"jobs" : "iron man movie",
"author" : "Solomon David"
}
{
"_id" : "538037a769a1ca1c1ffc96e1",
"jobs" : "spider man movie",
"author" : "Solomon David"
}
Actually everything there has the same "textScore" but this is the order in which MongoDB will return them. Unless you are providing some other weighting or additional sort field then that order does not change.
That essentially covers the first part of what is meant to happen with text searches. A text search cannot modify the order or filter the contents of an array contained inside a document so this is why the documents are separated.
Paging these results is a simple process, even if $skip and $limit are not the most efficient ways to go about it, but generally you won't have much other option when using a "text search".
What you seem to be trying to achieve though is producing some "statistics" about your search within your result somehow. At any rate, storing documents with items within arrays is not the way to go about this. So the first thing to look at is a combined aggregation example:
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$sort": { "score": { "$meta": "textScore" } } },
{ "$group": {
"_id": null,
"data": {
"$push": {
"_id": "$_id",
"jobs": "$jobs",
"author": "$author"
}
},
"Total_Results": { "$sum": 1 },
"author": {
"$push": "$author"
}
}},
{ "$unwind": "$author" },
{ "$group": {
"_id": "$author",
"data": { "$first": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"authorCount": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"data": { "$first": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"Author_Info": {
"$push": {
"author": "$_id",
"count": "$authorCount"
}
}
}},
{ "$unwind": "$data" },
{ "$skip": 0 },
{ "$limit": 2 },
{ "$group": {
"_id": null,
"data": { "$push": "$data" },
"Total_Results": { "$first": "$Total_Results" },
"Author_Info": { "$first": "$Author_Info" }
}}
])
What you see here in many stages is that you are getting some "statistics" about your total search results in "Total_Results" and "Author_Info" as well as using $skip and $limit to select a "page" of two entries to return:
{
"_id" : null,
"data" : [
{
"_id" : "538037a169a1ca1c1ffc96e0",
"jobs" : "superman movie",
"author" : "Solomon David"
},
{
"_id" : "538037b869a1ca1c1ffc96e3",
"jobs" : "america movie",
"author" : "Solomon David"
}
],
"Total_Results" : 5,
"Author_Info" : [
{
"author" : "Solomon David",
"count" : 5
}
]
}
The problem here is that you can see this will become very unpractical when you have a large set of results. The key part here is that in order to get these "statistics", you need to use $group to $push all of the results into an array of a single document. That might be fine for a few hundred results or more, but for thousands there would be a significant performance drop, not to mention memory resource usage and the very real possibility of basically breaking the 16MB BSON limit for an individual document.
So doing everything in aggregation is not the most practical solution, and if you really need the "statistics" then your best option is to separate this into two queries. SO first the aggregate for "statistics":
db.movies.aggregate([
{ "$match": {
"$text": {
"$search": "movie"
}
}},
{ "$group": {
"_id": "$author",
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"Total_Results": { "$sum": "$count" },
"Author_Info": {
"$push": {
"author": "$_id",
"count": "$count"
}
}
}}
])
That is basically the same thing except this time we are not storing "data" with the actual search results and not worrying about paging as this is a single record of results just providing the statistics. It very quickly gets down to a single record and more or less stays there, so this is a solution that scales.
It should also be apparent that you would not need to do this for every "page" and only need to run this with the initial query. The "statistics" can be easily cached so you can just retrieve that data with each "page" request.
All that is to do now is simply run the query per page of results desired without that "statistics", and this can be done simply using the .find() form:
db.movies.find(
{ "$text": { "$search": "movie" } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } }).skip(0).limit(2)
The short lesson here is that is you want "statistics" from your search, do that in a separate step to the actual paging of results. That is pretty common practice for general database paging in as simple as a "statistic" for "Total Results".
Beyond that, other options are to look at full text search solutions external to MongoDB. These are more feature laden than the "toe in the water" implementation that MongoDB offers out of the box and will also likely offer better performance solutions for "paging" large sets of results over that $skip and $limit can offer.

Related

Is there any way to get date from ObjectId from mongoose using aggregate?

I have Users Collection. devices are all in array of Objects.
[{
"_id" : ObjectId("5c66a979e109fe0f537c7e37"),
"devices": [{
"dev_token" : "XXXX",
"_id" : ObjectId("5ccc0fa5f7778412173d22bf")
}]
},{
"_id" : ObjectId("5c66b6382b18fc4ff0276dcc"),
"devices": [{
"dev_token" : "XXXX",
"_id" : ObjectId("5c93316cc33c622bdcfaa4be")
}]
}]
I need to query the documents with adding the new field date in devices like
"devices": [{
"dev_token" : "XXXX",
"_id" : ObjectId("5c93316cc33c622bdcfaa4be"),
"date": ISODate("2012-10-15T21:26:17Z")
}]
date key from devices._id.getTimestamp()
I tried using aggregate this one, donno how to use getTimestamp()
db.getCollection('users').aggregate([ {
"$unwind": "$devices"
}, {
"$group": {
"_id": "$_id",
"devices": {
"$push": "$devices._id.getTimestamp()"
}
}
}])
I use $devices._id.getTimestamp(), this could be error.. Here how I handle this one.. Thanks for advance

You can use $toDate to get Timestamp from the _id field.
Add date field to each devices element after unwind stage, using $addFields
Try this :
db.getCollection('users').aggregate([ {
"$unwind": "$devices"
},{
$addFields : {
"devices.date": { $toDate: "$_id" }
}
}, {
"$group": {
"_id": "$_id",
"devices": {
"$push": "$devices"
}
}
}])
You can check the result at Mongo Playground (just press "run")

Using MongoDb 3.6
The $dateFromParts operator comes in handy here where you can use it in conjunction with the other date operators. You won't need
to $unwind the array as you can use $map to map over the devices array documents and add the extra date field with the above expression.
This can be followed with an example pipeline below :
db.getCollection('users').aggregate([
{ "$addFields": {
"devices": {
"$map": {
"input": "$devices",
"in": {
"dev_token": "$$this.dev_token",
"_id": "$$this._id",
"date": {
"$dateFromParts": {
'year': { "$year": "$$this._id"},
'month': { "$month": "$$this._id"},
'day':{ "$dayOfMonth": "$$this._id"},
'hour': { "$hour": "$$this._id"},
'minute': { "$minute": "$$this._id"},
'second': { "$second": "$$this._id"},
'millisecond': { "$millisecond": "$$this._id"}
}
}
}
}
}
} }
])
Output
/* 1 */
{
"_id" : ObjectId("5c66a979e109fe0f537c7e37"),
"devices" : [
{
"dev_token" : "XXXX",
"_id" : ObjectId("5ccc0fa5f7778412173d22bf"),
"date" : ISODate("2019-05-03T09:53:41.000Z")
}
]
}
/* 2 */
{
"_id" : ObjectId("5c66b6382b18fc4ff0276dcc"),
"devices" : [
{
"dev_token" : "XXXX",
"_id" : ObjectId("5c93316cc33c622bdcfaa4be"),
"date" : ISODate("2019-03-21T06:38:36.000Z")
}
]
}
Using MongoDb 4.0 and newer:
The pipeline can be tweaked slightly to use the new $toDate or $convert operators. Their respective uses follow:
$toDate
db.getCollection('users').aggregate([
{ "$addFields": {
"devices": {
"$map": {
"input": "$devices",
"in": {
"dev_token": "$$this.dev_token",
"_id": "$$this._id",
"date": { "$toDate": "$$this._id" }
}
}
}
} }
])
$convert
db.getCollection('users').aggregate([
{ "$addFields": {
"devices": {
"$map": {
"input": "$devices",
"in": {
"dev_token": "$$this.dev_token",
"_id": "$$this._id",
"date": {
"$convert": { "input": "$$this._id", "to": "date" }
}
}
}
}
} }
])

$concat field with index in $map mongodb? [duplicate]

This question already has answers here:
Add some kind of row number to a mongodb aggregate command / pipeline
(3 answers)
Closed 4 years ago.
I have following collection
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker"
}
I need to $project one more key index with $concat with 'INV-00' + index of the root element
My output should be something like that
{
"_id" : ObjectId("5b16405a8832711234bcfae7"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Bruce",
"lastName": "Wayne",
"index": "INV-001"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae8"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Clerk",
"lastName": "Kent",
"index": "INV-002"
},
{
"_id" : ObjectId("5b16405a8832711234bcfae9"),
"createdAt" : ISODate("2018-06-05T07:48:45.248Z"),
"firstName": "Peter",
"lastName": "Parker",
"index": "INV-003"
}
and can I change createdAt format to this Thu Jan 18 2018 using $dateToString or something else???
Thanks in advance!!!

While I would certainly recommend you to do that on the client side as opposed to inside MongoDB, here is how you could get what you want - pretty brute-force but working:
db.collection.aggregate([
// you should add a $sort stage here to make sure you get the right indexes
{
$group: {
_id: null, // group all documents into the same bucket
docs: { $push: "$$ROOT" } // just to create an array of all documents
}
}, {
$project: {
docs: { // transform the "docs" field
$map: { // into something
input: { $range: [ 0, { $size: "$docs" } ] }, // an array from 0 to n - 1 where n is the number of documents
as: "this", // which shall be accessible using "$$this"
in: {
$mergeObjects: [ // we join two documents
{ $arrayElemAt: [ "$docs", "$$this" ] }, // one is the nth document in our "docs" array
{ "index": { $concat: [ 'INV-00', { $substr: [ { $add: [ "$$this", 1 ] }, 0, -1 ] } ] } } // and the second document is the one with our "index" field
]
}
}
}
}
}, {
$unwind: "$docs" // flatten the result structure
}, {
$replaceRoot: {
newRoot: "$docs" // restore the original document structure
}
}])

Dynamic keys after $group by

I have following collection
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress...",
}
I need to group by status and get all the keys dynamically which are in status
[
{
"completed": [
{
"_id": "5b18d31a27a37696ec8b5773",
"status": "completed",
"description": "completed..."
}
]
},
{
"pending": [
{
"_id": "5b18d14cbc83fd271b6a157c",
"status": "pending",
"description": "You have to complete the challenge..."
},
{
"_id": "5b18d31a27a37696ec8b5775",
"status": "pending",
"description": "pending..."
}
]
},
{
"inProgress": [
{
"_id": "5b18d31a27a37696ec8b5776",
"status": "inProgress",
"description": "inProgress..."
}
]
}
]

Not that I think it's a good idea and mostly because I don't see any "aggregation" here at all is that after "grouping" to add to an array you similarly $push all that content into array by the "status" grouping key and then convert into keys of a document in a $replaceRoot with $arrayToObject:
db.collection.aggregate([
{ "$group": {
"_id": "$status",
"data": { "$push": "$$ROOT" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": "$_id",
"v": "$data"
}
}
}},
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$data" }
}}
])
Returns:
{
"inProgress" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress..."
}
],
"completed" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed..."
}
],
"pending" : [
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge..."
},
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending..."
}
]
}
That might be okay IF you actually "aggregated" beforehand, but on any practically sized collection all that is doing is trying force the whole collection into a single document, and that's likely to break the BSON Limit of 16MB, so I just would not recommend even attempting this without "grouping" something else before this step.
Frankly, the same following code does the same thing, and without aggregation tricks and no BSON limit problem:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d => {
if (!obj.hasOwnProperty(d.status))
obj[d.status] = [];
obj[d.status].push(d);
})
printjson(obj);
Or a bit shorter:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d =>
obj[d.status] = [
...(obj.hasOwnProperty(d.status)) ? obj[d.status] : [],
d
]
)
printjson(obj);
Aggregations are used for "data reduction" and anything that is simply "reshaping results" without actually reducing the data returned from the server is usually better handled in client code anyway. You're still returning all data no matter what you do, and the client processing of the cursor has considerably less overhead. And NO restrictions.

Mongoose format datetime field in find query retrieving result [duplicate]

Given collection(#name: users) Structure:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"ts" : ISODate("2016-06-17T18:30:00.996Z")
},
{
"count" : 31,
"ts" : ISODate("2016-06-18T18:30:00.288Z")
}
]
}
I want to query this collection based on username field, and ts to be returned in 'yyyy-mm-dd' format.
Expected Output:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"date" : "2016-06-17"
},
{
"count" : 31,
"date" : "2016-06-18"
}
]
}
I have tried something like this:
db.users.aggregate([
{$match:{"username":"xyz"}},
{$project:{ "followers":{"count":1,
"date":"$followers.ts.toISOString().slice(0,10).replace(/-/g,'-')"
}}
}
])
But it doesn't seems to be working. Can anyone please help?
Thanks much.

Consider running an aggregation pipeline that will allow you to flatten the data list first, project the new field using the $dateToString operator, then regroup the flattened docs to get your desired result.
The above can be shown in three distinct pipelines:
db.users.aggregate([
{ "$match": { "username": "xyz" } },
{ "$unwind": "$followers" },
{
"$project": {
"username": 1,
"count": "$followers.count",
"date": { "$dateToString": { "format": "%Y-%m-%d", "date": "$followers.ts" } }
}
},
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"followers": { "$push": {
"count": "$count",
"date": "$date"
}}
}
}
])
With MongoDB 3.4 and newer, you can use the new $addFields pipeline step together with $map to create the array field without the need to unwind and group:
db.users.aggregate([
{ "$match": { "username": "xyz" } },
{
"$addFields": {
"followers": {
"$map": {
"input": "$followers",
"as": "follower",
"in": {
"count": "$$follower.count",
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$$follower.ts"
}
}
}
}
}
}
}
])

The best and easiest way to do this is to transform each element in the array with the $map operator. Of course in the "in" expression, you need to use the $dateToString to convert you "date" to string using a format specifiers.
db.coll.aggregate(
[
{ "$match": { "username": "XYZ" } },
{ "$project": {
"username": 1,
"followers": {
"$map": {
"input": "$followers",
"as": "f",
"in": {
"count": "$$f.count",
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$$f.ts"
}
}
}
}
}
}}
]
)
which produces:
{
"_id" : ObjectId("57653dcc533304a40ac504fc"),
"username" : "XYZ",
"followers" : [
{
"count" : 31,
"date" : "2016-06-17"
},
{
"count" : 31,
"date" : "2016-06-18"
}
]
}

Aggregate mongodb by latest timestamp

I'd like to get the "population" of each city's last timestamp using the aggregate function.
In a MongoDB like this:
{
"_id": {"$oid": "55354bc97b5dfd021f2be661"},
"timestamp": {"$date": "2015-04-20T18:56:09.000Z"},
"city": "Roma",
"population": [
{"age": 90,"count": 1000},
{"age": 25,"count": 25}
]
},
{
"_id": {"$oid": "55354c357b5dfd021f2be663"},
"timestamp": {"$date": "2015-04-20T18:57:57.000Z"},
"city": "Madrid",
"population": [
{"age": 90,"count": 10},
{"age": 75,"count": 2343},
{"age": 50,"count": 500},
{"age": 70,"count": 5000}
]
},
{
"_id": {"$oid": "55362da541c37aef07d4ea9a"},
"timestamp": {"$date": "2015-04-21T10:59:49.000Z"},
"city": "Roma",
"population": [
{"age": 90,"count": 5}
]
}
I'd like to retrieve all the cities, but for each one only the latest timestamp:
{
"city": "Roma",
"population": [
{"age": 90,"count": 5}
]
},
{
"city": "Madrid",
"population": [
{"age": 90,"count": 10},
{"age": 75,"count": 2343},
{"age": 50,"count": 500},
{"age": 70,"count": 5000}
]
}
I have tried something like this answer, but I don't know how to "unwind" the populations after getting the latest timestamp for each city:
db.collection('population').aggregate([
{ $unwind: '$population' },
{ $group: { _id: '$city', timestamp: { $max: '$timestamp' } } },
{ $sort: { _id : -1 } }
], function(err, results) {
res.send(results)
});

The following aggregation pipeline will give you the desired result. The first step in the pipeline orders the documents by the timestamp field (descending) and then groups the ordered documents by the city field in the next $group stage. Within the $group operator, you can extract the population array field by way of the $$ROOT operator. The $first operator returns the value that results from applying the $$ROOT expression to the first document in a group of documents that share the same city key. The final pipeline stage involves projecting the fields from the previous pipeline into the desired fields:
db.population.aggregate([
{
"$sort": { "timestamp": -1 }
},
{
"$group": {
"_id": "$city",
"doc": { "$first": "$$ROOT" }
}
},
{
"$project": {
"_id": 0,
"city": "$_id",
"population": "$doc.population"
}
}
]);
Output:
/* 0 */
{
"result" : [
{
"city" : "Madrid",
"population" : [
{
"age" : 90,
"count" : 10
},
{
"age" : 75,
"count" : 2343
},
{
"age" : 50,
"count" : 500
},
{
"age" : 70,
"count" : 5000
}
]
},
{
"city" : "Roma",
"population" : [
{
"age" : 90,
"count" : 5
}
]
}
],
"ok" : 1
}

I think that you want to use $project instead of $unwind:
db.collection('population').aggregate([{
$group: {
_id: '$city',
timestamp: {$max: '$timestamp'}
}
}, {
$project: {
population: '$doc.population'
}
}, {
$sort: {
_id : -1
}
}], function(err, results) {
res.send(results)
});

I use this to sort any timestamp field using aggregation, I am sorting it by the latest update time of the document. If you need you can group it later. You can learn more about [aggregate sorting here.][1]
aggregate.push({ $sort: { updated_at: -1 } });
What I do is I make blocks of aggregate actions push them into an array and execute it all together. I find it easier to debug if something is not working properly.
[1]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

paginating search results in mongoDB - node.js

Related

Is there any way to get date from ObjectId from mongoose using aggregate?

$concat field with index in $map mongodb? [duplicate]

Dynamic keys after $group by

Mongoose format datetime field in find query retrieving result [duplicate]

Aggregate mongodb by latest timestamp

Categories

Resources