mongoose query using sort and skip on populate is too slow - node.js

I'm using an ajax request from the front end to load more comments to a post from the back-end which uses NodeJS and mongoose. I won't bore you with the front-end code and the route code, but here's the query code:
Post.findById(req.params.postId).populate({
path: type, //type will either contain "comments" or "answers"
populate: {
path: 'author',
model: 'User'
},
options: {
sort: sortBy, //sortyBy contains either "-date" or "-votes"
skip: parseInt(req.params.numberLoaded), //how many are already shown
limit: 25 //i only load this many new comments at a time.
}
}).exec(function(err, foundPost){
console.log("query executed"); //code takes too long to get to this line
if (err){
res.send("database error, please try again later");
} else {
res.send(foundPost[type]);
}
});
As was mentioned in the title, everything works fine, my problem is just that this is too slow, the request is taking about 1.5-2.5 seconds. surely mongoose has a method of doing this that takes less to load. I poked around the mongoose docs and stackoverflow, but didn't really find anything useful.

Using skip-and-limit approach with mongodb is slow in its nature because it normally needs to retrieve all documents, then sort them, and after that return the desired segment of the results.
What you need to do to make it faster is to define indexes on your collections.
According to MongoDB's official documents:
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
-- https://docs.mongodb.com/manual/indexes/
Using indexes may cause increased collection size but they improve the efficiency a lot.
Indexes are commonly defined on fields which are frequently used in queries. In this case, you may want to define indexes on date and/or vote fields.
Read mongoose documentation to find out how to define indexes in your schemas:
http://mongoosejs.com/docs/guide.html#indexes

Related

Mongoose, How to limit the query based on the sum of a field in the document

I have a document in the shape of
const Model = mongoose.Schema({
something1: {type:String},
someNumber1:{type:Number},
something2: {type:String},
someNumber2:{type:Number},
aFloatNumber: {type:Number}
)}
and after indexing the document like
Model.index({something1:1 , something2:1 , aFloatNumber:1})
for better performance which I hope I am doing right and please correct me if I am doing it wrong.
I am trying to query usign syntax:
const model = await Model.find({
$and:[{something1:anInput}, {something2:anotherInput}]})
.sort(aFloatNumber)
now I want to limit the returned query as it could be a very large list to improve performance, however, this limit changes based on an input. Basically I want the mongoose to keep adding someNumber1 together and stop returning after it gets larger than the input number. Something like the code bellow:
const model = await Model.find({
$and:[{something1:anInput}, {something2:anotherInput}]})
.sort(aFloatNumber)
.limit( sum(someNumber1) >= theInputNumber )
So basically my questions are:
Am I indexing the document correctly based on my query?
Does it make any difference on the performance to limit the query since it is sorting the data and I think it is going to check all the document to be able to sort it?
If it makes a huge difference on the performance, what is the correct syntax for it as I am going to make this query a lot in my application?
You're asking for skip function of mongodb which is like offset in sql
https://docs.mongodb.com/manual/reference/operator/aggregation/skip/

Speeding up my cloudant query

I was wondering whether someone could provide some advice on my cloudant query below. It is now taking upwards of 20 seconds to execute against a DB of 50,000 documents - I suspect I could be getting better speed than this.
The purpose of the query is to find all of my documents with the attribute "searchCode" equalling a specific value plus a further list of specific IDs.
Both searchCode and _id are indexed - any ideas why my query would be taking so long / what I could do to speed it up?
mydb.find({selector: {"$or":[{"searchCode": searchCode},{"_id":{"$in":idList}}]}}, function (err, result) {
if(!err){
fulfill(result.docs);
}
else{
console.error(err);
}
});
Thanks,
James
You could try doing separate calls for the queries
find me documents where the searchCode = 'some value'
find me documents whose ids match a list of ids
The first can be achieved with a find call and a query like so:
{ selector: {"searchCode": searchCode} }
The second can be achieved by hitting the databases's _all_docs endpoint, passing in the list of ids as a keys parameter e.g.
GET /db/_all_docs?keys=["a","b","c"]
You might find that running both requests in parallel and merging the results gives you better performance.

Meteor last executed query in mongodb?

Meteor Mongo and Mongodb query is doest same. I am using external Mongodb. so I need to debug my query. Is their any way to find last executed query in Mongo?
Don't know if this works in meteor mongo -but you seem to be using an external mongo - presumably you set up profiling with a capped collection, so that the collection never grows over a certain size. If you only need the last op, then you make the size pretty much smaller than this.
db.createCollection( "system.profile", { capped: true, size:4000000 } )
The mongo doc is here: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
From the mongo docs:
To return the most recent 10 log entries in the system.profile
collection, run a query similar to the following:
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()
Since it's sorted inversely by time, just take the first record from the result.
Otherwise you could roll your own with a temporary client-only mongo collection:
Queries = new Mongo.Collection(null);
Create an object containing your query, cancel the last record and insert the new one.

MongoDB 2.6 Production Ready Text Search - How To Use Skip For Pagination

In MongoDB 2.6, the text-search is supposedly production ready and we can now use skip. I'd like to use text-search and skip for pagination in my, but I'm not yet sure how to implement it.
Right now, I'm using Mongoose and the Mongoose-text-search plugin, but I don't believe either of them support skip in MongoDB's text search, so I guess I'll need to use the native MongoClient...
My app connects to the database via Mongoose using:
//Bootstrap db connection
var db = mongoose.connect(config.db, function(e) {
Now, how can I use the native MongoClient to execute a full text search for my Products model, with a skip parameter. Here is what I had using Mongoose and Mongoose-text-search, but there is no way to add in skip:
Product = mongoose.model('Product')
var query = req.query.query;
var skip = req.query.skip;
var options = {
project: '-created', // do not include the `created` property
filter: filter, // casts queries based on schema
limit: 20,
language: 'english',
lean: true
};
Product.textSearch(query, options, function (err, response) {
});
The main difference introduced in 2.6 versions of MongoDB is that you can issue a "text search" query using the standard .find() interface so the old methods for textSearch would no longer need to be applied. This is basically how modifiers such as limit and skip can be applied.
But keep in mind that as of writing the current Mongoose dependency is for an earlier version of the MongoDB node driver that existed prior to the release of MongoDB 2.6. As Mongoose actually wraps the main methods and does some syntax checking of it's own, it is probably likely ( as in untried by me ) that using the Mongoose methods will currently fail.
So what you will need to do is get the underlying driver method for .find(), and also now use the $text operator instead:
Product.collection.find(
{ "$text": { "$search": "term" } },
{ "sort": { "score": { "$meta": "textScore" } }, "skip": 25, "limit": 25 },
function(err,docs) {
// processing here
});
Also noting that $text operator does not sort the results by "score" for relevance by default, but this is passed to the "sort" option using the new $meta operator, which is also introduced in MongoDB 2.6.
So alter your skip and limit values and you have paging on text search results and with a cursor. Just be wary of large data returns as skip and limit are not really efficient ways to move through a large cursor. Better to have another key where you can range match, even though counter-intuitive to "relevance matching".
So, text search facilities are a bit "better" but not "perfect". As always, if you really need more and/or more performance, look to an external solution.
Feel free to try a similar operation with the Mongoose implementation of .find() as well. But have my reservations from past experience that there is generally some masking and checking going on there, so hence the description of usage with the "native" node driver.

Sails 0.10 association fails to populate

I'm working on a custom adapter in sails#0.10.0-rc4 which will support associations but I am having trouble getting them working in conjunction with my adapter. My configuration is a one-to-many association between article and stats. My models and adapter are setup like this:
// api/models/article.js
module.exports = {
connection: ['myadapter'],
tableName: 'Knowledge_Base__kav',
attributes: {
KnowledgeArticleId: { type: 'string', primaryKey: true }
stats: {
collection: 'stats',
via: 'parentId'
}
}
// api/models/stats.js
module.exports = {
connection: ['myadapter'],
tableName: 'KnowledgeArticleViewStat',
attributes: {
count: 'integer',
ParentId: {
model: 'article'
}
}
}
// adapter.js
find: function(connectionName, collectionName, options, cb) {
console.dir(options)
// output
// {where: null}
db.query(options, function(err, res)) {
cb(err, res)
}
}
However, when I try to populate using Article.find().populate('stats').exec(console.log()), my adapter gets {where: null} as options when I would expect it to receive {where: {parentId: [<some-article-id>]}}. It will return a list of articles to me but the field which is supposed to be populated from another model (stats) is just an empty list.
I feel like this is related to the fact that my adapter is not getting the proper where param to search for the related model on the primary key. To test this further, I setup a test one-to-many relationship using the the sails-mongo adapter. In this case the adapter did receive params I expected and the association worked fine.
Does anyone have any idea on why .populate('stats') wouldn't be sending the proper "where" params to my adapter?
Update 3/7
So it seems like what happens in associations is that SomeModel.find() will hit the adapter once and then .populate('othermodel') hits the adapter again using the primary key of the first request. Then the results of both are joined together. In my case, the second hit against the adapter isn't happening for some unknown reason.
Update
The original issue was related to an attribute naming error that's mentioned in the comments below. However, there still appears to be some issue with the final population step mentioned by particlebanana:
Final step will: Take all of the query results from all the returned query operations
and combine them in-memory to build up a result set you can return in
the exec callback.
I'm seeing that all required queries are now firing but they are failing to actually populate the alias. Here's the call with some added debugging output in the form of a gist for easier consumption: https://gist.github.com/jasonsims/9423170
It looks like you are on the right track! The way the operation sets get built up, the .find() on the Article should run with the first log (empty where) and the second query should get run with the parentId criteria in the log. The second query isn't running because it can't build up that parentId array of primary keys when you don't return anything from the first query.
Short answer: you need to return something in the find callback to see the second log, which should match your expected criteria.
The query lifecycle looks something like this:
Check if all query pieces are on the same connection, if not break out which queries will run on which connections
For all queries on a single connection, check if the adapter supports native joins (has a .join() method, if so you can pass the criteria down and let the adapter handle the joins.
If no native join method is defined run the "parent" operation (in this case the Article.find())
Use the results of the parent operation to build up criteria for any populations that need to run. (The parentId array in your criteria) and run the child results.
Take all of the query results from all the returned query operations and combine them in-memory to build up a result set you can return in the exec callback.
I hope that helps some. Shoot me the url of your repo and I will look through it, if it's able to be open sourced, and can help some more if you come across any issues.
Just to summarize, there were multiple issues going on here which were causing associations not to populate:
Custom primary keys
There was a problem with waterline when joining data from models using custom primary keys. #particlebanana fixed this in 8eff54b and it should be included in the next rc of waterline (waterline#0.10.0-rc5).
Malformed SOQL query
When waterline queries the adapter for a second time in order to acquire the child rows, it does so using { foreignKey: [ value ] }. Since the value was a list, jsforce was incorrectly generating the SOQL query since it expected all list values to be accompanied by either $in or $nin operators. I addressed this issue in github/jsforce#9 and it's now included in jsforce#1.1.2.
Model attributes are case sensitive
The models in my project were defined in snakeCase but the json response from Salesforce was using EveryWordCapitalized. This causes 1-to-many joins in waterline to reduce the many child records to one when it runs _.uniq(childRows, pk). Since the model has defined pk == id but the actual value returned from Salesforce is pk == Id, this call to uniq blows away all child records but one. I'm not entirely sure if this should be a waterline bug or not but fixing the capitalization in the model attribute definitions resolved this.

Resources