find() operation taking longer time when MongoDB collection contains docs around 200K - node.js

My MongoDB database contains a collection with 200 k documents. I'm trying to fetch all documents in NodeJS as follows
var cursor = collection.find({}, {
"_id" : false,
}).toArray(function(err, docs) {
if (err)
throw err;
callback(null, docs);
});
The above operation is taking longer time and I could not able to get results. Is there any way to optimize find operation to get the result ?
NodeJS driver version :2.0
MongoDB version :3.2.2
I can easily load data from json raw file but I could not able to do it from MongoDB

People can't do a lot with 200k items in the UI. Google shows only 10 results per page, for good reason. Sounds like pagination can help you. Here's an example: Range query for MongoDB pagination

Related

mongoose query using sort and skip on populate is too slow

I'm using an ajax request from the front end to load more comments to a post from the back-end which uses NodeJS and mongoose. I won't bore you with the front-end code and the route code, but here's the query code:
Post.findById(req.params.postId).populate({
path: type, //type will either contain "comments" or "answers"
populate: {
path: 'author',
model: 'User'
},
options: {
sort: sortBy, //sortyBy contains either "-date" or "-votes"
skip: parseInt(req.params.numberLoaded), //how many are already shown
limit: 25 //i only load this many new comments at a time.
}
}).exec(function(err, foundPost){
console.log("query executed"); //code takes too long to get to this line
if (err){
res.send("database error, please try again later");
} else {
res.send(foundPost[type]);
}
});
As was mentioned in the title, everything works fine, my problem is just that this is too slow, the request is taking about 1.5-2.5 seconds. surely mongoose has a method of doing this that takes less to load. I poked around the mongoose docs and stackoverflow, but didn't really find anything useful.
Using skip-and-limit approach with mongodb is slow in its nature because it normally needs to retrieve all documents, then sort them, and after that return the desired segment of the results.
What you need to do to make it faster is to define indexes on your collections.
According to MongoDB's official documents:
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
-- https://docs.mongodb.com/manual/indexes/
Using indexes may cause increased collection size but they improve the efficiency a lot.
Indexes are commonly defined on fields which are frequently used in queries. In this case, you may want to define indexes on date and/or vote fields.
Read mongoose documentation to find out how to define indexes in your schemas:
http://mongoosejs.com/docs/guide.html#indexes

Speeding up my cloudant query

I was wondering whether someone could provide some advice on my cloudant query below. It is now taking upwards of 20 seconds to execute against a DB of 50,000 documents - I suspect I could be getting better speed than this.
The purpose of the query is to find all of my documents with the attribute "searchCode" equalling a specific value plus a further list of specific IDs.
Both searchCode and _id are indexed - any ideas why my query would be taking so long / what I could do to speed it up?
mydb.find({selector: {"$or":[{"searchCode": searchCode},{"_id":{"$in":idList}}]}}, function (err, result) {
if(!err){
fulfill(result.docs);
}
else{
console.error(err);
}
});
Thanks,
James
You could try doing separate calls for the queries
find me documents where the searchCode = 'some value'
find me documents whose ids match a list of ids
The first can be achieved with a find call and a query like so:
{ selector: {"searchCode": searchCode} }
The second can be achieved by hitting the databases's _all_docs endpoint, passing in the list of ids as a keys parameter e.g.
GET /db/_all_docs?keys=["a","b","c"]
You might find that running both requests in parallel and merging the results gives you better performance.

How to fetch/count millions of records in mongodb with nodejs

We have a collection with millions of records in mongoDB. its taking lots of time and time out to count and create pagination with these records. whats the best way to do it using nodejs. I want to create a page where I see records with pagination, count, delete, search of records. Below is the code which doing query to Mongo with different conditions.
crowdResult.find({ "auditId":args.audit_id,"isDeleted":false})
.skip(args.skip)
.limit(args.limit)
.exec(function (err, data) {
if (err)
return callback(err,null);
console.log(data);
return callback(null,data);
})
If the goal is to get through a large dataset without timing out then I use the following approach to get pages one after another and process the paged resultset as soon as it becomes available:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f
Please focus on the following lines to get a quick idea of what the code is doing before diving deeper:
Let getPage() handle the work, you can set the pageSize and query to your liking:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L68
Method signature:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L29
Process pagedResults as soon as they become available:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L49
Move on to the next page:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L53
The code will stop when there is no more data left:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L41
Or it will stop when working on the last page of data:
https://gist.github.com/pulkitsinghal/2f3806670439fa137210fc26b134237f#file-sample-js-L46
I hope this offers some inspiration, even if its not an exact solution for your needs.

Meteor last executed query in mongodb?

Meteor Mongo and Mongodb query is doest same. I am using external Mongodb. so I need to debug my query. Is their any way to find last executed query in Mongo?
Don't know if this works in meteor mongo -but you seem to be using an external mongo - presumably you set up profiling with a capped collection, so that the collection never grows over a certain size. If you only need the last op, then you make the size pretty much smaller than this.
db.createCollection( "system.profile", { capped: true, size:4000000 } )
The mongo doc is here: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
From the mongo docs:
To return the most recent 10 log entries in the system.profile
collection, run a query similar to the following:
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()
Since it's sorted inversely by time, just take the first record from the result.
Otherwise you could roll your own with a temporary client-only mongo collection:
Queries = new Mongo.Collection(null);
Create an object containing your query, cancel the last record and insert the new one.

MongoDB query executes in 1ms on mongo-shell but takes 400ms and more on NodeJS

I have a large MongoDB collection, containing more than 2GB of raw data and I use a very simple query to fetch a specific document from the collection by its Id. Document sizes currently range from 10KB to 4MB, and the Id field is defined as an index.
This is the query I'm using (with the mongojs module):
db.collection('categories').find({ id: category_id },
function(err, docs) {
callback(err, docs.length ? docs[0] : false);
}).limit(1);
When I execute this query using MongoDB shell or a GUI such as Robomongo it takes approximately 1ms to fetch the document, no matter what its physical size, but when I execute the exact same query on NodeJS the response time ranges from 2ms to 2s and more depending on the amount of data. I only measure the time it takes to receive a response and even in cases where NodeJS waits for more than 500ms the MongoDB profiler (.explain()) shows it took only a single millisecond to execute the query.
Now, I'm probably doing something wrong but I can't figure out what it is. I'm rather new to NodeJS but I had experience with MongoDB and PHP in the past and I never encountered such performance issues, so I tend to think I'm probably abusing NodeJS in some way.
I also tried profiling using SpyJS on WebStorm, I saw there are a lot of bson.deserialize calls which sums up quickly into a large stack, but I couldn't investigate farther because SpyJS always crashes at this point. Probably related but I still have no idea how to deal with it.
Please advise, any leads will be appreciated.
Edit:
This is the result of db.categories.getIndexes():
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "my_db.categories"
},
{
"v" : 1,
"key" : {
"id" : 1
},
"name" : "id_1",
"ns" : "my_db.categories"
}
]
I also tried using findOne which made no difference:
db.collection('categories').findOne({ id: category_id },
function(err, doc) {
callback(err, doc || false);
});
My guess is the .limit(1) is ignored because the callback is provided early. Once find sees a callback it's going to execute the query, and only after the query has been sent to mongo will the .limit modifier try to adjust the query but it's too late. Recode as such and see if that solves it:
db.collection('categories').find({ id: category_id }).limit(1).exec(
function(err, docs) {
callback(err, docs.length ? docs[0] : false);
});
Most likely you'll need to have a combination of normalized and denormalized data in your object. Sending 4MB across the wire at a time seems pretty heavy, and likely will cause problems for any browser that's going to be doing the parsing of the data.
Most likely you should store the top 100 products, the first page of products, or some smaller subset that makes sense for your application in the category. This may be the top alphabetically, most popular, newest, or some other app-specific metric you determine.
When you go about editing a category, you'll use the $push/$slice method to ensure you avoid unbounded array growth.
Then when you actually page through the results you'll do a separate query to the individual products table by category. (Index that.)
I've written about this before here:
https://stackoverflow.com/a/27286612/68567

Resources