How slow is mongodb's skip() in mongoose

How slow is mongodb's skip() in mongoose - node.js

I've seen a lot of questions regarding this and all answers seem to imply .skip() method is expensive & I should use ranges. What I'm trying to ask is when does it get slower? I have a database of 11000 records, will it be slow to do pagination with a limit of 25.
Also is there an alternative or a better way of doing pagination in mongoose. Is mongoose-paginate module fast? how does it work (skip or something else)?
Thanks :)

Related

Using Mongoose Rather than the MongoDB Driver to Find Document

We are in the process of writing tests for our Node/MongoDB backend and I have a question about finding documents.
My understanding is it's preferable to use Mongoose to get your documents as opposed to the MongoDB driver. In other words, doing Customer.findOne().exec() instead of setting up a db connection and then doing db.collection("customers").findOne().
Other than the first option (using Mongoose to find the doc) being slightly less verbose, I'm curious what the other reasons are. Is a straight MongoDB lookup a bigger drag on the database?

One of the great feature of mongoose is the built in validation mechanism. Also the Populate method to get data from multiple collections is an awesome characteristic of Mongoose.
In terms of query performance, here is a good read:
https://medium.com/#bugwheels94/performance-difference-in-mongoose-vs-mongodb-60be831c69ad
Hope this helps :)

Efficient pagination in Cosmos DB

I need to implement efficient pagination for Cosmos DB with nodejs api. There are many examples about the implementation with .NET and LINQ but I could not find anything good for nodejs. The idea is to send the pageSize and pageIndex and get the relevant result.
I already know we can always use dbClient.queryDocuments and get the queryIterator and perform the pagination but this requires always iterating from the first document in the DB. An example could be find here.
Any idea how to do it in an efficient way?

Unfortunately CosmosDB as an engine doesn’t have skip and take pagination support yet.
It is, however, a planned feature.
The blogs you’ve read provide one of the few viable workarounds for now which of course comes with a cost.
You could write something smarter and instead of iterating though every document from the beginning, you could keep the request’s continuation token and use it with your next request. That way you can have a previous and next button logic.

Mongoose populate option and query time

I am working on a platform where I use mongoose .populate number of times in all my queries, I turn on the mongoose debug mode and find that there is hardly difference in query execution time (for 100 document now , there will will be 100000 doc in future) with using populate and without using populate.
I know that basically populate is also doing a finOne query internally , my question is, is using .populate will increase my query time or is it anyways going to effect my performance if number of record reaches millions. Also is there any alternate that I can choose to increase performance

In general, you're correct - you want to avoid using populate() since it will issue another query for each row. Keep in mind that this is a full round-trip to the server. Mongo doesn't have any sort of concept for a join, so when you do populate you're issuing an additional query for each row in your returned set.
The technique to work around this is to denormalize your data - don't design a mongo database like a relational one. The mongo docs have lots of information on how to do this. https://docs.mongodb.org/manual/core/data-model-design/ One important thing to keep in mind with Mongo design is that you never want to have a subdocument with unbounded growth. Due to the way mongo space allocation and paging works, this can cause severe performance problems, so if you're in a situation like this it's best to normalize.
Another very common technique is subdocument caching. This is where you take partial data from the "joined" collection and cache it on the collection you're querying. In this case, you're trading space for performance because you have duplicate data. Also, you'll have to make sure you keep the data updated whenever there's a change. With mongoose, it is easy to do this as a post-save hook on the model of the foreign collection.

What is the best practice for mongoDB to handle 1-n n-n relationships?

In relational database, 1-n n-n relationships mean 2 or more tables.
But in mongoDB, since it is possible to directly store those things into one model like this:
Article{
content: String,
uid: String,
comments:[Comment]
}
I am getting confused about how to manage those relations. For example, in article-comments model, should I directly store all the comments into the article model and then read out the entire article object into JSON every time? But what if the comments grow really large? Like if there is 1,000 comments in an article object, will such strategy make the GET process very slow every time?

I am by no means an expert on this, however I've worked through similar situations before.
From the few demos I've seen yes you should store all the comments directly in line. This is going to give you the best performance (unless you're expecting some ridiculous amount of comments). This way you have everything in your document.
In the future if things start going great and you do notice things going slower you could do a few things. You Could look to store the latest (insert arbitrary number) of comments with a reference to where the other comments are stored, then map-reduce old comments out into a "bucket" to keep loading times quick.
However initially I'd store it in one document.
So would have a model that looked maybe something like this:
Article{
content: String,
uid: String,
comments:[
{"comment":"hi", "user":"jack"},
{"comment":"hi", "user":"jack"},
]
"oldCommentsIdentifier":12345
}
Then only have oldCommentsIdentifier populated if you did move comments out of your comment string, however I really wouldn't do this for less then 1000 comments and maybe even more. Would take a bit of testing here to see what the "sweet" spot would be.

I think a large part of the answer depends on how many comments you are expecting. Having a document that contains an array that could grow to an arbitrarily large size is a bad idea, for a couple reasons. First, the $push operator tends to be slow because it often increases the size of the document, forcing it to be moved. Second, there is a maximum BSON size of 16MB, so eventually you will not be able to grow the array any more.
If you expect each article to have a large number of comments, you could create a separate "comments" collection, where each document has an "article_id" field that contains the _id of the article that it is tied to (or the uid, or some other field unique to the article). This would make retrieving all comments for a specific article easy, by querying the "comments" collection for any documents whose "article_id" field matches the article's _id. Indexing this field would make the query very fast.
The link that limelights posted as a comment on your question is also a great reference for general tips about schema design.

But if solve this problem by linking article and comments with _id, won't it kinda go back to the relational database design? And somehow lose the essence of being NoSQL?
Not really, NoSQL isn't all about embedding models. Infact embedding should be considered carefully for your scenario.
It is true that the aggregation framework solves quite a few of the problems you can get from embedding objects that you need to use as documents themselves. I define subdocuments that need to be used as documents as:
Documents that need to be paged in the interface
Documents that might exist across multiple root documents
Document that require advanced sorting within their group
Documents that when in a group will exceed the root documents 16meg limit
As I said the aggregation framework does solve this a little however your still looking at performing a query that, in realtime or close to, would be much like performing the same in SQL on the same number of documents.
This effect is not always desirable.
You can achieve paging (sort of) of suboducments with normal querying using the $slice operator, but then this can house pretty much the same problems as using skip() and limit() over large result sets, which again is undesirable since you cannot fix it so easily with a range query (aggregation framework would be required again). Even with 1000 subdocuments I have seen speed problems with not just me but other people too.
So let's get back to the original question: how to manage the schema.
Now the answer, which your not going to like, is: it all depends.
Do your comments satisfy the needs that they should separate? Is so then that probably is a good bet.

There is no best way to this. In MongoDB you should be designing your collections according to application that is going to use it.
If your application needs to display comments with article, then I can say it is better to embed these comments in article collection. Otherwise, you will end up with several round trips to your database.
There is one scenario where embedding does not work. As far as I know, document size is limited to 16 MB in MongoDB. This is quite large actually. However, If you think your document size can exceed this limit it is better to have separate collection.

NoSQL database with high read performances (write accesses are not significant)?

I'm working on a "real-time" website using Nodejs. Currently, I'm using Redis because I need high performance for read-access. The write accesses are not really significant for my use case.
In addition, Redis does not have a query language for the search. So, I create my indexes manually and I use some unions/intersections/... to find some values.
I think that it will be easier to use MongoDB with a embedded finding system and a ORM-like (Mongoose for example). The problem is that I'm not sure that MongoDB is the best choice for my usecase.
What is your advices about the NoSQL DB that I need ? Redis ? CouchDB ? MongoDB ? Cassandra ? etc.
I repeat: I want to have a real good performance for the read accesses and for the searches (the write accesses are not significant), the simplest possible (orm-like ? finding system ? etc.)
Thanks.

I believe that redis would be the better solution for the following reasons.
You require fast read access and redis provides the fastest solution since the keys are in memory, if not most.
Although mongodb is easier to query in the general case, your problem domain is narrow and once you decide how you would like to query the data, you can put the correct data structures and indexes in place.

I would say that Redis is a good fit for your DB, and you should look at something like Solr or elasticsearch to provide your searching.

CouchDB will do better in write heavy environment. I don't use it though.
MongoDB will do better on read heavy environment.
For search and indexing:
MongoDB would require separate index for each of your search criteria for better performance (at least this is what I remember).
Proper index is important in MongoDB. And no joins!!
Here are some links you might go through:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Hope these will help you find the right db
Goodluck

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string