I am studying about pagination and I have some questions.
What is the difference between two approches?
Best use-case for a cursor based pagination?
Can cursor based pagination go to a specific page?
Can cursor based pagination go back to the previous page?
Are there any performance differences between the two?
My thoughts
I think cursor based is much more complex which makes offset based pagination more desirable. Only real-time data centric system needs a cursor based pagination.
Cursor pagination is most often used for real-time data due to the frequency new records are added and because when reading data you often see the latest results first. There different scenarios in which offset and cursor pagination make the most sense so it will depend on the data itself and how often new records are added. When querying static data, the performance cost alone may not be enough for you to use a cursor, as the added complexity that comes with it may be more than you need.
Quoted from this awesome blog post, happy coding!
Also, check this out:
Pagination is a solution to this problem that ensures that the server only sends data in small chunks. Cursor-based pagination is our recommended approach over numbered pages, because it eliminates the possibility of skipping items and displaying the same item more than once. In cursor-based pagination, a constant pointer (or cursor) is used to keep track of where in the data set the next items should be fetched from.
This explanation is from Appolo GraphQL docs.
This post explains the difference between them.
What is the difference between two approaches?
The difference is big. One paginates using offset, and the other paginates using cursors. There are multiple pros/cons of both approaches. For example, offset pagination allows you to jump to any page, while in Cursor-based pagination, you can only jump into the next/previous page.
There is also a substantial difference in the implementation underneath. The Offset will very likely have to load all the records from the first page to the page you want to get. There are techniques to avoid this.
For more pros and cons I suggest reading the article.
Best use-case for a cursor based pagination?
If you're using a relational Database and have millions of records. Querying high Offset will probably take a lot of time/timeout, while cursor pagination will be more performant.
Can cursor based pagination go to a specific page?
No, this is one of the disadvantages of the approach.
Can cursor based pagination go back to the previous page?
It's a very common technique to provide two cursors as a response, one containing the previous page and the other with the next page. If that's the case, you can go to the previous page. Otherwise, you cannot.
Are there any performance differences between the two?
Yes! see my comment above.
Related
Reading the documentation for the Azure Search .NET SDK, I see that the ContinuationToken property is not supposed to used for pagination (this is the same as the #odata.nextLink and #search.nextPageParameter properties in the REST API).
Note that this property is not meant to help you implement paging of search results. You can implement paging using the Top and Skip search parameters.
Source
Why can't I use it for pagination? I have a situation where I want to run a query and then step through a static copy of the results page by page. I don't want those query results to change beneath my feet, however, as I am navigating through them, as new documents are added to the underlying database. In my case, there could be hundreds or thousands of results that get added in the minute or two between submitting the initial query and navigating to another page. How could I accomplish this?
Your question can be addressed in two parts:
Why is it not recommended to use ContinuationToken to implement pagination?
How can pagination be implemented such that results remain completely stable from page to page?
These are actually unrelated questions, since nothing about ContinuationToken guarantees the stability of the search results. Azure Search makes no consistency guarantees around paging, whether you use $top and $skip or ContinuationToken.
For question #1, the reason ContinuationToken is not recommended for paging is that Azure Search controls when the token is returned, not your application code. If you make assumptions about how and when Azure Search decides to return you a token, there's a chance those assumptions may break with a future service update. The intent of ContinuationToken is to prevent requests for too many documents from overwhelming the service, so you should assume that it is entirely at the service's discretion whether it will return a token.
For question #2, since Azure Search doesn't provide consistency guarantees, you can't completely avoid issues like the same document showing up in multiple pages, missing documents, or documents that are deleted by the time they are seen in results. Even if you wanted to build your own snapshot of the results and page over them in your application code, building a consistent snapshot isn't possible in the first place. However, if your only concern is to avoid showing new documents in the results, you can include a created timestamp field in your index and filter on that in every search request.
Frankly, unless you're trying to export the entire contents of your index, I would question the need for such strong consistency guarantees around paging. Google and Bing make no such guarantees, so arguably user expectations are already set around this. If you are trying to export your data, this is unfortunately not easy with Azure Search today. In that case, please vote on this User Voice item to help the team prioritize this scenario.
I'm playing with cassandra for some time and the one thing I'm less satisfied with is the previous page pagination.
As far as I can understand cassandra has auto paging support. All I have to give is PageSize and the PageState and its returning the next set of rows.
I have no problem with the "Next" page link since everytime I query cassandra it returns the next PageState.
However I have no idea what is the right way to implement previous page link. Since my project is a web app, its very important to have previous page link.
At the moment the only way I can go back to previous page is by storing all past PageStates in Sessions.
This is fine for a few page site. But the reason I choose cassandra is for big data. I don't wanna keep track of all past PageStates.
I don't want to expose the page state in browser either because of security reasons. What is the proper way to implement paging with proper previous page link?
Please take a look at the following Backward paging in cassandra c# driver.
We have implemented similar thing with encryption though.
I am facing the following challenge in an XPage: There are three databases with exactly the same views in it. The goal is to unite these three views from the three databases in one XPage and one view component!
AFAIK, one can usually provide just one view per view component. Currently, I have a Java back end where the documents are fetched. They are then processed to HTML markup and made more beautiful / functional by using jQuery data tables.
I see (at least) three disadvantages:
It is quite some code and if you want to display another view from the databases you quickly run into boiler plate code...
It is not too fast as it takes up to 30 sec. to fetch and display all records.
I can hardly image that my way is best practice.
Has anyone ever faced this challenge? I would like to reduce Java code, make it faster and use some standard component if possible.
Tim has good questions in his comment. With your current approach make sure you use ViewNavigator cache which is the fastest way to retrieve view entries:
Notes/Domino Release 8.52 or greater
View.setAutoUpdate must be False
ViewNavigator cache must be enabled
ViewNavigator.getNext() (or getPrev) must be used
http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Fast_Retrieval_of_View_Data_Using_the_ViewNavigator_Cache
When i first query a CouchDB/Couchbase view it needs to be calculated. This can take a good while if there are large number of docs and that for each single view..
Is there any way of replicate an already calculated view from one Couch to another?
Not directly through CouchDB replication, no, there's all sorts of practical complexities in how that would have to be implemented that make it impractical I'm afraid.
For starters it means that CouchDBs have to carefully manage replication of view calculation of changes simultaneously somehow exactly in sync with the actual data (so you don't ever get newer view calculations than data), and that then gets further complicated by the fact that views only get updated when requested, so view data on either end could be out of date (and if users are querying with stale=ok, it might even be required to stay out of date).
I believe you can do it by directly copying the view index files (in /var/lib/couchdb/.DBNAME_design/SOMEHASH.view by default I think), if you just need a once-off view sync. I'd recommend against doing that frequently as a general solution though, since it's not officially supported AFAIK and is likely to be pretty fragile.
This isn't directly the answer to your question, although as PimTerry pointed out, replicating the view index is not supported, especially between different implementation.
What you can do instead is follow the procedure described here:
http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment
This way you can have your couchdb calculate the new index "in background" without blocking the usage of your application.
Hope this helps.
I have some basic views and some map/reduce views with logic. Nothing too complex. Not too many documents. I've tried with 250k, 75k, and 10k documents. Seems like I'm always waiting for view indexing.
Does better, more efficient code in the view help? I'm assuming it's basically processing the view at all levels of aggregation. So there must be some improvement there.
Does emit()-ing less data help? emit(doc.id, doc) vs specifying fewer fields?
Do more or less complex keys impact view indexing?
Or is it all about memory, CPU cores, and processor speed?
There must be some documentation out there, but I can't find anything referencing ways to improve performance.
I would take a deeper look into the reduce function. Try to use the built-in Erlang functions like _sum, _count, instead of writing Javascript.
Complex views can take hours and more, that's normal.
Maybe post such not too complex map/reduce.
And don't forget: indexing all docs is only done once after changing the view (or pushing a whole bunch of new docs). Subsequent new docs are indexed incrementally.
Use a view with &stale=ok to retrieve the "old" data instantly, so you don't have to wait. (But pay attention: you always have to call a view without stale=ok at least once to trigger the indexing process). Or better: use stale=update_after.
The code you write in views is more like CREATE INDEX than SELECT. It should be irrelevant how long it takes, as long as the view builds keep up with the document change rate. Building a view is a sunk (one-time) cost.
When you query the view, that is always a binary tree scan, which operates against a static data set in logarithmic time. That is usually the performance people care about more (in production.)
If you are not seeing behavior like I describe, perhaps we could discuss your view functions and your general approach to your problem. CouchDB is very different from relational databases. In the latter, you have highly structured data and free-form queries. In CouchDB, you have free-form data but highly structured index definitions (views). Except during development, changing and rebuilding views should be rare.
not emitting anything will help, but doing the view creation in smaller batches ( there are scripts that do this automagically ) helps more than anything other than not emitting anything at all, which can't be helped sometimes.