Keyset Pagination for Spring Data JDBCs Pageable

Keyset Pagination for Spring Data JDBCs Pageable - pagination

AFAIK, the Pageable class supports only LIMIT/OFFSET based paging. However, while being a quite universal solution, it comes with some downsides as outlined here https://momjian.us/main/blogs/pgblog/2020.html#August_10_2020
Keyset Pagination (aka Seek Method or Cursor-based Pagination) has some benefits in terms of performance and behavior during concurrent data inserts and deletes. For details see
https://use-the-index-luke.com/no-offset
http://allyouneedisbackend.com/blog/2017/09/24/the-sql-i-love-part-1-scanning-large-table/
https://slack.engineering/evolving-api-pagination-at-slack-1c1f644f8e12
https://momjian.us/main/blogs/pgblog/2020.html#August_17_2020
So, are there any plans to support this pagination method, e.g. via Pageable<KeyType> and getKey() that then gets incorporated into the SQLs WHERE clause?

This possibility was discussed in the team and while not considered urgent it is something we would like to offer eventually.
The first step would be to provide support for this in Spring Data Commons, i.e. a persistence store independent API. The issue for this is DATACMNS-1729

Related

How to overwrite generic ODATA expand handling functionality

We are currently working on performance issues with our provided OData interface, since the UI5 issues a read request with multiple expand paths attached. Due to the generic handling of the request by the framework this leads to an additional processing per expand option, which we need to prevent.
Reading the blog about this topic there seems to be a way to overwrite the generic handling somehow:
https://blogs.sap.com/2018/03/19/sap-cloud-platform-sdk-for-service-development-create-odata-service-7-more-navigation-read-create-expand-sqo/
In this case it is us who need to decide if we can afford to rely on the FWK-functionality. Of course, such generic support cannot be performant. But for small amount of data it is just nice to get it for free.
Stay tuned to learn how to overwrite such generic FWK-functionality with own specific implementation.
However, there is no further blog post on this and looking through the framework, my only idea to overwrite this would be to configure and use an own com.sap.gateway.core.api.provider.data.IDataProvider implementation which handles the request in a custom way, although this would be an immense workaround.
So the questions is if there is some leaner or easier approach to overwriting this functionality which I missed?
UPDATE:
I was update to create a custom data provider and register it with the RuntimeDelegate after servlet initialization. This custom data provider would then check for a custom annotation on the mapped method handler to see if expand should be handled or not. If not it will just read the entity, but not perform he generic expanded read. This works more or less fine, but what is of course missing is a way to pass the properties to be expanded in the ReadRequest. So far only a static implementation is possible solving our performance problem, but I would gladly have a hint if there is another, better solution for this...

At the time of this writing, no better approach exists at the moment.

Why can't ContinuationToken be used for paging in Azure Search API?

Reading the documentation for the Azure Search .NET SDK, I see that the ContinuationToken property is not supposed to used for pagination (this is the same as the #odata.nextLink and #search.nextPageParameter properties in the REST API).
Note that this property is not meant to help you implement paging of search results. You can implement paging using the Top and Skip search parameters.
Source
Why can't I use it for pagination? I have a situation where I want to run a query and then step through a static copy of the results page by page. I don't want those query results to change beneath my feet, however, as I am navigating through them, as new documents are added to the underlying database. In my case, there could be hundreds or thousands of results that get added in the minute or two between submitting the initial query and navigating to another page. How could I accomplish this?

Your question can be addressed in two parts:
Why is it not recommended to use ContinuationToken to implement pagination?
How can pagination be implemented such that results remain completely stable from page to page?
These are actually unrelated questions, since nothing about ContinuationToken guarantees the stability of the search results. Azure Search makes no consistency guarantees around paging, whether you use $top and $skip or ContinuationToken.
For question #1, the reason ContinuationToken is not recommended for paging is that Azure Search controls when the token is returned, not your application code. If you make assumptions about how and when Azure Search decides to return you a token, there's a chance those assumptions may break with a future service update. The intent of ContinuationToken is to prevent requests for too many documents from overwhelming the service, so you should assume that it is entirely at the service's discretion whether it will return a token.
For question #2, since Azure Search doesn't provide consistency guarantees, you can't completely avoid issues like the same document showing up in multiple pages, missing documents, or documents that are deleted by the time they are seen in results. Even if you wanted to build your own snapshot of the results and page over them in your application code, building a consistent snapshot isn't possible in the first place. However, if your only concern is to avoid showing new documents in the results, you can include a created timestamp field in your index and filter on that in every search request.
Frankly, unless you're trying to export the entire contents of your index, I would question the need for such strong consistency guarantees around paging. Google and Bing make no such guarantees, so arguably user expectations are already set around this. If you are trying to export your data, this is unfortunately not easy with Azure Search today. In that case, please vote on this User Voice item to help the team prioritize this scenario.

sub partitioning or composite partitioning document db

In one article of msdn,
https://azure.microsoft.com/en-in/documentation/articles/documentdb-partition-data/,
there is a line which specifies that "sub-partitioning" or "complex partitioning" can be done. Does this mean :
There can be sub-partitioning inside a collection?
In a single DocumentDb, there can be more than one partitioning logic? For example, I will have four collections inside a single Document Db. Can two of them can be based on hash and the other two on range?
If either of those answers is YES, then can someone provide me a link that might lead me to an example of the same?

Answers:
There is no explicit method to sub-partition data within a collection. It's common to use a field to represent the type of document or to have isTypeA: true key value pairs on each document, but that's a convention that your application adopts. However, you can create multiple databases (default limit 5 but may be extended upon request) per account and each can have their own set of collections. I'm using that two-level hierarchy in (temporalize-api). TenantID determines my top-level partitioning (database) using a lookup table plus defaults. This allows me to pull critical or high value tenants into a less loaded database and leave everyone else in the default. I use a consistent hash on the EntityID for second-level partitioning (collection).
Sure, there is nothing preventing you from doing that. Pay particular attention to the excellent discussion in the last section (Developing a partitioned application) in the Aravind article you linked to. It includes a checklist of things you'll need to decide upon and implement. The partition resolvers provided for the .NET SDK do not take care of these issues for you.
I haven't yet seen open source examples of what I would consider a complete system including balancing when capacity is added, where to store the partition maps/meta-data, and query fan-out/aggregate optimization. I have a node.js one under way (temporalize-api) and actually in production. I've made decisions about how I'm going to do balancing and query fan-out and those are documented in the comments in that linked file, but I have not implemented all of them. I store the partition meta-data in the "first" collection of the "first" database.

RethinkDB Transaction to multiple documents/tables

I need to update 2 tables in one transaction.
In current version RethinkDB doesn't support transactions from the box. So, how can I achieve this?
I can make update in 2 ways:
Update 1st table. If success -> update second table.
Update 2nd tables async.
But how can I resolve case, when 1 of 2 updates was completed well, but another no? Yes, I can check result of update and revert update if error occured. But anyway, there can be case, when something happens with application (lost connection to Rethink, or just crash of script), but one of two updates was completed.
So, my data base will be in inconsistent state. And no way to resolve this.
So, is it possible to simulate transaction behavior in nodejs for RethinkDB?

The best you can do is two-phase commit. (MongoDB has a good document on how to do this, and the exact same technique should work in RethinkDB: http://docs.mongodb.org/master/tutorial/perform-two-phase-commits/ .)

RethinkDB supports per key linearizability and compare-and-set (document level atomicity) and it's known to be enough to implement application level transactions, more over you have several options to choose from:
If you need Serializable isolation level then you can follow the same algorithm which Google use for the Percolator system or Cockroach Labs for CockroachDB. I've blogged about it and create a step-by-step visualization, I hope it will help you to understand the main idea behind the algorithm.
If you expect high contention but it's fine for you to have Read Committed isolation level then please take a look on the RAMP transactions by Peter Bailis.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.

We had a similar requirement to implement transactional support in ReThinkDB, as we wanted to have transactions extending across MySQL and ReThinkDB DB boundaries. We had come-up with this micro library thinktrans https://github.com/jaladankisuresh/thinktrans, which is a promised based declarative javascript library for RethinkDB supporting Atomic transactions. However, It is still in its alpha stages
If you have a specific requirement and you may want to understand its approach Implementing Transactions in NoSQL Databases and implement your own.
Disclaimer: I am the author of this library

Does a B Tree work well for auto suggest/auto complete web forms?

Auto suggest/complete fields are used all over the web. Google has appeared to master it given that as soon as one types in a search query, suggestions are returned almost instantaneously.
I'm assuming the framework for achieving this involves a fast, in-memory data store on the web tier. We're building a Grails app based around retail products, so a user may search for Can which should suggest things like Canon, Cancun, etc, and wondering if a Java B-tree cached in memory would suffice for quick auto completes returned as JSON over AJAX. Outside of the jQuery AutoComplete field, do any frameworks and/or libraries exist to facilitate the development of this solution?

Autocomplete is a text matching, information retrieval problem. Implementing your own B-tree and writing your own logic to match words to other words is something you could do. But then you would have to implement Porter Stemming, a Vector Space Model, and a String-edit distance calculation.
...or you could use Lucene and its derivatives, which do a lot of this stuff already. If you really care about the data structures used to store this stuff, you could dive into its source. But I highly doubt writing your own and doing it all yourself would be more maintainable and efficient in the long run.
One of the more popular Grails ecosystem plugins for this is Searchable, which was mentioned in Ledbrook & Smith's Grails in Action. It uses Lucene under the covers, and make sit pretty easy to add full-text search to your domain classes. (For example, check out chapter 8 in GinA or the searchable docs).

The Grails Richui plugin has an autocomplete that I've used in the past. We had it hooked up to hit the database every keystroke (which I would not suggest but our data changed often enough that real-time data was required). If your list of things is pretty static though then it could probably work well for you.
http://grails.org/plugin/richui#AutoComplete

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string