How to power a windowed virtual list with cursor based pagination? - pagination

Take a windowed virtual list with the capability of loading an arbitrary range of rows at any point in the list, such as in this following example.
The virtual list provides a callback that is called anytime the user scrolls to some rows that have not been fetched from the backend yet, and provides the start and stop indexes, so that, in an offset based pagination endpoint, I can fetch the required items without fetching any unnecessary data.
const loadMoreItems = (startIndex, stopIndex) => {
fetch(`/items?offset=${startIndex}&limit=${stopIndex - startIndex}`);
}
I'd like to replace my offset based pagination with a cursor based one, but I can't figure out how to reproduce the above logic with it.
The main issue is that I feel like I will need to download all the items before startIndex in order to receive the cursor needed to fetch the items between startIndex and stopIndex.
What's the correct way to approach this?

After some investigation I found what seems to be the way MongoDB approaches the problem:
https://docs.mongodb.com/manual/reference/method/cursor.skip/#mongodb-method-cursor.skip
Obviously he same approach can be adopted by any other backend implementation.
They provide a skip method that allows to skip an arbitrary amount of items after the provided cursor.
This means my sample endpoint would look like the following:
/items?cursor=${cursor}&skip=${skip}&limit=${stopIndex - startIndex}
I then need to figure out the cursor and the skip values.
The following code could work to find the closest available cursor, given I store them together with the items:
// Limit our search only to items before startIndex
const fragment = items.slice(0, startIndex);
// Find the closest cursor index
const cursorIndex = fragment.length - 1 - fragment.reverse().findIndex(item => item.cursor != null);
// Get the cursor
const cursor = items[cursorIndex];
And of course, I also have a way to know the skip value:
const skip = items.length - 1 - cursorIndex;

Related

Suitescript Pagination

Ive been trying to create a suitelet that allows for a saved search to be run on a collection of item records in netsuite using suitescript 1.0
Pagination is quite easy everywhere else, but i cant get my head around how to do it in NetSuite.
For instance, we have 3,000 items and I'm trying to limit the results to 100 per page.
I'm struggling to understand how to apply a start row and a max row parameter as a filter so i can run the search to return the number of records from my search
I've seen plenty of scripts that allow you to exceed the limit of 1,000 records, but im trying to throttle the amount shown on screen. but im at a loss to figure out how to do this.
Any tips greatly appreciated
function searchItems(request,response)
{
var start = request.getParameter('start');
var max = request.getParameter('max');
if(!start)
{
start = 1;
}
if(!max)
{
max = 100;
}
var filters = [];
filters.push(new nlobjSearchFilter('category',null,'is',currentDeptID));
var productList = nlapiSearchRecord('item','customsearch_product_search',filters);
if(productList)
{
response.write('stuff here for the items');
}
}
You can approach this a couple different ways. Either way, you will definitely need to sort your search results by something meaningful and consistent, like by internal ID. Make sure you've got your results sorted either in your saved search definition or by adding a search column in your script.
You can continue building your search exactly like you are, and then just using the native slice method on the productList Array. You would use your start and end parameters to pass as the arguments to slice appropriately.
Another approach is to use the async API for searches. It will look similar to this:
var search = nlapiLoadSearch("item", "customsearch_product_search");
search.addFilter(new nlobjSearchFilter('category',null,'is',currentDeptID));
var productList = search.runSearch().getResults(start, end);
For more references on this approach, check out the NetSuite Help page titled "Search APIs" and the reference page for nlobjSearch.

Paginating a mongoose mapReduce, for a ranking algorithm

I'm using a MongoDB mapReduce to code a ranking feed algorithm, it almost works but the latest thing to implement is the pagination. The map reduce supports the results limitation but how could I implement the offset (skipping) based e.g. on the latest viewed _id of the results, knowing that I'm using mongoose?
This is the procedure I wrote:
o = {};
o.map = function() {
//log10(likes+comments) / elapsed hours from the post creation
emit(Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1), this);
};
o.reduce = function(key, values) {
//sort the values, when they have the same score
values.sort(function(a, b) {
a.createdAt - b.createdAt;
});
//serialize the values, because mongoose does not support multiple returned values
return JSON.stringify(values);
};
o.scope = {now: new Date()};
o.limit = 15;
Posts.mapReduce(o, function(err, results) {
if (err) return console.log(err);
console.log(results);
});
Also, if the mapReduce it's not the way to go, do you suggest other on how to implement something like this?
What you need is a page delimiter which is not the id of the latest viewed as you say, but your sorting property. In this case, it seems to be the formula Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1).
So, in your mapReduce query needs to hold a where value of that formula above. Or specifically, 'formula >= . And also it needs to hold the value of createdAt at the last page, since you don't sort by that. (Assuming createdAt is unique). So yourqueryof mapReduce would saywhere: theFormulaExpression, createdAt: { $lt: lastCreatedAt }`
If you do allow multiple identical createdAt values, you have to play a little outside of the database itself.
So you just search by formula.
Ideally, that gives you one element with exactly that value, and the next ones sorted after that. So in reply to the module caller, remove this first element off the array (and make sure you actually ask for more results then you need because of this).
Now, since you allow for multiple similar values, you need another identifying prop, say, object id or created_at. Your consumer (caller of this module) will have to provide both (last value of the score, createdAt of the last object). Say you have a page split exactly in the middle - one or more objects is on the previous page, another set on the next
. You'd have to not simply remove the top value (because that same score is already served on the previous page), but possibly several of them from the top.
Then it goes really crazy, because potentially your whole page was already served - compare the _ids, look for the first one after the one your module caller has provided you with. Or look into the data and determine how many matching values like that are there, try to get at least as many more values from mapReduce then you have on your actual page size.
Aside from that, I would do this with aggregation instead, it should be much more preformant.

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

How to retrieve all documents in couchdb database without causing out of memory

I have a coucdb database which contains about 200000 tweets, keys are tweet ID. I have a query which needs to retrieve all documents to look for some information. I'm using lightcouch to work with couchdb in a java web app. If I create a dbClient like this:
List<JsonObject>tweets = dbClient.view("_all_docs").query(JsonObject.class);
and then loop through tweets, for each JsonObject in tweets, use
JsonObject tweetJson = dbClient.find(JsonObject.class, tweet.get("id").toString().replaceAll("\"", ""));
to retrieve each tweet one by one it took extremely long time for 200000 documents. If I load all documents in one single query using includeDocs(true)
List<JsonObject>allTweets = dbClient.view("_all_docs").includeDocs(true).query(JsonObject.class);
it caused outofmemory exception since the number of documents are too large. So how can i deal with this problem? I'm thinking about using limit(5000) to retrieve 5000 documents for each time and loop through whole database, but I don't know how to write the loop to continue to retrieve the next 5000 after the first 5000 docs. One possible solution is using startKey and endKey but I'm confused how to use them when the key is tweet ID.
Use queryPage but make sure to use a String as the Key
See: https://github.com/lightcouch/LightCouch/issues/26#event-122327174
0.1.6 still seems to show this behaviour.
A workaround that I found for this goes something like this:
changes = DbClient.changes()
.since(null) // or... since(since) if you want an offset
.includeDocs(true);
int size = 1;
getCursor("0");
while (size > 0 ) {
ChangesResult resultSet = changes.limit(40000).getChanges();
List<ChangesResult.Row> rowList = resultSet.getResults();
for (ChangesResult.Row feed: rowList) {
<instantiate your object via gson>
.
.
.
}
getCursor(resultSet.getLastSeq());
size = rowList.size();
}

How to maintain counters with LinqToObjects?

I have the following c# code:
private XElement BuildXmlBlob(string id, Part part, out int counter)
{
// return some unique xml particular to the parameters passed
// remember to increment the counter also before returning.
}
Which is called by:
var counter = 0;
result.AddRange(from rec in listOfRecordings
from par in rec.Parts
let id = GetId("mods", rec.CKey + par.UniqueId)
select BuildXmlBlob(id, par, counter));
Above code samples are symbolic of what I am trying to achieve.
According to the Eric Lippert, the out keyword and linq does not mix. OK fair enough but can someone help me refactor the above so it does work? A colleague at work mentioned accumulator and aggregate functions but I am novice to Linq and my google searches were bearing any real fruit so I thought I would ask here :).
To Clarify:
I am counting the number of parts I might have which could be any number of them each time the code is called. So every time the BuildXmlBlob() method is called, the resulting xml produced will have a unique element in there denoting the 'partNumber'.
So if the counter is currently on 7, that means we are processing 7th part so far!! That means XML returned from BuildXmlBlob() will have the counter value embedded in there somewhere. That's why I need it somehow to be passed and incremented every time the BuildXmlBlob() is called per run through.
If you want to keep this purely in LINQ and you need to maintain a running count for use within your queries, the cleanest way to do so would be to make use of the Select() overloads that includes the index in the query to get the current index.
In this case, it would be cleaner to do a query which collects the inputs first, then use the overload to do the projection.
var inputs =
from recording in listOfRecordings
from part in recording.Parts
select new
{
Id = GetId("mods", recording.CKey + part.UniqueId),
Part = part,
};
result.AddRange(inputs.Select((x, i) => BuildXmlBlob(x.Id, x.Part, i)));
Then you wouldn't need to use the out/ref parameter.
XElement BuildXmlBlob(string id, Part part, int counter)
{
// implementation
}
Below is what I managed to figure out on my own:.
result.AddRange(listOfRecordings.SelectMany(rec => rec.Parts, (rec, par) => new {rec, par})
.Select(#t => new
{
#t,
Id = GetStructMapItemId("mods", #t.rec.CKey + #t.par.UniqueId)
})
.Select((#t, i) => BuildPartsDmdSec(#t.Id, #t.#t.par, i)));
I used resharper to convert it into a method chain which constructed the basics for what I needed and then i simply tacked on the select statement right at the end.

Resources