Alfresco webscript (js) and pagination - pagination

I have a question about the good way to use pagination with Alfresco.
I know the documentation (https://wiki.alfresco.com/wiki/4.0_JavaScript_API#Search_API)
and I use with success the query part.
I mean by that that I use the parameters maxItems and skipCount and they work the way I want.
This is an example of a query that I am doing :
var paging =
{
maxItems: 100,
skipCount: 0
};
var def =
{
query: "cm:name:test*"
page: paging
};
var results = search.query(def);
The problem is that, if I get the number of results I want (100 for example), I don't know how to get the maxResults of my query (I mean the total amount of result that Alfresco can give me with this query).
And I need this to :
know if there are more results
know how many pages of results are lasting
I'm using a workaround for the first need : I'm doing a query for (maxItems+1), and showing only maxItems. If I have maxItems+1, I know that there are more results. But this doesn't give me the total amount of result.
Do you have any idea ?

With the javascript search object you can't know if there are more items. This javascript object is backed by the class org.alfresco.repo.jscript.Search.java. As you can see the query method only returns the query results without any extra information. Compare it with org.alfresco.repo.links.LinkServiceImpl which gives you results wrapped in PagingResults.
So, as javacript search object doesn't provide hasMoreItems info, you need to perform some workaround, for instance first query without limits to know the total, and then apply pagination as desired.

You can find how many objects have been found by your query simply calling
results.length
paying attention to the fact that usually queries have a configured maximum result set of 1000 entries to save resources.
You can change this value by editing the <alfresco>/tomcat/webapps/alfresco/WEB_INF/classes/alfresco/repository.properties file.
So, but is an alternative to your solution, you can launch a query with no constraints and obtain the real value or the max results configured.
Then you can use this value to devise how many pages are available basing you calculation on the number of results for page.
Then dinamically pass the number of the current page to the builder of your query def and the results variable will contain the corresponding chunk of data.
In this SO post you can find more information about pagination.

Related

Converting the logic for cts:search to search:search

I have been working with cts:search in my project but somehow it feels the result time are taking a bit longer than expected. Can search:search help? If so, how?
For example I have the query as
let $query := cts:and-query((cts:element-value-query(xs:QName("Applicability"),"Yes")))
and I want to fetch the document URIs. I was using:
cts:search(collection("abc"), $query)
and it returned the URIs, but how can this be extracted using search:search?
Or is there something other than search can help for improving the execution time?
Are you interested in retrieving the documents, or just the URIs?
If you are only looking to retrieve the URIs of the documents that have an element with that value, then use cts:uris() instead of cts:search(). The cts:uris() function runs unfiltered and will only return URIs from the lexicon, instead of retrieving all of the documents, which can be a lot more expensive than cts:search if you don't need the content.
cts:uris("", (), cts:and-query(( collection("abc"), $query)) )
When using cts:search, the first thing that I would try is to add the unfiltered option to your search and see if that helps.
By default cts:search executes filtered:
A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query.
So, try executing the same query with the "unfiltered" option:
cts:search(collection("abc"), $query, "unfiltered")
You could also look to create an index on that Applicability element, with either an element-range-index or a field-range-index, and then use the appropriate range-query instead of a value-query.

solr-api accessing document field is taking a lot of time

I am trying to access a field in custom request handler. I am accessing it like this for each document:
Document doc;
doc = reader.document(id);
DocFields = doc.getValues("state");
There are around 600,000 documents in the solr. For a query running on all the docs, it is taking more than 65 seconds.
I have also tried SolrIndexSearcher.doc method, but it is also taking around 60 seconds.
Removing the above lines of code bring down the qtime to milliseconds. But, I need to access that field for my algo.
Is there a more optimised way to do this?
It seems that you querying one document at a time which is slow.
If you need to query all documents try to query *:*(instead of asking for a specific id) and then iterate over the results.

ember.js rest api with pagination

At the moment I'm pointless how do achieve pagination with ember-data. I found out, that i can return 'meta' property in response and ember-data does not throw an error. But I just don't know how to access it, nor what is intended purpose of this property.
The few examples on internet assume that i have whole collection already loaded to ember, or they do little trick and do infinite scroll, which does'nt require information about page count.
I think that loading all records it's ok if I would have < 1k of them, but sometimes I'll be dealing with massive amounts of data (let's say apache logs). What then?
So basically I'm at the point in which I would like to use ember and ember-data to build my first real-life application, but I just think that it is not a good idea.
Ok, so anybody has any idea how to solve this basic, yet complicated, problem? :)
Ok, so here are some ideas to get you started.
First, you have to start with a route and take an page number as a dynamic parameter.
this.resource('posts', { path: '/posts/:page' };
Then as I have no experience with Silex, you need to support some kind of server side parameters that could be used for pagination. For example offset and limit where first means how many records you want to skip and second how many record you want in select from there. Ideally you should implement them as query parameters like ?offset=0&limit=10.
Then you just implement your table route as follows:
App.TableRoute = Ember.Route.extend({
model: function (params) {
return App.Post.find({ offset: (params.page - 1) * 10, limit: 10 });
}
});
You can then start doing some more magic and create your items per page parameter or validate the page number by fetching number of all records in advance.

Pagination in CouchDB using variable keys

There's a bunch of questions on here related to pagination using CouchDB, but none that quite fit what I'm wondering about.
Basically, I have a result set ranked by number of votes, and I want to page through the set in descending order.
Here's the map for reference.
function(doc) {
emit(doc.votes);
}
Now, the problem. I found out that startkey_docid doesn't work on it's own. You have to use it in combination with startkey. The thing is, for the query, I don't use a startkey parameter (I'm not looking to restrict the results, just get the most->least). I was thinking I could just use startkey={{doc.votes}}&startkey_docid={{doc._id}} instead, but the number of votes for a document could have changed by the time someone clicks the "Next Page" link.
The way to solve this seemed obvious: just set startkey=99999999 so that it will return all documents in the database and I can just use startkey_docid to start at the one where we left off last time. Oddly, when I do that, the startkey_docid stopped working and just allowed all results to be returned again. Apparently startkey needs to exactly equal the key on the document whose _id is used in startkey_docid.
What I'm asking is whether anyone knows a workaround for using startkey_docid to page when the actual startkey could have changed by the time you want to use it? Should my application just lookup the document by _id and immediately use the doc.votes value hoping it hasn't changed in the few milliseconds between requests? Even that doesn't seem very reliable.
EDIT: Ended up switching to Mongo for the speed, so this question turned out to be kinda moot.
I have never done something like this but I think I have some idea how to do it. What you can do is to take a snapshot of the ratings and refer to it in every page. You probably want your view not to consume to much space, so you should not map separate copies of the documents with votes not changed after taking the snapshot. So, you can do the following:
Add some history of ratings with timestamp to your document.
Map the ratings AND history like this.
In your app get the current time: start_time = Date.now() and query all pages.
Cleanup the history older then the oldest active sessions.
The problem is that if you emit [votes, date] and try to paginate you will never know how many document you have to fetch to get desired number per page. There can always be some older version which you will have to skip, and you will have make next get from DB. Thats why you can consider emitting: [date, votes], read the view always twice -- for start_time and current time, and merge and sort the result (like in merge-sort).
Ad.1:
{ ...,
votes: 12,
history: [
{date: 1357390271342, votes: 10},
{date: 1357390294682, votes: 11}
]
}
Ad.2:
function (doc) {
emit([{}, doc.votes], null);
doc.history && doc.history.forEach(function(h) {
emit([h.date, h.votes], null);
});
}
Ad.3:
?startkey=[start_time, votes]&limit=items_per_page_plus1
?startkey=[{}, votes]&limit=items_per_page_plus1
Merge lists, sort by votes in your app (on in a list function).
If you will have problems with using start_docid then you can emit [date, votes, id] and query with the ID explicitly. Even when this particular doc changes its votes it will still be available in the history.
Ad.4:
If you emit [date, votes] then you can just get outdated history width: ?startkey=[0]&endkey=[oldest_active_session_time]&inclusive_end=false and update them with update handler:
function(doc, req) {
if (!doc || !doc.history) return [null, 'Error'];
var history = new Array();
var oldest = +(req.query.date);
doc.history.forEach(function(h) {
if (h.date >= oldest)
history.push(h);
});
doc.history = history;
return [doc, 'OK'];
}
Note: I have not tested it, so it is expected not to run without modifications :)
As far as I know CouchDB uses b-tree shadowing to make updates and in principle is should be possible to access older revisions of the view. I am not into the CouchDB design, so it is just a guess and there seems not to be any (documented) API for this.
I can't figure out any simple solution by now, but there are options:
Replicate not-so-often your sorting list to small dedicated db so it will be much more stale than stale=ok
Modify your schema in a way that you'll be able to sort by some more stable data. Look at the banking/ledger example in CouchDb guide: http://guide.couchdb.org/draft/recipes.html#banking. Try to log every vote and reduce them hourly for example. As a bonus you'll get a history/trends :)
I'm kind of surprised this question has been left unanswered because the functionality of CouchDB Futon basically does this when you are paginating through the results of a map function. I opened up firebug to see what was happening in the javascript console as I paginated and saw that for every set of paginated results it is passing the startkey along with startkey_docid. So although the question is how do I paginate without including startkey, CouchDB specifies that the startkey is required and demonstrates how it can work. The endkey is not specified, so if there is only one result for the specified startkey, the next set of paginated results will also contain the next key of the sorted results that do not match the startkey.
So to clarify a bit, the answer to this problem is that as you are paginating and keeping track of the startkey_docid, you also need to capture the startkey of the same document that will be the start of the next set of results. When you are calling the paginated results use both the captured startkey and startkey_docid as couchdb requires. Leave endkey off so that the results will continue on to the next key of the sorted results.
The usecase scenario for wanting to be able to paginate without specifying a key is kind of odd. So let's say that the start docid of the next paginated result did change it's key value drastically from a 9 to a 3. And we are also assuming that there is only one instance of the docid existing in the map results, even though it could potentially appear multiple times (which I believe is why the startkey needs to be specified). As the user is clicking the next button, the user's paginated results will have now moved from looking at rank 9 to rank 3. But if you are including the startkey in addition to the startkey_docid, the paginated results would just start all over at the beginning of the rank 9 results which is a more logical progression than potentially jumping over a large set of results.

How can I configure Sitecore search to retrieve custom values from the search index

I am using the AdvancedDatabaseCrawler as a base for my search page. I have configured it so that I can search for what I want and it is very fast. The problem is that as soon as you want to do anything with the search results that requires accessing field values the performance goes through the roof.
The main search results part is fine as even if there are 1000 results returned from the search I am only showing 10 or 20 results per page which means I only have to retrieve 10 or 20 items. However in the sidebar I am listing out various filtering options with the number or results associated with each filtering option (eBay style). In order to retrieve these filter options I perform a relationship search based on the search results. Since the search results only contain SkinnyItems it has to call GetItem() on every single result to get the actual item in order to get the value that I'm filtering by. In other words it will call Database.GetItem(id) 1000 times! Obviously that is not terribly efficient.
Am I missing something here? Is there any way to configure Sitecore search to retrieve custom values from the search index? If I can search for the values in the index why can't I also retrieve them? If I can't, how else can I process the results without getting each individual item from the database?
Here is an idea of the functionality that I’m after: http://cameras.shop.ebay.com.au/Digital-Cameras-/31388/i.html
Klaus answered on SDN: use facetting with Apache Solr or similar.
http://sdn.sitecore.net/SDN5/Forum/ShowPost.aspx?PostID=35618
I've currently resolved this by defining dynamic fields for every field that I will need to filter by or return in the search result collection. That way I can achieve the facetted searching that is required without needing to grab field values from the database. I'm assuming that by adding the dynamic fields we are taking a performance hit when rebuilding the index. But I can live with that.
In the future we'll probably look at utilizing a product like Apache Solr.

Resources