Apache CMIS: Paging query result - cmis

Recently I've started using Apache CMIS and read the official documentation and examples. I haven't noticed anything about paging query results.
There is an example showing how to list folder items, setting maxItemsPerPage using operationContext, but it seems that operationContext can be used inside getChilder method:
int maxItemsPerPage = 5;
int skipCount = 10;
CmisObject object = session.getObject(session.createObjectId(folderId));
Folder folder = (Folder) object;
OperationContext operationContext = session.createOperationContext();
operationContext.setMaxItemsPerPage(maxItemsPerPage);
ItemIterable<CmisObject> children = folder.getChildren(operationContext);
ItemIterable<CmisObject> page = children.skipTo(skipCount).getPage();
This is ok when it comes to listing u folder. But my case is about getting results from custom search query. The basic approach is:
String myType = "my:documentType";
ObjectType type = session.getTypeDefinition(myType);
PropertyDefinition<?> objectIdPropDef = type.getPropertyDefinitions().get(PropertyIds.OBJECT_ID);
String objectIdQueryName = objectIdPropDef.getQueryName();
String queryString = "SELECT " + objectIdQueryName + " FROM " + type.getQueryName();
ItemIterable<QueryResult> results = session.query(queryString, false);
for (QueryResult qResult : results) {
String objectId = qResult.getPropertyValueByQueryName(objectIdQueryName);
Document doc = (Document) session.getObject(session.createObjectId(objectId));
}
This approach will retrieve all documents in a queryResult, but I would like to include startIndex and limit. The idea would be to type something like this:
ItemIterable<QueryResult> results = session.query(queryString, false).skipTo(startIndex).getPage(limit);
I'm not sure about this part: getPage(limit). Is this right approach for paging? Also I would like to retrieve Total Number of Items, so I could know how to set up the max items in grid where my items will be shown. There is a method, but something strange is written in docs, like sometimes the repository can't be aware of max items. This is that method:
results.getTotalNumItems();
I have tried something like:
SELECT COUNT(*)...
but that didn't do the trick :)
Please, could you give me some advice how to do a proper paging from a query result?
Thanks in advance.

Query returns the same ItemIterable that getChildren returns, so you can page a result set returned by a query just like you can page a result set returned by getChildren.
Suppose you have a result page that shows 20 items on the page. Consider this snippet which I am running in the Groovy Console in the OpenCMIS Workbench against a folder with 149 files named testN.txt:
int PAGE_NUM = 1
int PAGE_SIZE = 20
String queryString = "SELECT cmis:name FROM cmis:document where cmis:name like 'test%.txt'"
ItemIterable<QueryResult> results = session.query(queryString, false, operationContext).skipTo(PAGE_NUM * PAGE_SIZE).getPage(PAGE_SIZE)
println "Total items:" + results.getTotalNumItems()
for (QueryResult result : results) {
println result.getPropertyValueByQueryName("cmis:name")
}
println results.getHasMoreItems()
When you run it with PAGE_NUM = 1, you'll get 20 results and the last println statement will return true. Also note that the first println will print 149, the total number of documents that match the search query, but as you point out, not all servers know how to return that.
If you re-run this with PAGE_NUM = 7, you'll get 9 results and the last println returns false because you are at the end of the list.
If you want to see a working search page that leverages OpenCMIS and plain servlets and JSP pages, take a look at the SearchServlet class in The Blend, a sample web app that comes with the book CMIS & Apache Chemistry in Action.

Related

How can I setup Pagination in Excel Power Query?

I am importing financial data using JSON from the web into excel, but as the source uses pagination (giving 50 results per page I need to implement pagination in order to import all the results.
The data source is JSON:
https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=1
or https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=2
?page=1, ?page=2, ?page=3
I use the following code to implement pagination, but receive an error:
= (page as number) as table =>
let
Source = Json.Document(Web.Contents("https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=" & Number.ToText(page) )),
Data1 = Source{1}[Data],
RemoveBottom = Table.RemoveLastN(Data1,3)
in
RemoveBottom
When I envoke a parameter (1 for page 1) to test it I get the following error and I can't seem to find out why?
An error occurred in the ‘GetData’ query. Expression.
Error: We cannot convert a value of type Record to type List.
Details:
Value=Record
Type=Type
For the record, I try to include page handling using ListGenerate:
= List.Generate( ()=>
[Result= try GetData(1) otherwise null, page = 1],
each [Result] <> null,
each [Result = try GetData([page]+1) otherwise null, Page = [Page]+1],
each [Result])
What is the default way to implement pagination using Power Query in MS Excel?
I realise you asked this nearly a month ago and may have since found an answer, but will respond anyway in case it helps someone else.
This line Data1 = Source{1}[Data] doesn't make sense to me, since I think Source will be a record and you can't use {1} positional lookup syntax with records.
The code below returns 7 pages for me. You may want to check if it's getting all the pages you need/expect.
let
getPageOfData = (pageNumber as number) =>
let
options = [
Query = [page = Number.ToText(pageNumber)]
],
url = "https://localbitcoins.com/sell-bitcoins-online/VES/.json",
response = Web.Contents(url, options),
deserialised = Json.Document(response)
in deserialised,
responses = List.Generate(
() => [page = 1, response = getPageOfData(page), lastPage = null],
each [lastPage] = null or [page] <= [lastPage],
each [
page = [page] + 1,
response = getPageOfData(page),
lastPage = if [lastPage] = null then if Record.HasFields(response[pagination], "next") then null else page else [lastPage]
],
each [response]
)
in
responses
In List.Generate, my selector only picks the [response] field to keep things simple. You could drill deeper into the data either within selector itself (e.g. each [response][data][ad_list]) or create a new step/expression and use List.Transform to do so.
After a certain amount of drilling down and transforming, you might see some data like:
but that depends on what you need the data to look like (and which columns you're interested in).
By the way, I used getPageOfData in the query above, but this particular API was including the URL for the next page in its responses. So pages 2 and thereafter could have just requested the URL in the response (rather than calling getPageOfData).

WHERE IN with Azure DocumentDB (CosmosDB) .Net SDK

Having seen this question I'm not sure why the following code for my DocDb instance isn't working:
var userApps = _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameters()))
.ToList()
.Select(s => (string)s.appId);
var query = _docs.CreateDocumentQuery<Document>(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.APP_DEFINITIONS),
new SqlQuerySpec(#"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN (#userApps)", (#"#userApps", userApps.ToArray()).ToSqlParameters()),
new FeedOptions { EnableCrossPartitionQuery = true })
.AsDocumentQuery();
When I execute this, though I know the data should be returning me back a result set, it comes back empty every time.
Troubleshooting so far
Variants of .Select()
Return strings that I string.Join in to a comma-separated list.
Eg:
var userApps = string.Join(#",", _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameters()))
.ToList()
.Select(s => $#"'{s.appId}'");
Don't encapsulate IN parameter
Removing the () around the parameter spec in the query thinking maybe the SqlParameter spec was doing the array specification?
Eg: #"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN #userApps")
Ends up throwing "Syntax error, incorrect syntax near '#userApps'."
Validate via Azure Portal queries
Ran the (expected) SQL that this code should be running.
I get back my expected results without issue (ie: I know there is a result set for these queries as-written).
Debugger output for Query 1
AppIds are coming back from query 1.
Unsatisfactory workaround
Change query 2 to not be parameterized. Rather, inject the comma-separated list of IDs from query 1 in to it:
var userApps = string.Join(#",", _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameter()))
.ToList()
.Select(s => $#"'{s.appId}'"));
var query = _docs.CreateDocumentQuery<Document>(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.APP_DEFINITIONS),
new SqlQuerySpec($#"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN ({userApps})"),
new FeedOptions { EnableCrossPartitionQuery = true })
.AsDocumentQuery();
Works perfectly but I'm not going to accept it as an answer to this problem as it goes against a couple decades-worth of SQL best practices and, frankly, shouldn't be a solution.
Here's my ToSqlParameters() extension method in case it's the culprit (this works everywhere else I've used it, though. Maybe something special is needed for arrays?):
public static SqlParameterCollection ToSqlParameters(this (string, object) parmDef) => new SqlParameterCollection(new[] { new SqlParameter(parmDef.Item1, parmDef.Item2) });
Thanks!
If you use a parameterized IN list, then it will be considered as a single value when the parameter is expanded.
For instance, for this query:
SELECT * FROM r WHERE r.item IN (#items) And #items is defined as "['val1', 'val2', 'val3']" will be interpreted as such:
SELECT * FROM r WHERE r.item IN (['val1', 'val2', 'val3'])
which basically means that you're comparing r.item to a single value that is an array of three elements (i.e. equivalent to r.item = ['val1', 'val2', 'val3']).
To compare to multiple items, you need to use a parameter for each value. Something like this: SELECT * FROM r WHERE r.item IN (#val1, #val2, #val3])
A more convenient way to write this query is to use ARRAY_CONTAINS instead and pass the array of items as a single parameter. So the above query will be written like this:
SELECT * FROM r WHERE ARRAY_CONTAINS(#items, r.item)

Alfresco slingshot search numberFound and totalRecords number different

I'm using Alfresco 5.0.d
I see in the search json result (with firebug console panel) that in addition to result items , 2 other properties are returned : numberFound and totalRecords. It seems that Alfresco search engine considers numberFound as total items found.
So it display "numberFound results founded" to user.
The problem is that numberFound is not equals to totalRecords.
I see that totalRecords is the correct number of search result (in fact search always return totalRecords number of items).
So i decided to see in the webscript that performs search (alfresco-remote-api-5.0.d.jar\alfresco\templates\webscripts\org\alfresco\slingshot\search\search.lib.js).
We can easly see that the numberFound property comes from this statement
var rs = search.queryResultSet(queryDef);
var numberFound = rs.meta.numberFound ;
About totalRecords property, it comes from the same statement but a little bit different:
var totalRecords = rs.nodes.length
which is the correct value of number of item really found.
So is it an Alfresco api bug?
If no , is it possible that error comes from my query parameters?
Can someone explains me what does mean the numberFound property?
Thank you.
Below is the URL of java file which is getting called when you are executing search.queryResultSet(queryDef) code.
you can refer below method in the java file.It is adding all the things.
https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/jscript/Search.java
public Scriptable queryResultSet() //This is java method which is getting called.
Below is the code which is written for what you are getting in result.
meta:
{
numberFound: long, // total number found in index, or -1 if not known or not supported by this resultset
facets:
{ // facets are returned for each field as requested in the SearchParameters fieldfacets
field:
{ // each field contains a map of facet to value
facet: value,
},
}
}

Liferay custom-sql using IN operator

I am using Liferay 6.1, Tomcat and MySQL. I have a custom-sql sentence for a list portlet. The custom-sql uses two parameters: an array of groupIds and a result limit.
SELECT
count(articleId) as count,
...
FROM comments
WHERE groupId IN (?)
GROUP BY articleId
ORDER BY count DESC
LIMIT 0, ?
My FinderImpl class has this method:
public List<Comment> findByMostCommented(String groupIds, long maxItems) {
Session session = null;
session = openSession();
String sql = CustomSQLUtil.get(FIND_MOST_COMMENTS);
SQLQuery query = session.createSQLQuery(sql);
query.addEntity("Comment", CommentImpl.class);
QueryPos queryPos = QueryPos.getInstance(query);
queryPos.add(groupIds);
queryPos.add(maxItems);
List<Comment> queryResult = query.list();
return queryResult;
}
This returns 0 results. If I remove the WHERE IN(), it works.
Is IN a valid operator? If not, how can search within different groups?
Perhaps hibernate is quoting your string of groupIds (presumably it is in the form of "1,2,3,4" and when hibernate translates this to sql it is putting quotes around it for you?
You may want to try something like this (from Liferay itself):
String sql = CustomSQLUtil.get(FIND_BY_C_C);
sql = StringUtil.replace(sql, "[$GROUP_IDS$]", groupIds);
And include ([$GROUP_IDS$]) in place of the (?) in your SQL

Difference between KeywordQuery, FullTextQuerySearch type for Object Model and Web service Query

Initially I believed these 3 to be doing more or less the same thing with just the notation being different. Until recently, when i noticed that their does exists a big difference between the results of the KeyWordQuery/FullTextQuerySearch and Web service Query.
I used both KeywordQuery and FullText method to search of the the value of a customColumn XYZ with value (ASDSADA-21312ASD-ASDASD):-
When I run this query as:- FullTextSqlQuery:-
FullTextSqlQuery myQuery = new FullTextSqlQuery(site);
{
// Construct query text
String queryText = "Select title, path, author, isdocument from scope() where freetext('ASDSADA-21312ASD-ASDASD') ";
myQuery.QueryText = queryText;
myQuery.ResultTypes = ResultType.RelevantResults;
};
// execute the query and load the results into a datatable
ResultTableCollection queryResults = myQuery.Execute();
ResultTable resultTable = queryResults[ResultType.RelevantResults];
// Load table with results
DataTable queryDataTable = new DataTable();
queryDataTable.Load(resultTable, LoadOption.OverwriteChanges);
I get the following result representing the document.
* Title: TestPDF
* path: http://SharepointServer/Shared Documents/Forms/DispForm.aspx?ID=94
* author: null
* isDocument: false
Do note the Path and isDocument fields of the above result.
Web Service Method
Then I tried a Web Service Query method. I used Sharepoint Search Service Tool available at http://sharepointsearchserv.codeplex.com/ and ran the same query i.e. Select title, path, author, isdocument from scope() where freetext('ASDSADA-21312ASD-ASDASD'). This time I got the following results:-
* Title: TestPDF
* path: http://SharepointServer/Shared Documents/TestPDF.pdf
* author: null
* isDocument: true
Again note the path. While the search results from 2nd method are useful as they provide me the file path exactly, I can't seem to understand why is the method 1 not giving me the same results?
Why is there a discrepancy between the two results?
The first item is a List item, not a document per se. A document library is essentially just another list that is specially made to hold documents in it. The list item likely has some extra metadata not held within the document and so on. The second result is the actual document, so the "isDocument" flag will be up for it.
At least that's my theory.

Resources