Converting the logic for cts:search to search:search - search

I have been working with cts:search in my project but somehow it feels the result time are taking a bit longer than expected. Can search:search help? If so, how?
For example I have the query as
let $query := cts:and-query((cts:element-value-query(xs:QName("Applicability"),"Yes")))
and I want to fetch the document URIs. I was using:
cts:search(collection("abc"), $query)
and it returned the URIs, but how can this be extracted using search:search?
Or is there something other than search can help for improving the execution time?

Are you interested in retrieving the documents, or just the URIs?
If you are only looking to retrieve the URIs of the documents that have an element with that value, then use cts:uris() instead of cts:search(). The cts:uris() function runs unfiltered and will only return URIs from the lexicon, instead of retrieving all of the documents, which can be a lot more expensive than cts:search if you don't need the content.
cts:uris("", (), cts:and-query(( collection("abc"), $query)) )
When using cts:search, the first thing that I would try is to add the unfiltered option to your search and see if that helps.
By default cts:search executes filtered:
A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query.
So, try executing the same query with the "unfiltered" option:
cts:search(collection("abc"), $query, "unfiltered")
You could also look to create an index on that Applicability element, with either an element-range-index or a field-range-index, and then use the appropriate range-query instead of a value-query.

Related

How do you construct an Azure Search query to return a wildcard search based solely on a specific field?

If I may have missed this in some other area of SO please redirect me but I don't think this is a duped question.
I am using Azure Search with an index field called Title which is searchable and filterable using a Standard Lucerne Analyzer.
When using the built-in explorer, if I want to return all Job Titles that are explicitly named Full Stack Developer I can achieve it this way:
$filter=Title eq 'Full Stack Developer'&$count=true
But if I want to retrieve all the Job Titles using a wildcard to return all records having Full Stack in the name this way:
$filter=Title eq 'Full Stack*'&$count=true
The first 20 or so records returned are spot on, but after that point I get a mix of records that have absolutely nothing in common with the Title I specified in the query. My initial assumption was that perhaps Azure was including my specified Title performing an inclusive keyword search on the text as well.
Though I found a few instances where that hypothesis seemed to prove out, so many more of the records returned invalidated that altogether.
Maybe I don't understand fully the mechanics under the hood of Azure Search and so though my query appears to be valid; my expectation of the result is way off.
So how should my query look to perform a wildcard resulting to guarantee the words specified in the search to be included in the Titles returned, if this should be possible? And what would be the correct syntax to condition the return to accommodate for OR operators to be inclusive?
Azure Cognitive Search allows you to perform wildcard searches limited to specific fields only. To do so, you will need to specify the name of the fields in which you want to perform the search in searchFields parameter.
Your search URL in this case would be something like:
https://accountname.search.windows.net/indexes/indexname/docs?api-version=2020-06-30&searchFields=Title&search=Full Stack*
From the link here:

wildcard searches on specific elements only

I am looking for a way to do wildcard search only on specific elements when executing a search:search. Specifically, I might have documents that look like the following:
<pdbe:person-envelope xmlns:pdbe="http://schemas.abbvienet.com/people-db/envelope">
<person xmlns="http://schemas.abbvienet.com/people-db/model">
<costcenter>
<code>0000601775</code>
<name>DISC-PLAT INFORM</name>
</costcenter>
<displayName>Tj Tang</displayName>
<upi>10025613</upi>
<firstName>
<preferred>TJ</preferred>
<given>Tze-John</given>
</firstName>
<lastName>
<preferred>Tang</preferred>
<given>Tang</given>
</lastName>
<title>Principal Research Scientist</title>
</person>
<pdbe:raw/>
</pdbe:person-envelope>
When searches happen, I want the search text to be automatically wildcarded, but only for certain elements like displayName, firstName, lastName, but NOT for upi or code. As I understand it, I would have certain wildcard related indexes enabled in the database, but then I would need to have a custom query parser that rewrite the query into multiple cts:element-query and cts:element-value-query statements for each element that I want to wildcard search on, OR'd with the originally parsed search query. Or I can create field constraints, and rewrite the query to use field contraints.
Is there another way to conditionally search using wildcard on some elements but not others, when the user is entering as simple search query?, i.e. partial first and last name, "TJ Tan", but no partial hits when I search "100256".
You are on the right track. Lets take an element (or maybe field) query on "TS Tan"
With cts:tokenize, you can break this up (read about cs:tokenize - it is not just a normal tokenizer).
Then I have "TS" and "Tan"
You can the do things like apply business rules on which word should be wild-carded and which not and build the appropriate cts query (probably individual word queries in an and statement - or a near query - tuning depends on your need).
Now with search phrase tokenized, you can also consider that you may find building your results relies not on a wildcard index, but on a an element word lexicon - where you do your term-expansion with word-matches and those terms are then sent to the query.
We sometimes take that further and combine the query building with xdmp:estimate and make the query less restrictive if we don't get enough results early on.
Where to put this logic?
You mention search:search, so in this case, I would suggest you package this into a custom constraint.

Alfresco webscript (js) and pagination

I have a question about the good way to use pagination with Alfresco.
I know the documentation (https://wiki.alfresco.com/wiki/4.0_JavaScript_API#Search_API)
and I use with success the query part.
I mean by that that I use the parameters maxItems and skipCount and they work the way I want.
This is an example of a query that I am doing :
var paging =
{
maxItems: 100,
skipCount: 0
};
var def =
{
query: "cm:name:test*"
page: paging
};
var results = search.query(def);
The problem is that, if I get the number of results I want (100 for example), I don't know how to get the maxResults of my query (I mean the total amount of result that Alfresco can give me with this query).
And I need this to :
know if there are more results
know how many pages of results are lasting
I'm using a workaround for the first need : I'm doing a query for (maxItems+1), and showing only maxItems. If I have maxItems+1, I know that there are more results. But this doesn't give me the total amount of result.
Do you have any idea ?
With the javascript search object you can't know if there are more items. This javascript object is backed by the class org.alfresco.repo.jscript.Search.java. As you can see the query method only returns the query results without any extra information. Compare it with org.alfresco.repo.links.LinkServiceImpl which gives you results wrapped in PagingResults.
So, as javacript search object doesn't provide hasMoreItems info, you need to perform some workaround, for instance first query without limits to know the total, and then apply pagination as desired.
You can find how many objects have been found by your query simply calling
results.length
paying attention to the fact that usually queries have a configured maximum result set of 1000 entries to save resources.
You can change this value by editing the <alfresco>/tomcat/webapps/alfresco/WEB_INF/classes/alfresco/repository.properties file.
So, but is an alternative to your solution, you can launch a query with no constraints and obtain the real value or the max results configured.
Then you can use this value to devise how many pages are available basing you calculation on the number of results for page.
Then dinamically pass the number of the current page to the builder of your query def and the results variable will contain the corresponding chunk of data.
In this SO post you can find more information about pagination.

How can I configure Sitecore search to retrieve custom values from the search index

I am using the AdvancedDatabaseCrawler as a base for my search page. I have configured it so that I can search for what I want and it is very fast. The problem is that as soon as you want to do anything with the search results that requires accessing field values the performance goes through the roof.
The main search results part is fine as even if there are 1000 results returned from the search I am only showing 10 or 20 results per page which means I only have to retrieve 10 or 20 items. However in the sidebar I am listing out various filtering options with the number or results associated with each filtering option (eBay style). In order to retrieve these filter options I perform a relationship search based on the search results. Since the search results only contain SkinnyItems it has to call GetItem() on every single result to get the actual item in order to get the value that I'm filtering by. In other words it will call Database.GetItem(id) 1000 times! Obviously that is not terribly efficient.
Am I missing something here? Is there any way to configure Sitecore search to retrieve custom values from the search index? If I can search for the values in the index why can't I also retrieve them? If I can't, how else can I process the results without getting each individual item from the database?
Here is an idea of the functionality that I’m after: http://cameras.shop.ebay.com.au/Digital-Cameras-/31388/i.html
Klaus answered on SDN: use facetting with Apache Solr or similar.
http://sdn.sitecore.net/SDN5/Forum/ShowPost.aspx?PostID=35618
I've currently resolved this by defining dynamic fields for every field that I will need to filter by or return in the search result collection. That way I can achieve the facetted searching that is required without needing to grab field values from the database. I'm assuming that by adding the dynamic fields we are taking a performance hit when rebuilding the index. But I can live with that.
In the future we'll probably look at utilizing a product like Apache Solr.

WildcardQuery error in Solr

I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
type Status report
message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).
Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?
If you want all documents, do a query on *:*
If you want all documents with a certain field (e.g. id) try id:[* TO *]
Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.
If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.
If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.
id:[a* TO z*] id:[0* TO 9*] etc.
I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.
In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.
If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.
If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:456 by sending the query reverse_id:654 instead. Hope that makes sense.
You can also search the Solr user group mailing list at http://www.mail-archive.com/solr-user#lucene.apache.org/ where questions like this come up quite often.
The following Solr issue is a request to be able to configure the default lucene query parser.
https://issues.apache.org/jira/browse/SOLR-218
In this issue you can find the following description how to 'patch' Solr. This modification would allow you to start queries with a *.
Jonas Salk: I've basically updated only one Java file: SolrQueryParser.java.
public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
...
public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
I'm not sure if setLowercaseExpandedTerms is needed...
I'm assuming with id:* you're just trying to match all documents, right?
I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.
If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.
Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.
Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.
With such values in the field, it is possible to search using the query id:A*
But would love to find whether a true solution exists.

Resources