Solr Search Field Best Practices - search

I'm using solr for an enterprise application. So far it works well, as I am using a ngram field to search against. It works correctly for partial queries (match against indexed ngrams). But the problem I have is, how to enforce exact query matches?. For an example the query "Test 1" should match exactly the same text as it is when the user enter it with double quotation marks. Currently Since I have used some tokenizers and filters, the double quotation marks get filtered out, there's no difference in the queries "test 1", "tEst 1" or "TEST 1" (that is because of the analyzer chain I use, but it is needed to work with ngrams and partial search).
Currently I'm searching against a ngram query field. In order to enforce exact query match, what should I do? what is the best practice?. currently what I think is to identify the double quotation marks from client side and change the query field to the original field (with out ngrams). But I feel like there should be a better way of doing this, since the problem I have is generic and solr is a complete enterprise level search engine.

You can have another field for it and add string as the fieldType for the same and index it with same.
When you want to perform the exact match you can query on the above field.
And when you want to perform partial search ..you can query to the earlier field which is indexed by ngram.
OR.. Here is another way you can try.
You have defined the current field type using the ngram. In that while indexing you can define the ngram tokenizer and for the query you mention keywordTokenizer and lowercase filter factory only.
While indexing the text will be tokenized and while performing the query it will not.

Related

Azure-search: How to get documents which exactly contain search term

This question/answer dealt with a pretty similar topic, but I couldn't find the solution I was searching for.
How to practially use a keywordanalyzer in azure-search?
Starting situation:
I created a resource with multiple indexes. One of these indexes contains a Collection(Edm.String) field.
From this field i only want to get documents which exactly contain the search term. For example the field contains documents like these: "Hovercraft zero", "Hovercraft one", "Hovercraft two".
If the search term is "Hover" all three documents should be returned. If the search term is "craft zer" only the document "Hovercraft zero" should be returned. The document shouldn't get a higher score, the desired behaviour is that I only get the "Hovercraft zero" document as result.
Further information:
It is not possible to set the searchmode to all (like it was recommended in the question on the top) because I just want to set this behaviour for this specific field and not for all search queries. It also is not possible to let the responsibility on the user to enter the search term with quotes.
What I have tried so far:
Use the keyword analyzer like it was described in the question on
top: no success
Use an indexanalyzer with specific token filters (ngram,
lowercase) and a searchanalyzer as a keyword analyzer: no success
Use Charfilters to manipulate the search term and manually set the
quotes on the first and last position (craft zer -> "craft zer").
Like Yahnoosh explained in the question on top, the query parser
processes the query string before the analyzers are applied. So:
no success
Is there any solution for this issue?
Or is there a other approach to achieve the desired behaviour?
Hopefully someone can help.
Thanks in advance!
Using your example with three documents: "Hovercraft zero", "Hovercraft one", "Hovercraft two"
Issue a prefix query to find all documents that contain terms that start with "Hover"
search=Hover*
To match the term "craft zer", you need to use the keyword analyzer (or the keyword tokenizer with the lowercase token filter) at indexing time to make sure elements of your string collection are not tokenized. Then at query time you can issue a regex query (note regex queries are much slower than term or prefix queries)
search=/.craft zer./&queryType=full
Also, please use the Analyze API to test your custom analyzer configurations. It will help you make sure the analyzer produces the terms you expect.
Thanks #Yahnoosh for your answer, I found a solution that worked for me.
Short example:
I have an index including three fields (field1, field2, field3). From field3 I want a result where documents exactly contain the search term. From field1 and field2 I want do get a "standard" result.
Solution:
I manipulated the searchquery to ->
field1:{searchterm} || field2:{searchterm} || field3:"{searchterm}" &queryType=full
Using this searchquery field1 and field2 are queried in the "standard" way and field3 is queried with the behaviour i was searching for. Of course there are more efficient and elegant ways out there to solve this issue, but it worked for me.
If anybody has a better solution let me know ;)

Hybris: Use same field for search and facet

I have to use a field "manufacturerName" for both solr search and solr facet in Hybris. While the solr free text search requires the field type to be text, the facet only works properly in string type.
Is there any way to use this same field for both search and facet. I think there is one way by using "copyField" but I searched a lot, and still don't know how to use it?
Any help would be highly appreciated!
PS: On keeping the field type string, free text search doesn't fetch proper results. On keeping the field type text, facet shows truncated values.
Using a copyField instruction is the way to go, but that require you to define an alternative field - meaning you have one field with the type text and the associated tokenization, and one field of the type string which isn't processed in any way. There is no way in Solr to combine these in a single field that I know of.
You'll then use the name of the string field to generate the facets, while you use the other field when you're querying.
<copyField source="text_search_field" dest="string_facet_field" />
You'll then have to refer to the name string_facet_field when you're filtering or faceting on the field. You'll want to filter against the facet field after the user selects a facet, since you otherwise would end up with documents from other facets possibly leaking into your document result set (for example if the facet was "Foo Bar", you'd suddenly get documents that had "Baz Foo Bar Spam" as the facet, since both words are present in the search string.
I was not able to implement the "copyField" approach, but I found another easy way to do this. In solr.impex, I had already added my new field manufacturerNameFacet of type string, but there is a parameter "fieldValueProvider" and "valueProviderParameter". I provided these values as "springELValueProvider" and the field I wanted to use for search and facet "manufacturerName". After a solr full indexing, it worked like a charm. No other setting was required. The search and facet both were working as expected.

wildcard searches on specific elements only

I am looking for a way to do wildcard search only on specific elements when executing a search:search. Specifically, I might have documents that look like the following:
<pdbe:person-envelope xmlns:pdbe="http://schemas.abbvienet.com/people-db/envelope">
<person xmlns="http://schemas.abbvienet.com/people-db/model">
<costcenter>
<code>0000601775</code>
<name>DISC-PLAT INFORM</name>
</costcenter>
<displayName>Tj Tang</displayName>
<upi>10025613</upi>
<firstName>
<preferred>TJ</preferred>
<given>Tze-John</given>
</firstName>
<lastName>
<preferred>Tang</preferred>
<given>Tang</given>
</lastName>
<title>Principal Research Scientist</title>
</person>
<pdbe:raw/>
</pdbe:person-envelope>
When searches happen, I want the search text to be automatically wildcarded, but only for certain elements like displayName, firstName, lastName, but NOT for upi or code. As I understand it, I would have certain wildcard related indexes enabled in the database, but then I would need to have a custom query parser that rewrite the query into multiple cts:element-query and cts:element-value-query statements for each element that I want to wildcard search on, OR'd with the originally parsed search query. Or I can create field constraints, and rewrite the query to use field contraints.
Is there another way to conditionally search using wildcard on some elements but not others, when the user is entering as simple search query?, i.e. partial first and last name, "TJ Tan", but no partial hits when I search "100256".
You are on the right track. Lets take an element (or maybe field) query on "TS Tan"
With cts:tokenize, you can break this up (read about cs:tokenize - it is not just a normal tokenizer).
Then I have "TS" and "Tan"
You can the do things like apply business rules on which word should be wild-carded and which not and build the appropriate cts query (probably individual word queries in an and statement - or a near query - tuning depends on your need).
Now with search phrase tokenized, you can also consider that you may find building your results relies not on a wildcard index, but on a an element word lexicon - where you do your term-expansion with word-matches and those terms are then sent to the query.
We sometimes take that further and combine the query building with xdmp:estimate and make the query less restrictive if we don't get enough results early on.
Where to put this logic?
You mention search:search, so in this case, I would suggest you package this into a custom constraint.

sunspot solr attribute substring/wildcard searches

Is it possible to search solr attribute fields (non solr.TextField types) using a substring/wildcard/partial string match?
For instance if I have a solr.StrField field and documents that contain the string "1234567890" I want to be able to search on "456" and have that document returned.
From what I can see only textfields can be searched in this method using things like EdgeNGram and the like, but not attribute fields??
You can have the partial matches working for String as well Text fields for wildcards.
If the query parsers you are using, supports leading wildcard queries, you can easily search for *456*, and this should match 1234567890.
However, EdgeNGram would only work for solr.TextField, as solr.strField do not allow analysers to be added to it.
So you can only define fields with class as solr.TextField and have the EdgeNGram in the analysis chain, which would break down the indexed terms into shingles for partial matching.

One word phrase search to avoid stemming in Solr

I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case?
Is there a simple way to achieve this?
There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here).
For example, if I perform this search:
q=field_name:determine
I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so:
q=field_name:determine~1
I only see results that contain the word "determine". This is because I'm specifying a required similarity of 1, which means "exact match". I can specify this value anywhere from 0 to 1.
Another thing you can do is index the same text without stemming in one field, and with stemming in another. Boost the non-stemmed field & that should prefer exact versions of words to stemmed versions. Of course you could also write your own query parser that directs quoted phrases to the non-stemmed field only.

Resources