What is the meaning of "must match" in Solr? - search

I have an requirement to modify the must match parameter in Solr.
<str name="mm">
2<-1 5<-2 6<50%
</str>
I'm not able to understand the logic behind this syntax.

mm stands for Minimum should match parameter.
It helps you configure the number of the query terms that should match for the document to be returned in the search results.
You can configure it as a fixed number or as the Percentage of the Number of the Query terms that should match.
For detailed explanation and example check link

Related

Solr Fuzzy search (max 2 edits)

I am using Solr 6.0.0
I am using data-driven-configuration for my configuration related purpose. Most of the configuration is standard.
I have a document in Solr with
name:"aquickbrownfox"
Now if I do a fuzzy search like:
name:aquickbrownfo~0.7
OR
name:aquickbrownf~0.7
It lists out the record in the results.
But if I do a search like:
name:aquickbrown~0.7
It does not list the record.
Does it have to do something with the maxEdits in solrconfig.xml which is set to 2 ?
I tried increasing it. But I could not create a collection with this configuration. It gave an error:
ERROR: Error CREATEing SolrCore 'my-search': Unable to create core
[my-search] Caused by: Invalid maxEdits
Max 2 Edits seems to be a serious limitation. I wonder what is the use of passing the fractional value after the ~ operator.
My Usecase:
I have a contact database. I am supposed to detect the duplicates based on three parameters : Name, Email and Phone. So I rely on Solr for Fuzzy search. Email and Phone are relatively easy to work with simple assumptions. Name seems to be a bit tricky. For each word in the Name, I plan to do a fuzzy search. I expected the optional parameter after ~ to work without the maxEdit distance limitation.
The documentation no longer suggests using a fractional value after the tilde - see http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Fuzzy_Searches for more information.
However, you are correct that only 2 changes are allowed to be made to the search string in order to carry out a fuzzy search. I would guess this limitation strikes a balance between efficiency and usefulness.
The maxEdits parameter in solrconfig.xml applies to the DirectSpellChecker configuration, and doesn't affect your searching, unless you're using the spell checker.
For your use case, your best approach may be to index the name field twice, using different field configurations: one using a simple set of analyzers and filters (ie. StandardTokenizerFactory, StandardFilterFactory, LowerCaseFilterFactory), and the other using a phonetic matcher such as the Beider-Morse filter. You can use the first field to carry out fuzzy searches, and the second version to look for names which may be spelled differently but sound the same as the name being checked.

Solr search with ranking and best match

i am new to this forum. I am looking for you suggestion on one of our searching requirement.
We have data of names , addresses and other relevant data to search for. The input for search going to be a free from text string with more than one word. The search api should match the input string against the complete data set includes names,address and other data. To fulfill the same , i have used copyField to copy all the required fields to a search field in solr confg. I am using the searchField as searchble agianst the input string that comes in. The input search string can have partial words like example below.
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
After solr creates the index, the searchable field will have the document like below
searchField {
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
}
End user can enter search string like "Test Company Main Ave" and the search is currently returns the above document. But not at the top, i see other documents are being returned too.
I am framing the solr query as ""Test* Company Main Ave" , adding a "*" after first word and going against the searchFiled
I have followed this approach after searching few forums over internet. How can i get the maximum match at the top. Not sure the above approach is right.
Any help appreciated.
Thanks,
Ram
You could index all fields separately and also use your searchField as a catchall.
Use an Edismax search handler to query all field with a scoring boost + also query your catchall field.
eg.
<str name="qf">
Name^2.0
Address^1.5
.
.
.
searchField^1.0
</str>
To boost relevancy, you could also index each field twice, once with a string type and then with a text_en type, as per this
<str name="qf">
Name^2.0
Name_exact^5.0
Address^1.5
Address_exact^3.0
.
.
.
searchField^1.0
</str>
Technically if there are documents above the one you want to match then they are a better match so it depends why they are getting a higher relevancy score. Try turning the debug on and see where the documents above your preferred document are getting the extra relevancy from.
Once you know why they are coming higher then you need to ask yourself why should your preferred document come first, what makes it a "better" match in your eyes.
Once you've decided why it should come top then you need to work out how to index and search the content so that the documents you expect to come first actually do come first, you may as qux says in his answer need to index multiple versions of the data to allow for better matching etc.
Si

Solr: How to specify field relevancy/weight

I'm currently indexing data using Solr that consists of about 10 fields. When I perform a search I would like certain fields to be weighted higher. Could anyone help point me in the right direction?
For example, searching across all fields for a term such as "superman" should return hits in the "Title" field before the "Description" field.
I've found documentation on how to make one field score higher from the query, but I would prefer to set this in a configuration file or similar. The following would require all searches to specify the weight. Is it possible to specify this in the solr config file?
q=title:superman^2 description:superman
Try using qf with ExtendedDisMax your query then would look like that:
q=superman
While your config will look like:
<str name="qf">title^2 description</str>
You can get some working examples here
The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field's importance in the query. For example, the query below:
qf="fieldOne^2.3 fieldTwo fieldThree^0.4"
Assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4. These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree."
Source: Apache Lucene
In your case: qf="title^100 description" may do the trick - if you're using Solr in a library I'd love to chat.
By using edismax we can achieve what you looking for.
Try adding these two fields in your request handler by changing the fields.
You can remove a particular field completely, if you don't want it.
<str name="defType"> edismax </str>
<str name="qf"> YourField^50 YourAnotherField^30 YetAnotherField</str>
The more the power(^) increases, the more priority that field gets.

One word phrase search to avoid stemming in Solr

I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case?
Is there a simple way to achieve this?
There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here).
For example, if I perform this search:
q=field_name:determine
I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so:
q=field_name:determine~1
I only see results that contain the word "determine". This is because I'm specifying a required similarity of 1, which means "exact match". I can specify this value anywhere from 0 to 1.
Another thing you can do is index the same text without stemming in one field, and with stemming in another. Boost the non-stemmed field & that should prefer exact versions of words to stemmed versions. Of course you could also write your own query parser that directs quoted phrases to the non-stemmed field only.

How to implement faceted search suggestion with number of relevant items in Solr?

Hi
I have a very specific need in my company for the system's search engine, and I can't seem to find a solution.
We have a SOLR index of items, all of them have the same fields, with one of the fields being "Type", (And ofcourse, "Title", "Text", and so on).
What I need is: I get an Item Type and a Query String, and I need to return a list of search suggestion with each also saying how meny items of the correct type will that suggested string return.
Something like, if the original string is "goo" I'll get
Goo 10
Google 52
Goolag 2
and so on.
now, How do I do it?
I don't want to re-query SOLR for each different suggestion, but if there is no other way, I just might.
Thanks in advance
you can try edge n-gram tokenization
http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6
You can try facets. Take a look at my more detailed description ('Autocompletion').
This was implemented at http://jetwick.com with Solr ... now using ElasticSearch but the Solr sources are still available and the idea is also the identical https://github.com/karussell/Jetwick
The SpellCheckComponent of Solr (that gives the suggestions) have extended results that can give the frequency of every suggestion in the index - http://wiki.apache.org/solr/SpellCheckComponent#Extended_Results.
However, the .Net component SolrNet, does not currently seem to support the extendedResults option: "All of the SpellCheckComponent parameters are supported, except for the extendedResults option" - http://code.google.com/p/solrnet/wiki/SpellChecking.
This is implemented using a facet field query with a Prefix set. You can test this using the xml handler like this:
http://localhost:8983/solr/select/?rows=0&facet=true&facet.field=type&f.type.prefix=goo

Resources