Solr search using contains, sound like - search

Problem:
I have a movie information in solr. Two string fields define the movie title and director name. A copy field define another field which solr search for default.
I would like to have google like search with limited scope as follows. How to achieve it.
1)How to search solr for contains
E.g.
a) If the movie director name is "John Cream", searching for joh won't return anything. However, searchign for John return the correct result.
b) If there is a movie title called aaabbb and another one called aaa, searching for aaa returns only one result. I need to return the both results.
2) How to account for misspelling
E.g.
If the movie director name is "John Cream", searching for Jon returns no results. Is there a good sounds like (soundex) implementation for solr. If so how to enable it?
You can use solr query syntax

Searching for contains is obviously possible using wildcards (eg: title:*aaa* will match 'aaabbb' and also 'cccaaabbb'), but be careful about it, becouse it doesn't use indexes efficently. Do you really need this?
A soundex like search is possible applying solr.PhoneticFilterFactory filter to both your index and query. To achieve this define your fieldType like this in schema:
<fieldType name="text_soundex" class="solr.TextField">
...
<filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
</fieldType>
If you define your "director" field as "text_soundex" you'll be able to search for "Jon" and find "John"
See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more information.

Things you are asking, the first one is definitely achievable from Solr. I don't know about soundex.
1)How to search solr for contains
You can store data into string type of field or text type of field. In string field by wild card searching you can achieve the result (E.g field1:"John*"). Also you should look into different types of analyzers. But before everything, please look into the Solr reference http://wiki.apache.org/solr/.

def self.get_search_deals(search_q, per = 50)
data = Sunspot.search(Deal) do
fulltext '*'+search_q +'*', fields: :title
paginate page: page_no, per_page: per
end
data.results
end
searchable do
text :title
end
just pass string as "*sam*"

Related

Azure-search: How to get documents which exactly contain search term

This question/answer dealt with a pretty similar topic, but I couldn't find the solution I was searching for.
How to practially use a keywordanalyzer in azure-search?
Starting situation:
I created a resource with multiple indexes. One of these indexes contains a Collection(Edm.String) field.
From this field i only want to get documents which exactly contain the search term. For example the field contains documents like these: "Hovercraft zero", "Hovercraft one", "Hovercraft two".
If the search term is "Hover" all three documents should be returned. If the search term is "craft zer" only the document "Hovercraft zero" should be returned. The document shouldn't get a higher score, the desired behaviour is that I only get the "Hovercraft zero" document as result.
Further information:
It is not possible to set the searchmode to all (like it was recommended in the question on the top) because I just want to set this behaviour for this specific field and not for all search queries. It also is not possible to let the responsibility on the user to enter the search term with quotes.
What I have tried so far:
Use the keyword analyzer like it was described in the question on
top: no success
Use an indexanalyzer with specific token filters (ngram,
lowercase) and a searchanalyzer as a keyword analyzer: no success
Use Charfilters to manipulate the search term and manually set the
quotes on the first and last position (craft zer -> "craft zer").
Like Yahnoosh explained in the question on top, the query parser
processes the query string before the analyzers are applied. So:
no success
Is there any solution for this issue?
Or is there a other approach to achieve the desired behaviour?
Hopefully someone can help.
Thanks in advance!
Using your example with three documents: "Hovercraft zero", "Hovercraft one", "Hovercraft two"
Issue a prefix query to find all documents that contain terms that start with "Hover"
search=Hover*
To match the term "craft zer", you need to use the keyword analyzer (or the keyword tokenizer with the lowercase token filter) at indexing time to make sure elements of your string collection are not tokenized. Then at query time you can issue a regex query (note regex queries are much slower than term or prefix queries)
search=/.craft zer./&queryType=full
Also, please use the Analyze API to test your custom analyzer configurations. It will help you make sure the analyzer produces the terms you expect.
Thanks #Yahnoosh for your answer, I found a solution that worked for me.
Short example:
I have an index including three fields (field1, field2, field3). From field3 I want a result where documents exactly contain the search term. From field1 and field2 I want do get a "standard" result.
Solution:
I manipulated the searchquery to ->
field1:{searchterm} || field2:{searchterm} || field3:"{searchterm}" &queryType=full
Using this searchquery field1 and field2 are queried in the "standard" way and field3 is queried with the behaviour i was searching for. Of course there are more efficient and elegant ways out there to solve this issue, but it worked for me.
If anybody has a better solution let me know ;)

Hybris: Use same field for search and facet

I have to use a field "manufacturerName" for both solr search and solr facet in Hybris. While the solr free text search requires the field type to be text, the facet only works properly in string type.
Is there any way to use this same field for both search and facet. I think there is one way by using "copyField" but I searched a lot, and still don't know how to use it?
Any help would be highly appreciated!
PS: On keeping the field type string, free text search doesn't fetch proper results. On keeping the field type text, facet shows truncated values.
Using a copyField instruction is the way to go, but that require you to define an alternative field - meaning you have one field with the type text and the associated tokenization, and one field of the type string which isn't processed in any way. There is no way in Solr to combine these in a single field that I know of.
You'll then use the name of the string field to generate the facets, while you use the other field when you're querying.
<copyField source="text_search_field" dest="string_facet_field" />
You'll then have to refer to the name string_facet_field when you're filtering or faceting on the field. You'll want to filter against the facet field after the user selects a facet, since you otherwise would end up with documents from other facets possibly leaking into your document result set (for example if the facet was "Foo Bar", you'd suddenly get documents that had "Baz Foo Bar Spam" as the facet, since both words are present in the search string.
I was not able to implement the "copyField" approach, but I found another easy way to do this. In solr.impex, I had already added my new field manufacturerNameFacet of type string, but there is a parameter "fieldValueProvider" and "valueProviderParameter". I provided these values as "springELValueProvider" and the field I wanted to use for search and facet "manufacturerName". After a solr full indexing, it worked like a charm. No other setting was required. The search and facet both were working as expected.

Solr search with ranking and best match

i am new to this forum. I am looking for you suggestion on one of our searching requirement.
We have data of names , addresses and other relevant data to search for. The input for search going to be a free from text string with more than one word. The search api should match the input string against the complete data set includes names,address and other data. To fulfill the same , i have used copyField to copy all the required fields to a search field in solr confg. I am using the searchField as searchble agianst the input string that comes in. The input search string can have partial words like example below.
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
After solr creates the index, the searchable field will have the document like below
searchField {
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
}
End user can enter search string like "Test Company Main Ave" and the search is currently returns the above document. But not at the top, i see other documents are being returned too.
I am framing the solr query as ""Test* Company Main Ave" , adding a "*" after first word and going against the searchFiled
I have followed this approach after searching few forums over internet. How can i get the maximum match at the top. Not sure the above approach is right.
Any help appreciated.
Thanks,
Ram
You could index all fields separately and also use your searchField as a catchall.
Use an Edismax search handler to query all field with a scoring boost + also query your catchall field.
eg.
<str name="qf">
Name^2.0
Address^1.5
.
.
.
searchField^1.0
</str>
To boost relevancy, you could also index each field twice, once with a string type and then with a text_en type, as per this
<str name="qf">
Name^2.0
Name_exact^5.0
Address^1.5
Address_exact^3.0
.
.
.
searchField^1.0
</str>
Technically if there are documents above the one you want to match then they are a better match so it depends why they are getting a higher relevancy score. Try turning the debug on and see where the documents above your preferred document are getting the extra relevancy from.
Once you know why they are coming higher then you need to ask yourself why should your preferred document come first, what makes it a "better" match in your eyes.
Once you've decided why it should come top then you need to work out how to index and search the content so that the documents you expect to come first actually do come first, you may as qux says in his answer need to index multiple versions of the data to allow for better matching etc.
Si

Solr, managing entities

I have the following situation when using Solr. My document contains "entities" for example "peanut butter". I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat "peanut butter" as an entity. For example if someone searches for
"peanut"
then documents that have the word peanut should rank higher than documents that have the word "peanut butter". However if someone searches for
"peanut butter"
then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful?
Configure that field to use a StrField type, instead of a TextField. TextField is designed to handle tokenization and full-text search on textual content. StrField treats it's contents as a keyword, and so does not tokenize.

filtering results in solr

I'm trying to build auto suggest functionality using Solr. The index contains different locations within a city and looks something like
id: unique id
name: the complete name
type: can be one of 'location_zone', 'location_subzone', 'location_city', 'outlet', 'landmark' ...
city: city id
now when the user types something, I want it to return suggestion only from the current city and of type location_*. something similar to WHERE city_id = 1 AND type="location_%" in SQL.
I guess one way to do it is by faceting but is that the right way? will it still search in all documents and then filter the results or will it apply the condition first as mysql would do it
PS: I'm new to solr and would appreciate if you can point out any mistakes in the approach
Solr does provide filtering, using the fq parameter. What you're looking for should be something along the lines of:
&fq=city_id:1&fq=type:location_*&q=...
This page illustrates very well how and when to use filter queries in Solr.

Resources