Solr search with ranking and best match - search

i am new to this forum. I am looking for you suggestion on one of our searching requirement.
We have data of names , addresses and other relevant data to search for. The input for search going to be a free from text string with more than one word. The search api should match the input string against the complete data set includes names,address and other data. To fulfill the same , i have used copyField to copy all the required fields to a search field in solr confg. I am using the searchField as searchble agianst the input string that comes in. The input search string can have partial words like example below.
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
After solr creates the index, the searchable field will have the document like below
searchField {
Name: Test Insurance company
Address: 123 Main Avenue, Galaxy city
Phone: 6781230000
}
End user can enter search string like "Test Company Main Ave" and the search is currently returns the above document. But not at the top, i see other documents are being returned too.
I am framing the solr query as ""Test* Company Main Ave" , adding a "*" after first word and going against the searchFiled
I have followed this approach after searching few forums over internet. How can i get the maximum match at the top. Not sure the above approach is right.
Any help appreciated.
Thanks,
Ram

You could index all fields separately and also use your searchField as a catchall.
Use an Edismax search handler to query all field with a scoring boost + also query your catchall field.
eg.
<str name="qf">
Name^2.0
Address^1.5
.
.
.
searchField^1.0
</str>
To boost relevancy, you could also index each field twice, once with a string type and then with a text_en type, as per this
<str name="qf">
Name^2.0
Name_exact^5.0
Address^1.5
Address_exact^3.0
.
.
.
searchField^1.0
</str>

Technically if there are documents above the one you want to match then they are a better match so it depends why they are getting a higher relevancy score. Try turning the debug on and see where the documents above your preferred document are getting the extra relevancy from.
Once you know why they are coming higher then you need to ask yourself why should your preferred document come first, what makes it a "better" match in your eyes.
Once you've decided why it should come top then you need to work out how to index and search the content so that the documents you expect to come first actually do come first, you may as qux says in his answer need to index multiple versions of the data to allow for better matching etc.
Si

Related

Implementing search : Identifying known keywords

I have implemented search functionality for my e-Commerce website using elastic search. The basic structure is like, each product has a title and whatever the user enters I search the exact string using elastic and return the result.
Now I notice that most of the search phrases (almost 90%) follow a similar pattern. It contains:
Brand name of the product (Apple, Nokia etc.)
Category of the product (phone, mobile phone, smartphone etc.)
Model name of the product (iPhone 6S, Lumia 950 etc.)
Now I think if I am able to identify the specific components, then I can return better results than just text match.
I have list of brands, categories and models. If i am able to identify the terms present, then I can request elasticsearch with that field specifically
For example, a search string of "Apple iPhone 5S", I should be able to deduce that brand=Apple.
EDIT: More details as asked in comments
Structure of document:
I have a single index and each document ID is the SKU of the product and it contains the following fields
title (Apple iPhone 5S)
brand (Apple)
categ (Electronics)
sub_categ (Smartphones)
model (iPhone 5S)
attribs (dictionary of product attributes particular to each sub_categ like {"color": "gold", "memory": "32 GB", "battery": "1570 mAh"})
price
Use Case:
Now when the user searches for phrase "iphone 5s battery", elastic returns search results which returns even the phone. (I agree the relevance score matches better for battery)
What I am trying to achieve is, I have master list of sub categories. So if any word from the search phrase is present in the master list, then i would search on elasticsearch with query ["must": {"sub_categ": "battery"}]. So the result from "Smartphones" sub category would not be fetched from elastic. I wish to replicate this across multiple fields like brand, category etc
My question is, how do I find if brand or any other particular word from the master list if present in the search phrase quickly? The only option i could think of is, looping through the master list and check if the word is present in the search phrase. If present, then keep note of it and do the same across all master list field (brand, categ, sub categ). Then generate the query with must and then querying them. I wish to know if there is a better way of accomplishing it.
The person in the Lucene world who has spoken the most on this topic is Ted Sullivan. (He calls this "auto-filtering", and has a component which does this available for Solr)
I realize you're using Elasticsearch, but Ted's component works by introspecting FieldCache data (exposed by Lucene) so should be possible to implement something very similar with Elasticsearch (look at the code).
There is also a discussion in this article about how to create a separate index for providing pre-query intelligence like you've described (e.g. your term "Apple" is most frequently found in the company field).

Solr search using contains, sound like

Problem:
I have a movie information in solr. Two string fields define the movie title and director name. A copy field define another field which solr search for default.
I would like to have google like search with limited scope as follows. How to achieve it.
1)How to search solr for contains
E.g.
a) If the movie director name is "John Cream", searching for joh won't return anything. However, searchign for John return the correct result.
b) If there is a movie title called aaabbb and another one called aaa, searching for aaa returns only one result. I need to return the both results.
2) How to account for misspelling
E.g.
If the movie director name is "John Cream", searching for Jon returns no results. Is there a good sounds like (soundex) implementation for solr. If so how to enable it?
You can use solr query syntax
Searching for contains is obviously possible using wildcards (eg: title:*aaa* will match 'aaabbb' and also 'cccaaabbb'), but be careful about it, becouse it doesn't use indexes efficently. Do you really need this?
A soundex like search is possible applying solr.PhoneticFilterFactory filter to both your index and query. To achieve this define your fieldType like this in schema:
<fieldType name="text_soundex" class="solr.TextField">
...
<filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
</fieldType>
If you define your "director" field as "text_soundex" you'll be able to search for "Jon" and find "John"
See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more information.
Things you are asking, the first one is definitely achievable from Solr. I don't know about soundex.
1)How to search solr for contains
You can store data into string type of field or text type of field. In string field by wild card searching you can achieve the result (E.g field1:"John*"). Also you should look into different types of analyzers. But before everything, please look into the Solr reference http://wiki.apache.org/solr/.
def self.get_search_deals(search_q, per = 50)
data = Sunspot.search(Deal) do
fulltext '*'+search_q +'*', fields: :title
paginate page: page_no, per_page: per
end
data.results
end
searchable do
text :title
end
just pass string as "*sam*"

Solr title search failing

I am indexing the title field for few products in Solr.
But when I am searching, I am not getting those titles in response.
For eg. I am storing following as title : Baboons Typing Tshirt
But when I am searching following I am not getting any result !!!
1)title:Baboons
2)title:(Baboons Typing Tshirt)
3)title:(Baboons*)
On the otherhand, if I am searching like this, I am getting lot of results
1)title:(Tshirt)
I have indexed many titles containing word Tshirt but I want to search a specific title which is failing..!!
I dont know whether Solr is ignoring first words, or it is doing something random.
My Question is basically: If I have a search title with lots of words, I will like to match it with the title which contains maximum common terms.
How to do it?
Thanks
Solr works like that by itself. You don't have to change anything.
You have to be careful how you set up your fields in schema.xml, i.e. how analysis is done.
You can use Solr's admin > Analysis interface to see how exactly your title field (when indexing) and query (when searching) is processed (tokenized, transformed).
Remember, match, in order to occur, requires identical word (case and everything) on both sides (index & query).
To open your index and see how Solr has actually indexed your data, use Luke.

Invalid Magento Search result

Searching Magento with fulltext search engine and like method , it will store results in catalogsearch_fulltext table in "data_index" field where it stores value in the format like
each searchable attribute is separated with |.
e.g
3003|Enabled|None||Product name|1.99|yellow|0
here it store sku,status,tax class, product name , price ,color etc etc
It stores all searchable attribute value.
Now the issue is for Configurable product , it will also store the associated products name ,price ,status in the same field like
3003|Enabled|Enabled|Enabled|Enabled|None|None|None|None|Product name|Product name|associted Product name1|associted Product name2|associted Product name3|1.99|2.00|2.99|3.99|yellow|black|yellow|green|0|0|0|0
So what happen is if i search for any word from associated product, it will also list the main configurable product as it has the word in its "data_index" field.
Need some suggestion how can i avoid associated products being included in data_index, So that i can have perfect search result.
thanks
We are looking into our search as well and it has been surprising to see the inefficiencies included in the fulltext table. We have some configurable products as well that have MANY variations and their population in the fulltext search is downright horrendous.
As for solutions, I can only offer my approach to fix the problem (not completed: but rather in the process).
I am extending Magento to include an event listener to the process of indexing the products (Because catalog search indexing is when the fulltext database is populated). Once that process occurs, I am writing my own module to remove duplicate entries from the associated products and also to add the functionality of adding additional search keyword terms as populated from a CSV file.
This should effectively increase search speed dramatically and also return more relevent search results. Because as of now, configurable products are getting "search bias" in the search results.
This isn't so much of an answer as a comment, but it was too lengthy to fit in the comments but I thought this might be beneficial to you. Once I get my module working, if you would like, I can possibly give you directions on how you could implement a similar module yourself.
Hope that helped (if only for moral support in magento's search struggle)

How to implement faceted search suggestion with number of relevant items in Solr?

Hi
I have a very specific need in my company for the system's search engine, and I can't seem to find a solution.
We have a SOLR index of items, all of them have the same fields, with one of the fields being "Type", (And ofcourse, "Title", "Text", and so on).
What I need is: I get an Item Type and a Query String, and I need to return a list of search suggestion with each also saying how meny items of the correct type will that suggested string return.
Something like, if the original string is "goo" I'll get
Goo 10
Google 52
Goolag 2
and so on.
now, How do I do it?
I don't want to re-query SOLR for each different suggestion, but if there is no other way, I just might.
Thanks in advance
you can try edge n-gram tokenization
http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6
You can try facets. Take a look at my more detailed description ('Autocompletion').
This was implemented at http://jetwick.com with Solr ... now using ElasticSearch but the Solr sources are still available and the idea is also the identical https://github.com/karussell/Jetwick
The SpellCheckComponent of Solr (that gives the suggestions) have extended results that can give the frequency of every suggestion in the index - http://wiki.apache.org/solr/SpellCheckComponent#Extended_Results.
However, the .Net component SolrNet, does not currently seem to support the extendedResults option: "All of the SpellCheckComponent parameters are supported, except for the extendedResults option" - http://code.google.com/p/solrnet/wiki/SpellChecking.
This is implemented using a facet field query with a Prefix set. You can test this using the xml handler like this:
http://localhost:8983/solr/select/?rows=0&facet=true&facet.field=type&f.type.prefix=goo

Resources