Solr searching within a dictionary file - search

I was wondering if there is something out of the box from Solr which would allow me to search within a dictinary file (containing words and phrases) to returning all the phrases that contains my search terms.
For example, my dictionary file could have:
red car
blue bike
big bike tires
When I search 'bike', I expect to see
blue bike
big bike tires
And when I search 'big tires', i expect to see
big bike tires
Is there anything from Solr that could support this? I was looking into the SpellCheckComponent but it would only support prefix searches.
Basically, I would like to achieve solr searches (token searching) but against a dictionary file (this same file would also be used for autosuggest).
Any advice or direction would be appreciated.

Why not store such phrases in the index itself? The schema can be:
type: suggest_phrase #other types are product or review_article
phrase: big bike tires
So your search for big tires would be:
..fq=type:suggest_phrase&q=phrase:big tires

Related

Implementing search : Identifying known keywords

I have implemented search functionality for my e-Commerce website using elastic search. The basic structure is like, each product has a title and whatever the user enters I search the exact string using elastic and return the result.
Now I notice that most of the search phrases (almost 90%) follow a similar pattern. It contains:
Brand name of the product (Apple, Nokia etc.)
Category of the product (phone, mobile phone, smartphone etc.)
Model name of the product (iPhone 6S, Lumia 950 etc.)
Now I think if I am able to identify the specific components, then I can return better results than just text match.
I have list of brands, categories and models. If i am able to identify the terms present, then I can request elasticsearch with that field specifically
For example, a search string of "Apple iPhone 5S", I should be able to deduce that brand=Apple.
EDIT: More details as asked in comments
Structure of document:
I have a single index and each document ID is the SKU of the product and it contains the following fields
title (Apple iPhone 5S)
brand (Apple)
categ (Electronics)
sub_categ (Smartphones)
model (iPhone 5S)
attribs (dictionary of product attributes particular to each sub_categ like {"color": "gold", "memory": "32 GB", "battery": "1570 mAh"})
price
Use Case:
Now when the user searches for phrase "iphone 5s battery", elastic returns search results which returns even the phone. (I agree the relevance score matches better for battery)
What I am trying to achieve is, I have master list of sub categories. So if any word from the search phrase is present in the master list, then i would search on elasticsearch with query ["must": {"sub_categ": "battery"}]. So the result from "Smartphones" sub category would not be fetched from elastic. I wish to replicate this across multiple fields like brand, category etc
My question is, how do I find if brand or any other particular word from the master list if present in the search phrase quickly? The only option i could think of is, looping through the master list and check if the word is present in the search phrase. If present, then keep note of it and do the same across all master list field (brand, categ, sub categ). Then generate the query with must and then querying them. I wish to know if there is a better way of accomplishing it.
The person in the Lucene world who has spoken the most on this topic is Ted Sullivan. (He calls this "auto-filtering", and has a component which does this available for Solr)
I realize you're using Elasticsearch, but Ted's component works by introspecting FieldCache data (exposed by Lucene) so should be possible to implement something very similar with Elasticsearch (look at the code).
There is also a discussion in this article about how to create a separate index for providing pre-query intelligence like you've described (e.g. your term "Apple" is most frequently found in the company field).

Lucene phrase query with terms in OR

Suppose that i have 5 documents having the field text as follow:
the red house is beautiful
the house is little
the red fish
the red and yellow house is big
What kind of query should i use to retrieve the documents such that the rank is the following if i search for "red house":
the red house is beautiful and big [matching: red house]
the red and yellow house is big [matching: red x x house]
the house is little [matching: house]
the red fish [matching: red]
What i need is to give an high rank to the documents that match the phrase i've searched, and a lower score to the documents that have just a part of the phrase searched.
Notice that the string query could contains also more than 2 terms.
It is like a PhraseQuery in which each term can appear or not, and in which the closer are the terms the higher is the score.
I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.
How can i do?
Thanks
Try creating a BooleanQuery composed of TermQuery objects, combined with OR (BooleanClause.Occur.SHOULD). This will match documents where only one term appears, but should give a higher score to those where both appear.
Query term1 = new TermQuery(new Term("text", "red"));
Query term2 = new TermQuery(new Term("text", "house"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
booleanQuery.add(term2, BooleanClause.Occur.SHOULD);
I think a PhraseQuery with a postive setSlope, SHOULD-combined with a TermQuery for every term, should get you there. Maybe with a boost for the PhraseQuery.
I've tried to use compose a PhraseQuery with a TermQuery but the
result is not what i need.
What do you get with this combination and how it is not what you need?

Solr, managing entities

I have the following situation when using Solr. My document contains "entities" for example "peanut butter". I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat "peanut butter" as an entity. For example if someone searches for
"peanut"
then documents that have the word peanut should rank higher than documents that have the word "peanut butter". However if someone searches for
"peanut butter"
then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful?
Configure that field to use a StrField type, instead of a TextField. TextField is designed to handle tokenization and full-text search on textual content. StrField treats it's contents as a keyword, and so does not tokenize.

How do I deal with multiple word keyphrases and exact search in Solr?

How do I deal with multiple word keyphrases and exact search in Solr?
Hi. I need some help on the following issue:
I am indexing a list of shops of which each shop has a list of productwords that contain single keyword keyphrases like 'bike' and multiple keyword keyphrases like 'red bike'.
Example: Store A sells 'ipad' and 'iphone'. Store B sells 'ipad accessoires' and 'iphone', Store C sells 'ipad' and 'ipad accessoires'
When a user performs a search for 'ipad' I want only the stores that have a exact match on 'ipad' (i.e. only store A and C) to show up in the results.
In my current solr setup a keyphrase like 'ipad accessoires' gets tokenized to 'ipad' and 'accessoires' and when a user searches for 'ipad' store B shows up as well. How do I get a keyphrase like 'ipad accessoires' in the index with solr understanding it's actually 1 keyphrase/token to match upon.
Any help would be greatly appreciated!
If you are ready to try something different check upon Percolate Query in Elastic Search.
This is probably exactly what you are trying to do a reverse matching somthing link Google adwords.

how can I add weight to search term when searching by sphinx?

how can I add weight to search term when searching by sphinx ?
example:
Assume that the users types this search:
" I'd like to play soccer (A.K.A football) "
and assume that I do some processing on this sentence to simplify it to "soccer football"
Now I want to give more weight to search results that have "soccer" than those having "football"
How do I do that in Sphinx, please notice that I've seen answers here that mentioned that it's not possible, but I've seen an article about such possibility starting from sphinx 2.01 but it doesn't have further description
Duplicate the term you would like to amplify.
->setMatchMode(SPH_MATCH_EXTENDED)
->setRankingMode(SPH_RANK_WORDCOUNT)
->Query('"soccer soccer soccer soccer football"/1',$index);

Resources