I use solr to make a short query with brand. I want the equal match, but understand that it is impossible in the Lucene.
I tried some hardcoded query just for tests
myBrand:2\+2 and myBrand:\+
I get 2:2, seems and condition not working or not so how I am expect?
Also, i try fq
myBrand:2\+2 with a fq of myBrand:\+
Now, no results at all.
I use Solr 5 and make all tests in the Solr web interface.
Is there some method to get the best matching of some short brands, nicknames and etc, when I no need too much eristics and want strong equal matching? Or anyway I have to filter results in my own code after solr query executed?
UPDATED
Changes in a schema resolved my issues.
Now it is working for the queries 2+2 like a charm.
<fieldType name="text_general" class="solr.TextField" multiValued="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*"/>
</analyzer>
</fieldType>
Equal matches are not impossible. You just have to use a field type that retains the exact value, such as StrField or a TextField with KeywordTokenizer (if you want to make it case insensitive).
Matching would be field:"Exact value" or any regular query syntax. The reason why Solr/Lucene wouldn't do an "exact" match is that the regular TextField definitions in the example breaks the text into separate tokens.
For filters you usually want the exact value (both for a facet and for the fq, so you can filter the results exactly), so this is not a Lucene limitation, but something introduced by the type of fields you're working with.
The solution might be to have the same content in many fields (one to search against for regular text queries) and one to filter and facet on. Use copyField to get the same values into several fields from the same source field.
Related
I am trying to search in Solr for exact match. Problem is this kind of data:
test score
test-score
test_score
test+score
If i use exact search with query test score it will result only one record.
I need to find all four.
One way is to copy this field and replace these special characters which is creating a new index requirement so that original content is saved separately.
<fieldType name="text_exact_dehyphen" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s*-\s*" replacement=" "/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
Is there any way i can use my exact query to search for all words with special characters in it also.
Thanks
Use copyField to copy the content into multiple fields - one that doesn't change the content (i.e. just a string field or a textfield with keywordtokenizer if you need to lowercase the content) and one field with a StandardTokenizer or similar, allowing you to match "test score" against "test+score" etc.
You can then weight these fields differently by using the edismax query parser, and using qf to weigh the fields: qf=field_exact^5 field will score an exact match five times higher than matches in the other field.
Use q.op=AND to ensure that all terms are present in the resulting document.
When I search for company in solr , the result should contain similar results such as com pany,comp-any and company.How to get that using solr.
For the use case you provided, you can use n-grams.
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="7"/>
</analyzer>
This filter breaks the tokens in parts of the specified sizes, like, for the word "company", will produce the following tokens: "com", "omp", "mpa", "pan", "any", "comp", "ompa", "mpan", "pany", "compa", "ompan", "mpany", "compan", "ompany", "company"
TAKE CARE This filter may degrade performance and makes your index grows exponentially, and possibly runs Solr out of memory depending on the size of the fields you're using it (i.e. if you use it for content extraction). So, choose wisely the field to use it :)
Here are some useful information with examples about it:
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-N-GramFilter
I use the Solr 6.1,
And i Just completed the document index,
But some reason I need make it not case sensitive in search,
And i found the solution can use copy field make it work,
But it need to add field to help it completed,
Like below :
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
does anybody know can I use this solution when it completed index after?
or it have other solution can fix it??
No. You'll have to reindex the content (at least the field in question) to change the case of the generated tokens. You can do this either from the original source, or write a script that retrieves each document from Solr and re-indexes the single field - as long as all your fields are set as stored. If they're not stored (and do not have docValues that can be used in place of a stored value), you'll have to reindex. Solr has no way to get the original text from the processed tokens.
Also remember that a KeywordTokenizer will keep the value as a single token and not split on whitespace etc.
Make sure you get the correct result before indexing by using the Analysis page under Solr's admin interface.
I have file which has integers and strings delimited by pipe like below
abc|182|2rt|jd
yre|123|7yd|op
ifs|132|24d|oe
i have created a new field type pipedelimited as below
<fieldType name="pipedelimited" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="|"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The problem is when i search for a integer the search will take too much of time to respond,
but if i search for string response is in millisecond.
Please help with the reason for this
Both of your search examples are text as far as Solr is concerned. So, they should be treated identically.
So, either you missed something from your description of the situation or there is something very funny about particular records. Have you tried searching for string and "integer" values that supposed to return the same record. Do you get the same speed? You should.
Try using a debug flag and see what you can notice differently.
Basically, side by side comparisons should be evaluated by trying to make all other parameters as equal as possible. And then focusing on the visible differences.
I have been trying to enable fuzzy searching for our Solr 4.1 powered search but all I can find online is:
1- how to do it in the default lucene query syntax which doesnt help in my case,
2- that dismax does not support it and
3- that edismax is going to or should support it
However, I can't find any documentation of how to use it in edismax querying format, not even on the default edsimax page for query syntax which uses the operator ~ for defining slop factor instead. I did try specifying it in qf parameter as per some links online but that didn't work and I am also assuming here that Solr 4.1 uses edismax by default.
So if someone knows how its supposed to work or if its even supported, any pointers would be greatly appreciated.
This worked for me.
Try adding this fieldType at your schema.xml.
This is case unsensitive and for spanish words, but changing that should work for you too.
<fieldType name="text_es_fuzzy" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" format="snowball" enablePositionIncrements="true"/>
</analyzer>
</fieldType>
After this, when you perform a query just add "~0.5" at the end of the search string. The "0.5" is a custom value you can choose. This value determines how "fuzzy" your search is. When the value is closer to zero the results will be "fuzzier" and viceversa.