I have a field "brand" its defined as Field-Type: "org.apache.solr.schema.TextField" and Index-Analyzer and Query-Analyzer: "org.apache.solr.analysis.TokenizerChain".
I wanna search for one text and only this text. E. g. search for "Lego" brings also the results for "Lego", "Lego Mio", "Lego Plus" and "Lego Plus Sub". But i just wanna have the results for "Lego". Same when i search for "Lego Plus" i find "Lego Plus Sub" too, but just wanna have "Lego Plus".
I tried this with some regex expressions, but its not wirking.
Do you have some ideas?
thx in advance :)
best rudolf
You can either use a StrField - which will only give you a result if there's an exact match, or you can use a TextField with a KeywordTokenizer. The KeywordTokenizer will keep the input string as a single token, in effect only matching if there's an exact match.
The difference is that you can apply filters to the result from the KeywordTokenizer, such as a LowerCaseFilter, enabling you to match regardless of case.
Related
I currently use Typesense to search in an HTML database. When I search for a term, I would like to retrieve N characters before and N characters after the term found in search.
For example, I search for "query" and this is the sentence that matches:
Let's repeat the query we made earlier with a group_by parameter
I would like to easy retrieve a fixed number of letters (or words) before and after the term to show it in a presumably small area where the search results is retrieved, without breaking any words.
For this particular example, I would be showing:
..repeat the query we made earlier..
Is there a feature like this in Typesense?
I have checked Typesense's documents, without any luck.
The feature you're referring to is called snippets/highlights and it's enabled by default. You can control how many words are returned on either side of the matched text using the highlight_affix_num_tokens search parameter, documented under the table here: https://typesense.org/docs/0.23.1/api/search.html#results-parameters
highlight_affix_num_tokens
The number of tokens that should surround the highlighted text on each side. This controls the length of the snippet.
I am trying to search for a term in Solr in the Title that contains only the string 1604-04. But the results come back with anything that contains 1604 or 04. What would the syntax be to force solr to search on the exact string of 1604-04?
You can also use Classic Tokenizer.The Classic Tokenizer preserves the same behavior as the Standard Tokenizer with the following exceptions:-
Words are split at hyphens, unless there is a number in the word, in which case the token is not split and
the numbers and hyphen(s) are preserved.
This means if someone searches for 1604-04 then this Tokenizer won't break search string into two tokens.
If you want exact matches only, use a string field or a text field with a KeywordTokenizer as the tokenizer. These will keep your tokens intact as one single entry, and won't break it up into multiple tokens.
The difference is that if you use a Textfield with a KeywordTokenizer, you can still apply other filters, such as a LowercaseFilter, while a string field will store anything verbatim without any further processing possible.
Your analyzer is splitting "1604-04" into two terms, "1604" and "04". You've received answer on how to change your analysis to stop doing that.
Changing your analysis my not be the best solution (can't be entirely sure based on what you've written). Using a phrase query would be the usual way to do this. You can use a phrase query by wrapping it in quotes:
field:"1604-04"
This will still analyze and split it into two terms, but it will look for those terms in sequence. So, that query would match "1604-04" and "1604 04", but not "1604 some other stuff 04".
I'm fairly new to Expression Engine and I feel this is a really simple question, I just can't find a straight-forward answer from the documentation.
I have a list of restaurants and an alphabetized menu (A B C D etc...)
I want to search only he listings that start with the letter "A".
In a tradiational MySQL search that's be WHERE Title LIKE 'A%'
Any ideas?
I do not believe the Channel Entries module's search parameter allows LIKE matching.
You'll save time by grabbing the Low Alphabet module in this specific case for sure.
Expression Engine doesn't have an exact "LIKE" option but they do have something similar.
I can search a field to see if it "contains" a string but there isn't anything specifically to determine if it starts with or ends with a specific string (such as would be easily available in MySQL).
I ended up doing the "contains" search parameter and then excluded any results within the exp:channel:entries looping that didn't match my exact criteria.
I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case?
Is there a simple way to achieve this?
There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here).
For example, if I perform this search:
q=field_name:determine
I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so:
q=field_name:determine~1
I only see results that contain the word "determine". This is because I'm specifying a required similarity of 1, which means "exact match". I can specify this value anywhere from 0 to 1.
Another thing you can do is index the same text without stemming in one field, and with stemming in another. Boost the non-stemmed field & that should prefer exact versions of words to stemmed versions. Of course you could also write your own query parser that directs quoted phrases to the non-stemmed field only.
I use Solr's proximity search quite often to search for words within a specified range of each other, like so
"Government Spending" ~2
I was wondering is there a way to perform a proximity search using a phrase and a word or two phrases. Is this possible? If so what is the syntax?
This appears to be "somewhat" doable. Consider this text:
This is more about traffic between Solr servers themselves
"more traffic between solr" ~2
"more about between solr" ~2
Even if you change the order it works:
"more about solr between" ~2" ~2
But too far apart and it stops working:
"more about servers themselves" ~2
I think if that doesn't work, it would probably not be TOO hard to make a custom request handler that does this. I think you might need to define a new syntax, prehaps something like ("phrase one" "phrase two") ~2. I would guess that if you are shingling, and you create a Lucene query where there is a token of just "phrase one" and another of "phrase two" that have a certain proximity, i think it will work. (of course you will need to actually make the lucene java call, you can't just hand the query over (read this http://lucene.apache.org/java/2_2_0/api/index.html)).
Out of the box I have discovered a way to perform a Solr proximity search using more then one word, or phrases, see below
eg. with 3 words:
"(word1) (word2) (word3)"~10
eg. with 2 phrases: (note the double quote needs to be escaped)
"(\"phrase1\") (\"phrase2\")"~10
Since Solr 4 it is possible with SurroundQueryParser.
E.g. to query where "phrase two" follows "phrase one" not further than 3 words after:
3W(phrase W one, phrase W two)
To query "phrase two" in proximity of 5 words of "phrase one":
5N(phrase W one, phrase W two)