How can I easily get search context around search term with Typesense? - search

I currently use Typesense to search in an HTML database. When I search for a term, I would like to retrieve N characters before and N characters after the term found in search.
For example, I search for "query" and this is the sentence that matches:
Let's repeat the query we made earlier with a group_by parameter
I would like to easy retrieve a fixed number of letters (or words) before and after the term to show it in a presumably small area where the search results is retrieved, without breaking any words.
For this particular example, I would be showing:
..repeat the query we made earlier..
Is there a feature like this in Typesense?
I have checked Typesense's documents, without any luck.

The feature you're referring to is called snippets/highlights and it's enabled by default. You can control how many words are returned on either side of the matched text using the highlight_affix_num_tokens search parameter, documented under the table here: https://typesense.org/docs/0.23.1/api/search.html#results-parameters
highlight_affix_num_tokens
The number of tokens that should surround the highlighted text on each side. This controls the length of the snippet.

Related

Using ArangoSearch LIKE to search for a string with space

I created an ArangoSearch view over a collection and am using the SEARCH keyword with wildcards w/ LIKE to search a field with spaces, similarly to how MySQL would. The problem I am running in to is, I keep getting an empty set even though records with the Star Wars title definitely exist.
Note, searching for '%star%' works and returns results... as soon as I add a space and search for '%star wars%' the query returns empty set.
This is the query
FOR d IN imdb_norm
SEARCH ANALYZER(d.name LIKE "%Star Wars%", "text_en")
RETURN d.name
This is the structure, running arango version 3.7.2
The thing is since you're using text_en, it breaks up strings into individual words Since all spaces are considered as break characters, there is not a single term (a word stored in aragnosearch index) containing a whitespace.
If you don't need tokenization, please can consider indexing a word as it is (i.e. without case conversion, accent removal, etc) using identity analyzer, or check out norm analyzer instead.
https://www.arangodb.com/docs/stable/arangosearch-analyzers.html#analyzer-types

Required number of characters in azure search

I've made an azure search service and it's up and working. I would like for users to be able to search with 3 characters or more.
I have the following texts in different documents:
Paracet 200mg
Paracet 150mg
Kodein/paracetamol SA
When I search for 'par' I get no results. I have to type 5 characters (parac) and I get 1 & 2 as a result. I want this result for 'par' as well. Is this possible? I can't find anything in the documentation on setting the required number of characters for a search.
For the best performance, you could enable the "fast" prefix analyzer in your index, which will break down every token into a list of prefixes at indexing time. Here's some additional information on how to do that : https://azure.microsoft.com/en-us/blog/custom-analyzers-in-azure-search/
This would require you to re-index your data, so if you are creating a brand new index, this is an option.
If re-indexing is not an option, you can instead use the suffix operation '*' in your query. Here's more information on the suffix operator : https://learn.microsoft.com/en-us/rest/api/searchservice/Simple-query-syntax-in-Azure-Search?redirectedfrom=MSDN
I suspect searching using the suffix operator (or re-indexing while using fast prefix analyzer) will also work with the 3rd document you listed (Kodein/paracetamol SA). If it still does not work, it might be due to you using a tokenizer that does not split on the '/' character. The default analyzer should correctly split on '/', but if you are using a custom analyzer it's possible the whole "Kodein/paracetamol" expression get tokenized into a single term, which would explain why a search for parace* does not return the document, since the prefix of the document is "kodeā€¦".

Solr exact search with a hyphen

I am trying to search for a term in Solr in the Title that contains only the string 1604-04. But the results come back with anything that contains 1604 or 04. What would the syntax be to force solr to search on the exact string of 1604-04?
You can also use Classic Tokenizer.The Classic Tokenizer preserves the same behavior as the Standard Tokenizer with the following exceptions:-
Words are split at hyphens, unless there is a number in the word, in which case the token is not split and
the numbers and hyphen(s) are preserved.
This means if someone searches for 1604-04 then this Tokenizer won't break search string into two tokens.
If you want exact matches only, use a string field or a text field with a KeywordTokenizer as the tokenizer. These will keep your tokens intact as one single entry, and won't break it up into multiple tokens.
The difference is that if you use a Textfield with a KeywordTokenizer, you can still apply other filters, such as a LowercaseFilter, while a string field will store anything verbatim without any further processing possible.
Your analyzer is splitting "1604-04" into two terms, "1604" and "04". You've received answer on how to change your analysis to stop doing that.
Changing your analysis my not be the best solution (can't be entirely sure based on what you've written). Using a phrase query would be the usual way to do this. You can use a phrase query by wrapping it in quotes:
field:"1604-04"
This will still analyze and split it into two terms, but it will look for those terms in sequence. So, that query would match "1604-04" and "1604 04", but not "1604 some other stuff 04".

FTSearch that looks for '-'

does anybody know if there is a possibility to search for '-' using FTSearch?
Set col = db.ftsearch({ [services] = "-"}, 0)
dat requests does not work and instead says:
Notes error: Full text error; see log for more information (
[services] = "-")
Short answer is no.
The full text search treats most symbol characters as a white space. The exception is if the search term itself is wrapped in quotes.
The FT search engine also uses 3-gram for searching. This means that less then 3 characters will not return the results you expect. White spaces would be treated in that search, but only in the context of the found text.
For example: "ce " would find "space " but not "space." or "space" or "spaced".
If you are looking for the field that only contains "-", then a better solution is to create a view with a column containing that field value, and/or filter by that field being that value.
Looks like you are trying to do a full text search in a view? You probably would get better response time and less server impact to use #Formula language if you are working with a view.
I try to keep away on doing full text searches on the entire database. You can use a search on a view collection for faster results. There is no restriction on how many views you can have in a db. There is a cost for everything though. There are so many little tricks that can be used to get better results. Please give us more details on what you are trying to do.

Solr title search failing

I am indexing the title field for few products in Solr.
But when I am searching, I am not getting those titles in response.
For eg. I am storing following as title : Baboons Typing Tshirt
But when I am searching following I am not getting any result !!!
1)title:Baboons
2)title:(Baboons Typing Tshirt)
3)title:(Baboons*)
On the otherhand, if I am searching like this, I am getting lot of results
1)title:(Tshirt)
I have indexed many titles containing word Tshirt but I want to search a specific title which is failing..!!
I dont know whether Solr is ignoring first words, or it is doing something random.
My Question is basically: If I have a search title with lots of words, I will like to match it with the title which contains maximum common terms.
How to do it?
Thanks
Solr works like that by itself. You don't have to change anything.
You have to be careful how you set up your fields in schema.xml, i.e. how analysis is done.
You can use Solr's admin > Analysis interface to see how exactly your title field (when indexing) and query (when searching) is processed (tokenized, transformed).
Remember, match, in order to occur, requires identical word (case and everything) on both sides (index & query).
To open your index and see how Solr has actually indexed your data, use Luke.

Resources