Azure Search, return non relevant results - azure

Here are 2 search examples,
you can see the first search for "yael0079"
return an object where the email filed is yael0079#gmail.com as top score.
The second search for "yael0079#gmail.com"
return the object from before somewhere far below.
Now, I know the '#' tag consider as space, but still, I would expect the same object will get the higher score.

In the second case, since the # sign is considered punctuation, your query becomes yael0079 OR gmail.com. The term gmail.com matches also in other fields of the documents returned what adds to the overall relevance score. To learn more about query processing and scoring in Azure Search, please read: How full text search works in Azure Search.

Related

How can I easily get search context around search term with Typesense?

I currently use Typesense to search in an HTML database. When I search for a term, I would like to retrieve N characters before and N characters after the term found in search.
For example, I search for "query" and this is the sentence that matches:
Let's repeat the query we made earlier with a group_by parameter
I would like to easy retrieve a fixed number of letters (or words) before and after the term to show it in a presumably small area where the search results is retrieved, without breaking any words.
For this particular example, I would be showing:
..repeat the query we made earlier..
Is there a feature like this in Typesense?
I have checked Typesense's documents, without any luck.
The feature you're referring to is called snippets/highlights and it's enabled by default. You can control how many words are returned on either side of the matched text using the highlight_affix_num_tokens search parameter, documented under the table here: https://typesense.org/docs/0.23.1/api/search.html#results-parameters
highlight_affix_num_tokens
The number of tokens that should surround the highlighted text on each side. This controls the length of the snippet.

How do you construct an Azure Search query to return a wildcard search based solely on a specific field?

If I may have missed this in some other area of SO please redirect me but I don't think this is a duped question.
I am using Azure Search with an index field called Title which is searchable and filterable using a Standard Lucerne Analyzer.
When using the built-in explorer, if I want to return all Job Titles that are explicitly named Full Stack Developer I can achieve it this way:
$filter=Title eq 'Full Stack Developer'&$count=true
But if I want to retrieve all the Job Titles using a wildcard to return all records having Full Stack in the name this way:
$filter=Title eq 'Full Stack*'&$count=true
The first 20 or so records returned are spot on, but after that point I get a mix of records that have absolutely nothing in common with the Title I specified in the query. My initial assumption was that perhaps Azure was including my specified Title performing an inclusive keyword search on the text as well.
Though I found a few instances where that hypothesis seemed to prove out, so many more of the records returned invalidated that altogether.
Maybe I don't understand fully the mechanics under the hood of Azure Search and so though my query appears to be valid; my expectation of the result is way off.
So how should my query look to perform a wildcard resulting to guarantee the words specified in the search to be included in the Titles returned, if this should be possible? And what would be the correct syntax to condition the return to accommodate for OR operators to be inclusive?
Azure Cognitive Search allows you to perform wildcard searches limited to specific fields only. To do so, you will need to specify the name of the fields in which you want to perform the search in searchFields parameter.
Your search URL in this case would be something like:
https://accountname.search.windows.net/indexes/indexname/docs?api-version=2020-06-30&searchFields=Title&search=Full Stack*
From the link here:

Required number of characters in azure search

I've made an azure search service and it's up and working. I would like for users to be able to search with 3 characters or more.
I have the following texts in different documents:
Paracet 200mg
Paracet 150mg
Kodein/paracetamol SA
When I search for 'par' I get no results. I have to type 5 characters (parac) and I get 1 & 2 as a result. I want this result for 'par' as well. Is this possible? I can't find anything in the documentation on setting the required number of characters for a search.
For the best performance, you could enable the "fast" prefix analyzer in your index, which will break down every token into a list of prefixes at indexing time. Here's some additional information on how to do that : https://azure.microsoft.com/en-us/blog/custom-analyzers-in-azure-search/
This would require you to re-index your data, so if you are creating a brand new index, this is an option.
If re-indexing is not an option, you can instead use the suffix operation '*' in your query. Here's more information on the suffix operator : https://learn.microsoft.com/en-us/rest/api/searchservice/Simple-query-syntax-in-Azure-Search?redirectedfrom=MSDN
I suspect searching using the suffix operator (or re-indexing while using fast prefix analyzer) will also work with the 3rd document you listed (Kodein/paracetamol SA). If it still does not work, it might be due to you using a tokenizer that does not split on the '/' character. The default analyzer should correctly split on '/', but if you are using a custom analyzer it's possible the whole "Kodein/paracetamol" expression get tokenized into a single term, which would explain why a search for parace* does not return the document, since the prefix of the document is "kodeā€¦".

Implementing a location search in ElasticSearch

I have a problem with location queries returning erroneous results in ElasticSearch.
In our system, a business search engine, every search takes two inputs: a location, and a query-string, e.g.
q=sushi
location=Greenwich Village, New York, New York
I want the search to show me sushi in Greenwich Village first, then sushi outside of Greenwich Village, but to never show me non-sushi results.
The problem is, because of the location query, anything in Greenwich Village gets matched -- lawyers, doctors, whatever. I'd like say the following to ElasticSearch:
If q matches, then location doesn't have to (it's OK to return sushi outside of Greenwich Village), but if location matches, don't return it unless q matches also (not OK to return non-sushi businesses in Greenwich Village).
Anyone have any thoughts on how to do this?
It sounds like you want to search for "sushi" (you don't want non-sushi results) but sort your results by location (you want Greenwich Village results first).
If you are storing locations as geo points, you can simply use distance to sort your results.
If location is just a field, and you can only know if the business is inside or outside of a location, you can use Custom Filters Score query to boost relevancy of the results in the desired location. The query part should contain the search for "sushi" and the filters part should contain the search for location.
I incorporated the information on this post and here to to come up with the following solution.
Index every 'place' (neighborhood, city, etc) with a center-point, and also index the coordinates of every business.
Index the place ids attached to the businesses that contain them.
Use a sub-search to convert the text entered into the location bar to a place record.
Use a CustomScoreQuery to modify every result's score by the following formula, which was worked out by trial and error:
new_score = old_score / (1 + distance_between_place_centerpoint_and_result)^3
Also query the place id that results from 3 against the place_ids field as a 'should' boolean query. This gives a flat boost to everything that actually falls within the confines of the specified place.
A side effect of this strategy is that businesses near the center point of the place are considered more relevant -- it is arguable, in my opinion, whether this is correct or not. But other than that it has worked quite well.
Thanks to imitov for his insight that helped me come up with this solution.

Solr title search failing

I am indexing the title field for few products in Solr.
But when I am searching, I am not getting those titles in response.
For eg. I am storing following as title : Baboons Typing Tshirt
But when I am searching following I am not getting any result !!!
1)title:Baboons
2)title:(Baboons Typing Tshirt)
3)title:(Baboons*)
On the otherhand, if I am searching like this, I am getting lot of results
1)title:(Tshirt)
I have indexed many titles containing word Tshirt but I want to search a specific title which is failing..!!
I dont know whether Solr is ignoring first words, or it is doing something random.
My Question is basically: If I have a search title with lots of words, I will like to match it with the title which contains maximum common terms.
How to do it?
Thanks
Solr works like that by itself. You don't have to change anything.
You have to be careful how you set up your fields in schema.xml, i.e. how analysis is done.
You can use Solr's admin > Analysis interface to see how exactly your title field (when indexing) and query (when searching) is processed (tokenized, transformed).
Remember, match, in order to occur, requires identical word (case and everything) on both sides (index & query).
To open your index and see how Solr has actually indexed your data, use Luke.

Resources