In Azure Search, how do you run a "contains" search with multiple terms in searchtext? - azure

I am using Azure Search in full query mode on top of CosmosDB and I want to run a query for any documents with a field that contains the string "azy do". This should match, for example, a document containing "lazy dog".
Reading the Azure Search documentation, it looks like this is impossible due to the term-based indexes it uses.
Rejected solutions
0 matches since it is looking for whole words:"azy do"
Doesn't work since regexes are not allowed to span multiple terms:/.azy do./
This "works", to the extent that it will match "lazy dog", but this does not respect the ordering of the query and will also match "dog lazy", for example /.azy./ AND /.do./
Is there any way of doing this correctly in Azure Search?

If you cannot achieve that via regular expression in the Lucene Query syntax, then is not possible. You may want to vote for supporting contains here.

It should be /.lazy|dog./
So, split the terms based on whitespaces and add a pipe(|) delimiter which stands for OR.

Shortly, Azure Search is not designed to support this scenario. You might be better off using the CONTAINS function in Cosmos DB or its equivalent, depending on what query language you use.
Azure Search is designed for finding terms or phrases that occur in unstructured content (documents) and returning the most relevant documents. The process of extracting and indexing those searchable terms is customizable and described here: How full text search works in Azure Search.

Related

Azure Cognitive Search - When would you use different search and index analyzers?

I'm trying to understand what is the purpose of configuring a different analyzer for searching and indexing in Azure Search. See: https://learn.microsoft.com/en-us/rest/api/searchservice/create-index#-field-definitions-
According to my understanding, the job of the indexing analyzer is to breakup the input document into individual tokens. Through this process, it might apply multiple transformations like lower-casing the content, removing punctuation and white-spaces, and even removing entire words.
If the tokens are already processed, what is the use of the search analyzer?
Initially, I thought it would apply a similar process on the search query itself, but wouldn't setting a different analyzer than the one used to index the document at this stage completely breaks the search results? If the indexing analyzer lower-cased everything, but the search analyzer doesn't lower-case the query, wouldn't that means you'll never get matches for queries with upper case characters? What if the search analyzer doesn't split tokens on white-spaces? Won't you ever get a match the moment the query includes a space?
Assuming that this is indeed how the two analyzers works together, then why would you ever want to set two different ones?
Your understanding of the difference between index and search analyzer is correct. An example scenario where that's valuable is using ngrams for indexing but not for search terms. So this would allow a document with "cat" to produce "c", "ca", "cat" but you wouldn't necessarily want to apply ngrams on the search term as that would make the query less performant and isn't necessary since the documents already produced the ngrams. Hopefully that makes sense!

Azure Cognitive Search - how to prevent searching for plural form also returning singular matches

We have an Azure Cognitive Search index that we use for full text searches.
Currently, when the user searches for a plural word (e.g. Buildings), the singular forms are also being matched (building).
We want to restrict this behaviour so that only the plural matches are returned.
I've read through the odata documentation but cannot find any reference to how we could accomplish this either through parameters in the search.ismatch in the filter or in the index config.
Plural and singular forms are likely both matching because the field is configured with the default language analyzer, which performs stemming of terms. If you're looking for an exact match, you can use the 'eq' operator in a filter. If you want a case-insensitive (but otherwise exact) match, you can try normalizers (note that this feature is in preview at the time of this writing.)
If you need matching behaviour that is somewhat more sophisticated than a case-insensitive match, you should look into custom analyzers. They allow you to customize the behaviour of tokenization, as well as selectively use (or not use) stemming and other lexical analysis techniques.
To add onto Bruce's answer,
Custom normalizers support many of the same token and character filters as custom analyzers do. In order to decide which one best fits your needs, consider the following questions:
Will this plural/singular matching behaviour be used in filtering/sorting/faceting operations? If so, pre-configured or custom normalizers will enable you to refine what results are returned by your search filter. You can build your own or choose from the list of pre-configured ones, and it supports more than case insensitivity. See the list of supported char and token filters.
Will you need this plural/singular matching behaviour in full-text search, regardless of whether a filter is used? If so, consider using the custom analyzer Bruce suggested above.
Afaik, please note that normalizers will only affect filtering/sorting/faceting results. Also, normalizers are the only way to perform this "normalization" to filter/sort/facet queries. Setting an analyzer will not affect these types of queries.

Azure Cognitive Search: search query with two search profiles

Our search service uses Azure Cognitve Search in the following way:
Search non-fuzzy (i.e. with full match of query string).
Search fuzzy (i.e. it's allowed to change 1-2 letters in a query string)
Join results by certain rule.
This way we want to achieve that full match results will always be on the top.
But now we want to introduce a pagination. And to do it with two separate queries is a difficult and not effective task.
An alternative would be to somehow create a single query which will combine in itself both fuzzy and non-fuzzy search but with different scoring profiles, one with higher weights for full-match search and another with lower weights for fuzzy search.
Like
search=rabbit&scoringProfile=highWeightsProfile | seacrh=rabbit~&scoringProfile=lowWeightsProfile
Is there any way to do this, either in API or in SDK?
Is there any other alternative solutions to the problem of fuzzy search but with higher scores for full-match?
Boosting individual subqueries with Lucene query syntax worked for me as a good solution. Maybe not that flexible as separate search profiles for fuzzy and non-fuzzy parts, but still good.

How to create a partial search in Meteor that only returns permitted data

I'm trying to create a search feature in Meteor 1.8.1 that does the following:
returns partial matches, e.g. "fish" will find "fish", "fishcake" and "dogfish"
has server-side control of which documents are returned, so search results don't include documents that are not published to the user
is reasonably efficient
returns a limited number of results
This seems like it should be a common requirement, but I'm failing to find any solution.
MongoDB full text search will only return on whole words, so will only find "fish".
Easy search doesn't support server-side permissions, as far as I can tell.
I could try a regex solution but I think it would be expensive?
Thank you for any solutions!
Edit: From discussion it seems that Easy Search does support server-side filtering using a selector, and this would be the best solution. However, I can't get a selector working from the examples and documentation. For clarity, I've created a new question for that issue
The documentation explicitly states that for advanced use cases you may want to use elastic search and offers you a pluggable extension to ease the burden of integration.
https://matteodem.github.io/meteor-easy-search/docs/recipes/#advanced-search
You might wish that a search for cafe returns documents with the text café in them (special character). Or that your search string is split up by whitespace and those terms used to search across multiple fields.
You should consider using a search engine like ElasticSearch for your search if you have these usecases. ElasticSearch allows you to configure precisely how your fields are being searched. One way you can do that is by analyzing your data, so that searching itself is as fast as possible.

Tagging and Analysing a Search Query

I'm developing a search engine which functions taking the semantics of data into account, unlike the usual keyword based index. I managed to develop a reasonable index for the search using metadata extraction methods and RDF, but I have difficulty in using such methods on the search query itself since the search query is very much shorter that the actual data. any idea how to perform a successful tagging of a search query, using similar methods, natural language processing, etc. ?
Thank You!
Yes, the sample size of a typical query is too small for semantic analysis to be of any value.
One approach might be to constrain or expand your query using drop-down menus for things like "Named Entities" or "Subject Verb Object" tuples.
Another approach would be to expand simple keywords using rules created from your metadata so that, for example, a query for 'car' might be expanded to the tuple pattern
(*,[drive,operate,sell],[car,automobile,vehicle])
before submission.
Finally, you might try expanding the query with a non-semantically valuable prefix and/or suffix to get the query size large enough to trigger OpenCalais' recognizer.
Something like 'The user has specified the following terms in her query: one, two, three.'.
And once the results are returned, filter out all results that match only the added prefix/suffix.
Just a few quick thoughts.
You need to build semantic tree. It will based on the combination of keywords.
For example, automobile -->vehicle --> car this relation technical aspect of car. travel --
hire/rent-->vehicle-->car this is something related to travel and rent a car.
In this case MongoDB will help you a lot.

Resources