Multiple analyzers for a single field in a search index of Azure Cognitive Search

Multiple analyzers for a single field in a search index of Azure Cognitive Search - azure

We need two different types of search (based on user input), partial and exact for few fields that we have and for the same requirement, we require two different analyzers for each field to produce the required output.
Now, the problem is, I'm not able to configure 2 analyzers for a single field. The only option for me is to create two different indexes altogether and then query respective index based on the user input, but clearly, this is not the right solution, it is not scalable, mostly redundant data and takes almost double the space.
I'm trying to create a duplicate field in the same index with different analyzers and use the output of them based on the user input, but I'm not sure how I can configure that in the index. The name of the field is what is used to search for, during query time. Is there a possibility for me to have 2 different fields with different names, which actually point to one field but have different analyzers?

You can have 2 different fields with different names, which actually point to one field with two different analyzers. This can be done using field mappings in indexer definition.
I have created index as shown below,
As highlighted in above screen shot, I have taken two new fields with name cont01 and cont02.
These two new fields will point to field merged_content with two different analyzers.
In indexer definition I have configured field mappings as shown below,
Ran indexer and results are as shown below,
Reference link

Related

Solr default search field for multiple fields which has different analyzers

I have a document which has title, stockCode, category fields.
I have different field types (and analysis chains) for each. For instance title has EdgeNGram 2 to 20, category has EdgeNGram 3 to 10 with different range and stockCode just has lowercase filter.
So that, I don't want to search from documents with keyword "sample" with building the query like title:sample OR stockCode:sample OR category:sample.
I'd like to search with just "q=sample".
I copied my fields to text but It does not work. Because all fields analyzed as same. But I don't want to index stockCode as EdgeNGram or any other filters. I'd like to index my fields as I configured and I'd like to search a keyword over them base on my indexes.
I've been researching about that for three days, and Solr has a little bit poor documentation.

You can use the edismax handler, as this will allow you to give a list of fields to query and supply the query by itself. You can also give separate weights to each field for scoring them differently.
defType=edismax&q=sample&qf=title^10 stockCode category
.. will search for sample in each of the three fields, giving a 10x boost to any hits in the title field.
You can find the documentation about the edismax query parser under Searching in the reference guide.

Azure search - how to implement multiple facet search?

For example, if we have category facet and it returns withe 5 different categories, on clicking of the first category, the other categories will not be available in the response. I want to implement multiple facet search.
Appreciate your response.
For more info, i am referring the same scenario as below:
https://feedback.azure.com/forums/263029-azure-search/suggestions/7762452-provide-multiselect-facets

The facet in the response is limited to the selected and this feature is not supported. I'd suggest to vote for it here https://feedback.azure.com/forums/263029-azure-search/suggestions/7762452-provide-multiselect-facets
A workaround is to send multiple queries to get facets and filtered results separately.
For example,
1. keep all facets in the UI (or make another query to get all facets) after the first search query; 2. make another search query after another facet is selected provided that the application tracks what facets the user has selected.

if you want to filter results with multiple facets , you can modify your filter as below :
$filter = search.in(country, 'USA,Canada,Mexico,Brasil,Chile,Argentina', ',')
The first parameter to the search.in function is the string field reference (or a range variable over a string collection field in the case where search.in is used inside an any or all expression). The second parameter is a string containing the list of values, separated by spaces and/or commas. If you need to use separators other than spaces and commas because your values include those characters, you can specify an optional third parameter to search.in.
This third parameter is a string where each character of the string, or subset of this string is treated as a separator when parsing the list of values in the second parameter.
For more information about OData expression syntax for filters and order-by clauses in Azure Search, please refer to this tutorial.

I've recently run into this limitation and my workaround was to run a separate query for each facet as suggested by #rudin above.
Let's say for example that your application has facets for Colour, Brand and Size. Your primary search query includes all three filters but doesn't return any facets. You then run an additional query ignoring any selected Colours, which will give you all available colour values for the chosen brands and sizes, and you do the same for the brand and size facets.
For the additional queries it's important to set the 'Size' property to 0 so no search results are returned - just the relevant facet.
By doing this and running these queries asynchronously the performance overhead is minimal in my case with 6 facets.

The implication of #search.score in Azure Search Service

I understood the reason for having search profile and boosting results based on some fields e.g. distance, rating, etc. To me, that's most likely applicable to structured documents like json files. The scenario that I cannot make sense of it is when indexer gets search service index let's say a MS Word or PDF document in azure blob. We have two entries of "id" and "content" which I don't know how the search score would apply to it.
For e.g. there are two documents with different contents. I searched for a keyword and the same keyword found in two documents resulted into getting two different scores for two MS Word documents. My challenge is why this score should be different while both documents contain the same keyword?

The score is determined by many factors, for example, the count of terms in each document, and the number of searchable fields in which query terms were found. In your example, the documents have different lengths, so naturally they'll have different scores. HTH.

Matching "fuzzy" data based on several inputs

I have a search and matching problem:
Inputs
In my database, I have thousands of names, in addition to some other matching characteristics: a few columns of numerical data, and a few columns of other text that helps identify this specific company.
A prospective client has about 500 company names, and then sparsely populated additional characteristics as mentioned above for each of the names.
Current Process
In the past, the process has been a manual one, try to match each name given by the client by searching through the database, finding a name "like" the one reported to me, and then verifying that the additional characteristics match up. However, the main issue is that the names reported are not the same, can often contain abbreviations or only parts of the name stored in my database, and the additional characteristics may be incomplete or only partially matching as well.
Automation
I want to automate this process since it happens frequently. The optimal solution would input one company from the client list along with any of the additional characteristics they filled in for it, and then try to find the top 5 matches in my database.
I've never used Lucene or Sphinx, but they seem to be more document driven. Is there a way to format these inputs so those libraries work for this problem, or instead, what other software tools exist that would work?

To Lucene, a 'document' can easily be a row in a table and I think you will like the fuzzy~ search and search hit scoring capabilities.

How can I delete records from a table that have certain criteria

Rookie question I know.
I have a table with about 10 fields, one of the fields is a category field. I need this field to exist because of the multiple types of categories. However, one category in this field is wrong and is duplicating results.
So can I delete all records in the table that have "Type320" in the CatDescription field, and how? I want to keep eveerything else as it is in this table; just need to get rid of the records that have that that in that one field
Thanks very much!
EDIT: Thanks for the answer, I did not know how to do this so this is very helpful
However, this is more complicated than I thought. The raw data that I am supplied carries these duplicate records (only duplicate in certain circumstances but they are easy to isolate). This raw data is given to me on a monthly basis in several spreadsheet forms.
It all relates to these ID numbers, and has like 10 fields (xls columns). As I said before one of these is the Category Description field (sorry, this is not a lookup) In certain places this records automatically duplicates itself on output because in the database this comes from, it has to have this sub category for one particular "type"
So....every time there is a duplication, every single bit of information in all fields are exactly the same, with the exception of this CatDescription (one is Type320, and the duplicated record type is "Type321"). However, there are some instances where Type321 is valid on it's own (in which case there is no matching data row with a Type320 catdescription). By matching I mean all data in all fields of a particular record.
A very clear absolute of this is if all fields (data within) of a record with Type320 CatDescription, matches all fields (data within) a record with Type321 CatDescription, then I can delete that record containing Type321 CatDescription. This is true because this is the only situation where this duplication occurs, normally not all of this should match.
This allows all unique records with Type320 and Type321 data (that does not match exactly) to stay; just a it should. This makes sense to me (and hopefully you too :/) but can it be done, and how?
thanks because this is way over my head. I would rather know how to do it in access, but an xls solution is equally as appreciate. heck i would do it in ppt if it would get the job done! :)

I would try with one of these two querys:
DELETE FROM table WHERE CatDescription LIKE '%Type320%';
DELETE FROM table WHERE CatDescription LIKE '*Type320*';
That because the Access database engine could be using * (ANSI-89 Query Mode e.g. DAO) instead of % (ANSI-92 Query Mode e.g. OLE DB/ADO) for the wildcards.
Alternatively, this regardless of ANSI Query Mode:
DELETE FROM table WHERE CatDescription ALIKE '%Type320%';
Note the Access database engine's ALIKE keyword is not officially supported.

Does the CatDescription field look to another table? Is it a a query of those tables that creates what you call duplicate results?
If so, be careful about blaming the table that has CatDescription. Check the look-up table to see if Type320 is found there in duplicate.
If you don't have the problem isolated correctly, then you're likely to delete good records while not fixing the problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string