Azure Search Lucene Query Incorrect Result

Azure Search Lucene Query Incorrect Result - azure

I am currently using Azure Search to bring back images stored in blob storage, based off filters that are passed in by the user. Below is my Azure Search, which I thought should filter all of the content specified in the tags field as a AND:
search=foreignId:d0c41422-acfa-4e4b-a9db-8c06b6860f3f, tags:SiteRef +\""TY0033"\" + BlockRef + \""00"\" + Disipline + \""FABRIC"\"&searchMode=all&queryType=full
and what it brings back (which is wrong as you can see from the BlockRef, though if I pass CN0001, it brings the correct values):
"foreignId": "d0c41422-acfa-4e4b-a9db-8c06b6860f3f",
"description": "Health & Safety Eire - Site Photo - TY0033-01-
FABRIC-005",
"fileName": "TY0033-01-FABRIC-005",
"fileExtension": ".jpg",
"createdAt": "26/11/2018 02:00:24",
"tags": "[{\"TagName\":\"SiteRef\",\"Value\":\"TY0033\"},{\"TagName\":\"BlockRef\",\"Value\":\"01\"},{\"TagName\":\"Disipline\",\"Value\":\"FABRIC\"},{\"TagName\":\"PhotoNumber\",\"Value\":\"005\"}]",
"longitude": 0,
"latitude": 0
95% of the time this is working perfectly, however the other 5% of the time, the images comes back incorrect, as Azure search has given the incorrect details.
I have checked and it seems to be because it is not respecting the multiplicity of the search terms. I am new to Azure Search, so I am wondering if I am doing it correctly?
Any help would be greatly appreciated
Index Definition:
Index Definition
Edit: Updated Post with index definition

In your query you check if foreignId is equal to d0c41422-acfa-4e4b-a9db-8c06b6860f3f and tags field contains SiteRef and if any searchable field contains TY0033, BlockRef, 00, Disipline and FABRIC. In your case all fields are searchable. Thus:
forignId matches
tags contains SiteRef
TY0033, BlockRef, Disipline and FABRIC are in tags field
00 is in createdAt field, as standard Lucene analyzer tokenizes "26/11/2018 02:00:24" into 26,11,2018,02,00,24
In order to search in tags field you should rewrite your query as follows:
search=foreignId:d0c41422-acfa-4e4b-a9db-8c06b6860f3f AND tags:(SiteRef AND \""TY0033"\" AND BlockRef AND \""00"\" AND Disipline AND \""FABRIC"\")&searchMode=all&queryType=full
It might be worthwhile to use proximity search to make sure you correlate occurrences field/value pairs, e.g.: BlockRef and 00 e.g., "BlockRef 00"~1

Related

How to disable tokenization for Azure Search Autocomplete?

I've created Azure Search Suggester for "full_name" index field in order to support autocomplete functionality. Now when I use Azure autocomplete REST endpoint by using "search" parameter as a let's say "Lor" I only get back the result "Lorem" not the "Lorem Ipsum". Is there any way to disable tokenization for suggester and to get back full name like "Lorem Ipsum" for the search term "Lor" for autocomplete?

The Autocomplete API is meant to suggest search terms based on incomplete terms one is typing into to the search box (type-ahead). It supports three modes:
oneTerm – Only one term is suggested. If the query has two terms, only
the last term is completed. For example:
"washington medic" -> "medicaid", "medicare", "medicine"
twoTerms – Matching two-term phrases in the index will be suggested,
for example:
"medic" -> "medicare coverage", "medical assistant"
oneTermWithContext – Completes the last term in a query with two or
more terms, where the last two terms are a phrase that exists in the
index, for example:
"washington medic" -> "washington medicaid", "washington medical"
The twoTerms mode might work for you. If you're looking for an API that suggests documents based on an incomplete query term, try the Suggestions API. It returns the entire contents of a field that has a Suggester enabled for all documents that matched the query.

Azure Search - Match value from comma-separated values string

How do you structure a Azure POST REST call to match a value on a comma-separated list string?
For Example:
I want to search for "GWLAS" or "SAMGV" within the Azure field "ProductCategory".
The "ProductCategory" field in the documents will have a comma-separated value string such as "GWLAS, EXDEB, SAMGV, AMLKYC".
Any ideas?

If you use the default analyzer for your ProductCategory field (assuming it is searchable), it should word-break on commas by default. This means all you should need to do is search for the terms you're interested in and limit it to the right field:
POST /indexes/yourindex/docs/search?api-version=2016-09-01
{
"search": "GWLAS SAMGV",
"searchFields": [ "ProductCategory" ]
}
There are other ways to do this, but this is the simplest. If you already scope parts of your search query to other fields, here is how you can scope just the desired terms to ProductCategory:
POST /indexes/yourindex/docs/search?api-version=2016-09-01
{
"search": "(Name:\"Anderson John\"~3 OR Text:\"Anderson John\"~3) AND ProductCategory:GWLAS SAMGV",
"queryType": "full"
}
Please consult the Azure Search REST API documentation for details on other options you can set in the Search request. Also, this article will help you understand how Azure Search executes queries. You can find the reference for the full Lucene query syntax here.

How to build search with facetting over unknown/unspecified set of attributes/properties?

I'm working on a product search engine with a big set of undefined products which is constantly growing. Each product has different attributes and at this time they're saved in an array of string key-value pairs like this:
"attributes": [
{
"key": "Producttype",
"value": "Headphones - 3.5 mm plug"
},
{
"key": "Weight",
"value": "280 g"
},
{
"key": "Soundmode",
"value": "Stereo"
},
....
]
Each product has also a category. I'm using elasticsearch 2.4.x to persist data that i want to search on via spring-data-elasticsearch. It's possible to upgrade to the newest elasticsearch version if needed.
As you can see the attributes are really generic. It's also needed to use nested objects to be able to search on this attributes. I'm also thinking about preprocessing this attributes to a standardized format. For example the "Weight" key might be written in different forms like "Productweight" or "Weight of product". Because there are a lot of attributes and i wouldn't like to create a custom property/field for each one i thought about about mapping only the important ones (like weight) to a custom, own field and to map the other attributes like described above.
Now if someone searches for example "iphone" i would like to show some facettes on the left of the search result page. The facettes should differ if someone searches "Adidas shoes". Is this possible with the given format above using nested objects? Is it possible to build the facettes dynamically regarding to the resultset elasticsearch is returning? E.g. the most common properties which all result products contain should be used to create facettes. Or do i have to persist some predefined filters/facettes on each category? I think that would be too much work and also doesn't work on search results where products can have different categories. What's the best practice to build a search feature with facetting on entities with n different properties that can grow in future?

Azure Search: Searching for singular version of a word, but still include plural version in results

I have a question about a peculiar behavior I noticed in my custom analyzer (as well as in the fr.microsoft analyzer). The below Analyze API tests are shown using the “fr.microsoft” analyzer, but I saw the same exact behavior when I use my “text_contains_search_custom_analyzer” custom analyzer (which makes sense as I base it off the fr.microsoft analyzer).
UAT reported that when they search for “femme” (singular) they expect documents with “femmes” (plural) to also be found. But when I tested with the Analyze API, it appears that the Azure Search service only tokenizes plural -> plural + singular, but when tokenizing singular, only singular tokens are used. See below for examples.
Is there a way I can allow a user to search for the singular version of a word, but still include the plural version of that word in the search results? Or will I need to use synonyms to overcome this issue?
Request with “femme”
{
"analyzer": "fr.microsoft",
"text": "femme"
}
Response from “femme”
{
"#odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult",
"tokens": [
{
"token": "femme",
"startOffset": 0,
"endOffset": 5,
"position": 0
}
]
}
Request with “femmes”
{
"analyzer": "fr.microsoft",
"text": "femmes"
}
Response from “femmes”
{
"#odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult",
"tokens": [
{
"token": "femme",
"startOffset": 0,
"endOffset": 6,
"position": 0
},
{
"token": "femmes",
"startOffset": 0,
"endOffset": 6,
"position": 0
}
]
}

You are using the Analyze API which uses text analyzers, that is not the same as searching using the Search API.
Text analyzers are what supports the search engine when building the indexes that is really what is at the bottom of a search engine. In order to structure a search index the the documents that goes in there needs to be analyzed, this is where the Analyzers come in. They are the ones that can understand different languages and can parse a text and make sense of if, i.e. splitting up words, removing stop words, understand sentences and so on. Or as they put it in the docs: https://learn.microsoft.com/en-us/rest/api/searchservice/language-support
Searchable fields undergo analysis that most frequently involves word-breaking, text normalization, and filtering out terms. By default, searchable fields in Azure Search are analyzed with the Apache Lucene Standard analyzer (standard lucene) which breaks text into elements following the "Unicode Text Segmentation" rules. Additionally, the standard analyzer converts all characters to their lower case form.
So what you are seeing is actually perfectly right, the french analyzer breaks down the word you send in and returns possible tokens from the text. For the first text it cannot find any other possible tokens than 'femme' (I guess there are no other words like 'fem' or 'femm' in French?), but for the second one it can find both 'femme' and 'femmes' in there.
So, what you are seeing is a natural function of a text analyzer.
Searching for the same text using the search API on the other hand should return documents with both 'femme' and 'femmes' in, if you have set the right analyzer (for instance fr.microsoft) for the searchable field. The default 'standard' analyzer does not handle pluralis and other inflections of the same word.

Just to add to yoape's response, the fr.microsoft analyzer reduces inflected words to their base form. In your case, the word femmes is reduced to its singular form femme. All cases that you described will work:
Searching with the base form of a word if an inflected form was in the document. Let's say you're indexing a document with Vive with Femmes. The search engine will index the following terms: vif, vivre, vive, femme, femmes.If you search with any of these terms e.g., femme, the document will match.
Searching with an inflected form of a word if the base form was in the document.
Let's say you're indexing a document with teext Femme fatale. The search engine will index the following terms: femme, fatal, fatale.If you search with term femmes, the analyzer will produce also its base form. Your query will become femmes OR femme. Documents with any of these terms will match.
Searching with an inflected from if another inflected form of that word was in the document. If you have a document with allez, terms allez and aller will be indexed. If you search for alle, the query becomes alle OR aller. Since both inflected forms are reduced to the same base form the document will match.
The key learning here is that the analyzer processes the documents but also query terms. Terms are normalized accounting for language specific rules.
I hope that explains it.

SOLR - match tags against text

I have the tags solr collection with 100k records. It has simple structure, example node:
{
"id": "57301",
"name": "Roof repair",
}
The task is - automatically bind tag list for any input text using solr search engine. Now our algorithm is.
First we send whole text as query to tags collection. We are searching whole text in "name" field. We recive a big list of tags.
Send requests in a cycle (loop tags, recived at step1), to another collection, that contains the document with input text (id is known). Example query
id:38373 AND _text_:"Roof repair" . If this query gives any results - will we add Roof repair to matched tags.
Finaly - we have a checked tag list for given input text. Quality of this automatic tag binding is good (for us of course).
But we have a performance problem: some texts have 10k tags on step 1. Then each tag checking in step2 with http request to solr. 10k requests is very much. We can to crop tags count to analyse, but tag-linking quality becomes much worse.
Is there way to match solr tag collection against text without cyclic request for each tag?

Please elaborate your question again. I didn't get the first part and second one how this happened id:38373 AND text:"Roof repair"?
First we send whole text as query to tags collection. We recive a big list of tags.?
Means you are searching whole text in "name" field ?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string