How to match this query in Azure Search - azure

I have this INDEX
{
"name": "testentities",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"retrievable": true,
"filterable": true,
"sortable": true
},
{
"name": "entity_id",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true,
"searchAnalyzer":"standard",
"indexAnalyzer": "custom_analyzer"
},
{
"name": "description",
"type": "Edm.String",
"searchable": true,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "entity_type",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": true,
"retrievable": true,
"filterable": true
},
{
"name": "ancestors",
"type": "Collection(Edm.String)",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "calendar_id",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "currency",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "timezone",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "active",
"type": "Edm.Boolean",
"retrievable": true,
"facetable": true,
"filterable": true
},
{
"name": "kpi_collection",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "rid",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
}
}
],
"analyzers": [
{
"name": "custom_analyzer",
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"token1",
"tokenFilters": [
"lowercase",
"entityID_stopWords",
"entityID_edgeNGram"
]
}
],
"tokenizers":[
{
"name":"token1",
"#odata.type":"#Microsoft.Azure.Search.StandardTokenizerV2"
}
],
"tokenFilters": [
{
"name": "entityID_edgeNGram",
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram": 1,
"maxGram": 6
},
{
"name": "entityID_stopWords",
"#odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": [
"store",
"region",
"zone",
"field_org",
":"
]
}
]
}
and if i execute this query :
{
"search": "0001",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
i get this result, that is correct , because it matches with name that have hight score after entity id.
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
But if i run this query
{
"search": "1",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
I got this result that is not correct
{
"#search.score": 1.4522386,
"name": "LensCrafters 1622",
"entity_id": "store:1622",
"entity_type": "store",
"description": "31625 Pacific Hwy S, Spc #E-1, Federal Way, 98003-5645, WA, US",
"active": true,
"ancestors": [
"region:1024",
"zone:1107",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
"#search.score": 1.3403159,
"name": "LensCrafters 1178",
"entity_id": "store:1178",
"entity_type": "store",
"description": "1 W FlatIron Crossing Dr #1104, Broomfield, 80021-8881, CO, US",
"active": true,
"ancestors": [
"region:1019",
"zone:1122",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
...............
Why the resulat is not this despite inside scoring profile entity_is has value 9?
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
Here the scoring profile?
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
},
"functions": [],
"functionAggregation": null
}
],.............

You are using a custom analyzer on the entity_id field that produces the following tokens for text store:1178: 1, 11, 117, 1178 (you can test your analyzer configuration with the Analyze API). This means, the documents LensCrafters 1622 and LensCrafters 1178 match the query as well as the document LensCrafters 0001 - they all have 1 in entity_id. However, the documents LensCrafters 1622 and LensCrafters 1178 also match 1 in description. Thus, they have a higher score than LensCrafters 0001.
To learn more about query processing and custom analyzers in Azure Search, please read: How full text search works in Azure Search.
Do you want to keep the edgeNGram token filter in your analysis chain? Why?

Related

Pimcore: New product class not visible in e-commerce product list

Goal
Data objects of my data object class Product should be visible in the e-commerce Pimcore site.
Current Setup
Current Demo and Blue Print Application for Pimcore
I create a new data object class called Product. Parent PHP class is set to \App\Model\Product\AbstractProduct (Complete class definition export attached)
Created a new data object based on the Product class.
Result
The new product is not visible in the shop. There is no error shown up either.
What I also tried
Based on the Index Service documentation I manually updated the index, without any effect.
$ php bin/console ecommerce:indexservice:bootstrap --update-index
Processing 1 Product in segments of 50, batches of 50, 1 round, 1 batch in 1 process
1/1 [============================] 100% < 1 sec/< 1 sec 48.5 MiB
Processed 1 Product.
Attached complete class definition export
{
"id": "PROD",
"description": "",
"modificationDate": 1669880184,
"parentClass": "\\App\\Model\\Product\\AbstractProduct",
"implementsInterfaces": "",
"listingParentClass": "",
"useTraits": "",
"listingUseTraits": "",
"allowInherit": true,
"allowVariants": true,
"showVariants": true,
"layoutDefinitions": {
"name": "pimcore_root",
"type": null,
"region": null,
"title": null,
"width": 0,
"height": 0,
"collapsible": false,
"collapsed": false,
"bodyStyle": null,
"datatype": "layout",
"permissions": null,
"children": [
{
"name": "Layout",
"type": null,
"region": null,
"title": "",
"width": "",
"height": "",
"collapsible": false,
"collapsed": false,
"bodyStyle": "",
"datatype": "layout",
"permissions": null,
"children": [
{
"name": "Base Data",
"type": null,
"region": null,
"title": "Base Data",
"width": "",
"height": "",
"collapsible": false,
"collapsed": false,
"bodyStyle": "",
"datatype": "layout",
"permissions": null,
"children": [
{
"name": "productName",
"title": "Product Name",
"tooltip": "",
"mandatory": true,
"noteditable": false,
"index": true,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "input",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"width": "",
"defaultValue": null,
"columnLength": 190,
"regex": "",
"regexFlags": [],
"unique": true,
"showCharCount": false,
"defaultValueGenerator": ""
},
{
"name": "localizedfields",
"title": "",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": null,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "localizedfields",
"relationType": false,
"invisible": false,
"visibleGridView": true,
"visibleSearch": true,
"children": [
{
"name": "description",
"title": "Description",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "textarea",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"width": "",
"height": "",
"maxLength": null,
"showCharCount": false,
"excludeFromSearchIndex": false
},
{
"name": "packaging",
"title": "Packaging",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "input",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"width": "",
"defaultValue": null,
"columnLength": 190,
"regex": "",
"regexFlags": [],
"unique": false,
"showCharCount": false,
"defaultValueGenerator": ""
}
],
"region": null,
"layout": null,
"width": "",
"height": "",
"maxTabs": null,
"border": false,
"provideSplitView": false,
"tabPosition": null,
"hideLabelsWhenTabsReached": null,
"fieldDefinitionsCache": null,
"permissionView": null,
"permissionEdit": null,
"labelWidth": 0,
"labelAlign": "left"
},
{
"name": "image",
"title": "Image",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "image",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"width": "",
"height": "",
"uploadPath": ""
},
{
"name": "group",
"title": "Group",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "manyToOneRelation",
"relationType": true,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"classes": [
{
"classes": "ProductGroup"
}
],
"pathFormatterClass": "",
"width": "",
"assetUploadPath": "",
"objectsAllowed": true,
"assetsAllowed": false,
"assetTypes": [],
"documentsAllowed": false,
"documentTypes": []
},
{
"name": "categories",
"title": "Categories",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "manyToManyObjectRelation",
"relationType": true,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"classes": [
{
"classes": "Category"
}
],
"pathFormatterClass": "",
"width": "",
"height": "",
"maxItems": null,
"visibleFields": "id,fullpath,name",
"allowToCreateNewObject": false,
"optimizedAdminLoading": false,
"enableTextSelection": false,
"visibleFieldDefinitions": []
}
],
"locked": false,
"fieldtype": "panel",
"layout": null,
"border": false,
"icon": "",
"labelWidth": 0,
"labelAlign": "left"
},
{
"name": "Attributes",
"type": null,
"region": null,
"title": "Attributes",
"width": "",
"height": "",
"collapsible": false,
"collapsed": false,
"bodyStyle": "",
"datatype": "layout",
"permissions": null,
"children": [
{
"name": "attributes",
"title": "Attributes",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "objectbricks",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"allowedTypes": [
"EdgebandingAttributes"
],
"maxItems": null,
"border": false
},
{
"name": "saleInformation",
"title": "Sale Information",
"tooltip": "",
"mandatory": false,
"noteditable": false,
"index": false,
"locked": false,
"style": "",
"permissions": null,
"datatype": "data",
"fieldtype": "objectbricks",
"relationType": false,
"invisible": false,
"visibleGridView": false,
"visibleSearch": false,
"allowedTypes": [
"SaleInformation"
],
"maxItems": null,
"border": false
}
],
"locked": false,
"fieldtype": "panel",
"layout": null,
"border": false,
"icon": "",
"labelWidth": 0,
"labelAlign": "left"
}
],
"locked": false,
"fieldtype": "tabpanel",
"border": false,
"tabPosition": null
}
],
"locked": false,
"fieldtype": "panel",
"layout": null,
"border": false,
"icon": null,
"labelWidth": 100,
"labelAlign": "left"
},
"icon": "",
"previewUrl": "",
"group": "Product Data",
"showAppLoggerTab": false,
"linkGeneratorReference": "",
"previewGeneratorReference": "",
"compositeIndices": [],
"generateTypeDeclarations": true,
"showFieldLookup": false,
"propertyVisibility": {
"grid": {
"id": true,
"key": false,
"path": true,
"published": true,
"modificationDate": true,
"creationDate": true
},
"search": {
"id": true,
"key": false,
"path": true,
"published": true,
"modificationDate": true,
"creationDate": true
}
},
"enableGridLocking": false
}
I finally got it to work (after my last answer which was supposed to be a comment, my bad :D )
did you check the following:
1 Class override
https://pimcore.com/docs/pimcore/current/Development_Documentation/Extending_Pimcore/Overriding_Models.html
in /config/ecommerce/base-ecommerce.yaml
pimcore:
models:
class_overrides:
Pimcore\Model\DataObject\YourClass: App\Model\Product\YourClass
make sure you clear the cache like in the documentation
./bin/console cache:clear --no-warmup && ./bin/console pimcore:cache:clear
2 Check all Car Class names in Model / Controller
For example:
src/controller/productController.php
src/Model/Adminstyle/Car --> to your Class
src/Model/Car --> to your Class
3 Make sure that saving a product of yours gets in the index
I saw that on saving the object in the backend, i got a log that the object was not indexed.
https://pimcore.com/docs/pimcore/current/Development_Documentation/E-Commerce_Framework/Index_Service/Product_Index_Configuration/Data_Architecture_and_Indexing_Process.html
I had some other issues which where 100% not meant to be fixed like i did. So try how far you come with this
edit: typo

adding analyzers to Azure Search Index using REST API not saving

Having trouble getting the analyzers to save / update on the index. When creating, everything else (the tokenFilters, tokenizers, fields) saves fine, but the analyzers array is always empty?
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
Creating a new index:
let index = {
name: "test-index",
tokenizers: [{
"odatatype": "#Microsoft.Azure.Search.StandardTokenizerV2",
"name": "test_standard_v2",
"maxTokenLength": 255
}],
fields: [{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}, {
'name': 'metadata_storage_name',
'type': 'Edm.String',
'facetable': false,
'filterable': false,
'key': false,
'retrievable': true,
'searchable': true,
'sortable': false,
'synonymMaps': [],
'fields': [],
},
{
"name": "partialName",
"type": "Edm.String",
"retrievable": false,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"key": false,
"searchAnalyzer": "standardCmAnalyzer",
"indexAnalyzer": "filename_analyzer"
}],
tokenFilters: [{
"name": "nGramCmTokenFilter",
"odatatype": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"minGram": 3,
"maxGram": 20
}],
analyzers: [{
"name": "standardCmAnalyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": ["lowercase", "asciifolding"]
},
{
"name": "filename_analyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": [
"nGramCmTokenFilter"
]
}],
};
Then creating it:
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
I noticed no error messages being returned.
EDIT:
Using the sdk #azure/search-documents ^11.1.0

Azure Cognitive Search: How to index json custom metadata of a blob

I have a blob with a custom metadata property of jsonmd.
The custom metadata looks something like:
{
"ResourceName": "ipso factum...",
"ResourceVariations": [{
"Description": "ipso factum...",
"Name": "R4.mp4",
"Thumbnail": "R4.jpg",
"URL": ""
},
...
I was able to capture the full json in the index by including a filed in the index:
{
"name": "jsonmd",
"type": "Edm.String",
"facetable": true,
"filterable": true,
...
I want to capture the Thumbnail property and have added this field to the index:
{
"name": "Thumbnail",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "standard.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
I can't figure out how to use the custom metadata (jsonmd) to populate the Thumbnail property of the index?
You can define complex types in your index schema. Below is an example of how I collect metadata from PDF documents that we index. I extract the properties from the PDF using regular C# code, populate a Dictionary and then submit the objects using the Azure Cognitive Search SDK.
For more examples, see Model complex data types in Azure Cognitive Search.
{
"name": "Metadata",
"type": "Edm.ComplexType",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Properties",
"type": "Collection(Edm.ComplexType)",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Values",
"type": "Collection(Edm.String)",
"facetable": true,
"filterable": true,
"retrievable": true,
"searchable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
]
}
]
}

How to search on complex fields in Azure Cognitive Search

Consider the following model, where Address has nested property of City
{
"HotelId": "1",
"HotelName": "Secret Point Motel",
"Description": "Ideally located on the main commercial artery of the city in the heart of New York.",
"Tags": ["Free wifi", "on-site parking", "indoor pool", "continental breakfast"],
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY"
},
"Rooms": [
{
"Description": "Budget Room, 1 Queen Bed (Cityside)",
"RoomNumber": 1105,
"BaseRate": 96.99,
},
{
"Description": "Deluxe Room, 2 Double Beds (City View)",
"Type": "Deluxe Room",
"BaseRate": 150.99,
}
. . .
]
}
The model is indexed in Azure Cognitive Search as the following, where the Address is set as Edm.ComplexType
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false },
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Address", "type": "Edm.ComplexType",
"fields": [
{ "name": "StreetAddress", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "searchable": true },
{ "name": "City", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "StateProvince", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true }
]
},
{ "name": "Rooms", "type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Type", "type": "Edm.String", "searchable": true },
{ "name": "BaseRate", "type": "Edm.Double", "filterable": true, "facetable": true }
]
}
]
}
Now I am trying to search on the data for City equals New York using the following queries, but none of them works
city eq 'new york' // return no result
address/city eq 'new york' // return error The property 'address/city' does not exist
address.city eq 'new york' // return error The property 'address.city' does not exist
So then how to search on Edm.ComplexType filed in Azure Cognitive Search?
N.B: I am using Azure Dotnet SDK (10.1.0)
The correct syntax is to define the OData expression in $filter clause. If you were using REST API, your $filter clause would be:
Address/City eq 'New York'
The reason your code is failing is because the actual field path is Address/City whereas you are specifying it as address/city. Once you use the proper field names, your code should work just fine.

Why the field is not subject to lexical analysis

I'm trying to make my searches ignore word accents
To do this I decided to use the language analyzer: es.microsoft
I was testing the analyzer with the word "Lámpara" in the analyzer API and I got the following results:
{
"token": "lampara",
"startOffset": 0,
"endOffset": 7,
"position": 0
},
{
"token": "lámpara",
"startOffset": 0,
"endOffset": 7,
"position": 0
}
I have only 2 documents in my test index:
{
"#search.score": 1,
"Id": "2",
"Nombre": "Lampara"
},
{
"#search.score": 1,
"Id": "1",
"Nombre": "Lámpara"
}
When searching for the word in the index search=Lámpara I get the following results:
{
"#search.score": 0.30685282,
"Id": "1",
"Nombre": "Lámpara"
}
For what reason the document is only received with Nombre = "Lámpara" and not Nombre = "Lampara" (without accent). I have the impression that the Name field was not sent to the lexical analysis
The definition of my index is as follows
{
"name": "test",
"fields": [
{
"name": "Id",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Nombre",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "es.microsoft",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
],
"suggesters": [],
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"analyzers": [],
"charFilters": [],
"tokenFilters": [],
"tokenizers": []
}
I would appreciate any help, and an apology for my bad English
Sorry for the delay in getting your an answer. Indeed, the Microsoft Spanish analyzer currently only fold the accents in documents, so they can be matched by queries that forgo the accents (as you mentioned, search for Lampara, it will match documents that contains Lámpara, but if you explicitly set the accents in the query (for example, searching for Lámpara), it won't match documents that don't have any accents.
If this behavior is important to you, you can instead use the es.lucene analyzer which actually actually do "ascii folding" (removing of accents) both at indexing and at search time.

Resources