How to search on complex fields in Azure Cognitive Search - azure

Consider the following model, where Address has nested property of City
{
"HotelId": "1",
"HotelName": "Secret Point Motel",
"Description": "Ideally located on the main commercial artery of the city in the heart of New York.",
"Tags": ["Free wifi", "on-site parking", "indoor pool", "continental breakfast"],
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY"
},
"Rooms": [
{
"Description": "Budget Room, 1 Queen Bed (Cityside)",
"RoomNumber": 1105,
"BaseRate": 96.99,
},
{
"Description": "Deluxe Room, 2 Double Beds (City View)",
"Type": "Deluxe Room",
"BaseRate": 150.99,
}
. . .
]
}
The model is indexed in Azure Cognitive Search as the following, where the Address is set as Edm.ComplexType
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false },
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Address", "type": "Edm.ComplexType",
"fields": [
{ "name": "StreetAddress", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "searchable": true },
{ "name": "City", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "StateProvince", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true }
]
},
{ "name": "Rooms", "type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Type", "type": "Edm.String", "searchable": true },
{ "name": "BaseRate", "type": "Edm.Double", "filterable": true, "facetable": true }
]
}
]
}
Now I am trying to search on the data for City equals New York using the following queries, but none of them works
city eq 'new york' // return no result
address/city eq 'new york' // return error The property 'address/city' does not exist
address.city eq 'new york' // return error The property 'address.city' does not exist
So then how to search on Edm.ComplexType filed in Azure Cognitive Search?
N.B: I am using Azure Dotnet SDK (10.1.0)

The correct syntax is to define the OData expression in $filter clause. If you were using REST API, your $filter clause would be:
Address/City eq 'New York'
The reason your code is failing is because the actual field path is Address/City whereas you are specifying it as address/city. Once you use the proper field names, your code should work just fine.

Related

Receiving RequestMalformed error when doing Typesense upsert

I have the following interface in typescript:
export interface TypesenseAtlistedProEvent {
// IDs
id: string;
proId: string;
eventId: string;
startTime: Number;
stopTime: Number;
eventRate: Number;
remainingSlots: Number;
displayName: string;
photoURL: string;
indexOptions: string;
location: Number[];
}
and the following schema in Typesense:
{
"created_at": 1665530883,
"default_sorting_field": "location",
"fields": [
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "proId",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "eventId",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "startTime",
"optional": false,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "stopTime",
"optional": false,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "eventRate",
"optional": false,
"sort": true,
"type": "float"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "remainingSlots",
"optional": false,
"sort": true,
"type": "int32"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "displayName",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "photoURL",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "indexOptions",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "location",
"optional": false,
"sort": true,
"type": "geopoint"
}
],
"name": "atlistedProEventIndex",
"num_documents": 0,
"symbols_to_index": [],
"token_separators": []
}
I look to upsert like the in the following:
const indexedDoc: TypesenseAtlistedProEvent = {
id: proId + eventId,
proId: proId,
eventId: eventId,
startTime: publicEvent.startTime.seconds,
stopTime: publicEvent.stopTime.seconds,
eventRate: publicEvent.eventRate,
remainingSlots: publicEvent.remainingSlots,
displayName: tpi.displayName,
photoURL: tpi.photoURL,
indexOptions: tpi.indexOptions,
location: [tpi.lat, tpi.lng],
};
return await typesenseClient
.collections("atlistedProEventIndex")
.documents()
.upsert(indexedDoc)
.then(() => {
return {success: true, exit: 0};
})
I am getting the following upon the query:
RequestMalformed: Request failed with HTTP code 400 | Server said: [json.exception.type_error.302] type must be number
I am passing it location as Number[], and trying to get that to update the geopoint in typesense. This is not working and thus it would be useful if:
I was able to locate the logs to go through. I would particularly like the logs given by the Typesense Cloud, and am feeling at a loss that I cannot find these.
I would like to pass in the geopoint as the right type in typescript. Right now, as you can see above, the location is of type Number[], which, from the examples I saw, assumed was right. It also may be the case that another field is off and I'm just missing it. Either way, I could really use some kind of server side logging coming from Typesense Cloud.
The error message is a little confusing, but the core of the issue is that the default_sorting_field can only be a numeric field, but it's currently set as a geopoint field (location), which is what that error is trying to convey.
So if you create a new collection without default_sorting_field, the error should not show up.
If you want to sort by geo location, you want to use the sort_by parameter: https://typesense.org/docs/0.23.1/api/geosearch.html#searching-within-a-radius
let searchParameters = {
'q' : '*',
'query_by' : 'title',
'filter_by' : 'location:(48.90615915923891, 2.3435897727061175, 5.1 km)',
'sort_by' : 'location(48.853, 2.344):asc'
}
client.collections('companies').documents().search(searchParameters)

adding analyzers to Azure Search Index using REST API not saving

Having trouble getting the analyzers to save / update on the index. When creating, everything else (the tokenFilters, tokenizers, fields) saves fine, but the analyzers array is always empty?
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
Creating a new index:
let index = {
name: "test-index",
tokenizers: [{
"odatatype": "#Microsoft.Azure.Search.StandardTokenizerV2",
"name": "test_standard_v2",
"maxTokenLength": 255
}],
fields: [{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}, {
'name': 'metadata_storage_name',
'type': 'Edm.String',
'facetable': false,
'filterable': false,
'key': false,
'retrievable': true,
'searchable': true,
'sortable': false,
'synonymMaps': [],
'fields': [],
},
{
"name": "partialName",
"type": "Edm.String",
"retrievable": false,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"key": false,
"searchAnalyzer": "standardCmAnalyzer",
"indexAnalyzer": "filename_analyzer"
}],
tokenFilters: [{
"name": "nGramCmTokenFilter",
"odatatype": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"minGram": 3,
"maxGram": 20
}],
analyzers: [{
"name": "standardCmAnalyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": ["lowercase", "asciifolding"]
},
{
"name": "filename_analyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": [
"nGramCmTokenFilter"
]
}],
};
Then creating it:
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
I noticed no error messages being returned.
EDIT:
Using the sdk #azure/search-documents ^11.1.0

Azure Cognitive Search: How to index json custom metadata of a blob

I have a blob with a custom metadata property of jsonmd.
The custom metadata looks something like:
{
"ResourceName": "ipso factum...",
"ResourceVariations": [{
"Description": "ipso factum...",
"Name": "R4.mp4",
"Thumbnail": "R4.jpg",
"URL": ""
},
...
I was able to capture the full json in the index by including a filed in the index:
{
"name": "jsonmd",
"type": "Edm.String",
"facetable": true,
"filterable": true,
...
I want to capture the Thumbnail property and have added this field to the index:
{
"name": "Thumbnail",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "standard.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
I can't figure out how to use the custom metadata (jsonmd) to populate the Thumbnail property of the index?
You can define complex types in your index schema. Below is an example of how I collect metadata from PDF documents that we index. I extract the properties from the PDF using regular C# code, populate a Dictionary and then submit the objects using the Azure Cognitive Search SDK.
For more examples, see Model complex data types in Azure Cognitive Search.
{
"name": "Metadata",
"type": "Edm.ComplexType",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Properties",
"type": "Collection(Edm.ComplexType)",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Values",
"type": "Collection(Edm.String)",
"facetable": true,
"filterable": true,
"retrievable": true,
"searchable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
]
}
]
}

How to index complex types into Edm.ComplexType with Azure Cognitive Search

I am indexing data into an Azure Search Index that is produced by a custom skill. This custom skill produces complex data which I want to preserve into the Azure Search Index.
Source data is coming from blob storage and I am constrained to using the REST API without a very solid argument for using the .NET SDK.
Current code
The following is a brief rundown of what I currently have. I cannot change the index's field or the format of data produced by the endpoint used by the custom skill.
Complex data
The following is an example of complex data produced by the custom skill (in the correct value/recordId/etc. format):
{
"field1": 0.135412,
"field2": 0.123513,
"field3": 0.243655
}
Custom skill
Here is the custom skill which creates said data:
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri": "https://myfunction.azurewebsites.com/api,
"httpHeaders": {},
"httpMethod": "POST",
"timeout": "PT3M50S",
"batchSize": 1,
"degreeOfParallelism": 5,
"name": "MySkill",
"context": "/document/mycomplex
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "field1",
"targetName": "field1"
},
{
"name": "field2",
"targetName": "field2"
},
{
"name": "field3",
"targetName": "field3"
}
]
}
I have attempted several variations, notable using the ShaperSkill with each field as an input and the output "targetName" as "mycomplex" (with the appropriate context).
Indexer
Here is the indexer's output field mapping for the skill:
{
"sourceFieldName": "/document/mycomplex,
"targetFieldName": "mycomplex"
}
I have tried several variations such as "sourceFieldName": "/document/mycomplex/*.
Search index
And this is the targeted index field:
{
"name": "mycomplex",
"type": "Edm.ComplexType",
"fields": [
{
"name": "field1",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
},
{
"name": "field2",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
},
{
"name": "field3",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
}
]
}
Result
My result is usually similar to Could not map output field 'mycomplex' to search index. Check your indexer's 'outputFieldMappings' property..
This may be a mistake with the context of your skill. Instead of setting the context to /document/mycomplex, can you try setting it to /document? You can then add a ShaperSkill with the context also set to /document and the output field being mycomplex to generate the expected complex type shape
Example skills:
"skills":
[
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri": "https://myfunction.azurewebsites.com/api,
"httpHeaders": {},
"httpMethod": "POST",
"timeout": "PT3M50S",
"batchSize": 1,
"degreeOfParallelism": 5,
"name": "MySkill",
"context": "/document"
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "field1",
"targetName": "field1"
},
{
"name": "field2",
"targetName": "field2"
},
{
"name": "field3",
"targetName": "field3"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "field1",
"source": "/document/field1"
},
{
"name": "field2",
"source": "/document/field2"
},
{
"name": "field3",
"source": "/document/field3"
}
],
"outputs": [
{
"name": "output",
"targetName": "mycomplex"
}
]
}
]
Please refer to the documentation on shaper skill for specifics.

How to match this query in Azure Search

I have this INDEX
{
"name": "testentities",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"retrievable": true,
"filterable": true,
"sortable": true
},
{
"name": "entity_id",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true,
"searchAnalyzer":"standard",
"indexAnalyzer": "custom_analyzer"
},
{
"name": "description",
"type": "Edm.String",
"searchable": true,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "entity_type",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": true,
"retrievable": true,
"filterable": true
},
{
"name": "ancestors",
"type": "Collection(Edm.String)",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "calendar_id",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "currency",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "timezone",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "active",
"type": "Edm.Boolean",
"retrievable": true,
"facetable": true,
"filterable": true
},
{
"name": "kpi_collection",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "rid",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
}
}
],
"analyzers": [
{
"name": "custom_analyzer",
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"token1",
"tokenFilters": [
"lowercase",
"entityID_stopWords",
"entityID_edgeNGram"
]
}
],
"tokenizers":[
{
"name":"token1",
"#odata.type":"#Microsoft.Azure.Search.StandardTokenizerV2"
}
],
"tokenFilters": [
{
"name": "entityID_edgeNGram",
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram": 1,
"maxGram": 6
},
{
"name": "entityID_stopWords",
"#odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": [
"store",
"region",
"zone",
"field_org",
":"
]
}
]
}
and if i execute this query :
{
"search": "0001",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
i get this result, that is correct , because it matches with name that have hight score after entity id.
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
But if i run this query
{
"search": "1",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
I got this result that is not correct
{
"#search.score": 1.4522386,
"name": "LensCrafters 1622",
"entity_id": "store:1622",
"entity_type": "store",
"description": "31625 Pacific Hwy S, Spc #E-1, Federal Way, 98003-5645, WA, US",
"active": true,
"ancestors": [
"region:1024",
"zone:1107",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
"#search.score": 1.3403159,
"name": "LensCrafters 1178",
"entity_id": "store:1178",
"entity_type": "store",
"description": "1 W FlatIron Crossing Dr #1104, Broomfield, 80021-8881, CO, US",
"active": true,
"ancestors": [
"region:1019",
"zone:1122",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
...............
Why the resulat is not this despite inside scoring profile entity_is has value 9?
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
Here the scoring profile?
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
},
"functions": [],
"functionAggregation": null
}
],.............
You are using a custom analyzer on the entity_id field that produces the following tokens for text store:1178: 1, 11, 117, 1178 (you can test your analyzer configuration with the Analyze API). This means, the documents LensCrafters 1622 and LensCrafters 1178 match the query as well as the document LensCrafters 0001 - they all have 1 in entity_id. However, the documents LensCrafters 1622 and LensCrafters 1178 also match 1 in description. Thus, they have a higher score than LensCrafters 0001.
To learn more about query processing and custom analyzers in Azure Search, please read: How full text search works in Azure Search.
Do you want to keep the edgeNGram token filter in your analysis chain? Why?

Resources