Having trouble getting the analyzers to save / update on the index. When creating, everything else (the tokenFilters, tokenizers, fields) saves fine, but the analyzers array is always empty?
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
Creating a new index:
let index = {
name: "test-index",
tokenizers: [{
"odatatype": "#Microsoft.Azure.Search.StandardTokenizerV2",
"name": "test_standard_v2",
"maxTokenLength": 255
}],
fields: [{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}, {
'name': 'metadata_storage_name',
'type': 'Edm.String',
'facetable': false,
'filterable': false,
'key': false,
'retrievable': true,
'searchable': true,
'sortable': false,
'synonymMaps': [],
'fields': [],
},
{
"name": "partialName",
"type": "Edm.String",
"retrievable": false,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"key": false,
"searchAnalyzer": "standardCmAnalyzer",
"indexAnalyzer": "filename_analyzer"
}],
tokenFilters: [{
"name": "nGramCmTokenFilter",
"odatatype": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"minGram": 3,
"maxGram": 20
}],
analyzers: [{
"name": "standardCmAnalyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": ["lowercase", "asciifolding"]
},
{
"name": "filename_analyzer",
"odatatype": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "test_standard_v2",
"tokenFilters": [
"nGramCmTokenFilter"
]
}],
};
Then creating it:
await client.createOrUpdateIndex(index, { allowIndexDowntime: true });
I noticed no error messages being returned.
EDIT:
Using the sdk #azure/search-documents ^11.1.0
Related
I have the following interface in typescript:
export interface TypesenseAtlistedProEvent {
// IDs
id: string;
proId: string;
eventId: string;
startTime: Number;
stopTime: Number;
eventRate: Number;
remainingSlots: Number;
displayName: string;
photoURL: string;
indexOptions: string;
location: Number[];
}
and the following schema in Typesense:
{
"created_at": 1665530883,
"default_sorting_field": "location",
"fields": [
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "proId",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "eventId",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "startTime",
"optional": false,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "stopTime",
"optional": false,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "eventRate",
"optional": false,
"sort": true,
"type": "float"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "remainingSlots",
"optional": false,
"sort": true,
"type": "int32"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "displayName",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "photoURL",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "indexOptions",
"optional": false,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "location",
"optional": false,
"sort": true,
"type": "geopoint"
}
],
"name": "atlistedProEventIndex",
"num_documents": 0,
"symbols_to_index": [],
"token_separators": []
}
I look to upsert like the in the following:
const indexedDoc: TypesenseAtlistedProEvent = {
id: proId + eventId,
proId: proId,
eventId: eventId,
startTime: publicEvent.startTime.seconds,
stopTime: publicEvent.stopTime.seconds,
eventRate: publicEvent.eventRate,
remainingSlots: publicEvent.remainingSlots,
displayName: tpi.displayName,
photoURL: tpi.photoURL,
indexOptions: tpi.indexOptions,
location: [tpi.lat, tpi.lng],
};
return await typesenseClient
.collections("atlistedProEventIndex")
.documents()
.upsert(indexedDoc)
.then(() => {
return {success: true, exit: 0};
})
I am getting the following upon the query:
RequestMalformed: Request failed with HTTP code 400 | Server said: [json.exception.type_error.302] type must be number
I am passing it location as Number[], and trying to get that to update the geopoint in typesense. This is not working and thus it would be useful if:
I was able to locate the logs to go through. I would particularly like the logs given by the Typesense Cloud, and am feeling at a loss that I cannot find these.
I would like to pass in the geopoint as the right type in typescript. Right now, as you can see above, the location is of type Number[], which, from the examples I saw, assumed was right. It also may be the case that another field is off and I'm just missing it. Either way, I could really use some kind of server side logging coming from Typesense Cloud.
The error message is a little confusing, but the core of the issue is that the default_sorting_field can only be a numeric field, but it's currently set as a geopoint field (location), which is what that error is trying to convey.
So if you create a new collection without default_sorting_field, the error should not show up.
If you want to sort by geo location, you want to use the sort_by parameter: https://typesense.org/docs/0.23.1/api/geosearch.html#searching-within-a-radius
let searchParameters = {
'q' : '*',
'query_by' : 'title',
'filter_by' : 'location:(48.90615915923891, 2.3435897727061175, 5.1 km)',
'sort_by' : 'location(48.853, 2.344):asc'
}
client.collections('companies').documents().search(searchParameters)
I have a blob with a custom metadata property of jsonmd.
The custom metadata looks something like:
{
"ResourceName": "ipso factum...",
"ResourceVariations": [{
"Description": "ipso factum...",
"Name": "R4.mp4",
"Thumbnail": "R4.jpg",
"URL": ""
},
...
I was able to capture the full json in the index by including a filed in the index:
{
"name": "jsonmd",
"type": "Edm.String",
"facetable": true,
"filterable": true,
...
I want to capture the Thumbnail property and have added this field to the index:
{
"name": "Thumbnail",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "standard.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
I can't figure out how to use the custom metadata (jsonmd) to populate the Thumbnail property of the index?
You can define complex types in your index schema. Below is an example of how I collect metadata from PDF documents that we index. I extract the properties from the PDF using regular C# code, populate a Dictionary and then submit the objects using the Azure Cognitive Search SDK.
For more examples, see Model complex data types in Azure Cognitive Search.
{
"name": "Metadata",
"type": "Edm.ComplexType",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Properties",
"type": "Collection(Edm.ComplexType)",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "Name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Values",
"type": "Collection(Edm.String)",
"facetable": true,
"filterable": true,
"retrievable": true,
"searchable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
]
}
]
}
Consider the following model, where Address has nested property of City
{
"HotelId": "1",
"HotelName": "Secret Point Motel",
"Description": "Ideally located on the main commercial artery of the city in the heart of New York.",
"Tags": ["Free wifi", "on-site parking", "indoor pool", "continental breakfast"],
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY"
},
"Rooms": [
{
"Description": "Budget Room, 1 Queen Bed (Cityside)",
"RoomNumber": 1105,
"BaseRate": 96.99,
},
{
"Description": "Deluxe Room, 2 Double Beds (City View)",
"Type": "Deluxe Room",
"BaseRate": 150.99,
}
. . .
]
}
The model is indexed in Azure Cognitive Search as the following, where the Address is set as Edm.ComplexType
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false },
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Address", "type": "Edm.ComplexType",
"fields": [
{ "name": "StreetAddress", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "searchable": true },
{ "name": "City", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "StateProvince", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true }
]
},
{ "name": "Rooms", "type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "Description", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene" },
{ "name": "Type", "type": "Edm.String", "searchable": true },
{ "name": "BaseRate", "type": "Edm.Double", "filterable": true, "facetable": true }
]
}
]
}
Now I am trying to search on the data for City equals New York using the following queries, but none of them works
city eq 'new york' // return no result
address/city eq 'new york' // return error The property 'address/city' does not exist
address.city eq 'new york' // return error The property 'address.city' does not exist
So then how to search on Edm.ComplexType filed in Azure Cognitive Search?
N.B: I am using Azure Dotnet SDK (10.1.0)
The correct syntax is to define the OData expression in $filter clause. If you were using REST API, your $filter clause would be:
Address/City eq 'New York'
The reason your code is failing is because the actual field path is Address/City whereas you are specifying it as address/city. Once you use the proper field names, your code should work just fine.
I'm trying to make my searches ignore word accents
To do this I decided to use the language analyzer: es.microsoft
I was testing the analyzer with the word "Lámpara" in the analyzer API and I got the following results:
{
"token": "lampara",
"startOffset": 0,
"endOffset": 7,
"position": 0
},
{
"token": "lámpara",
"startOffset": 0,
"endOffset": 7,
"position": 0
}
I have only 2 documents in my test index:
{
"#search.score": 1,
"Id": "2",
"Nombre": "Lampara"
},
{
"#search.score": 1,
"Id": "1",
"Nombre": "Lámpara"
}
When searching for the word in the index search=Lámpara I get the following results:
{
"#search.score": 0.30685282,
"Id": "1",
"Nombre": "Lámpara"
}
For what reason the document is only received with Nombre = "Lámpara" and not Nombre = "Lampara" (without accent). I have the impression that the Name field was not sent to the lexical analysis
The definition of my index is as follows
{
"name": "test",
"fields": [
{
"name": "Id",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Nombre",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "es.microsoft",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
],
"suggesters": [],
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"analyzers": [],
"charFilters": [],
"tokenFilters": [],
"tokenizers": []
}
I would appreciate any help, and an apology for my bad English
Sorry for the delay in getting your an answer. Indeed, the Microsoft Spanish analyzer currently only fold the accents in documents, so they can be matched by queries that forgo the accents (as you mentioned, search for Lampara, it will match documents that contains Lámpara, but if you explicitly set the accents in the query (for example, searching for Lámpara), it won't match documents that don't have any accents.
If this behavior is important to you, you can instead use the es.lucene analyzer which actually actually do "ascii folding" (removing of accents) both at indexing and at search time.
I have this INDEX
{
"name": "testentities",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"retrievable": true,
"filterable": true,
"sortable": true
},
{
"name": "entity_id",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true,
"searchAnalyzer":"standard",
"indexAnalyzer": "custom_analyzer"
},
{
"name": "description",
"type": "Edm.String",
"searchable": true,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "entity_type",
"type": "Edm.String",
"searchable": true,
"sortable": true,
"facetable": true,
"retrievable": true,
"filterable": true
},
{
"name": "ancestors",
"type": "Collection(Edm.String)",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": true,
"filterable": true
},
{
"name": "calendar_id",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "currency",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "timezone",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "active",
"type": "Edm.Boolean",
"retrievable": true,
"facetable": true,
"filterable": true
},
{
"name": "kpi_collection",
"type": "Edm.String",
"searchable": false,
"sortable": false,
"facetable": false,
"retrievable": false,
"filterable": false
},
{
"name": "rid",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
}
}
],
"analyzers": [
{
"name": "custom_analyzer",
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"token1",
"tokenFilters": [
"lowercase",
"entityID_stopWords",
"entityID_edgeNGram"
]
}
],
"tokenizers":[
{
"name":"token1",
"#odata.type":"#Microsoft.Azure.Search.StandardTokenizerV2"
}
],
"tokenFilters": [
{
"name": "entityID_edgeNGram",
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram": 1,
"maxGram": 6
},
{
"name": "entityID_stopWords",
"#odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": [
"store",
"region",
"zone",
"field_org",
":"
]
}
]
}
and if i execute this query :
{
"search": "0001",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
i get this result, that is correct , because it matches with name that have hight score after entity id.
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
But if i run this query
{
"search": "1",
"filter": "entity_type eq 'store' ",
"select":"name,entity_id,entity_type,description,active,ancestors",
"count": "true"
}
I got this result that is not correct
{
"#search.score": 1.4522386,
"name": "LensCrafters 1622",
"entity_id": "store:1622",
"entity_type": "store",
"description": "31625 Pacific Hwy S, Spc #E-1, Federal Way, 98003-5645, WA, US",
"active": true,
"ancestors": [
"region:1024",
"zone:1107",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
"#search.score": 1.3403159,
"name": "LensCrafters 1178",
"entity_id": "store:1178",
"entity_type": "store",
"description": "1 W FlatIron Crossing Dr #1104, Broomfield, 80021-8881, CO, US",
"active": true,
"ancestors": [
"region:1019",
"zone:1122",
"field_org:lenscrafters_na",
"ROOT"
]
},
{
...............
Why the resulat is not this despite inside scoring profile entity_is has value 9?
"#odata.count": 1,
"value": [
{
"#search.score": 1.6654625,
"name": "LensCrafters 0001",
"entity_id": "store:1",
"entity_type": "store",
"description": "2130 Mall Road, Florence, 41042, KY, US",
"active": true,
"ancestors": [
"region:1021",
"zone:1123",
"field_org:lenscrafters_na",
"ROOT"
]
}
]
}
Here the scoring profile?
"scoringProfiles": [
{
"name": "boostEntity",
"text": {
"weights": {
"entity_id": 9,
"name": 8,
"description": 1
}
},
"functions": [],
"functionAggregation": null
}
],.............
You are using a custom analyzer on the entity_id field that produces the following tokens for text store:1178: 1, 11, 117, 1178 (you can test your analyzer configuration with the Analyze API). This means, the documents LensCrafters 1622 and LensCrafters 1178 match the query as well as the document LensCrafters 0001 - they all have 1 in entity_id. However, the documents LensCrafters 1622 and LensCrafters 1178 also match 1 in description. Thus, they have a higher score than LensCrafters 0001.
To learn more about query processing and custom analyzers in Azure Search, please read: How full text search works in Azure Search.
Do you want to keep the edgeNGram token filter in your analysis chain? Why?