CosmosDB accent (diacritics) insensitive query

CosmosDB accent (diacritics) insensitive query - azure

How do I query CosmosDB ignoring accents?
Example:
{"id": "1", "name": "Émpresa 1" }
{"id": "2", "name": "empresa 2" }
SELECT *
FROM container_name
WHERE name LIKE 'empresa%'
I want to retrieve both records from this query, how can I do that?
If it's not possible "out of the box" then is there any workaround?

Cosmos DB will save data in the specified encoding of your text. As 'É' and 'e' are not the same, I believe you have two options:
1- Replace special chars to it's root form: https://stackoverflow.com/a/2086575/1384539
2- Use a Search Engine (e.g. Azure Cognitive Search) which will to the previous work for you

What you need here is just '%mpresa%' as the first item has starting letter with 'E' while second one has 'e'
SELECT * FROM c WHERE c.name LIKE "%mpresa%"
Demo
[
{
"ItemId": "1",
"name": "Émpresa 1",
"id": "c599d43a-a88b-441f-b6df-361380f05eac",
},
{
"ItemId": "2",
"name": "empresa 2",
"id": "ed894a28-f439-4a23-88e2-80d29ff106e0",
}
]
If you want to perform free text search you may consider the feature supported for SQL API with Azure cognitive search here

Related

Apply a filter on array field of couchDB

I'm working on Hyperledger fabric. I need a particular value from array not a full document in CouchDB.
Example
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
I want only an email :"i2#mail.com" Object on history array, not a full History array.
mango Query:
{
"selector": {
"History": {
"$elemMatch": {
"email": "i2#mail.com"
}
}
}
}
Output:
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
Full History array But needs only the first object of history array.
Can anyone guide me?
Thanks.

I think it's not possible, because rich queries are for retrieving complete records (key-value pairs) according to given selector.
You may want to reconsider your design. For example if you want to hold an history and query from there, this approach may work out:
GetState of your special key my_record.
If key exists:
PutState new value with key my_record.
Enrich old value with additional attributes: {"DocType": "my_history", "time": "789546"}. With the help of these new attributes, it will be possible create indexes and search via querying.
PutState enriched old value with a new key my_record_<uniqueId>
If key doesn't exists, just put your value with key my_record without any new attributes.
With this approach my_record key will always hold latest value. You can query history with any attributes with/out pagination by using indexes (or not, based on your performance concerns).
This approach will also be less space consuming approach. Because if you accumulate history on single key, existing history will be copied to next version every time which means your every entry will consume previous_size + delta, instead of just delta.

Azure Search match against two properties of the same object

I would like to do a query matches against two properties of the same item in a sub-collection.
Example:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "person.1#xpto.org" },
{ "type": "phone", "value": "555-12345" },
]
}
]
I would like to be able to search by emails than contain xpto.org but,
doing something like the following doesn't work:
search.ismatchscoring('email','contacts/type,','full','all') and search.ismatchscoring('/.*xpto.org/','contacts/value,','full','all')
instead, it will consider the condition in the context of the main object and objects like the following will also match:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "555-12345" },
{ "type": "phone", "value": "person.1#xpto.org" },
]
}
]
Is there any way around this without having an additional field that concatenates type and value?

Just saw the official doc. At this moment, there's no support for correlated search:
This happens because each clause applies to all values of its field in
the entire document, so there's no concept of a "current sub-document
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
and https://learn.microsoft.com/en-us/azure/search/search-query-understand-collection-filters

The solution I've implemented was creating different collections per contact type.
This way I'm able to search directly in, lets say, the email collection without the need for correlated search. It might not be the solution for all cases but it works well in this case.

How to get character matches in Azure Search index instead of substrings

I created an Azure index for my DocumentDB collection, and it seems to be working fine. The index has properties for a user account like FirstName, LastName, and Username. The problem is the default tokenizer seems to be tokenizing the Username field. While I want token matches for the first two fields, I'd like character matching for the usernames. Is there an easy way to achieve this through the Azure portal? If not, how can I achieve this?

Adding another answer based on your above comments. So basically in the best case, what you want to do is prefix, suffix and wildcard search. So if the username was user246392, you could find it by typing "use", "392" or even "er246". The prefix is easy, because you could search use* and it would find it.
Kendra Little did a really nice blog post on how to leverage RegEx with Azure Search, which can allow you to do the full wildcard part of your ask (i.e. search for "392").
If you wanted to do the suffix search, you can do a trick that is quite efficient where you create a new field that would be a custom analyzer that would index the words in opposite order. Here is an example of a index schema that would allow this (over suffixName field)
{
"name":"people",
"fields": [
{ "name":"id", "type":"Edm.String", "key":true, "searchable":false },
{"name": "suffixName", "type": "Edm.String", "searchable":true, "indexAnalyzer":"suffixIndexingAnalyzer", "searchAnalyzer":"reverseText"}
],
"analyzers": [
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "suffixIndexingAnalyzer",
"tokenizer": "keyword_v2",
"tokenFilters": [
"asciifolding",
"lowercase",
"reverse",
"my_edgeNGramForSuffix"
],
"charFilters": []
},
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "reverseText",
"tokenizer": "classic",
"tokenFilters": [
"lowercase",
"reverse"
],
"charFilters": []
}
],
"tokenFilters":[
{
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "my_edgeNGramForSuffix",
"minGram": 2,
"maxGram": 25,
"side": "front"
}
]
}

Can you give us an example of what you would want to do over this username field? I am not sure what you mean by character matching. Is it a RegEx based character match? If so, perhaps a custom analyzer that enabled RegEx searched might help for this field? Please note, RegEx is not as performant as typical indexing as we would need to scan the entire content as opposed to going to the inverted index to find token matches.

How to do field mapping in azure search for complex json objects for example nested array

I have following problem
I have a field mapping update to an index .Payload is complex where
I have:
{
"type": "abc",
"Party": [{
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"Type": "Office",
"Number": "12345"
}]
}]
}
And now I want to create a field for an index. The field name is phonenumber of type Collection(Edm.String)
where mapping is
{
"sourceFieldName" : "/Party/Phone/Number",
"targetFieldName" : "phonenumber",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
In http post body
But still after indexing i get phone number result as null.That means the mapping went wrong.If you see the phone number in source json, it is inside a json array and it itself is an array and result needs to get stored inside a collection of a string.Is it possible how can I achieve this?
If this is not possible I atleast want field mapping till phone array ie., /Party/Phone/
If i index complete party array as a text, I get an error while running the index saying:
"Field 'partydetails' contains a term that is too large to process. The max length for UTF-8 encoded terms is 32766 bytes. The most likely cause of this error is that filtering, sorting, and/or faceting are enabled on this field, which causes the entire field value to be indexed as a single term. Please avoid the use of these options for large fields."
Can someone please help!

If party would have been a Json object than an array and phone would have been only a string array for example
{
"type": "abc",
"Party": {
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"12345",
"23463"
}]
}
}
Then I could have mapped
{
"sourceFieldName" : "Party/Phonenumber",
"targetFieldName" : "phonenumbers",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
It map as collection of type odata EDM.string.
So to put this in better and straight forward way,
Either transform your json to something flatter (the example that I
gave above) or
Use the proper index incase if you know before inhand as
#Luis Cabrera said,
“sourceFieldName”: “/Party/0/Phone/0/Type
It is a limitation from azure search side.

Note that Party and Phone are arrays, so the field mapping you mention won't work.
You will need to index into the specific element. For example:
{
"sourceFieldName": "/Party/0/Phone/0/Type",
"targetFieldName": "firstPhoneNumberTypeOfFirstParty"
}
You may want to give that a shot.
Thanks!
Luis Cabrera | Program Manager | Azure Search

Elastic opposite query

I'm using elastic search for classic queries LIKE "search all documents with G4 in name and LG in manfucaturer". This is ok. But what if I have a lot of documents and database with lot of search terms and I need to know which documents match some specific multicolumn terms. For example:
Documents:
[
{
"id": 5787,
"name": "Smartphone G4",
"manufacturer": "LG",
"description": "The revolutionary LG G4 design can only be described as forward thinking—with a classic touch."
},
{
"id": 68779,
"name": "Smartphone S6",
"manufacturer": "Samsung",
"description": "The Samsung Galaxy S6 is powerful to use and beautiful to behold."
}
]
...
Terms:
[
{
"id": "587",
"name": "G4",
"manufacturer": "LG",
"description": "classic touch"
},
{
"id": "364",
"manufacturer": "Samsung",
"description": "galaxy s6"
}
]
...
Result:
{
"587": [5787],
"364": [68779]
}
OR:
{
"5787": [587],
"68779": [364]
}
I need list of documents and list of terms which corresponds them (or oposite). In small amount of terms, it should be possible to apply all rules one by one and save matching documents. But I have milions of documents and thousands of terms. So, it is not possible to aply them one by one. Is it possible in another way?

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html is exactly what I wanted. It can store your queries and execute them against documents.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

CosmosDB accent (diacritics) insensitive query - azure

Related

Apply a filter on array field of couchDB

Azure Search match against two properties of the same object

How to get character matches in Azure Search index instead of substrings

How to do field mapping in azure search for complex json objects for example nested array

Elastic opposite query

Categories

Resources