Cosmos DB SQL Query how to count sub properties? - azure

I have these kind of json documents in a CosmosDB database.
{
"Version": 0,
"Entity": {
"ID": "xxxxxxx",
"EventHistory": {
"2020-04-28T16:30:35.6887561Z": "NEW",
"2020-04-28T16:35:21.1811993Z": "PROCESSED"
},
"SourceSystem": "xxxx",
"SourceSystemIdentifier": "xxxx",
"PCC": "xxx",
"StorageReference": "xxxxxxxxxxxx",
"SupplementaryData": {
"eTicketCount": "2"
}
}
}
The number of sub-properties within the EventHistory node is dynamic. In the example there are two but it can be any number.
I couldn't find a way to count how many sub-properties the node contains. At least, I need to query those whose have only one property declared.
FYI: I'm not able to change the format of the documents. I know that it would be more convenient to store them as an array.
I tried to use ARRAY_LENGTH or COUNT functions but since it's not an array, the formers couldn't be applied.

Related

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

I have a Cosmos DB with documents that look like the following:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
}
I would like to write a sql query to obtain an entire document using "identifierLabel" as a filter when searching for the document.
I attempted to write a query based on an example I found from the following blog:
SELECT c,t AS identifiers
FROM c
JOIN t in c.identifiers
WHERE t.identifierLabel = "someLabel2"
However, when the result is returned, it appends the following to the end of the document:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
How can I avoid this and get the result that I desire, i.e. the entire document with nothing appended to it?
Thanks in advance.
Using ARRAY_CONTAINS(), you should be able to do something like this to retrieve the entire document, without any need for a self-join:
SELECT *
FROM c
where ARRAY_CONTAINS(c.identifiers, {"identifierLabel":"someLabel2"}, true)
Note that ARRAY_CONTAINS() can search for either scalar values or objects. By specifying true as the third parameter, it signifies searching through objects. So, in the above query, it's searching all objects in the array where identifierLabel is set to "someLabel2" (and then it should be returning the original document, unchanged, avoiding the issue you ran into with the self-join).

How to prevent entering duplicate data in Cosmos DB?

I have a container with id as partition key. Based on some condition, I do not want to enter duplicate data in my container, but I am not sure how to do that in Cosmos. i.e., I tried to create unique keys, but that didn't help me.
Condition:
Record will be duplicate if name + addresses[].city + addresses[].state + addresses[].zipCode are same.
Document:
{
"isActive": false,
"id": "d94d7350-8a5c-4300-b4e4-d4528627ffbe",
"name": "test name",
"addresses": [
{
"address1": "718 Old Greenville Rd",
"address2": "",
"city": "Montrose",
"state": "PA",
"zipCode": "18801",
"audit": {}
}
]
}
Findings:
Per https://stackoverflow.com/a/61317715, unique keys cannot include arrays. Unfortunately, I cannot change the document structure. So unique key approach is not the option.
Questions:
Do I need to change partition key? Not sure if I can have /id#name (or something like that) in Cosmos like Dynamo?
Is there any other way of handling this at DB level?
As a last resort, I can add the logic in my code to do this but that would be expensive in terms of RU/s.

How to do field mapping in azure search for complex json objects for example nested array

I have following problem
I have a field mapping update to an index .Payload is complex where
I have:
{
"type": "abc",
"Party": [{
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"Type": "Office",
"Number": "12345"
}]
}]
}
And now I want to create a field for an index. The field name is phonenumber of type Collection(Edm.String)
where mapping is
{
"sourceFieldName" : "/Party/Phone/Number",
"targetFieldName" : "phonenumber",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
In http post body
But still after indexing i get phone number result as null.That means the mapping went wrong.If you see the phone number in source json, it is inside a json array and it itself is an array and result needs to get stored inside a collection of a string.Is it possible how can I achieve this?
If this is not possible I atleast want field mapping till phone array ie., /Party/Phone/
If i index complete party array as a text, I get an error while running the index saying:
"Field 'partydetails' contains a term that is too large to process. The max length for UTF-8 encoded terms is 32766 bytes. The most likely cause of this error is that filtering, sorting, and/or faceting are enabled on this field, which causes the entire field value to be indexed as a single term. Please avoid the use of these options for large fields."
Can someone please help!
If party would have been a Json object than an array and phone would have been only a string array for example
{
"type": "abc",
"Party": {
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"12345",
"23463"
}]
}
}
Then I could have mapped
{
"sourceFieldName" : "Party/Phonenumber",
"targetFieldName" : "phonenumbers",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
It map as collection of type odata EDM.string.
So to put this in better and straight forward way,
Either transform your json to something flatter (the example that I
gave above) or
Use the proper index incase if you know before inhand as
#Luis Cabrera said,
“sourceFieldName”: “/Party/0/Phone/0/Type
It is a limitation from azure search side.
Note that Party and Phone are arrays, so the field mapping you mention won't work.
You will need to index into the specific element. For example:
{
"sourceFieldName": "/Party/0/Phone/0/Type",
"targetFieldName": "firstPhoneNumberTypeOfFirstParty"
}
You may want to give that a shot.
Thanks!
Luis Cabrera | Program Manager | Azure Search

query with multiple values for same attribute in dynamodb nodejs

Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.

Count duplicate values via Elasticsearch terms aggregation

I am trying to run an Elasticsearch terms aggregation on multiple fields of the documents in my index. Each document contains multiple fields with hashtags, which can be extracted using a custom hashtag analyzer. The goal is to find the most common hashtags in the system.
As stated in the Elasticsearch documentation, it is not possible to run a terms aggregation on multiple fields of a document. I am thus trying to use a copy_to field. The problem now is, that if the document contains the same hashtag in multiple fields, it should count the term multiple times. This is not the case with the default terms aggregation:
Given Mapping:
{
"properties": {
"field_one": {
"type": "string",
"copy_to": "hashtags"
},
"field_two": {
"type": "string",
"copy_to": "hashtags"
}
}
Given Document:
{
"field_one": "Hello #World",
"field_two": "One #World",
}
The aggregation will return a single bucket {"key": "#World", "doc_count": 1}. What I need is a single bucket {"key": "#World", "doc_count": 2}.

Resources