How to prevent entering duplicate data in Cosmos DB? - node.js

I have a container with id as partition key. Based on some condition, I do not want to enter duplicate data in my container, but I am not sure how to do that in Cosmos. i.e., I tried to create unique keys, but that didn't help me.
Condition:
Record will be duplicate if name + addresses[].city + addresses[].state + addresses[].zipCode are same.
Document:
{
"isActive": false,
"id": "d94d7350-8a5c-4300-b4e4-d4528627ffbe",
"name": "test name",
"addresses": [
{
"address1": "718 Old Greenville Rd",
"address2": "",
"city": "Montrose",
"state": "PA",
"zipCode": "18801",
"audit": {}
}
]
}
Findings:
Per https://stackoverflow.com/a/61317715, unique keys cannot include arrays. Unfortunately, I cannot change the document structure. So unique key approach is not the option.
Questions:
Do I need to change partition key? Not sure if I can have /id#name (or something like that) in Cosmos like Dynamo?
Is there any other way of handling this at DB level?
As a last resort, I can add the logic in my code to do this but that would be expensive in terms of RU/s.

Related

Cosmos DB SQL Query how to count sub properties?

I have these kind of json documents in a CosmosDB database.
{
"Version": 0,
"Entity": {
"ID": "xxxxxxx",
"EventHistory": {
"2020-04-28T16:30:35.6887561Z": "NEW",
"2020-04-28T16:35:21.1811993Z": "PROCESSED"
},
"SourceSystem": "xxxx",
"SourceSystemIdentifier": "xxxx",
"PCC": "xxx",
"StorageReference": "xxxxxxxxxxxx",
"SupplementaryData": {
"eTicketCount": "2"
}
}
}
The number of sub-properties within the EventHistory node is dynamic. In the example there are two but it can be any number.
I couldn't find a way to count how many sub-properties the node contains. At least, I need to query those whose have only one property declared.
FYI: I'm not able to change the format of the documents. I know that it would be more convenient to store them as an array.
I tried to use ARRAY_LENGTH or COUNT functions but since it's not an array, the formers couldn't be applied.

Azure CosmosDb partition key - different schema

I have an Azure CosmosDB SQP API account with one container "EmployeeContainer", with the partition key "personId". I have three different type of collections in this container. Their schema are as shown below:
Person Collection:
{
"Id": "1234569",
"personId" : "P1241234",
"FirstName": "The first name",
"LarstName": "The last name"
}
Person-Department Collection:
{
"Id": "923456757",
"personId" : "P1241234",
"departmentId": "dept01",
"unitId": "unit01",
"dateOfJoining": "joining date"
}
Department-Employees
{
"id": "678678",
"departmentId" : "dept01",
"departmentName": "IT",
"employees" : [
{ "personId": "P1241234" },
{ "personId": "P1241235" },
{ "personId": "P1241236" },
]
}
How will the data be stored in the logical partitions? PersonId is the partition key and all the collections have personId in it. So, the document in the person collection with the person id "P1241234" and the document in the Person-Department collection with person id "P1241234" will be stored in the same logical partition? How will be the data in the Department-Employees be stored?
This design is not optimal. You should combine Person and Person-Department into a single collection using personId as the partition key, then have a second container for departments that has departmentId as it's partition key with each person as another row and any other properties that you would need when querying that collection. Do not write code that queries both containers. Each should have all the data it needs to satisfy any request you make. Then use change feed to keep them in sync.
You can get more details on how to model this article here
Yes, that is true, documents with the same personId will be stored under the same logical partition (regardless of their type\schema). I'm not sure you can create documents without the partition key on a collection with a partition key, but if you can - all of them should be under the same logical partition (but I dont think you can create them).

Apply a filter on array field of couchDB

I'm working on Hyperledger fabric. I need a particular value from array not a full document in CouchDB.
Example
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
I want only an email :"i2#mail.com" Object on history array, not a full History array.
mango Query:
{
"selector": {
"History": {
"$elemMatch": {
"email": "i2#mail.com"
}
}
}
}
Output:
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
Full History array But needs only the first object of history array.
Can anyone guide me?
Thanks.
I think it's not possible, because rich queries are for retrieving complete records (key-value pairs) according to given selector.
You may want to reconsider your design. For example if you want to hold an history and query from there, this approach may work out:
GetState of your special key my_record.
If key exists:
PutState new value with key my_record.
Enrich old value with additional attributes: {"DocType": "my_history", "time": "789546"}. With the help of these new attributes, it will be possible create indexes and search via querying.
PutState enriched old value with a new key my_record_<uniqueId>
If key doesn't exists, just put your value with key my_record without any new attributes.
With this approach my_record key will always hold latest value. You can query history with any attributes with/out pagination by using indexes (or not, based on your performance concerns).
This approach will also be less space consuming approach. Because if you accumulate history on single key, existing history will be copied to next version every time which means your every entry will consume previous_size + delta, instead of just delta.

query with multiple values for same attribute in dynamodb nodejs

Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.

How to delete specific object under document in DocumentDB?

I am using documentDB as backend for my project.
I have created a collection named ResellerCollection.
Under it I added document as Reseller with Id's assign to it.
Now under Reseller document I have added a list of Customer and now I wanted to delete customer of reseller document by specific Id.
My JSON generated in documentDB is as follow.
{
"id": "73386791-5895-4a56-9108-df4a773331fe",
"Name": "Nadeem",
"PrimaryContact": "1234",
"Address": "bhusari clny",
"City": "pune",
"State": "maharashtra",
"Country": "india",
"ZipCode": "222",
"Telephone": "45234343",
"Email": "abc#xyz.com",
"Website": "asdfsd.com",
"Customer": [
{
"id": "4acf3ca9-f9e4-4117-a471-7ce8f905baec",
"FullName": "Test Cust1",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "45234343",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
},
{
"id": "e0d6d099-3d5d-4776-9b84-14b7ae0b9911",
"FullName": "Test Cust2",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "sadf",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
}
],
"UserId": "f807f027-2e21-45b1-b786-e4d2b3d677cb",
"_rid": "+JBQAOQWHQENAAAAAAAAAA==",
"_self": "dbs/+JBQAA==/colls/+JBQAOQWHQE=/docs/+JBQAOQWHQENAAAAAAAAAA==/",
"_etag": "\"0a004764-0000-0000-0000-583bd8b50000\"",
"_attachments": "attachments/",
"_ts": 1480317104
}
Please suggest me how to write delete function for customer in MVC. Whether I should write delete for it to delete specific customer or whether I have to update the whole customer list?
Two ways of doing this.
Implement "AddCustomer" and "RemoveCustomer" stored procedures in DocumentDB. These stored procedures will read the reseller document, then append or remove the new customer passed in as argument. Then just call ExecuteStoredProcedureAsync within your controller.
Instead of the stored procedure approach, perform three steps within the controller. Read the document, make changes, and replace the document.
In both implementations, you'll probably want to use the ETag to perform a conditional write to avoid any clobbering across multiple clients/writers.
As Larry and David pointed out, you should also consider different JSON modeling options such as storing the customers data as separate documents, or by storing only the IDs within the array vs. all relevant customer data.
The only way to delete an element of an array (or any other change to the document) is to fetch the entire document, modify it, then create/upsert the entire document. You can do this client-side or in a stored procedure.
Try not to think of collections as tables in traditional databases or collections in MongoDB. I used to say, think of them as "partitions" but with partitioned collections, even that distinction is not useful. I use a single partitioned collection for everything now. I distinguish document types by having an element in each document: isReseller = true or isCustomer = true for your example. Alternatively, you can do type = 'Reseller' or type = 'Customer'. However, if the customer of one reseller is ever a reseller themselves, the former will allow you to add both is___ = true and the latter will not.
What David Makogon says is definitely a worry. Unless Resellers are restricted to a small number of Customers, you are better off storing them separately and having a foreign key link them. That way, deleting or adding one is a single step. However, getting the Reseller and all its Customers will be two round trips.

Resources