Azure CosmosDb partition key - different schema - azure

I have an Azure CosmosDB SQP API account with one container "EmployeeContainer", with the partition key "personId". I have three different type of collections in this container. Their schema are as shown below:
Person Collection:
{
"Id": "1234569",
"personId" : "P1241234",
"FirstName": "The first name",
"LarstName": "The last name"
}
Person-Department Collection:
{
"Id": "923456757",
"personId" : "P1241234",
"departmentId": "dept01",
"unitId": "unit01",
"dateOfJoining": "joining date"
}
Department-Employees
{
"id": "678678",
"departmentId" : "dept01",
"departmentName": "IT",
"employees" : [
{ "personId": "P1241234" },
{ "personId": "P1241235" },
{ "personId": "P1241236" },
]
}
How will the data be stored in the logical partitions? PersonId is the partition key and all the collections have personId in it. So, the document in the person collection with the person id "P1241234" and the document in the Person-Department collection with person id "P1241234" will be stored in the same logical partition? How will be the data in the Department-Employees be stored?

This design is not optimal. You should combine Person and Person-Department into a single collection using personId as the partition key, then have a second container for departments that has departmentId as it's partition key with each person as another row and any other properties that you would need when querying that collection. Do not write code that queries both containers. Each should have all the data it needs to satisfy any request you make. Then use change feed to keep them in sync.
You can get more details on how to model this article here

Yes, that is true, documents with the same personId will be stored under the same logical partition (regardless of their type\schema). I'm not sure you can create documents without the partition key on a collection with a partition key, but if you can - all of them should be under the same logical partition (but I dont think you can create them).

Related

How to prevent entering duplicate data in Cosmos DB?

I have a container with id as partition key. Based on some condition, I do not want to enter duplicate data in my container, but I am not sure how to do that in Cosmos. i.e., I tried to create unique keys, but that didn't help me.
Condition:
Record will be duplicate if name + addresses[].city + addresses[].state + addresses[].zipCode are same.
Document:
{
"isActive": false,
"id": "d94d7350-8a5c-4300-b4e4-d4528627ffbe",
"name": "test name",
"addresses": [
{
"address1": "718 Old Greenville Rd",
"address2": "",
"city": "Montrose",
"state": "PA",
"zipCode": "18801",
"audit": {}
}
]
}
Findings:
Per https://stackoverflow.com/a/61317715, unique keys cannot include arrays. Unfortunately, I cannot change the document structure. So unique key approach is not the option.
Questions:
Do I need to change partition key? Not sure if I can have /id#name (or something like that) in Cosmos like Dynamo?
Is there any other way of handling this at DB level?
As a last resort, I can add the logic in my code to do this but that would be expensive in terms of RU/s.

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

How to archive old CosmosDB data to Azure Table using Azure Data Factory when CosmosDB collection documents have different properties?

I'm trying to archive old data from CosmosDB into Azure Tables but I'm very new to Azure Data Factory and I'm not sure what would be a good approach to do this. At first, I thought that this could be done with a Copy Activity but because the properties from my documents stored in the CosmosDB source vary, I'm getting mapping issues. Any idea on what would be a good approach to tackle this archiving process?
Basically, the way I want to store the data is to copy the document root properties as they are, and store the nested JSON as a serialized string.
For example, if I wanted to archive these 2 documents :
[
{
"identifier": "1st Guid here",
"Contact": {
"Name": "John Doe",
"Age": 99
}
},
{
"identifier": "2nd Guid here",
"Distributor": {
"Name": "Jane Doe",
"Phone": {
"Number": "12345",
"IsVerified": true
}
}
}
]
I'd like these documents to be stored in Azure Table like this:
identifier | Contact | Distributor
"Ist Guid here" | "{ \"Name\": \"John Doe\", \"Age\": 99 }" | null
"2nd Guid here" | null | "{\"Name\":\"Jane Doe\",\"Phone\":{\"Number\":\"12345\",\"IsVerified\":true}}"
Is this possible with the Copy Activity?
I tried using the mapping tab inside the CopyActivity, but when I try to run it I get an error saying that the dataType for one of the Nested JSON columns that are not present in the first row cannot be inferred.
Please follow my configuration in Mapping Tag.
Test output with your sample data:

query with multiple values for same attribute in dynamodb nodejs

Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.

DynamoDB requires all fields for ExclusiveStartKey on GSI, why?

I'm trying to implement a cursor-based pagination using DynamoDB (definitely not easy to do pagination in DynamoDB...) using a query request using ExclusiveStartKey.
My table index is made of an "id", while I have a GSI on "owner" (partition key) and "created_at" (range key).
I can easily retrieve the 10 first records using a query request, by specifying the GSI index and using the "owner" property.
However, on next requests, the ExclusiveStartKey only works if I specify the THREE elements from both indices (so "id", "owner" AND "created_at").
While I understand for "id", and "owner" as those are both partitioned key and are needed to "locate" the record, I don't see why DynamoDb requires me to specify the "created_at". This is annoying because this means that the consumer must not only submit the "id" as cursor, but also the "created_at".
As DynamoDb could find the record using the "id" (which is guarantees unique), why do I need to specify this created_at?
Thanks
GSI primary keys are not necessarily unique. Base table keys are necessary to answer the question, "For a given owner and creation date, up to which id did I read in this page?". Put another way, you could have multiple items with the same owner and creation date.
In my testing, querying a gsi on a table resulted in a last evaluated key with all the item properties (essentially gsi key + table key). I needed to add all elements of the last evaluated key to the next request as exclusive start key to get the next page. If I excluded any elements of the last evaluated key in the next request, I received an exclusive start key error.
The following request:
aws dynamodb query --table-name MyTable --index-name MyIndex --key-condition-expression "R = :type" --expression-attribute-values '{\":type\":{\"S\":\"Blah\"}}' --exclusive-start-key '{\"I\":{\"S\":\"9999\"},\"R\":{\"S\":\"Blah\"},\"S\":{\"S\":\"Bluh_999\"},\"P\":{\"S\":\"Blah_9999~Sth\"}}' --limit 1
Resulted in the following response:
{
"Items": [
{
"I": {
"S": "9999"
},
"R": {
"S": "Blah"
},
"S": {
"S": "Bluh_999"
},
"P": {
"S": "Blah_9999~Sth"
}
}
],
"Count": 1,
"ScannedCount": 1,
"LastEvaluatedKey": {
"I": {
"S": "9999"
},
"R": {
"S": "Blah"
},
"S": {
"S": "Bluh_999"
},
"P": {
"S": "Blah_9999~Sth"
}
}
}
If I left off some elements of the last evaluated key, for example (same request as above minus the table partition/sort keys):
aws dynamodb query --table-name MyTable --index-name MyIndex --key-condition-expression "R = :type" --expression-attribute-values '{\":type\":{\"S\":\"Blah\"}}' --exclusive-start-key '{\"I\":{\"S\":\"9999\"},\"R\":{\"S\":\"Blah\"}}' --limit 1
I get the following error:
An error occurred (ValidationException) when calling the Query operation: The provided starting key is invalid

Resources