DynamoDB requires all fields for ExclusiveStartKey on GSI, why? - pagination

I'm trying to implement a cursor-based pagination using DynamoDB (definitely not easy to do pagination in DynamoDB...) using a query request using ExclusiveStartKey.
My table index is made of an "id", while I have a GSI on "owner" (partition key) and "created_at" (range key).
I can easily retrieve the 10 first records using a query request, by specifying the GSI index and using the "owner" property.
However, on next requests, the ExclusiveStartKey only works if I specify the THREE elements from both indices (so "id", "owner" AND "created_at").
While I understand for "id", and "owner" as those are both partitioned key and are needed to "locate" the record, I don't see why DynamoDb requires me to specify the "created_at". This is annoying because this means that the consumer must not only submit the "id" as cursor, but also the "created_at".
As DynamoDb could find the record using the "id" (which is guarantees unique), why do I need to specify this created_at?
Thanks

GSI primary keys are not necessarily unique. Base table keys are necessary to answer the question, "For a given owner and creation date, up to which id did I read in this page?". Put another way, you could have multiple items with the same owner and creation date.

In my testing, querying a gsi on a table resulted in a last evaluated key with all the item properties (essentially gsi key + table key). I needed to add all elements of the last evaluated key to the next request as exclusive start key to get the next page. If I excluded any elements of the last evaluated key in the next request, I received an exclusive start key error.
The following request:
aws dynamodb query --table-name MyTable --index-name MyIndex --key-condition-expression "R = :type" --expression-attribute-values '{\":type\":{\"S\":\"Blah\"}}' --exclusive-start-key '{\"I\":{\"S\":\"9999\"},\"R\":{\"S\":\"Blah\"},\"S\":{\"S\":\"Bluh_999\"},\"P\":{\"S\":\"Blah_9999~Sth\"}}' --limit 1
Resulted in the following response:
{
"Items": [
{
"I": {
"S": "9999"
},
"R": {
"S": "Blah"
},
"S": {
"S": "Bluh_999"
},
"P": {
"S": "Blah_9999~Sth"
}
}
],
"Count": 1,
"ScannedCount": 1,
"LastEvaluatedKey": {
"I": {
"S": "9999"
},
"R": {
"S": "Blah"
},
"S": {
"S": "Bluh_999"
},
"P": {
"S": "Blah_9999~Sth"
}
}
}
If I left off some elements of the last evaluated key, for example (same request as above minus the table partition/sort keys):
aws dynamodb query --table-name MyTable --index-name MyIndex --key-condition-expression "R = :type" --expression-attribute-values '{\":type\":{\"S\":\"Blah\"}}' --exclusive-start-key '{\"I\":{\"S\":\"9999\"},\"R\":{\"S\":\"Blah\"}}' --limit 1
I get the following error:
An error occurred (ValidationException) when calling the Query operation: The provided starting key is invalid

Related

How to prevent entering duplicate data in Cosmos DB?

I have a container with id as partition key. Based on some condition, I do not want to enter duplicate data in my container, but I am not sure how to do that in Cosmos. i.e., I tried to create unique keys, but that didn't help me.
Condition:
Record will be duplicate if name + addresses[].city + addresses[].state + addresses[].zipCode are same.
Document:
{
"isActive": false,
"id": "d94d7350-8a5c-4300-b4e4-d4528627ffbe",
"name": "test name",
"addresses": [
{
"address1": "718 Old Greenville Rd",
"address2": "",
"city": "Montrose",
"state": "PA",
"zipCode": "18801",
"audit": {}
}
]
}
Findings:
Per https://stackoverflow.com/a/61317715, unique keys cannot include arrays. Unfortunately, I cannot change the document structure. So unique key approach is not the option.
Questions:
Do I need to change partition key? Not sure if I can have /id#name (or something like that) in Cosmos like Dynamo?
Is there any other way of handling this at DB level?
As a last resort, I can add the logic in my code to do this but that would be expensive in terms of RU/s.

Append string to existing string values in AWS dynamo db table using boto3

I am using boto3 to update AWS dynamo db table.
Requirement is to append 'abctest1.sh' to existing value for key 'init_script_name'. That means the updated value for 'init_script_name' should be 'abctest.sh,abctest1.sh' for all items in the dynamo db table.
"system_configuration": {
"L": [
{
"M": {
"name": {
"S": "Desktop"
},
"configuration": {
"L": [
{
"M": {
"key": {
"S": "DiskSize"
},
"value": {
"S": "RootVolume"
}
}
},
{
"M": {
"key": {
"S": "init_script_name"
},
"value": {
"S": "abctest.sh"
}
}
}
]
}
}
}
]
}
I tried below code block to update the key. But it is replacing existing 'abctest.sh' with abctest1.sh, instead of appending or concatenating existing value with new value
updResponse=dyd.update_item(TableName='test1',
Key={
"name":{"S":"Linux"}
},
UpdateExpression="SET #sys_config[0].#config[1].#val= :updVal",
ExpressionAttributeNames={"#sys_config": "system_configuration",
"#config":"configuration",
"#val":"value" },
ExpressionAttributeValues={":updVal": {"S":"abctest1.sh"}}
)
Dynamo doesn't have a concat string functionality like this. You can increment numbers or add items to a list or map but not a string. Strings are replace only
see: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html
You will need to get a copy of the item first then update that copy and push the updated copy. Ie: get the item, store as a dict. Update the key you need to update with the new value + old value, then UpdateItem with this new concatted key.
If you have to do more than a few you definitely want to look into the batch writing: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/dynamodb.html#batch-writing
If this is a task you will be performing a lot, you might be better served by changing that from a "S" string to a "L" list
{"L": [ { "S": "abctest1.sh" },{ "S": "abctest2.sh" }]}}

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

Azure CosmosDb partition key - different schema

I have an Azure CosmosDB SQP API account with one container "EmployeeContainer", with the partition key "personId". I have three different type of collections in this container. Their schema are as shown below:
Person Collection:
{
"Id": "1234569",
"personId" : "P1241234",
"FirstName": "The first name",
"LarstName": "The last name"
}
Person-Department Collection:
{
"Id": "923456757",
"personId" : "P1241234",
"departmentId": "dept01",
"unitId": "unit01",
"dateOfJoining": "joining date"
}
Department-Employees
{
"id": "678678",
"departmentId" : "dept01",
"departmentName": "IT",
"employees" : [
{ "personId": "P1241234" },
{ "personId": "P1241235" },
{ "personId": "P1241236" },
]
}
How will the data be stored in the logical partitions? PersonId is the partition key and all the collections have personId in it. So, the document in the person collection with the person id "P1241234" and the document in the Person-Department collection with person id "P1241234" will be stored in the same logical partition? How will be the data in the Department-Employees be stored?
This design is not optimal. You should combine Person and Person-Department into a single collection using personId as the partition key, then have a second container for departments that has departmentId as it's partition key with each person as another row and any other properties that you would need when querying that collection. Do not write code that queries both containers. Each should have all the data it needs to satisfy any request you make. Then use change feed to keep them in sync.
You can get more details on how to model this article here
Yes, that is true, documents with the same personId will be stored under the same logical partition (regardless of their type\schema). I'm not sure you can create documents without the partition key on a collection with a partition key, but if you can - all of them should be under the same logical partition (but I dont think you can create them).

query with multiple values for same attribute in dynamodb nodejs

Is there any way to query a dynamodb table with multiple values for a single attribute?
TableName: "sdfdsgfdg"
IndexName: 'username-category-index',
KeyConditions: {
"username": {
"AttributeValueList": { "S": "aaaaaaa#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"username": {
"AttributeValueList": { "S": "hhhhh#gmail.com" }
,
"ComparisonOperator": "EQ"
},
"category": {
"AttributeValueList": { "S": "Coupon" }
,
"ComparisonOperator": "EQ"
}
}
BachGetItem API can be used to get multiple items from DynamoDB table. However, it can't be used in your use case as you are getting the data from index.
The BatchGetItem operation returns the attributes of one or more items
from one or more tables. You identify requested items by primary key.
In API perspective, there is no other solution. You may need to look at data modelling perspective and design the table/index to satisfy your Query Access Pattern (QAP).
Also, please note that querying the index multiple times with partition key values (i.e. some small number) wouldn't impact the performance as long as it is handful of items.

Resources