Azure Cosmos DB - Can I use a JSON field that doesn't exist for all documents as my partition key? - azure

I am trying to setup a new Cosmos DB and it's asking me to set a partition key. I think I understand the concept where I should select a JSON field that can group my documents efficiently.
Is it possible to configure the collection to use a JSON field that may not exist in every incoming document?
For example:
{
"name" : "Robin",
"DOB" : "01/01/1969",
"scans" : {
"bloodType" : "O"
}
}
{
"name" : "Bill",
"DOB" : "01/01/1969"
}
Can I use /scans.bloodType as the partion key? For documents that don't have a scans JSON field, I still want that data as I can update that document later.

You can, indeed, specify a partition key that might not exist in every document. When you save a document that's missing the property specified by your partition key, it will result in assigning an "undefined" value for its partition key.
In the future, if you wanted to provide a value for the partition key of such a document, you'd have to delete, and then re-add, the document. You cannot modify a property's value when it happens to be the partition key within that document's container (nor can you add the property to an existing document that doesn't explicitly have that property defined, since it's already been assigned the "undefined" value).
See this answer for more details on undefined partition key properties.

Unfortunately you can't do that.
As per the official docs the partition key path (/scans.bloodType) or the partition key value cannot be changed.
Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value.
In terms of solutions, you could either try and find another partition Key property path and ensure there's a value at the time of creation, or maybe use a secondary collection to store your incomplete documents and use the change feed to "move" them to the final collection once all the data becomes available.

Something like blood type wouldn't be a good partition key, assuming it refers to A/B/O types, etc, if there are only a small number of possible values. What you can generally always fall back on is the unique id property of the item, which Cosmos creates automatically. It's fine to have one item per partition, if nothing else is a better candidate. Per docs:
If your container has a property that has a wide range of possible
values, it is likely a great partition key choice. One possible
example of such a property is the item ID. For small read-heavy
containers or write-heavy containers of any size, the item ID is
naturally a great choice for the partition key.

Related

Name the primary key of Container in Cosmbos DB Sql Api different than "id"

I am working on Azure and I have created a database with Cosmos DB SQL API. After creating a container I see that the primary key is always named "id". Is there any way to create a container with PK with name different than "id"?
Every document has a unique id called id. This cannot be altered. If you don't set a value, it is assigned a GUID.
When using methods such as ReadDocument() (or equivalent, based on the SDK) for direct reads instead of queries, a document's id property must be specified.
Now, as far as partitioning goes, you can choose any property you want to use as the partition key.
And if you have additional domain-specific identifiers (maybe a part number), you can always store those in their own properties as well. In this example though, just remember that, while you can query for documents containing a specific part number, you can only do a direct-read via id. If direct reads will be the majority of your read operations, then it's worth considering the use of id to store such a value.
PK = primary key, by the way ;) It's the "main" key for the record, it is not private :)
And no, as far as I know you cannot change the primary key name. It is always "id". Specifically it must be in lowercase, "Id" is not accepted.

Using set as Primary Key in Dynamodb Index

I have created a Dynamodb model where I have set an attribute to a set using a combination of three separate id's and another attribute which takes in timestamp. The idea was to create a GIS index on these two with the set attribute as the primary key and timestamp as the sort key. While using the "equality" operator for KeyConditionExpression, I am unable to fetch the data. Not sure what the issue is. So if somebody can guide me whether I am following the right approach or I am missing something.
Below is the set attribute value sample
{ "291447cb-f7a5-4627-9a7e-ac7b4adf9xce", "21", "d2e5723a-437a-4517-9f4b-1a62575224d6" }
DynamoDB can only use keys of scalar types (single value string, number or binary). What you could do is concatenate the values into a string for your key (e.g. "291447cb-f7a5-4627-9a7e-ac7b4adf9xce:21:d2e5723a-437a-4517-9f4b-1a62575224d6").
Don't forget in your table you'd need to store this concatenated key so it can be used in your GSI. And you'd need to make sure it's updated / kept in sync with the set as per your requirements.

DocumentDB REST API: x-ms-documentdb-partitionkey is invalid

I am attempted to get a Document from DocumentDB using the REST API. I am using a partitioned Collection and therefore need to add the "x-ms-documentdb-partitionkey" header. If I add this, I get "Partition key abc is invalid". I can't find anywhere in the documentation that expects the key to be in specific format, but simply supplying the expected string value does not work. Does anyone know the expected format?
Partition key must be specified as an array (with a single element). For example:
x-ms-documentdb-partitionkey: [ "abc" ]
Partition key for a partitioned collection is actually the path to a property in DocumentDB. Thus you would need to specify it in the following format:
/{path to property name} e.g. /department
From Partitioning and scaling in Azure DocumentDB:
You must pick a JSON property name that has a wide range of values and
is likely to have evenly distributed access patterns. The partition
key is specified as a JSON path, e.g. /department represents the
property department.
More examples are listed in the link as well.

HazelCast - How to insert records through a Map to a table with auto sequence field as PK

I am using HazelCast to do read/write operations on MS SQL server database.
I have a database table whose primary key is an auto increment column.
I read existing data in this table to a Map when my application starts.
The Map has an underlying MapStore implementation.
The key of the map is the auto increment column value. The value of the map is an object containing few fields from the table.
User can insert records to this table through the HazelCast layer.
This insert is performed in a Transaction (TransactionalMap is obtained from HazelCast).
It is at this place i face a problem as i do not have the primary key value (auto increment value) to set to the TransactionalMap instance.
I have the insert to database table logic in the MapStore's store() method and the store() method is called only after commit is called on the Transaction.
I am not getting how to first set the key value (auto increment value) to the map. Auto increment value can be obtained post the insert to the table.
Ideas/Points are most welcome.
Thanks,
s.r.guruprasad
You can implement PostProcessingMapStore interface on your MapStore which enables the ability to update the stored entry inside the store() method. You can obtain the auto-generated fields from the database then you can reflect those changes to your entry.
See Post Processing documentation : http://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#post-processing-objects-in-map-store
The only way I can see to do what you ask is to have your code write a demo record into the database and then read it back before into Hz and making MapStore do an update rather than insert.
Of course this would be slow and cumbersome.
The best solution would be to turn off the autoincrement, but what you could also do is use a different field as the cache key, say have a member called cacheKey and store that in your database record, when you do an insert in MapStore you just need insert where databaserecord.cachekey == cacheKey.

In Amazon DynamoDB creating a version of the table items

I have a item such as:
"Item": {
"model": 32590038899,
"date": 10-09-2015,
"price":100
}
"hash_key_attribute_name": "model"
"range_key_attribute_name": "date"
My problem is that new items gets inserted a lot so there is chance that items with same model number may come and they may not be same product.This may due to regions where the product is available.So i need a setup where i need to keep a copy of the product, if such a case arises and in future after inspection or requirement i can bring back the item. I am looking for a kind of version system. Currently the product gets deleted due to same primary key.
Put simply: if your primary key isn't unique, then you aren't using the correct field as your primary key.
...there is chance that items with same model number may come and they
may not be same product.This may due to regions where the product is
available...
It sounds like your primary key needs to be a composite of region and product ID. Or perhaps you need a separate table for each region.
I would add a unique id attribute to your item, a uuid works well, such as
{ id: "53f382ae-94b8-4910-b8f1-384f46dc10d8",
model: 32590038899,
date: "10-09-2015",
price:100
}
Change your table schema to have just a hash key attribute name - id.
Add a global secondary index with hash key - model, range key - date.
Global secondary indexes can contain items where the primary keys collide. With this schema you will prevent items from being overwritten and you can query for items with a given model number.

Resources