Partition Key Vs Document Id in CosmosDB - azure

If I store documents without providing partition key, In this case documentId will be treated as Partition Key of Logical Partition?
If yes: how about Billion logical partitions in that collection? I have query to only look by documentId.
Now inside Document JSON:
I have multiple fields & I have provided /asset as the partitionKey. Is this a composite partition key now: /asset/documentId?
or /asset will tel partition to search for documentId from?

If I store documents without providing partition key, In this case
documentId will be treated as Partition Key of Logical Partition?
No. If you create a document without Partition Key, document id will not be treated as the partition key. Cosmos DB engine will put all the documents without a partition key value in a hidden logical partition. This particular partition can be accessed by specifying partition key as {}.

You define the partition key when you create the collection (according to the screenshot asset is a partition key in your case). If you dont provide a partition key when you create a collection - it will be limited to 10 GB of data (because it wouldn't be able to shard it without partition key).
Only partition key is used to determine the partition of the document. other fields are irrelevant when deciding which partition this document belongs to.

Related

Azure table storage partition key and row key for one key entity

What is the best practice when choosing partition & row key for entities with one important key?
Sample entities:
Device1:
ID: AB1234567
IsRunning: Yes
IsUpdating: No
Device2:
ID: AB7654321
IsRunning: Yes
IsUpdating: Yes
I saw this post that suggests splitting the ID as partition key and row key.
But Azure documentation actually recommends only using partition key when the entity only has one key property. It doesn't say what should be set as the row key though.. should it be empty? Or maybe a default value like '0'?
The expected records is maybe in the tens of thousands. Currently ~10k but growing
PartitionKey in Table Storage
In Table Storage, you need to decide on the PartitionKey yourself. Eventually, you are responsible for the output you will get on your system. If you put every entity in the same partition, you will be limited to the size of the storage machines for the amount of storage you can use. Also, you will be constraining the maximal throughput as there are lots of entities in the same partition.
RowKey in Table Storage
A RowKey in Table Storage is a very important thing: it is “primary key” within a partition. Combination of PartitionKey and RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition.
This article by Maarten Balliauw will help you to decide What is the best practice when choosing partition & row key for entities.

Rsu same fro primary key and non-primary key - cosmos

I have a cosmos db container and currently it has just 100 documents. So when I query with Indexed id (primary key) or non-primary key, then the rsu remains the same.
So, as I have more data, the rsu's will change right ? Or does cosmos calculates based on more data and gives an average ?
I have id (primary key) as some unique ids and I am setting partition keys to be same as id. Because few times, i need to query based on id. But now there is a new requirement to query also on the basis of non-primary key. So, should I add that to be a partition key (again - unique ids) or add a secondary index ?
The data is not yet in production.
At small sizes the choices you are making here likely won't have any impact either way on RU/s costs. It's only when you want to scale your workload that design decisions you make will have an impact.
If you are new to Cosmos DB, I would recommend you watch this presentation on how to model and partition data in Cosmos DB. https://youtu.be/utdxvAhIlcY

choose partition key and clustering key for data to be stored in Cassandra

I am trying to store data in Cassandra but I am confused on what to choose as my partition key and clustering key. I want to eventually be able to do lookups on the guest token. I am new to Cassandra and am researching and still trying to fully understand the partition and clustering keys. Any help would be appreciated. See data below:
"guestToken": "a5vd72860v1575a3g9s1c92314f91r48",
"event": "visit",
"data_pipeline": "Spooline",
"performers": "Busta Rhymes"
"timestamp": "2020-03-20T09:40:25.328972V",
"timeuuid": "bc578m1-c468-08ea-88af-0242ac120003"
"glinkId": "gfcgu44a3-62qf-b0ns-612e563fe88"
It depends on your queries against this table and the amount of data being stored.
If you want to be able to query by "guestToken" only, you could go with "guestToken" as partition key. In this case the guestToken must be unique, otherwise you will overwrite entries with which have the same partition key.

Relationship between node and partition key in cassandra

What is the relationship between a node and partition key in cassandra. According to partition key's hash value the data will be stored in a node, is that mean there is "one to one" relationship in between a node and partition key i.e one node contains only one value of hashed value of partition key or a node can contains multiple hashed value of partition keys.
As I'm new to cassandra got confused in this basic point.
partition keys determine the locality of the data. in a cassandra cluster with RF=1, there will be only a single copy of every item, and all the items with the same partition key will be stored in the same node. depending on your usecase, this can be good or bad.
back to your question: it is NOT true that "one node contains only one value of hashed value of partition key" but rather the other way around: all the items with the same partition key would be stored in one node (along with other partition keys, potentially).
Each Node in cassandra is responsible for range of hash value of partition key (Consistent hashing).
By default casssandra uses MurMur3 partitioner.
So on each node in cassandra there will be multiple partition keys availaible. For same partition key there will be only one record on one node, other copies will be available on other nodes based on replication factor.Consistent Hashing in cassandra

DocumentDB Search on specific collection using EnableCrossPartitionQuery set True?

I created one collection with partition key as "/countryId", in that when i read multiple documents using SQL query i specified FeedOptions { EnableCrossPartitionQuery = true } and query like follows
select * from collection c where (c.countryId=1 or c.countryId=2 or c.countryId=3)
I would like to know how internally execute, i mean
I am specified countryId(partitionKey) in where condition,will it go to that particular partitions only for getting documents?
or
It will go to all partition of collection and check on each document countryId(partitionkey)
Thanks in Advance !!.
The DocumentDB query will execute against only the partitions that match the filter, not all partitions:
The DocumentDB SDK/gateway will retrieve the partition key metadata for the collection and know that the partition key is countryId, as well as the physical partitions, and what ranges for partition key hashes map to which physical partitions.
During query execution, the SDK/gateway will parse the SQL query and detect that there are filters against the partition key. It will hash the values, find the matching partitions based on its owning partition key ranges. For example, countries 1, 2, and 3 may be all in one physical partition, or three different partions.
The query will be executed in series or parallel by the SDK/gateway based on the configured degree of parallelism. If any post-processing like ORDER BY or aggregation is required, then it will be performed by the SDK/gateway.

Resources