Does a Cosmos DB physical partition hold 50GB or 100GB? - azure

According to Microsoft's documentation on physical partitions, each one can hold 50GB and supply 10,000RU of throughput. Provisioning physical partitions involves initialising the container in increments of 10,000RU max scale.
However, looking at their documentation on Autoscale and storage limits it claims that a max scale of 50,000RU can hold 500GB of data, double the amount stated that 5 partitions should be able to hold.
These two statements seem to be in conflict with each other. Does each partition actually hold 100GB, not 50GB?

It is still 50GB. I believe what the 2nd link does not talk about is the number of physical partitions it would create.
Each physical partition has a limit of 10,000 RU and 50GB so if your storage is 500GB (and max throughput is 50,000 RU), there would be 10 physical partitions where each partition has 5,000 RU and 50GB.
From this link:
When you first select the max RU/s, Azure Cosmos DB will provision:
Max RU/s / 10,000 RU/s = # of physical partitions. Each physical
partition can support up to 10,000 RU/s and 50 GB of storage. As
storage size grows, Azure Cosmos DB will automatically split the
partitions to add more physical partitions to handle the storage
increase, or increase the max RU/s if storage exceeds the associated
limit.

The maximum size of a physical partition is 50GB.
What your second article is describing is how the minimum RU is automatically adjusted based on the size of your container or database. The minimum RU is the bigger value between 400RU/s and the size in GB * 10RU/s (and 1/100 * the max you've ever set).
A few examples what that means for different database sizes:
1GB -> 400RU/s
30GB -> 400RU/s
50GB -> 500RU/s
500GB -> 5000RU/s
600GB -> 6000RU/s
(examples assumes you don't participate in a special program)
When using autoscale you can define a lower/upper limit with 10x difference. You however still have to consider the previous rule. Once you go from 500GB to 600GB in size your autoscale rule also automatically adjusts its lower and upper limit following the ruleset above based on the lower limit.

Related

Azure table storage: Partitioning strategy and performance targets

I am trying to come up with partitioning strategy for Azure Table storage.
I use Scalability and performance targets for Table storage as a primary source for my performance estimation. The document provides 2 numbers:
Maximum request rate per storage account 20,000 transactions per second, which assumes a 1-KiB entity size
Target throughput for a single table partition (1 KiB-entities) Up to 2,000 entities per second
Does it mean that I won't get much throughput benefits from having more than 10-ish partitions since TPS is caped by 10x 1 partition TPS limit?
Actually I am also wondering if they use "transactions" and "entities" interchangeably in the doc. Does the first number applicable to batch transaction or for batches I should divide 20000 by number of transactions in batch.
It is true that having more 10 partitions will not give you throughput benefits but that is only if all 10 partitions are are running at max throughput i.e 2000 tps. If any partition is not running at max throughput, then you will be under utilizing your table storage. This is why It is recommended HERE that
For optimal load balancing of traffic, you should use more partitions
so that Azure Table storage can distribute the partitions to more
partition servers.
We just worry about the partitioning/partition key and Azure will handle the load balancing upto the max throughput per partition/storage acct.

Partition key for a lookup table in Azure Cosmos DB Tables

I have a very simple lookup table I want to call from an Azure function.
Schema is incredibly simple:
Name | Value 1 | Value 2
Name will be unique, but value 1 and value 2 will not be. There is no other data in the lookup table.
For an Azure Table you need a partition key and a row key. Obviously the rowkey would be the Name field.
What exactly should I use for Partition Key?
Right now, I'm using a constant because there won't be a ton of data (maybe a couple hundred rows at most) but using a constant seems to go against the point.
This answer applies to all Cosmos DB containers, including Tables.
When does it make sense to store your Cosmos DB container in a single partition (use a constant as the partition key)?
If you are sure the data size of your container will always remain well under 10GB.
If you are sure the throughput requirement for your container will always remain under 10,000 RU/s (RU per second).
If either of the above conditions are false, or if you are not sure about future growth of data size or throughput requirements then using a partition key based on the guidelines below will allow the container to scale.
How partitioning works in Cosmos DB
Cosmos groups container items into a set of logical partitions based on the partition key. These logical partitions are then mapped to physical partitions. A physical partition is the unit of compute/storage which makes up the underlying database infrastructure.
You can determine how your data is split into logical partitions by your choice of partition key. You have no control over how your logical partitions are mapped to physical partitions, Cosmos handles this automatically and transparently.
Distributing your container across a large number of physical partitions is the way Cosmos allows the container to scale to virtually unlimited size and throughput.
Each logical partition can contain a maximum of 10GB of data. An unpartitioned container can have a maximum throughput of 10,000 RU/s which implies there is a limit of 10,000 RU/s per logical partition.
The RU/s allocated to your container are evenly split across all physical partitions hosting the container's data. For instance, if your container has 4,000 RU/s allocated and its logical partitions are spread across 4 physical partitions then each physical partition will have 1,000 RU/s allocated to it, which also means that if one of your physical partitions is under a heavly load or 'hot', it will get rate-limited at 1,000 RU/s, not at 4,000. This is why it is very important to choose a partition key that spreads your data, and access to the data, evenly across partitions.
If your container is in a single logical partition, it will always be mapped to a single physical partition and the entire allocation of RU/s for the container will always be available.
All Cosmos DB transactions are scoped to a single logical partition, and the execution of a stored procedure or trigger is also scoped to a single logical partition.
How to choose a good partition key
Choose a partition key that will evenly distribute your data across logical partitions, which in turn will help ensure the data is evenly mapped across physical partitions. This will prevent 'bottleneck' or 'hot' partitions which will cause rate-limiting and may increase your costs.
Choose a partition key that will be the filter criteria for a high percentage of your queries. By providing the partition key as filter to your query, Cosmos can efficiently route your query to the correct partition. If the partition key is not supplied it will result in a 'fan out' query, which is sent to all partitions which will increase your RU cost and may hinder performance. If you frequently filter based on multiple fields see this article for guidance.
Summary
The primary purpose of partitioning your containers in Cosmos DB is allowing the continers to scale in terms of both storage and throughput.
Small containers which will not grow significantly in data size or throughput requirements can use a single partition.
Large containers, or containers expected to grow in data size or throughput requirements should be partitioned using a well chosen partition key.
The choice of partition key is critical and may significantly impact your ability to scale, your RU cost and the performance of your queries.

How to obtain full collection throughput in a CosmosDB Multi-Master environment where partitionKey already has high cardinality?

Overview
I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.
Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions
Below is the single region statistics on a database with only one region.
Step 1 - Register new user (10.17 RU)
Step 2 - Update some data (3.4 RU)
Step 3 - Create a subscription (13.23 RU)
Step 4 - Update some data (3.43 RU)
Step 4 - Update some data (3.43 RU)
Step 5 - Update some data (3.83 RU)
Step 6 - Refresh access token (3.13RU)
Total: ~40.5 RU per onboard
Problem
Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?
The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.
Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.
Update - Under utilisation
Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.
Your rate-limiting is likely because you don't have access to all 500 RU in a single physical partition.
Take a close look at your 2nd graph, which has a hint to what's likely going on:
Collection UsersTest has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (100 RU/s per partition).
Under the covers, Cosmos DB creates a set of physical partitions, and your RU are divided across those physical partitions. In your case, Cosmos DB created 5 physical partitions.
Logical partitions may be mapped to any of your 5 physical partitions. So it's possible that, during your test, more than one logical partition mapped to the same physical partition. And given that each physical partition would top out at roughly 2-3 registrations per second, this likely explains why you're seeing throttling.

Throughput value for Azure Cosmos db

I am confused about how partition affects the size limit and throughput value for Azure Cosmos DB (in our case, we are using documentdb). If I understand the documentation correctly.
for a partitioned collection, the 10G storage limit applies to each partition?
the throughput value ex. 400RU/S applies to each partition, not collection?
Whether you use Single-partition collections or multi partition collections, each partition can be up to 10 Gb. This means that a single-partition collection can not exceed that size, where a multi partition collection can.
Taken from Azure Cosmos DB FAQ:
What is a collection?
A collection is a group of documents and their associated JavaScript application logic. A collection is a billable entity, where the cost is determined by the throughput and used storage. Collections can span one or more partitions or servers and can scale to handle practically unlimited volumes of storage or throughput.
Collections are also the billing entities for Azure Cosmos DB. Each collection is billed hourly, based on the provisioned throughput and used storage space. For more information, see Azure Cosmos DB Pricing.
Billing is per Collection, where one collection can have one or more Partitions. Since Azure allocates partitions to host your collection, the amount of RU's needs to be per collection. Otherwise a customer with lots and lots of partitions would get way more RU's than a different customer who has an equal collection, but way less partitions.
For more info, see the bold text in the quote below:
Taken from Azure Cosmos DB Pricing:
Provisioned throughput
At any scale, you can store data and provision throughput capacity. Each container is billed hourly based on the amount of data stored (in GBs) and throughput reserved in units of 100 RUs/second, with a minimum of 400 RUs/second. Unlimited containers have a minimum of 100 RUs/second per partition.
Taken from Request Units in Azure Cosmos DB:
When starting a new collection, table or graph, you specify the number of request units per second (RU per second) you want reserved. Based on the provisioned throughput, Azure Cosmos DB allocates physical partitions to host your collection and splits/rebalances data across partitions as it grows.
The other answers here provide a great starting point on throughput provisioning but failed to touch on an important point that doesn't get mentioned often in the docs.
Your throughput is actually divided across the number of physical partitions in your collection. So for a multi partition collection provisioned for 1000RU/s with 10 physical partitions it's actually 100RU/s per partition. So if you have hot partitions that get accessed more frequently you'll receive throttling errors even though you haven't exceeded the total RU assigned the collection.
For a single partition collection you obviously get the full RU assigned for that partition since it's the only one.
If you're using a multi-partition collection you should strive to pick a partition key that has an even access pattern so that your workload can be evenly distributed across the underlying partitions without bottle necking.
for a partitioned collection, the 10G storage limit applies to each partition?
That is correct. Each partition in a partitioned collection can be a maximum of 10GB in size.
the throughout value ex. 400RU/S applies to each partition, not collection?
The throughput is at collection level and not at partition level. Further minimum RU/S for a partitioned collection is 2500 RU/S and not 400RU/S. 400RU/S is the default for a non-partitioned collection.

Microsoft Azure DocumentDb Maximum Storage Capacity

So I am hitting the 10 GB maximum storage capacity for Azure DocumentDb every few months.
I noticed recently Microsoft has added a partitioned mode that raises the maximum storage capacity to 250 GB but the problem is that the minimum throughput (RU/s) is 10,100 which jumps the price to ~$606 a month from around $25 a month.
Is there anyway to increase storage capacity while keeping the throughput around throughput around 400?
Without using partitioned collections, you'd need to create multiple non-partitioned collections, as needed. The SDKs have partitioning support (or you can shard data across partitions as you see fit).
EDIT: Please note that the minimum RU for partitioned collections is no longer 10,100 RU (as mentioned in the original question). It's now 400 RU (as of late 2018).

Resources