Microsoft Azure DocumentDb Maximum Storage Capacity - azure

So I am hitting the 10 GB maximum storage capacity for Azure DocumentDb every few months.
I noticed recently Microsoft has added a partitioned mode that raises the maximum storage capacity to 250 GB but the problem is that the minimum throughput (RU/s) is 10,100 which jumps the price to ~$606 a month from around $25 a month.
Is there anyway to increase storage capacity while keeping the throughput around throughput around 400?

Without using partitioned collections, you'd need to create multiple non-partitioned collections, as needed. The SDKs have partitioning support (or you can shard data across partitions as you see fit).
EDIT: Please note that the minimum RU for partitioned collections is no longer 10,100 RU (as mentioned in the original question). It's now 400 RU (as of late 2018).

Related

Does a Cosmos DB physical partition hold 50GB or 100GB?

According to Microsoft's documentation on physical partitions, each one can hold 50GB and supply 10,000RU of throughput. Provisioning physical partitions involves initialising the container in increments of 10,000RU max scale.
However, looking at their documentation on Autoscale and storage limits it claims that a max scale of 50,000RU can hold 500GB of data, double the amount stated that 5 partitions should be able to hold.
These two statements seem to be in conflict with each other. Does each partition actually hold 100GB, not 50GB?
It is still 50GB. I believe what the 2nd link does not talk about is the number of physical partitions it would create.
Each physical partition has a limit of 10,000 RU and 50GB so if your storage is 500GB (and max throughput is 50,000 RU), there would be 10 physical partitions where each partition has 5,000 RU and 50GB.
From this link:
When you first select the max RU/s, Azure Cosmos DB will provision:
Max RU/s / 10,000 RU/s = # of physical partitions. Each physical
partition can support up to 10,000 RU/s and 50 GB of storage. As
storage size grows, Azure Cosmos DB will automatically split the
partitions to add more physical partitions to handle the storage
increase, or increase the max RU/s if storage exceeds the associated
limit.
The maximum size of a physical partition is 50GB.
What your second article is describing is how the minimum RU is automatically adjusted based on the size of your container or database. The minimum RU is the bigger value between 400RU/s and the size in GB * 10RU/s (and 1/100 * the max you've ever set).
A few examples what that means for different database sizes:
1GB -> 400RU/s
30GB -> 400RU/s
50GB -> 500RU/s
500GB -> 5000RU/s
600GB -> 6000RU/s
(examples assumes you don't participate in a special program)
When using autoscale you can define a lower/upper limit with 10x difference. You however still have to consider the previous rule. Once you go from 500GB to 600GB in size your autoscale rule also automatically adjusts its lower and upper limit following the ruleset above based on the lower limit.

How to obtain full collection throughput in a CosmosDB Multi-Master environment where partitionKey already has high cardinality?

Overview
I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.
Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions
Below is the single region statistics on a database with only one region.
Step 1 - Register new user (10.17 RU)
Step 2 - Update some data (3.4 RU)
Step 3 - Create a subscription (13.23 RU)
Step 4 - Update some data (3.43 RU)
Step 4 - Update some data (3.43 RU)
Step 5 - Update some data (3.83 RU)
Step 6 - Refresh access token (3.13RU)
Total: ~40.5 RU per onboard
Problem
Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?
The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.
Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.
Update - Under utilisation
Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.
Your rate-limiting is likely because you don't have access to all 500 RU in a single physical partition.
Take a close look at your 2nd graph, which has a hint to what's likely going on:
Collection UsersTest has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (100 RU/s per partition).
Under the covers, Cosmos DB creates a set of physical partitions, and your RU are divided across those physical partitions. In your case, Cosmos DB created 5 physical partitions.
Logical partitions may be mapped to any of your 5 physical partitions. So it's possible that, during your test, more than one logical partition mapped to the same physical partition. And given that each physical partition would top out at roughly 2-3 registrations per second, this likely explains why you're seeing throttling.

DocumentDB Throughput Scale down from 1000 to 400

At first, I have created DocumentDB Collection with 2000 as Throughput. after I have realized that it is too high, I tried to reduce by clicking (-) button. It got reduce till 1000 after that it doesn't work. I want to set as 400.
Unfortunately it is not possible to do so considering your collection is of unlimited size. Minimum throughput for such collection is 1000 RU/s (earlier it was 2500 RU/s).
If you want to have a collection with 400 RU/s, please create a new collection with fixed storage capacity (currently 10GB) and migrate the data from your existing collection to this new collection.
There were some updates to the performance level of Cosmos DB recently. The previous S1, S2, S3 performance levels have been retired. Now there are only two types of containers, fixed and unlimited. As Gaurav mentioned, the RU/s range for fixed is from 400 - 10,000, and for unlimited is from 1,000 - 100,000. And unlimited must have a partition key, while fixed doesn't have to.
More details can be found in this newly updated document: Partition and scale in Azure Cosmos DB. According to this document, my suggestions would be, even if you go and create a fixed container, you should still give it 1,000 RU/s and a partition key so you don't have to do migration in the future in case your data or load grows beyond the capacity of the fixed container.

Throughput value for Azure Cosmos db

I am confused about how partition affects the size limit and throughput value for Azure Cosmos DB (in our case, we are using documentdb). If I understand the documentation correctly.
for a partitioned collection, the 10G storage limit applies to each partition?
the throughput value ex. 400RU/S applies to each partition, not collection?
Whether you use Single-partition collections or multi partition collections, each partition can be up to 10 Gb. This means that a single-partition collection can not exceed that size, where a multi partition collection can.
Taken from Azure Cosmos DB FAQ:
What is a collection?
A collection is a group of documents and their associated JavaScript application logic. A collection is a billable entity, where the cost is determined by the throughput and used storage. Collections can span one or more partitions or servers and can scale to handle practically unlimited volumes of storage or throughput.
Collections are also the billing entities for Azure Cosmos DB. Each collection is billed hourly, based on the provisioned throughput and used storage space. For more information, see Azure Cosmos DB Pricing.
Billing is per Collection, where one collection can have one or more Partitions. Since Azure allocates partitions to host your collection, the amount of RU's needs to be per collection. Otherwise a customer with lots and lots of partitions would get way more RU's than a different customer who has an equal collection, but way less partitions.
For more info, see the bold text in the quote below:
Taken from Azure Cosmos DB Pricing:
Provisioned throughput
At any scale, you can store data and provision throughput capacity. Each container is billed hourly based on the amount of data stored (in GBs) and throughput reserved in units of 100 RUs/second, with a minimum of 400 RUs/second. Unlimited containers have a minimum of 100 RUs/second per partition.
Taken from Request Units in Azure Cosmos DB:
When starting a new collection, table or graph, you specify the number of request units per second (RU per second) you want reserved. Based on the provisioned throughput, Azure Cosmos DB allocates physical partitions to host your collection and splits/rebalances data across partitions as it grows.
The other answers here provide a great starting point on throughput provisioning but failed to touch on an important point that doesn't get mentioned often in the docs.
Your throughput is actually divided across the number of physical partitions in your collection. So for a multi partition collection provisioned for 1000RU/s with 10 physical partitions it's actually 100RU/s per partition. So if you have hot partitions that get accessed more frequently you'll receive throttling errors even though you haven't exceeded the total RU assigned the collection.
For a single partition collection you obviously get the full RU assigned for that partition since it's the only one.
If you're using a multi-partition collection you should strive to pick a partition key that has an even access pattern so that your workload can be evenly distributed across the underlying partitions without bottle necking.
for a partitioned collection, the 10G storage limit applies to each partition?
That is correct. Each partition in a partitioned collection can be a maximum of 10GB in size.
the throughout value ex. 400RU/S applies to each partition, not collection?
The throughput is at collection level and not at partition level. Further minimum RU/S for a partitioned collection is 2500 RU/S and not 400RU/S. 400RU/S is the default for a non-partitioned collection.

DocumentDB User Defined Performance Pricing

I'm looking into moving to the new partitioned collections for DocumentDB and have a few questions that the documentation and pricing calculator seems to be a little unclear on.
PRICING:
In the below scenario my partitioned collection will be charged $30.02/mo at 1GB of data with a constant hourly RU use of 500:
So does this mean if my users only hit the data for an average of 500 RU's for about 12 hours per day which means that HALF the time my collection goes UNUSED, but is still RUNNING and AVAILABLE (not shut down) the price goes down to $15.13/mo as the calculator indicates here:
Or will I be billed the full $30.01/mo since my collection was up and running?
I get confused when I go to the portal and see an estimate for $606/mo with no details behind it when I attempt to spin up the lowest options on a partition collection:
Is the portal just indicating the MAXIMUM that I COULD be billed that month if I use all my allotted 10,100 RU's a second every second of the hour for 744 consecutive hours?
If billing is based on hourly use and the average RU's used goes down to 100 on some of the hours used in the second scenario does the cost go down even further? Does Azure billing for partitioned collections fluctuates based on hourly usage and not total up time like the existing S1/S2/S3 tiers?
If so then how does the system determine what is billed for that hour? If most of the hour the RU's used are 100/sec but for a few seconds that hour it spikes to 1,000 does it average out by the seconds across that entire hour and only charge me for something like 200-300 RU's for that hour or will I be billed for the highest RU's used that hour?
PERFORMANCE:
Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access? If so what can I expect, or will it be so minimal that it would be undetected by my users?
RETRIES & FAULT HANDLING:
I'm assuming the TransientFaultHandling Nuget package I use in my current scenario will still work on the new scenario, but may not be used as much since my RU capacity is much larger, or do I need to rethink how I handle requests that go over the RU cap?
So they way that pricing works for an Azure documentDB is that you pay to reserve a certain amount of data storage size (in GB's) and/or throughput (in Request units (RU)). These charges are charged per hour that the reserve is in place (does not require usage). Additionally, just having a Document Account active is deemed to be an active S1 subscription, until a documentDB gets created then the pricing of your db takes over. There are two options available:
Option 1 (Original Pricing)
You can a choose between S1, S2 or S3. Each offering the same 10GB of storage but varying in throughput 250RU/1000RU/2500RU.
Option 2 (User-defined performance)
This is the new pricing structure which better decouples size and throughout. This option additionally provides for partitioning. Note that with user defined performance you are charge per GB of data storage used (Pay as you go storage).
With user-defined performance levels, storage is metered based on
consumption, but with pre-defined performance levels, 10 GB of storage
is reserved at the time of collection creation.
Single Partition Collection
The minimum is set at 400RU and 1GB of data storage.
The maximum is set at 10,000RU and 250GB of data storage.
Partitioned Collections
The minimum is set at 10,000RU and 1GB of data storage.
The maximum is set at 250,000RU and 250GB of data storage (EDIT can request greater).
So at a minimum you will be paying the cost per hour related to the option you selected. The only way to not pay for an hour would be to delete the db and the account, unfortunately.
Cost of Varying RU
In terms of varying your RU within the time frame of 1 hours, you are charged for that hour at the cost of the peak reserve RU you requested. So if you were at 400RU and you up it to 1000RU for 1sec you will be charge at the 1000RU rate for that hour. Even if for the other 59minutes 59secounds you set it back to 400RU.
Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access?
One the topic of perfromance hit there's a few things to think about, but in general no.
If you have a sane partition key with enough values you should not see a performance penalty. This means that you need to partition data so that you have the partition key available when querying and you need to keep the data you want from a query in the same partition by using the same partiton key.
If you do queries without partitionkey, you will see a sever penalty, as the query is parsed and executed per partition.
One thing to keep in mind when selecting a partition key is the limits for each partition, which are 10GB and 10K RU. This means that you want an even distributions over the partitions in order to avoid a "hot" partition which means that even if you scale to more than enough RU in total, you may recieve 429 for a specific partition.

Resources