Azure table storage: Partitioning strategy and performance targets - azure

I am trying to come up with partitioning strategy for Azure Table storage.
I use Scalability and performance targets for Table storage as a primary source for my performance estimation. The document provides 2 numbers:
Maximum request rate per storage account 20,000 transactions per second, which assumes a 1-KiB entity size
Target throughput for a single table partition (1 KiB-entities) Up to 2,000 entities per second
Does it mean that I won't get much throughput benefits from having more than 10-ish partitions since TPS is caped by 10x 1 partition TPS limit?
Actually I am also wondering if they use "transactions" and "entities" interchangeably in the doc. Does the first number applicable to batch transaction or for batches I should divide 20000 by number of transactions in batch.

It is true that having more 10 partitions will not give you throughput benefits but that is only if all 10 partitions are are running at max throughput i.e 2000 tps. If any partition is not running at max throughput, then you will be under utilizing your table storage. This is why It is recommended HERE that
For optimal load balancing of traffic, you should use more partitions
so that Azure Table storage can distribute the partitions to more
partition servers.
We just worry about the partitioning/partition key and Azure will handle the load balancing upto the max throughput per partition/storage acct.

Related

ThroughPut Unit and Partition Count

I have question regarding Partition Count with relate to TUs. We have a below configuration and 3 Tus for the NameSpace, than will it have an impact based on no of partition for each eventhub, also should we just create partition count as 32 for better performance?. FYI we are using standard plan and kept partition count higher for first one as it receives more messages. We also use batch method to send messages to evenhub.
There is a potential issue if having 3 TUs. if the namespace has 3 TUs, then in a minute, the maximum size of ingress is 1M * 60 * 3 = 180M/minute, but in the table you posted, the total size is larger than 180M(109+58+39).
And for TU and partition count, you should take a look at How many partitions do I need?, Partitions. And you can follow the guide below from the above articles:
We recommend that you balance 1:1 throughput units and partitions to achieve optimal scale. A single partition has a guaranteed ingress and egress of up to one throughput unit. While you may be able to achieve higher throughput on a partition, performance is not guaranteed. This is why we strongly recommend that the number of partitions in an event hub be greater than or equal to the number of throughput units.
Plan as max 1 MB/sec per partition. In other words, think each partition as an individual stream which can process 1 MB/sec traffic at most. That said, your current configuration looks alright to me. However, you can still consider increasing partition counts depending on your traffic growth trajectory.

Partition key for a lookup table in Azure Cosmos DB Tables

I have a very simple lookup table I want to call from an Azure function.
Schema is incredibly simple:
Name | Value 1 | Value 2
Name will be unique, but value 1 and value 2 will not be. There is no other data in the lookup table.
For an Azure Table you need a partition key and a row key. Obviously the rowkey would be the Name field.
What exactly should I use for Partition Key?
Right now, I'm using a constant because there won't be a ton of data (maybe a couple hundred rows at most) but using a constant seems to go against the point.
This answer applies to all Cosmos DB containers, including Tables.
When does it make sense to store your Cosmos DB container in a single partition (use a constant as the partition key)?
If you are sure the data size of your container will always remain well under 10GB.
If you are sure the throughput requirement for your container will always remain under 10,000 RU/s (RU per second).
If either of the above conditions are false, or if you are not sure about future growth of data size or throughput requirements then using a partition key based on the guidelines below will allow the container to scale.
How partitioning works in Cosmos DB
Cosmos groups container items into a set of logical partitions based on the partition key. These logical partitions are then mapped to physical partitions. A physical partition is the unit of compute/storage which makes up the underlying database infrastructure.
You can determine how your data is split into logical partitions by your choice of partition key. You have no control over how your logical partitions are mapped to physical partitions, Cosmos handles this automatically and transparently.
Distributing your container across a large number of physical partitions is the way Cosmos allows the container to scale to virtually unlimited size and throughput.
Each logical partition can contain a maximum of 10GB of data. An unpartitioned container can have a maximum throughput of 10,000 RU/s which implies there is a limit of 10,000 RU/s per logical partition.
The RU/s allocated to your container are evenly split across all physical partitions hosting the container's data. For instance, if your container has 4,000 RU/s allocated and its logical partitions are spread across 4 physical partitions then each physical partition will have 1,000 RU/s allocated to it, which also means that if one of your physical partitions is under a heavly load or 'hot', it will get rate-limited at 1,000 RU/s, not at 4,000. This is why it is very important to choose a partition key that spreads your data, and access to the data, evenly across partitions.
If your container is in a single logical partition, it will always be mapped to a single physical partition and the entire allocation of RU/s for the container will always be available.
All Cosmos DB transactions are scoped to a single logical partition, and the execution of a stored procedure or trigger is also scoped to a single logical partition.
How to choose a good partition key
Choose a partition key that will evenly distribute your data across logical partitions, which in turn will help ensure the data is evenly mapped across physical partitions. This will prevent 'bottleneck' or 'hot' partitions which will cause rate-limiting and may increase your costs.
Choose a partition key that will be the filter criteria for a high percentage of your queries. By providing the partition key as filter to your query, Cosmos can efficiently route your query to the correct partition. If the partition key is not supplied it will result in a 'fan out' query, which is sent to all partitions which will increase your RU cost and may hinder performance. If you frequently filter based on multiple fields see this article for guidance.
Summary
The primary purpose of partitioning your containers in Cosmos DB is allowing the continers to scale in terms of both storage and throughput.
Small containers which will not grow significantly in data size or throughput requirements can use a single partition.
Large containers, or containers expected to grow in data size or throughput requirements should be partitioned using a well chosen partition key.
The choice of partition key is critical and may significantly impact your ability to scale, your RU cost and the performance of your queries.

Why varying blob size gives different performance?

My cassandra table looks like this -
CREATE TABLE cs_readwrite.cs_rw_test (
part_id bigint,
s_id bigint,
begin_ts bigint,
end_ts bigint,
blob_data blob,
PRIMARY KEY (part_id, s_id, begin_ts, end_ts)
) WITH CLUSTERING ORDER BY (s_id ASC, begin_ts DESC, end_ts DESC)
When I insert 1 million row per client, with 8 kb blob per row and test the speed of insertions from different client hosts the speed is almost constant at ~100 mbps. But with the same table definition, from same client hosts if I insert rows with 16 bytes of blob data then my speed numbers are dramatically low ~4 to 5 mbps. Why is there such a speed difference? I am only measuring write speeds for now. My main concern is not speed (though some inputs will help) when I add more clients I see speed is almost constant for bigger blob size but for 16 bytes blob the speed is increasing only by 10-20% per added client before it becomes constant.
I have also looked at bin/nodetool tablehistograms output and do adjust number of partitions in my test data so no partition is > 100 mb.
Any insights/ links for documentation would be helpful. Thanks!
I think you are measuring the throughput in the wrong way. The throughput should be measured in transactions per second and not in data written per second.
Even though the amount of data written can play a role in determining the write throughput of a system but usually it depends on many other factors.
Compaction Strategy like STCS is write-optimized whereas LOCS is
read-optimized.
Connection speed and latency between the client and the cluster, and
between machines in the cluster
CPU usage of the node which is processing data, sending data to other
replicas and waiting for their acknowledgment.
Most of the writes are immediately written in memory instead of being written directly in the disk which basically makes the impact of the amount of data being written on final write throughput almost negligible whereas other fixed things like Network delay, CPU to coordinate the processing of data across nodes, etc have a bigger impact.
The way you should see it is that with 8KB of payload you are getting X transactions per second and with 16 Bytes you are getting Y transactions per second. Y will always be better than X but it will not be linearly proportional to the size difference.
You can find how writes are handled in cassandra explained in detail here.
Theres management overhead in Cassandra per row/partition, the more data (in bytes) you have in each row the less that overhead impacts throughput in bytes/sec. The reverse is true if you look at rows per sec as a metric of throughput. The larger the payloads the worse your rows/sec throughput would get.

How to obtain full collection throughput in a CosmosDB Multi-Master environment where partitionKey already has high cardinality?

Overview
I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.
Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions
Below is the single region statistics on a database with only one region.
Step 1 - Register new user (10.17 RU)
Step 2 - Update some data (3.4 RU)
Step 3 - Create a subscription (13.23 RU)
Step 4 - Update some data (3.43 RU)
Step 4 - Update some data (3.43 RU)
Step 5 - Update some data (3.83 RU)
Step 6 - Refresh access token (3.13RU)
Total: ~40.5 RU per onboard
Problem
Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?
The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.
Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.
Update - Under utilisation
Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.
Your rate-limiting is likely because you don't have access to all 500 RU in a single physical partition.
Take a close look at your 2nd graph, which has a hint to what's likely going on:
Collection UsersTest has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (100 RU/s per partition).
Under the covers, Cosmos DB creates a set of physical partitions, and your RU are divided across those physical partitions. In your case, Cosmos DB created 5 physical partitions.
Logical partitions may be mapped to any of your 5 physical partitions. So it's possible that, during your test, more than one logical partition mapped to the same physical partition. And given that each physical partition would top out at roughly 2-3 registrations per second, this likely explains why you're seeing throttling.

Throughput value for Azure Cosmos db

I am confused about how partition affects the size limit and throughput value for Azure Cosmos DB (in our case, we are using documentdb). If I understand the documentation correctly.
for a partitioned collection, the 10G storage limit applies to each partition?
the throughput value ex. 400RU/S applies to each partition, not collection?
Whether you use Single-partition collections or multi partition collections, each partition can be up to 10 Gb. This means that a single-partition collection can not exceed that size, where a multi partition collection can.
Taken from Azure Cosmos DB FAQ:
What is a collection?
A collection is a group of documents and their associated JavaScript application logic. A collection is a billable entity, where the cost is determined by the throughput and used storage. Collections can span one or more partitions or servers and can scale to handle practically unlimited volumes of storage or throughput.
Collections are also the billing entities for Azure Cosmos DB. Each collection is billed hourly, based on the provisioned throughput and used storage space. For more information, see Azure Cosmos DB Pricing.
Billing is per Collection, where one collection can have one or more Partitions. Since Azure allocates partitions to host your collection, the amount of RU's needs to be per collection. Otherwise a customer with lots and lots of partitions would get way more RU's than a different customer who has an equal collection, but way less partitions.
For more info, see the bold text in the quote below:
Taken from Azure Cosmos DB Pricing:
Provisioned throughput
At any scale, you can store data and provision throughput capacity. Each container is billed hourly based on the amount of data stored (in GBs) and throughput reserved in units of 100 RUs/second, with a minimum of 400 RUs/second. Unlimited containers have a minimum of 100 RUs/second per partition.
Taken from Request Units in Azure Cosmos DB:
When starting a new collection, table or graph, you specify the number of request units per second (RU per second) you want reserved. Based on the provisioned throughput, Azure Cosmos DB allocates physical partitions to host your collection and splits/rebalances data across partitions as it grows.
The other answers here provide a great starting point on throughput provisioning but failed to touch on an important point that doesn't get mentioned often in the docs.
Your throughput is actually divided across the number of physical partitions in your collection. So for a multi partition collection provisioned for 1000RU/s with 10 physical partitions it's actually 100RU/s per partition. So if you have hot partitions that get accessed more frequently you'll receive throttling errors even though you haven't exceeded the total RU assigned the collection.
For a single partition collection you obviously get the full RU assigned for that partition since it's the only one.
If you're using a multi-partition collection you should strive to pick a partition key that has an even access pattern so that your workload can be evenly distributed across the underlying partitions without bottle necking.
for a partitioned collection, the 10G storage limit applies to each partition?
That is correct. Each partition in a partitioned collection can be a maximum of 10GB in size.
the throughout value ex. 400RU/S applies to each partition, not collection?
The throughput is at collection level and not at partition level. Further minimum RU/S for a partitioned collection is 2500 RU/S and not 400RU/S. 400RU/S is the default for a non-partitioned collection.

Resources