How does CosmosDB distribute throughput when provisioned per database (shared RUs) - azure

I have a problem understading how distribution of provisioned throughput works if I set up Cosmos DB to use shared RUs - setting RUs at database level.
I know when set on container (collection) level that throughput is divided between logical partitions e.g. if collection has provisioned throughput 400 RU/s and 10 logical partitions then throughput for each partition is 400/10 = 40 RU/s.
But what about when throughput is set per database?
Only documentation I found is https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput#set-throughput-on-a-database
As far as I can tell difference is that physical partitions are not dedicated to single container but can host logical partitions from different containers - does this mean that throughput is divided between all logical partitions of all collections/containers?
For example: I have database with throughput 1000 RU/s and 2 collections, one with 3 logical partitions and second with 7 logical partitions, so is throughput divided 1000 / (3 + 7) = 100 RU/s for each logical partition?
OR
Is throughput reserved for all collections/partitions in sum? e.g. there is database with 1000 RU/s and one logical partitions use 800RU/s and other use 200RU/s (no matter what collection) then is it ok as long as they in sum dont exceed 1000 RU/s ?
Maybe question is short - is shared throughput distributed evenly between logical partitions (same as when set on collection level) or is not (somehow different)?

Thanks for your feedback. If you are configuring the throughput at database level. You cannot guarantee the throughput for each containers in it, unless you configure the throughput for containers as well.
All containers created inside a database with provisioned throughput must be created with a partition key. At any given point of time, the throughput allocated to a container within a database is distributed across all the logical partitions of that container. When you have containers that share provisioned throughput configured on a database, you can't selectively apply the throughput to a specific container or a logical partition.
You can follow this URL for understading the effect of setting throughput at database and containers level.
Set throughput on a database and a container
Hope it helps.

Maybe question is short - is shared throughput distributed evenly between logical partitions (same as when set on collection level) or is not (somehow different)?
The answer is 'it is not'
This answer you're looking for is in the same page you have been looking at (I have also confirmed with with MS Support).
In this section
https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput#comparison-of-models
RUs assigned or available to a specific container (with database level provisioned throughput, manual or autoscaled):
No guarantees. RUs assigned to a given container depend on the properties. Properties can be the choice of partition keys of
containers that share the throughput, the distribution of the
workload, and the number of containers.
Basically Cosmos has an algorithm that works out how to allocate the provisioned database level throughput across the containers in the database.
I would expect there is some performance costs to the algorithm similar to standard autoscaling where if there was a burst of RU demand, the autoscaling won't scale up unless it is sustained for a certain period.
There is an exception if you do the database and container throughput provisioning where 1 or more containers has a fixed provisioned throughput. They would just be separate to the other containers that would share the database throughput. These containers would behave as you mentioned on how a standard container level throughput do e.g. divided among the partitions

Related

What is the expected ingestion pace for a Cassandra cluster?

I am running a project that requires to load millions of records to cassandra.
I am using kafka connect and doing partitioning and raising 24 workers I only get around 4000 rows per second.
I did a test with pentaho pdi inserting straight to cassandra with jdbc driver and I get a litle bit less rows per second: 3860 (avg)
The cassandra cluster has 24 nodes. What is the expected insertion pace by default? how can i fine tune the ingestion of big loads of data?
There is no magical "default" rate that a Cassandra cluster can ingest data. One cluster can take 100K ops/sec, another can do 10M ops/sec. In theory, it can be limitless.
A cluster's throughput is determined by a lot of moving parts which include (but NOT limited to):
hardware configuration
number of cores, type of CPU
amount of memory, type of RAM
disk bandwidth, disk configuration
network capacity/bandwidth
data model
client/driver configuration
access patterns
cluster topology
cluster size
The only way you can determine the throughput of your cluster is by doing your own test on as close to production loads as you can simulate. Cheers!

In GCP Spanner, can I set a size limit of a table?

I'm exploring Multi-Tenancy with a spanner and was looking at the table approach (each customer gets a table), but I'm not sure how to enforce limits on a table. Is that possible? For example, can I say table A can be 5GB and table B can be 20GB?
I tried to find it in the API docs but I am no sure if it exists or I just was looking up the wrong things.
Is anyone aware?
Setting a size limit for a table is not possible, so that would be something you would have to enforce in your application.
The best practice for multi-tenancy in Cloud Spanner is however that you should in most cases use a single table for all your customers and segregate these using a separate customer key in that table. See https://cloud.google.com/spanner/docs/schema-and-data-model#multitenancy for more information. There are two important reasons two follow that best-practice:
It will automatically use the database and load splitting built into Cloud Spanner with zero maintenance on your side.
There is a limit on the number of tables per database, which would limit the number of customers that you could have per database.
It is not possible to place a limit on Google Cloud Spanner, whether setting a limit on the table or database.
Google Cloud Spanner uses nodes for database instances. Maximum storage for each node is 2 TB. Each node can do 1000 processing units or larger. If your project will not reach 1000 processes, it would take less than 1 node, that's around 205 GB per 100 processing units. Storage and network usage are measured in binary gigabytes or gibibytes where 240 bytes is 1024 GB or 1 TB.
Maximum number of tables that Cloud Spanner can handle is 5000 tables per database and maximum number of databases it can handle is 100 databases per instance.
Instead of limiting tables, databases or storage, Google Cloud Spanner has the ability of increase quota by following this steps on increasing quotas.
In addition, backups are stored separately and are not included in the storage limit.
You can check the full documentation on database limits under Quotas and Limits.

Too many connected disks to AKS node

I read that there is a limitation to the amount of data disks that can bound to a node in a cluster. Right now im using a small node which can only hold up to 4 data disks. If i exceed this amount i will get this error: 0/1 nodes are available: 1 node(s) exceed max volume count.
The question that i mainly have is how to handle this. I have some apps that just need a small amount of persistant storage in my cluster however i can only attach a few data disks. If i bind 4 data disks of 100m i already reached the max limit.
Could someone advice me on how to handle these scenarios? I can easily scale up the machines and i will have more power in my machine and more disks however the ratio disks vs server power is completely offset at that point.
Best
Pim
You should look at using Azure File instead of Azure Disk. With Azure File, you can do ReadWriteMany hence having a single mount on the VM(node) to allow multiple POD to access the mounted volume.
https://github.com/kubernetes/examples/blob/master/staging/volumes/azure_file/README.md
https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-file
https://learn.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
4 PV per node
30 pods per node
Thoses are the limits on AKS nodes right now.
You can handle it by add more nodes, and more money, or find a provider with different limits.
On one of those, as an example, the limits are 127 volumes and 110 pods, for the same node size.

Frequent Compaction of OpsCenter.rollup_state on all the nodes consuming CPU cycles

I am using Datastax Cassandra 4.8.16. With cluster of 8 DC and 5 nodes on each DC on VM's. For last couple of weeks we observed below performance issue
1) Increase drop count on VM's.
2) LOCAL_QUORUM for some write operation not achieved.
3) Frequent Compaction of OpsCenter.rollup_state and system.hints are visible in Opscenter.
Appreciate any help finding the root cause for this.
Presence of dropped mutations means that cluster is heavily overloaded. It could be increase of the main load, so it + load from OpsCenter, overloaded system - you need to look into statistics about number of requests, latencies, etc. per nodes and per tables, to see where increase happened. Please also check the I/O statistics on machines (for example, with iostat) - sizes of the queues, read/write latencies, etc.
Also it's recommended to use a dedicated OpsCenter cluster to store metrics - it could be smaller size, and doesn't require an additional license for DSE. How it said in the OpsCenter's documentation:
Important: In production environments, DataStax strongly recommends storing data in a separate DataStax Enterprise cluster.
Regarding VMs - usually it's not really recommended setup, but heavily depends on what kind of underlying hardware - number of CPUs, RAM, disk system.

Increasing Number of VMs decreases Cassandra Throughput. What can be reason?

I am using YCSB benchmarking tool to benchmark Cassandra cluster.
I am varying the number of Virtual machines in the cluster.
I am using 1 physical host and I am using 1,2,3,4 virtual machines for benchmarking(as shown in attached figure).
The generated workload is same all the time Workload C 10,000,00 operations, 10,000 records
Each VM has 2 GB RAM, 20GB drive
Cassandra - 1 seed node, endpoint_snitch - gossipproperty
Keyspace YCSB - Replication factor 3,
The problem is that when I increase the number of virtual machines in the cluster, the throughput decreases. What can be the reason?
By definition, by increasing compute resources(i.e virtual machines), the cluster should offer better performance, but the opposite is happening as shown in attached figure. Kindly explain what can be the probable reason for this? I am writing my thesis on this topic but I am unable to figure out the reason for this, please help, I will be grateful to you.
Throughput observed by varying number of VMs in Cassandra cluster:
Very likely hitting a disk IO bottleneck. Especially with non ssd drives this is completely expected. Unless you have dedicated disk/cpu per vm the competition for resources will cause contention like this. Also 2gb per vm is not enough to do any kind of performance benchmark with Cassandra since the minimum recommended JVM heap size is 8gb.
Cassandra is great at horizontal scaling (nearly linear), but that doesn't mean that simply adding VMs to one physical host will increase throughput - a single VM on the physical host will have less contention for resources (disk, cpu, memory, network) than 4, so it's likely one VM would perform better than 4.
By definition, if you WERE increasing resources, you SHOULD see it perform better - but you're not, you're simply adding contention to existing resources. If you want to scale cassandra, you need to test it with additional physical resources - more physical machines, not more VMs on the same machine.
Finally, as Chris Lohfink mentions, your VMs are too small to do meaningful tests - 8GB JVM heap is recommended, with another 8GB of vm page cache to support reads - running Cassandra with less than 16G of RAM is typically non-ideal in production.
You're trying to test a jet engine (a distribute database designed for hundreds or thousands of physical nodes) with a gas station level equipment - your benchmark hardware isn't viable for a real production environment, so your benchmark results aren't meaningful.

Resources