In GCP Spanner, can I set a size limit of a table? - google-cloud-spanner

I'm exploring Multi-Tenancy with a spanner and was looking at the table approach (each customer gets a table), but I'm not sure how to enforce limits on a table. Is that possible? For example, can I say table A can be 5GB and table B can be 20GB?
I tried to find it in the API docs but I am no sure if it exists or I just was looking up the wrong things.
Is anyone aware?

Setting a size limit for a table is not possible, so that would be something you would have to enforce in your application.
The best practice for multi-tenancy in Cloud Spanner is however that you should in most cases use a single table for all your customers and segregate these using a separate customer key in that table. See https://cloud.google.com/spanner/docs/schema-and-data-model#multitenancy for more information. There are two important reasons two follow that best-practice:
It will automatically use the database and load splitting built into Cloud Spanner with zero maintenance on your side.
There is a limit on the number of tables per database, which would limit the number of customers that you could have per database.

It is not possible to place a limit on Google Cloud Spanner, whether setting a limit on the table or database.
Google Cloud Spanner uses nodes for database instances. Maximum storage for each node is 2 TB. Each node can do 1000 processing units or larger. If your project will not reach 1000 processes, it would take less than 1 node, that's around 205 GB per 100 processing units. Storage and network usage are measured in binary gigabytes or gibibytes where 240 bytes is 1024 GB or 1 TB.
Maximum number of tables that Cloud Spanner can handle is 5000 tables per database and maximum number of databases it can handle is 100 databases per instance.
Instead of limiting tables, databases or storage, Google Cloud Spanner has the ability of increase quota by following this steps on increasing quotas.
In addition, backups are stored separately and are not included in the storage limit.
You can check the full documentation on database limits under Quotas and Limits.

Related

Scaling elastic search for read heavy applications

We have Node JS (Nest JS / Express) - based application services on the GCP cloud.
We are using elastic search to support full-text search on a blog/news website.
Our requirement is to support 2000 reads per second minimum.
While performing load testing, we observed that until a concurrency of 300 is reached, elastic search performs well and response times are acceptable.
CPU usage also spikes under this load. But, when the load is increased to 500 or 1000, CPU usage drops, and response times increase drastically.
What we don't understand is, why our CPU usage is 80% for a load of 300 and just 30 ~ 40% when load increases. Shouldn't CPU pressure increase with load?
What is the right way to tune elastic search for read-heavy usage? (Our write frequency is just 1 document in 2-3 hours)
We have one single index with approx 2 million documents. The index size is just 6GB.
Elastic cluster is deployed on Kubernetes using helm charts with:
- 1 dedicated master node
- 1 dedicated coordinating node
- 5 dedicated data nodes
Considering the small data size, the index is not sharded and the number of reading replicas is set to 4.
The index refresh rate is set to 30 sec.
RAM allocated to each data node is 2GB and the heap size is 1GB
CPU allocated to each data node is 1 vCPU
We tried to increase the search thread pool size up to 20 and queue_size to 10000 but that didn't help much either

Azure Data Explorer High Ingestion Latency with Streaming

We are using stream ingestion from Event Hubs to Azure Data Explorer. The Documentation states the following:
The streaming ingestion operation completes in under 10 seconds, and
your data is immediately available for query after completion.
I am also aware of the limitations such as
Streaming ingestion performance and capacity scales with increased VM
and cluster sizes. The number of concurrent ingestion requests is
limited to six per core. For example, for 16 core SKUs, such as D14
and L16, the maximal supported load is 96 concurrent ingestion
requests. For two core SKUs, such as D11, the maximal supported load
is 12 concurrent ingestion requests.
But we are currently experiencing ingestion latency of 5 minutes (as shown on the Azure Metrics) and see that data is actually available for querying 10 minutes after ingestion.
Our Dev Environment is the cheapest SKU Dev(No SLA)_Standard_D11_v2 but given that we only ingest ~5000 Events per day (per metric "Events Received") in this environment this latency is very high and not usable in the streaming scenario where we need to have the data available < 1 minute for queries.
Is this the latency we have to expect from the Dev Environment or are the any tweaks we can apply in order to achieve lower latency also in those environments?
How will latency behave with a production environment like Standard_D12_v2? Do we have to expect those high numbers there as well or is there a fundamental difference in behavior between Dev/test and Production Environments in this concern?
Did you follow the two steps needed to enable the streaming ingestion for the specific table, i.e. enabling streaming ingestion on the cluster and on the table?
In general, this is not expected, the Dev/Test cluster should exhibit the same behavior as the production cluster with the expected limitations around the size and scale of the operations, if you test it with a few events and see the same latency it means that something is wrong.
If you did follow these steps, and it still does not work please open a support ticket.

Too many connected disks to AKS node

I read that there is a limitation to the amount of data disks that can bound to a node in a cluster. Right now im using a small node which can only hold up to 4 data disks. If i exceed this amount i will get this error: 0/1 nodes are available: 1 node(s) exceed max volume count.
The question that i mainly have is how to handle this. I have some apps that just need a small amount of persistant storage in my cluster however i can only attach a few data disks. If i bind 4 data disks of 100m i already reached the max limit.
Could someone advice me on how to handle these scenarios? I can easily scale up the machines and i will have more power in my machine and more disks however the ratio disks vs server power is completely offset at that point.
Best
Pim
You should look at using Azure File instead of Azure Disk. With Azure File, you can do ReadWriteMany hence having a single mount on the VM(node) to allow multiple POD to access the mounted volume.
https://github.com/kubernetes/examples/blob/master/staging/volumes/azure_file/README.md
https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-file
https://learn.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
4 PV per node
30 pods per node
Thoses are the limits on AKS nodes right now.
You can handle it by add more nodes, and more money, or find a provider with different limits.
On one of those, as an example, the limits are 127 volumes and 110 pods, for the same node size.

How does CosmosDB distribute throughput when provisioned per database (shared RUs)

I have a problem understading how distribution of provisioned throughput works if I set up Cosmos DB to use shared RUs - setting RUs at database level.
I know when set on container (collection) level that throughput is divided between logical partitions e.g. if collection has provisioned throughput 400 RU/s and 10 logical partitions then throughput for each partition is 400/10 = 40 RU/s.
But what about when throughput is set per database?
Only documentation I found is https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput#set-throughput-on-a-database
As far as I can tell difference is that physical partitions are not dedicated to single container but can host logical partitions from different containers - does this mean that throughput is divided between all logical partitions of all collections/containers?
For example: I have database with throughput 1000 RU/s and 2 collections, one with 3 logical partitions and second with 7 logical partitions, so is throughput divided 1000 / (3 + 7) = 100 RU/s for each logical partition?
OR
Is throughput reserved for all collections/partitions in sum? e.g. there is database with 1000 RU/s and one logical partitions use 800RU/s and other use 200RU/s (no matter what collection) then is it ok as long as they in sum dont exceed 1000 RU/s ?
Maybe question is short - is shared throughput distributed evenly between logical partitions (same as when set on collection level) or is not (somehow different)?
Thanks for your feedback. If you are configuring the throughput at database level. You cannot guarantee the throughput for each containers in it, unless you configure the throughput for containers as well.
All containers created inside a database with provisioned throughput must be created with a partition key. At any given point of time, the throughput allocated to a container within a database is distributed across all the logical partitions of that container. When you have containers that share provisioned throughput configured on a database, you can't selectively apply the throughput to a specific container or a logical partition.
You can follow this URL for understading the effect of setting throughput at database and containers level.
Set throughput on a database and a container
Hope it helps.
Maybe question is short - is shared throughput distributed evenly between logical partitions (same as when set on collection level) or is not (somehow different)?
The answer is 'it is not'
This answer you're looking for is in the same page you have been looking at (I have also confirmed with with MS Support).
In this section
https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput#comparison-of-models
RUs assigned or available to a specific container (with database level provisioned throughput, manual or autoscaled):
No guarantees. RUs assigned to a given container depend on the properties. Properties can be the choice of partition keys of
containers that share the throughput, the distribution of the
workload, and the number of containers.
Basically Cosmos has an algorithm that works out how to allocate the provisioned database level throughput across the containers in the database.
I would expect there is some performance costs to the algorithm similar to standard autoscaling where if there was a burst of RU demand, the autoscaling won't scale up unless it is sustained for a certain period.
There is an exception if you do the database and container throughput provisioning where 1 or more containers has a fixed provisioned throughput. They would just be separate to the other containers that would share the database throughput. These containers would behave as you mentioned on how a standard container level throughput do e.g. divided among the partitions

MongoDB limit storage size?

What is MongoDB's storage size limit on 64bit platforms? Can MongoDB store 500-900 Gb of data within one instance (node)? What was the largest amount of data you've stored in MongoDB, and what was your experience?
The "production deployments" page on MongoDB's site may be of interest to you. Lots of presentations listed with infrastructure information. For example:
http://blog.wordnik.com/12-months-with-mongodb says they're storing 3 TB per node.
The MongoDB's storage limit on different operating systems are tabulated below as per the MongoDB 3.0 MMAPv1 storage engine limits.
The MMAPv1 storage engine limits each database to no more than 16000 data files. This means that a single MMAPv1 database has a maximum size of 32TB. Setting the storage.mmapv1.smallFiles option reduces this limit to 8TB.
Using the MMAPv1 storage engine, a single mongod instance cannot manage a data set that exceeds maximum virtual memory address space provided by the underlying operating system.
Virtual Memory Limitations
Operating System Journaled Not Journaled
Linux 64 terabytes 128 terabytes
Windows Server 2012 R2
and Windows 8.1 64 terabytes 128 terabytes
Windows (otherwise) 4 terabytes 8 terabytes
Reference: MongoDB Database Limit.
Note:The WiredTiger storage engine is not subject to this limitation.
Another way to have more than 2GB on a single node is to run multiple mongod processes. So sharding is one option or doing some manual partitioning across processes.
Hope This helps.
You won't run anywhere near hitting the cap with 1TB on 64 bit systems, however Mongo does store the indexes in memory so a smooth experience depends on your index size and how much memory you have. But if you have a beefy enough system it won't be a problem.

Resources