Maximum table size in YugabyteDB? - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
Is there a maximum table size in YugabyteDB like in PostgreSQL ?
The table size is in PostgreSQL is limited by the maximum number of blocks in a table, which is 2^32 blocks. PostgreSQL has a configurable block size which allows up to 32768 bytes per block, which would give a maximum table size of 128TB.

No there isn't.
The lower half of YugabyteDB (storage/transaction/distribution/etc. layers - which we refer to as DocDB) is a ground up implementation that doesn't reuse Postgres' lower half.
DocDB is elastic and scales to multiple nodes. You are basically limited by the aggregate storage capacity across the nodes in the cluster at any given time. But you can add more nodes to get more capacity.. and data will get redistributed/rebalanced automatically.
See more info in Architecture section in docs.

Related

What exactly is a partition size in cassandra?

I am new to Cassandra, I have a cassandra cluster with 6 nodes. I am trying to find the partition size,
Tried to fetch it with this basic command
nodetool tablehistograms keyspace.tablename
Now, I am wondering how is it calculated and why the result has only 5 records other than min, max, while the number of nodes are 6. Does node size and number of partitions for a table has any relation?
Fundamentally, what I know is partition key is used to hash and distribute data to be persisted across various nodes
When exactly should we go for bucketing? I am assuming that Cassandra has got a partitioner that take care of distributed persistence across nodes.
The number of entries in this column is not related to the number of nodes. It shows the distribution of the values - you have min, max, and percentiles (50/75/95/98/99).
Most of the nodetool commands doesn't show anything about other nodes - they are tools for providing information about current node only.
P.S. This document would be useful in explaining how to interpret this information.
As the name of the command suggests, tablehistograms reports the distribution of metadata for the partitions held by a node.
To add to what Alex Ott has already stated, the percentiles (not percentages) provide an insight on the range of metadata values. For example:
50% of the partitions for the given table have a size of 74KB or less
95% are 263KB or less
98% are 455KB or less
These metadata don't have any correlation with the number of partitions or the number of nodes in your cluster.
You are correct in that the partition key gets hashed and the resulting value determines where the partition (and its associated rows) get stored (distributed among nodes in the cluster). If you're interested, I've explained in a bit more detail with some examples in this post -- https://community.datastax.com/questions/5944/.
As far as bucketing is concerned, you would typically do that to reduce the number of rows in a partition and therefore reducing its size. The general recommendation is to keep your partition sizes less than 100MB for optimal performance but it's not a hard rule -- you can have larger partitions as long as you are aware of the tradeoffs.
In your case, the larges partition is only 455KB so size is not a concern. Cheers!

Is RethinkDB a good fit for a generic Real-time aggregation platform? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I need your help to verify if RethinkDB fits my use case.
Use case
My team is building a generic Real-time aggregation platform which needs to:
join data from a lot of Kafka topics
Joins need to be done on raw data
Topics have the same key
Data in topics is sometimes a “snapshot” (updatable) and sometimes en “event” (non-updatable)
The destination of the joined data will be some analytical OLAP DB. Clickhouse, Druid, etc. Depending on the case. These systems work with “deltas” (SCDs). Because of “snapshots”, I need stateful processing.
Updates for snapshots can come up to 7 days later
Topics receive around 20k msg/s with peaks up to 200k msg/s
Data in topics is json from 100 Bytes to 5kB
Data in topics can have duplicates
Duplicates are deduplicated with “version” json field which is part of every topic. Data should be processed only if new_version > old_version. Or if old_version didn't exist.
I already have a POC with Cassandra with five stages:
Cassandra Inserter - consumes from.all Kafka topics. Doing insert only for all topics in the same Cassandra table. Sharding is done on column which has the key as all the Kafka topics. So all the messages with the same key end-up in the same shard.
For every Cassandra insert an InsertEvent is produced to Kafka
Delta calculator - consumes InsertEvents and queries Cassandra by the sharding key. Gets all raw data and then deduplicates and creates deltas. The state is saved in another Cassandra cluster. By saving all the processed “versions”. Next time a new InsertEvent comes, we use the saved state “version” to get only two events: previous and current so we can create a DeltaEvent
DeltaEvent is produced to Kafka
ClickHouse / Druid ingest the data
So it's basically a 50/50 insert/read workload without updates to Cassandra.
With 14 Cassandra data nodes and 8 state nodes nodes it works OK up to 20k InsertEvent/s. With 25k InsertEvent/s the system begins to lag.
Nodes have 16GB Ram and disks are network storage backed by SSD (not ideal, I know, but can't change it now). Network 10 Gbit.
RethinkDB idea
I would like to do a new POC to try RethinkDB and use changefeeds to create deltas and to deduplicate. For this I would use a single table. Primary key / sharding key would be the Kafka key and all Kafka data from all topics with the same key would be joined/upserted in a single document.
The workload would be probably 10/90 insert/update. I would use squash: true, to avoid excessive reads and reduce the amount of DeltaEvents.
Do you think this is a good use case for RethinkDB?
Will it scale up to 200k msg/s which would be 20k inserts/s, 180k updates/s and around 150 k/reads via changefeeds?
I will need to delete data older than 7 days, how it will affect the insert/update/query workload?
do you have a proposal for a system which would be a better fit for this use case?
Thanks a lot,
Davor
PS: if you prefer reading a document, here it is: RethinkDB use case question.
IMHO, RehinkDB is good fit in your use case.
From RethinkDB docs
...RethinkDB scales to perform 1.3 million individual reads per second. ...RethinkDB performs well above 100 thousand operations per second in a mixed 50:50 read/write workload - while at the full level of durability and data integrity guarantees. ...performed all benchmarks across a range of cluster sizes, scaling up from one to 16 nodes.
Folks at RethinkDB have tested similar scenario using workloads from the YCSB benchmark suite and reported their results.
We found that in a mixed read/write workload, RethinkDB with two servers was able to perform nearly 16K queries per second (QPS) and scaled to almost 120K QPS while in a 16-node cluster. Under a read only workload and synchronous read settings, RethinkDB was able to scale from about 150K QPS on a single node up to over 550K QPS on 16 nodes. Under the same workload, in an asynchronous “outdated read” setting, RethinkDB went from 150K QPS on one server to 1.3M in a 16-node cluster.
Selecting workloads and hardware
...Out of the YCSB workload options, we chose to run workload A which comprises 50% reads and 50% update operations, and workload C which performs strictly read operations. All documents stored by the YCSB tests contain 10 fields with randomized 100 byte strings as values, with each document totaling about 1 KB in size.
Given the ease of scaling RethinkDB clusters across multiple instances, we deemed it necessary to observe performance when moving from a single RethinkDB instance to a larger cluster. We tested all of our workloads on a single instance of RethinkDB up to a 16-node cluster in varying increments of cluster size.
Additionally, I suggest reading through limitations on RethinkDB. I've copied some here.
There is a hard limit of 64 shards.
While there is no hard limit on the size of a single document, there is a recommended limit of 16MB for memory performance reasons.
The maximum size of a JSON query is 64M.
Primary keys are limited to 127 characters.
Secondary indexes do not store objects or null values.
Primary key strings may not include the null codepoint (U+0000).
By default, arrays on the RethinkDB server have a size limit of 100,000 elements. This can be changed on a per-query basis with the arrayLimit (or array_limit) option to run.
RethinkDB does not support Unicode collations, and does not normalize for identical characters with multiple codepoints (i.e, \u0065\u0301 and \u00e9 both represent the character “é” but RethinkDB treats them, and sorts them as, distinct characters).
Since yours is real-time system, RethinkDB memory requirements and crash recovery are also worth a read.
Furthermore, delete performance benchmark is missing.

Batch insert overflow

I am using Cassandra 3.10 and am trying to follow best practice by having a table per query so I am using the Batch insert proncipal to insert into multiple tables as a single transaction however I get the following error in the cassandra log.
Batch for [zed.payment, zed.trade_party_b_ref, zed.trade_product_type, zed.trade, zed.fx_variance_swap, zed.trade_party_a_ref, zed.trade_party_b_trade_id, zed.market_value] is of size 5.926KiB, exceeding specified threshold of 5.000KiB by 0.926KiB.
The log is saying that you are sending a batch of almost 6MB when the limit is 5MB.
You should send smaller batches of data to avoid going over that batch size limit.
You can also change the batch size limit in cassandra.yaml, but I would not recommend to change it.
Thanks for the info, the parameter in cassandra.yaml is
Log WARN on any multiple-partition batch size exceeding this value. 5kb per batch by default.
Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
which is in KB, not MB so my batch statement is really 6KB not 6MB.
After 30 years working with Oracle, this is my first venture into Cassandra so I have tried to follow the guidelines of having a separate table for each query so where I have a financial trade table which has to be queried in up to 8 different ways I have 8 tables. That then implies that an insert into the tables must be done in a batch to create what would be a single transaction in Oracle. The master table of the eight has a significant number of sibling tables which must also be included in the batch so here is my point:
If cassandra does not support transactions but relies on the batch functionality to achieve the same effect it must not impose a limit on the size of the batch. If this is not possible then cassandra is really limited to applications with VERY simple data structures.

Azure Cosmos DB each partition size limit

In Azure Cosmos DB partinioned collection, does each partition has any size limit?
As per this old document, they have a size limit of 10 GB. Is that the same now also?
https://azure.microsoft.com/en-in/blog/10-things-to-know-about-documentdb-partitioned-collections/
Regards,
Karthikeyan V.
A partitioned collection has individual 10GB partition spaces. For a given partition key, you cannot exceed 10GB of data. This has not changed.
You'll need to pick a partition key which distributes your data across many partitions, vs creating "hot" partitions which could fill up (where you'd then get an error when attempting to write content).
There are two type of collection
Single Partition Collection (10GB and 10,000 RU/s)
Partitioned Collection (250 GB and 250,000 RU/s)- you can increase the limit as needed after contacting azure team.
For partitioned collection you mush have to specify a partition key based on your query filter for better read performance and if you will not mention it will be by default single partition collection.
Note: Collection is a logical space and it can span across multiple node(hence quorum) in background based on the RU and other param, in short it's a PAAS and the infra handling is automated behind the screen, you will not have much control over it.
More info here:
Partitioning and horizontal scaling in Azure Cosmos DB

What is the maximum number of keyspaces in Cassandra?

What is the maximum number of keyspaces allowed in a Cassandra cluster? The wiki page on limitations doesn't mention one. Is there such a limit?
A keyspace is basically just a Map entry to Cassandra... you can have as many as you have memory for. Millions, easily.
ColumnFamilies are more expensive, since Cassandra will reserve a minimum of 1MB for each CF's memtable: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
You should have a look to : https://community.datastax.com/questions/12579/limit-on-number-of-cassandra-tables.html
We recommend a maximum of 200 tables total per cluster across all
keyspaces (regardless of the number of keyspaces). Each table uses 1MB
of memory to hold metadata about the tables so in your case where 1GB
is allocated to the heap, 500-600MB is used just for table metadata
with hardly any heap space left for other operations.
It is a recommendation and there is no hard-limit on the number of tables you can create in a cluster. You can create thousands if you were so inclined.
More importantly, applications take a long time to startup since the
drivers request the cluster metadata (including the schema) during the
initialisation/discovery phase. Retrieving the schema for 200 tables
is significantly less than it would take to load 500, 1000 or 3000.
This may not be important to you but there are lots of use cases where
short startup times are crucial, most notably for short-lived
serverless functions where execution time costs money and reducing
execution where possible results in thousands of dollars in savings.

Resources