Is cassandra unable to store relationships that cross partition size limit? - cassandra

I've noticed that relationships cannot be properly stored in C* due to its 100MB partition limit, denormalization doesn't help in this case and the fact that C* can have 2B cells per partition neither as those 2B cells of just Longs have 16GB ?!?!? Doesn't that cross 100MB partition size limit ?
Which is what I don't understand in general, C* proclaims it can have 2B cells but a partition sizes should not cross 100MB ???
What is the idiomatic way to do this? People say that this an ideal use case for TitanDB or JanusDB that scale well with billions of nodes and edges. How do these databases that use C* under the hood data-model this?
The use case of mine is described here https://groups.google.com/forum/#!topic/janusgraph-users/kF2amGxBDCM
Note that I'm fully aware of the fact that the answer to this question is "use extra partition key to decrease partition size" but honestly, who of us has this possibility? Especially in modeling relationships ... I'm not interested in relationship that happened in a particular hour...

Maximum number of cells (rows x columns) in a partition is 2 billion and single column value size is 2 GB ( 1 MB is recommended)
Source : http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html
Partition size 100MB is not the upper limit. If you check the datastax doc
For efficient operation, partitions must be sized within certain limits in Apache Cassandra™. Two measures of partition size are the number of values in a partition and the partition size on disk. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application will have different efficiency parameters, but a good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB
You can see that for efficient operation and low heap pressure they just made a good rule of thumb is to keep number of row 100,000 and disk size 100MB in a single partition.
TitanDB or JanusDB stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges (and properties).
They used VertexID is the partition key, PropertyKeyID or EdgeID as clustering key and property value or edge properties as normal column.
If you use cassandra as storage backend.
In TitanDB or JanusDB, For efficient operation and low heap pressure, same rule apply, means number of edge and property of a vertex is 100,000 and size 100MB

Related

cassandra partion key grow limit?

what means partions my grow large ? I think cassandra can handle a very large size of it. Why they use in this example 2 partion keys ?
And what I do maybe both partiton keys are too large ?
The example which you gave is one of the ways for preventing partitions to become too large. In Cassandra partition key ( part of primary key) is used for grouping similar set of rows.
Here in left side data model, user_id is the partition key which means every video interaction by that user will be placed in same partition. As mentioned in example comment, if user is active and has 1000 interaction daily then in 60 days (2 months) you will have 60000 rows for that user. This may breach Cassandra permissible partition size (in terms of data size stored in single partirion).
So to avoid this situation there are many ways you can avoid partition size to grow too big. For example, you can do
Make another column from that table a part of partition key. This is done in the example above. The video_id is made part of partition key along with user_id.
Bucketing - This is the strategy which is used in time series data generally where you make multiple buckets of a partition key. For example if date is your partition key then you can create 24 buckets as date_1, date_2,.....,date_24. Now you have divided your partition key into smaller partition keys and hence you divided one big partition into 24 small partitions.
The main idea is to avoid your partition to grow too big in size. This is a data modeling technique which one should be aware of while creating data model for Cassandra.
If still you have large partition size, you need to remodel your data model based on various data modelling techniques available. For that I would recommend understand your data, estimate rate of growth, calculate estimated size of partition and if your data model is not meeting the partition size demand then refine your data model.

What exactly is a partition size in cassandra?

I am new to Cassandra, I have a cassandra cluster with 6 nodes. I am trying to find the partition size,
Tried to fetch it with this basic command
nodetool tablehistograms keyspace.tablename
Now, I am wondering how is it calculated and why the result has only 5 records other than min, max, while the number of nodes are 6. Does node size and number of partitions for a table has any relation?
Fundamentally, what I know is partition key is used to hash and distribute data to be persisted across various nodes
When exactly should we go for bucketing? I am assuming that Cassandra has got a partitioner that take care of distributed persistence across nodes.
The number of entries in this column is not related to the number of nodes. It shows the distribution of the values - you have min, max, and percentiles (50/75/95/98/99).
Most of the nodetool commands doesn't show anything about other nodes - they are tools for providing information about current node only.
P.S. This document would be useful in explaining how to interpret this information.
As the name of the command suggests, tablehistograms reports the distribution of metadata for the partitions held by a node.
To add to what Alex Ott has already stated, the percentiles (not percentages) provide an insight on the range of metadata values. For example:
50% of the partitions for the given table have a size of 74KB or less
95% are 263KB or less
98% are 455KB or less
These metadata don't have any correlation with the number of partitions or the number of nodes in your cluster.
You are correct in that the partition key gets hashed and the resulting value determines where the partition (and its associated rows) get stored (distributed among nodes in the cluster). If you're interested, I've explained in a bit more detail with some examples in this post -- https://community.datastax.com/questions/5944/.
As far as bucketing is concerned, you would typically do that to reduce the number of rows in a partition and therefore reducing its size. The general recommendation is to keep your partition sizes less than 100MB for optimal performance but it's not a hard rule -- you can have larger partitions as long as you are aware of the tradeoffs.
In your case, the larges partition is only 455KB so size is not a concern. Cheers!

Partition key for a lookup table in Azure Cosmos DB Tables

I have a very simple lookup table I want to call from an Azure function.
Schema is incredibly simple:
Name | Value 1 | Value 2
Name will be unique, but value 1 and value 2 will not be. There is no other data in the lookup table.
For an Azure Table you need a partition key and a row key. Obviously the rowkey would be the Name field.
What exactly should I use for Partition Key?
Right now, I'm using a constant because there won't be a ton of data (maybe a couple hundred rows at most) but using a constant seems to go against the point.
This answer applies to all Cosmos DB containers, including Tables.
When does it make sense to store your Cosmos DB container in a single partition (use a constant as the partition key)?
If you are sure the data size of your container will always remain well under 10GB.
If you are sure the throughput requirement for your container will always remain under 10,000 RU/s (RU per second).
If either of the above conditions are false, or if you are not sure about future growth of data size or throughput requirements then using a partition key based on the guidelines below will allow the container to scale.
How partitioning works in Cosmos DB
Cosmos groups container items into a set of logical partitions based on the partition key. These logical partitions are then mapped to physical partitions. A physical partition is the unit of compute/storage which makes up the underlying database infrastructure.
You can determine how your data is split into logical partitions by your choice of partition key. You have no control over how your logical partitions are mapped to physical partitions, Cosmos handles this automatically and transparently.
Distributing your container across a large number of physical partitions is the way Cosmos allows the container to scale to virtually unlimited size and throughput.
Each logical partition can contain a maximum of 10GB of data. An unpartitioned container can have a maximum throughput of 10,000 RU/s which implies there is a limit of 10,000 RU/s per logical partition.
The RU/s allocated to your container are evenly split across all physical partitions hosting the container's data. For instance, if your container has 4,000 RU/s allocated and its logical partitions are spread across 4 physical partitions then each physical partition will have 1,000 RU/s allocated to it, which also means that if one of your physical partitions is under a heavly load or 'hot', it will get rate-limited at 1,000 RU/s, not at 4,000. This is why it is very important to choose a partition key that spreads your data, and access to the data, evenly across partitions.
If your container is in a single logical partition, it will always be mapped to a single physical partition and the entire allocation of RU/s for the container will always be available.
All Cosmos DB transactions are scoped to a single logical partition, and the execution of a stored procedure or trigger is also scoped to a single logical partition.
How to choose a good partition key
Choose a partition key that will evenly distribute your data across logical partitions, which in turn will help ensure the data is evenly mapped across physical partitions. This will prevent 'bottleneck' or 'hot' partitions which will cause rate-limiting and may increase your costs.
Choose a partition key that will be the filter criteria for a high percentage of your queries. By providing the partition key as filter to your query, Cosmos can efficiently route your query to the correct partition. If the partition key is not supplied it will result in a 'fan out' query, which is sent to all partitions which will increase your RU cost and may hinder performance. If you frequently filter based on multiple fields see this article for guidance.
Summary
The primary purpose of partitioning your containers in Cosmos DB is allowing the continers to scale in terms of both storage and throughput.
Small containers which will not grow significantly in data size or throughput requirements can use a single partition.
Large containers, or containers expected to grow in data size or throughput requirements should be partitioned using a well chosen partition key.
The choice of partition key is critical and may significantly impact your ability to scale, your RU cost and the performance of your queries.

Does having unique and lot of small partition in a table effect performance or create extra load in cassandra

I have a table with 4 million unique partition keys
select count(*) from "KS".table;
count
4355748
(1 rows)
I have read the cardinality of Partition Key should not too high and also not too low, which
means don’t make partition key too unique. Is it correct?
The table does not have any clustering key. Will changing data partitioning help with the load?
It really depends on the use case... If you don't have natural clustering by partition, then maybe little sense to introduce it. Also, what are the read patterns? Do you need to read multiple rows in one go, or not?
Number of partitions has an effect on the size of the bloom filter, key cache, etc., so as you increase the number of partitions, bloom filter is increased, and key cache has less hits (until you increase its size).
As far as I know, Cassandra is using consistent hashing for mapping partition key to physical partition, so cardinality should not matter.

One bigger partition or few smaller but more distributed partitions for Range Queries in Cassandra?

We have a table that stores our data partitioned by files. One file is 200MB to 8GB in json - but theres a lot of overhead obviously. Compacting the raw data will lower this drastically. I ingested about 35 GB of json data and only one node got slightly more than 800 MB data. This is possibly due to "write hotspots" -- but we only write once and read only. We do not update data. Currently, we have one partition per file.
By using secondary indexes, we search for partitions in the database that contain a specific geolocation (= first query) and then take the result of this query to range query a time range of the found partitions (= second query). This might even be the whole file if needed but in 95% of the queries only chunks of a partition are queried.
We have a replication factor of 2 on a 6 node cluster. Data is fairly even distributed, every node owns 31,9% to 35,7% (effective) data according to nodetool status *tablename*.
Good read performance is key for us.
My questions:
How big is too big for a partition in terms of volume or row size? Is there a rule of thumb for this?
For Range Query performance: Is it better to split up our "big" partitions to have more smaller partitions? We built our schema with "big" partitions because we thought that when we do range queries on a partition, it would be good to have it all on one node so data can be fetched easily. Note that the data is also available on one replica due to RF 2.
C* supports very huge rows, but it doesn't mean it is a good idea to go to that level. The right limit depends on specific use cases, but a good ballpark value could be between 10k and 50k. Of course, everything is a compromise, so if you have "huge" (in terms of bytes) rows then heavily limit the numbers of rows in each partition. If you have "small" (in terms of bytes) rows them you can relax that limit a bit. This is because one partition means one node only due to your RF=1, so all your query for a specific partition will hit only one node.
Range queries should ideally go to one partition only. A range query means a sequential scan on your partition on the node getting the query. However, you will limit yourself to the throughput of that node. If you split your range queries between more nodes (that is you change the way you partition your data by adding something like a bucket) you need to get data from different nodes as well performing parallel queries, directly increasing the total throughput. Of course you'd lose the order of your records within different buckets, so if the order in your partition matters, then that could not be feasible.

Resources