Maximum number of Partition in a Cassandra Table and How it depends on Disk space? - cassandra

I am using Cassandra DB, i want to know what could be the maximum number of partitions that can be made ?
Will more number of partition impact on the performance of Cassandra ?
how number of partition depend on the disk space ?

There is no maximum number of partitions in Cassandra (that I know of). The number of partitions does have an impact on performance, and the more the better (this implies a good distribution of data which is exactly what you want for scalability). Disk space has nothing to do with it. You'll use whatever you're going to store.

Related

How to design a big scale VoltDB cluser with dozens of nodes and hundreds of partitions?

If I have 32 phsical servers which have 32 cores CPU and 128G memory inside, I want to build a VoltDB cluster with all of those 32 servers with K-Safefy=2 and 32 partitions in each server, so we will get VoltDB cluster with 256 available partitions to save data.
Looks there are too many partitions to split tables especially when some tables don't have a lot of records. But there will be too many copies of table if we choice replica of table.
If we build a much smaller cluster with a couple of servers from the beginning, there's a worry that the cluster will have to scale-out soon along with the business grows. Actually I don't konw how the VoltDB will re-organize data when a cluster expand to more nodes horizontally.
Do you have comments? Appreciated.
It may be more optimal to set the sitesperhost to less than 32, so that some % of cores are free to run threads for subsystems like export or database replication, or to handle non-VoltDB processes. Typically somewhere from 8 - 24 is the optimal number.
VoltDB creates the logical partitions based on the sitesperhost, the number of hosts, and the kfactor. If you need to scale out later, you can add additional nodes to the cluster which will increase the number of partitions, and VoltDB will gradually and automatically rebalance data from existing partitions to the new ones. You must add multiple servers together if you have kfactor > 0. For kfactor=2, you would add servers in sets of 3 so that they provide their own redundancy for the new partitions.
Your data is distributed across the logical partitions based on a hash of the partition key value of a record, or the corresponding input parameter for routing the execution of a procedure to a partition. In this way, the client application code does not need to be aware of the # of partitions. It doesn't matter so much which partition each record goes to, but you can assume that any records that share the same partition key value will be located in the same partition.
If you choose partition keys well, they should be columns with high cardinality, such as ID columns. This will evenly distribute the data and procedure execution work across the partitions.
Typically a VoltDB cluster is sized based on the RAM requirements, rather than the need for performance, since the performance is very high on even a very small cluster.
You can contact VoltDB at info#voltdb.com or ask more questions at http://chat.voltdb.com if you'd like to get help with an evaluation or discuss cluster sizing and planning with an expert.
Disclaimer: I work for VoltDB.

How Cassandra Handle a select query?

I'm working on designing Cassandra column family.
I met with a situation of higher GC while SELECTing, after loading a higher density of data. That is, amount of data in a partition increased. Also for low density data, it works fine.
I want to know how Cassandra does the SELECT query (with both partition and cluster key specified)?
Is the whole set of data in a partition is loaded into memory while we execute SELECT?
Will large number of partition keys affect performance?
Cassandra does not load the entire partition into memory, but it does load IndexInfo objects which help Cassandra find the relevant CQL rows within the partition. These are short lived java objects which can create quite a bit of heap pressure (GC pauses) - this is a design issue that will be addressed in CASSANDRA-9754 (known as Birch, a b-tree implementation of the index data structure).
Until cassandra-4.0 is released, you should target 100MB for your max partition size, and break larger partitions into smaller pieces.

How Cassandra In-memory feature works

I have few questions with regards to the In-Memory feature in Cassandra
1.) I have a 4 node datacenter and in Opscenter, under memory usage , it shows there is 100GB of in-memory available. Does it mean that each of the 4 nodes have 100GB memory available or is the 100Gb the total in memory capacity for my datacenter?
2.) If really 100GB is available for In-Memory for a datacenter, is it advisable to use the full capacity? Do I need to factor replication factor as well? Say I have a 15GB data which I want to store it in In-Memory, if the replication factor is 2, will it be like we have 30GB of data in In-memory for the datacenter?
3.) In dse.yaml file, there is a property which has the value like percentage of system memory "max_memory_to_lock_fraction" and by default it is 20%. As per the guidelines from Datastax Cassandra, we need to ensure that the in memory usage does not exceed 45% of total available system memory for each node. Is this "max_memory_to_lock_fraction" the parameter that needs to be set for 45%?
4.) Datastax documentation says compression needs to be removed for In-memory table. If compression is indeed set, will it affect the read/write performance?
5.) Output of dsetool inmemorystatus has a parameter called "Current Total memory not able to lock". Is the value present in this parameter denote the available memory. Like say if the value is 1024MB, does it mean that still 1GB In-memory is available for use.
I am using DSE 4.8.11 version. Please help me as I am trying to understand this feature so as to leverage it best.
Thanks in advance.
1) It depends on how you configure it it can be per cluster (all of the available memory) or you can view graphs of individual nodes
2) Yes, replication factor increases data by factor times in total. You will have to factor that in on the cluster level. Very nice tool to help you start: https://www.ecyrd.com/cassandracalculator/
3) Yes max_memory_to_lock_fraction is what you are looking for
4) It will increase processing time, since writes in cassandra are actually cpu bound this might not be best performance wise idea.
5) Yes this means there is still memory (of specified amount), but due to settings cassandra is unable to lock it.

Does spark keep all elements of an RDD[K,V] for a particular key in a single partition after "groupByKey" even if the data for a key is very huge?

Consider I have a PairedRDD of,say 10 partitions. But the keys are not evenly distributed, i.e, all the 9 partitions having data belongs to a single key say a and rest of the keys say b,c are there in last partition only.This is represented by the below figure:
Now if I do a groupByKey on this rdd, from my understanding all data for same key will eventually go to different partitions or no data for the same key will not be in multiple partitions. Please correct me if I am wrong.
If that is the case then there can be a chance that the partition for key a can be of size that may not fit in a worker's RAM. In that case what spark will do ? My assumption is like it will spill the data to worker's disk.
Is that correct?
Or how spark handle such situations
Does spark keep all elements (...) for a particular key in a single partition after groupByKey
Yes, it does. This is a whole point of the shuffle.
the partition for key a can be of size that may not fit in a worker's RAM. In that case what spark will do
Size of a particular partition is not the biggest issue here. Partitions are represented using lazy Iterators and can easily store data which exceeds amount of available memory. The main problem is non-lazy local data structure generated in the process of grouping.
All values for the particular key are stored in memory as a CompactBuffer so a single large group can result in OOM. Even if each record separately fits in memory you may still encounter serious GC issues.
In general:
It is safe, although not optimal performance wise, to repartition data where amount of data assigned to a partition exceeds amount of available memory.
It is not safe to use PairRDDFunctions.groupByKey in the same situation.
Note: You shouldn't extrapolate this to different implementations of groupByKey though. In particular both Spark Dataset and PySpark RDD.groupByKey use more sophisticated mechanisms.

how to achieve even distribution of partitions keys in Cassandra

We modeled our data in cassandra table with partition key, lets say "pk". We have a total of 100 unique values for pk and our cluster size is 160. We are using random partitioner. When we add data to Cassandra (with replication factor of 3) for all 100 partitions, I noticed that those 100 partitions are not distributed evenly. One node has as many as 7 partitions and lot of nodes only has 1 or no partition. Given that we are using random partitioner, I expected the distribution to be reasonably even. Because 7 partitions are in the same node, thats creating a hot partition for us. Is there a better way to distribute partitions evenly?
Any input is appreciated.
Thanks
I suspect the problem is the low cardinality of your partition key. With only 100 possible values, it's not unexpected that several values end up hashing to the same nodes.
If you have 160 nodes, then only having 100 possible values for your partition key will mean you aren't using all 160 nodes effectively. An even distribution of data comes from inserting a lot of data with a high cardinality partition key.
So I'd suggest you figure out a way to increase the cardinality of your partition key. One way to do this is to use a compound partition key by including some part of your clustering columns or data fields into your partition key.
You might also consider switching to the Murmur3Partitioner, which generally gives better performance and is the current default partitioner on the newest releases. But you'd still need to address the low cardinality problem.

Resources