Cassandra Cell Number Limitation - cassandra

is this 2 billion cells per partition limit still valid?
http://wiki.apache.org/cassandra/CassandraLimitations
Let's say you save 16 bytes on average per cell. Then you "just" can persist 16*2e9 bytes = 32 GB of data (plus column name) on one machine!?
Or if you imagine a quadratic table you will be able to store 44721 rows with 44721 columns each!?
Doesn't really sound like Big Data.
Is this correct?
Thanks!
Malte

The 2 billion cell limit is still valid and you most likly want to remodel your data if you start seeing that many cells per partition.
The maximum number of cells (rows x columns) in a single partition is
2 billion.
A partition is defined by they partition key in CQL and will define where a particular piece of data will live. For example if I had two nodes with a fictional range of 0-100 and 100-200. Partition keys which hashed to between 0 and 100 would reside on the first node and those with hashed value of between 100 and 200 would reside on the second node. In reality Cassandra uses the Murmur3 algorithm to hash primary keys generating values between -2^63 and 2^63-1.
The real limitation tends to be based on how many unique values you have for your partition key. If you don't have a good deal of uniqueness within a single column many users combine columns to generate more uniqueness(composite primary key).
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/create_table_r.html
More info on hashing and how C* holds data.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerAbout_c.html

Related

cassandra: `sstabledump` output questions

I'm inspecting the output of sstabledump, to gain better understanding of the cassandra data model, and I have some questions
From the output of of sstabledump it seems that
a table is a list of partitions (split by partition key)
a partition is a list of rows (split according to clustering key)
a row is a map of key-value pairs, where the keys belong in a predefined list
Question 1: For each partition, as well as for each row inside a partition, there is a position key. What does this value correspond to? Physical storage details? And how exactly?
Question 2: Each row inside each partition has a type: row key-value pair. Could this type be anything else? If yes, what? If not
why have a value that is always the same?
why is cassandra is classified as wide-column and other similar terms? Looks more like a two-level row storage.
Partition key is the murmur3 hash of whatever you assigned as the primary key. Consistent hashing is used with that hash to determine which node in the cluster that partition belongs to and its replicas. Within each partition data is sorted by clustering key, and then by cell name within the row. The structure is used so redundant things like timestamps if inserted for a row at once is only inserted once as a vint delta sequence from the partitions to save space.
On disk the partitions are sorted in order of this hashed key. The output of the position key is just referring to where in the sstable's data file its located (decompressed byte offset). type can also identify in that spot as a static block, which is located at the beginning of each partition for any static cells or a ranged tombstone marker (beginning or end). Note that values are sometimes for sstabledump repeated in json for readability even if not physically written on disk (ie repeated timestamps).
You can have many of these rows inside a partition, a common datamodel for time series for example is to use timestamp as the clustering key which makes very wide partitions with millions of rows. Pre 3.0 as well the data storage was closer to big table's design. It was essentially a Map<byte[], SortedMap<byte[], Cell>> where the Comparator of the sorted map was changed based on schema. It did not differentiate rows and columns within a partition, and it lead to massive amounts redundant data and was redesigned to fit the query language better.
Some more references:
Explanation of motivation of 3.0 change by DataStax here
Blog post by TLP has a good detailed explanation of the new disk format
CASSANDRA-8099

How to increase the Cassandra row size above 64KB

I read that Apache Cassandra supported maximum size of a row is 64KB. But I need to save a record with a size of 560 KB. Is that possible.
Yes, it is possible if you store data in a column value instead of column key.
In Cassandra, the 64KB limitation is only for column keys, which determine the ordering of data in a partition. For column values, the size limitation is 2GB.
This page describes the difference between clustering columns (aka column keys) and regular columns (aka column values).

What are the maximum number of columns allowed in Cassandra

Cassandra published its technical limitations but did not mention the max number of columns allowed. Is there a maximum number of columns? I have a need to store 400+ fields. Is this possible in Cassandra?
The maximum number of columns per row (or a set of rows, which is called "partition" in Cassandra's CQL) is 2 billion (but the partition must also fit on a physical node, see docs).
400+ fields is not a problem.
As per Cassandra technical limitation page, total no. of cells together cannot exceed 2 billion cells (rows X columns).
You can have a table with (1 row X 2 billion columns) and no more rows will be allowed in that table, so the limit is not 2 billion columns per row but limit is on total no. of cells in a partition.
https://wiki.apache.org/cassandra/CassandraLimitations
Rajmohan's answer is technically correct. On the other hand, if you have 400 CQL columns, you most likely aren't optimizing your data model. You want to generate cassandra wide rows using partition keys and clustering columns in CQL.
Moreover, you don't want to have rows that are too wide from a practical (performance) perspective. A conservative rule of thumb is keep your partitions under the 100's of megs or 100,000's of cells.
Take a look at these two links to help wrap your head around this.
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
http://www.sestevez.com/sestevez/CASTableSizer/

cassandra primary key design

I am using datastax enterprise 4.5. Is there any disadvantage of defining a composite partition key than only a single column partition key in terms of any performance? What if one column of composite partition has high cardinality but the other coulmn of the composite has low cardinality?
A composite key is used to increase the cardinality of your partitions. For example a key like PRIMARY KEY ((x,y)) with 5 values of x and 10 values of y will end up creating 50 different partitions. This is usefuls if you need to distribute your data more but is unnecessary if you have a single variable with high enough cardinality.
A more realistic example might be creating a composite key of PRIMARY KEY ((Gender, ZipCode), age, userid). If you used only Gender as the Partition key you would end up with only 2 partitions to store your data! Adding zipcode allows for a total of all 99999 zipcodes or (zip+4 to get even more) while still allowing you to segregate your data by gender. This would be ideal for looking demographic information by location or something like that.
Basically the rule of thumb is that you want a large number of partitions to avoid hotspots in your cluster and composite keys allow an easy way of increasing the number of partitions by combining the cardinality of your fields.

Why many refer to Cassandra as a Column oriented database?

Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.
According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.
In my opinion, this is strictly row-oriented. Is there something I'm missing?
If you take a look at the Readme file at Apache Cassandra git repo, it says that,
Cassandra is a partitioned row store. Rows are organized into tables
with a required primary key.
Partitioning means that Cassandra can distribute your data across
multiple machines in an application-transparent matter. Cassandra will
automatically repartition as machines are added and removed from the
cluster.
Row store means that like relational databases, Cassandra organizes
data by rows and columns.
Column oriented or columnar databases are stored on disk column wise.
e.g: Table Bonuses table
ID Last First Bonus
1 Doe John 8000
2 Smith Jane 4000
3 Beck Sam 1000
In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as,
"Bonuses" : {
row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
...
}
Also, the number of columns in each row doesn't have to be the same. One row can have 100 columns and the next row can have only 1 column.
Read this for more details.
Yes, the "column-oriented" terminology is a bit confusing.
The model in Cassandra is that rows contain columns. To access the smallest unit of data (a column) you have to specify first the row name (key), then the column name.
So in a columnfamily called Fruit you could have a structure like the following example (with 2 rows), where the fruit types are the row keys, and the columns each have a name and value.
apple -> colour weight price variety
"red" 100 40 "Cox"
orange -> colour weight price origin
"orange" 120 50 "Spain"
One difference from a table-based relational database is that one can omit columns (orange has no variety), or add arbitrary columns (orange has origin) at any time. You can still imagine the data above as a table, albeit a sparse one where many values might be empty.
However, a "column-oriented" model can also be used for lists and time series, where every column name is unique (and here we have just one row, but we could have thousands or millions of columns):
temperature -> 2012-09-01 2012-09-02 2012-09-03 ...
40 41 39 ...
which is quite different from a relational model, where one would have to model the entries of a time series as rows not columns. This type of usage is often referred to as "wide rows".
You both make good points and it can be confusing. In the example where
apple -> colour weight price variety
"red" 100 40 "Cox"
apple is the key value and the column is the data, which contains all 4 data items. From what was described it sounds like all 4 data items are stored together as a single object then parsed by the application to pull just the value required. Therefore from an IO perspective I need to read the entire object. IMHO this is inherently row (or object) based not column based.
Column based storage became popular for warehousing, because it offers extreme compression and reduced IO for full table scans (DW) but at the cost of increased IO for OLTP when you needed to pull every column (select *). Most queries don't need every column and due to compression the IO can be greatly reduced for full table scans for just a few columns. Let me provide an example
apple -> colour weight price variety
"red" 100 40 "Cox"
grape -> colour weight price variety
"red" 100 40 "Cox"
We have two different fruits, but both have a colour = red. If we store colour in a separate disk page (block) from weight, price and variety so the only thing stored is colour, then when we compress the page we can achieve extreme compression due to a lot of de-duplication. Instead of storing 100 rows (hypothetically) in a page, we can store 10,000 colour's. Now to read everything with the colour red it might be 1 IO instead of thousands of IO's which is really good for warehousing and analytics, but bad for OLTP if I need to update the entire row since the row might have hundreds of columns and a single update (or insert) could require hundreds of IO's.
Unless I'm missing something I wouldn't call this columnar based, I'd call it object based. It's still not clear on how objects are arranged on disk. Are multiple objects placed into the same disk page? Is there any way of ensuring objects with the same meta data go together? To the point that one fruit might contain different data than another fruit since its just meta data or xml or whatever you want to store in the object itself, is there a way to ensure certain matching fruit types are stored together to increase efficiency?
Larry
The most unambiguous term I have come across is wide-column store.
It is a kind of two-dimensional key-value store, where you use a row key and a column key to access data.
The main difference between this model and the relational ones (both row-oriented and column-oriented) is that the column information is part of the data.
This implies data can be sparse. That means different rows don't need to share the same column names nor number of columns. This enables semi-structured data or schema free tables.
You can think of wide-column stores as tables that can hold an unlimited number of columns, and thus are wide.
Here's a couple of links to back this up:
This mongodb article
This Datastax article mentions it too, although it classifies Cassandra as a key-value store.
This db-engines article
This 2013 article
Wikipedia
Column Family does not mean it is column-oriented. Cassandra is column family but not column-oriented. It stores the row with all its column families together.
Hbase is column family as well as stores column families in column-oriented fashion. Different column families are stored separately in a node or they can even reside in different node.
IMO that's the wrong term used for Cassandra. Instead, it is more appropriate to call it row-partition store. Let me provide you some details on it:
Primary Key, Partitioning Key, Clustering Columns, and Data Columns:
Every table must have a primary key with unique constraint.
Primary Key = Partition key + Clustering Columns
# Example
Primary Key: ((col1, col2), col3, col4) # primary key uniquely identifies a row
# we need to choose its components partition key
# and clustering columns so that each row can be
# uniquely identified
Partition Key: (col1, col2) # decides on which node to store the data
# partitioning key is mandatory, and it
# can be made up of one column or multiple
Clustering Columns: col3, col4 # decides arrangement within a partition
# clustering columns are optional
Partition key is the first component of Primary key. Its hashed value is used to determine the node to store the data. The partition key can be a compound key consisting of multiple columns. We want almost equal spreads of data, and we keep this in mind while choosing primary key.
Any fields listed after the Partition Key in Primary Key are called Clustering Columns. These store data in ascending order within the partition. The clustering column component also helps in making sure the primary key of each row is unique.
You can use as many clustering columns as you would like. You cannot use the clustering columns out of order in the SELECT statement. You may choose to omit using a clustering column in you SELECT statement. That's OK. Just remember to sue them in order when you are using the SELECT statement. But note that, in your CQL query, you can not try to access a column or a clustering column if you have not used the other defined clustering columns. For example, if primary key is (year, artist_name, album_name) and you want to use city column in your query's WHERE clause, then you can use it only if your WHERE clause makes use of all of the columns which are part of primary key.
Tokens:
Cassandra uses tokens to determine which node holds what data. A token is a 64-bit integer, and Cassandra assigns ranges of these tokens to nodes so that each possible token is owned by a node. Adding more nodes to the cluster or removing old ones leads to redistributing these token among nodes.
A row's partition key is used to calculate a token using a given partitioner (a hash function for computing the token of a partition key) to determine which node owns that row.
Cassandra is Row-partition store:
Row is the smallest unit that stores related data in Cassandra.
Don't think of Cassandra's column family (that is, table) as a RDBMS table, but think of it as a dict of a dict (here dict is data structure similar to Python's OrderedDict):
the outer dict is keyed by a row key (primary key): this determines which partition and which row in partition
the inner dict is keyed by a column key (data columns): this is data in dict with column names as keys
both dict are ordered (by key) and are sorted: the outer dict is sorted by primary key
This model allows you to omit columns or add arbitrary columns at any time, as it allows you to have different data columns for different rows.
Cassandra has a concept of column families(table), which originally comes from BigTable. Though, it is really misleading to call them column-oriented as you mentioned. Within each column family, they store all columns from a row together, along with a row key, and they do not use column compression. Thus, the Bigtable model is still mostly row-oriented.

Resources