Cassandra Composite Columns - How are CompositeTypes chosen? - cassandra

I'm trying to understand the type used when I create composite columns.
I'm using CQL3 (via cqlsh) to create the CF and then the CLI to issue a describe command.
The Types in the Columns sorted by: ...CompositeType(Type1,Type2,...) are not the ones I'm expecting.
I'm using Cassandra 1.1.6.
CREATE TABLE CompKeyTest1 (
KeyA int,
KeyB int,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(Int32,Int32,UTF8)
Shouldn't it be (Int32,Int32,Int32)?
CREATE TABLE CompKeyTest2 (
KeyA int,
KeyB varchar,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(UTF8,Int32,UTF8)
Why isn't it the same as the types used when I define the table? I'm probably missing something basic in the type assignment...
Thanks!

The composite column name is composed of the values of primary keys 2...n and the name of the non-primary key column being saved.
(So if you have 5 non-key fields then you'll have five such columns and their column names will differ only in the last composed value which would be the non-key field name.)
So in both examples the composite column is made up of the values of KeyB, KeyC and the name of the column being stored ("MyData", in both cases). That's why you're seeing those CompositeTypes being returned.
(btw, the first key in the primary key is the partitioning key and its value is only used as the row key (if you're familiar with Cassandra under the covers). It is not used as part of any of the composite column names.)

Related

Cassandra add column after particular column

I need to alter the table to add a new column after a particular column or as last column, I have been through the document but no luck.
Let's say I'm starting with a table that has this definition:
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
1- Adding a column is a simple matter.
ALTER TABLE mykeyspace.letterstable ADD column_j TEXT;
2- After adding the new column, my table definition will look like this:
desc table mykeyspace.letterstable;
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_j TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
This is because columns in Cassandra are stored by ASCII-betical order, after the keys (so column_n will always be first, because it is the only key). I can't tell Cassandra that I want my new column_j to go after column_z. It's going to put it between column_c and column_z on its own.
Cassandra will store table data based on partition & clustering key.
Standard CQL for adding column:
ALTER TABLE keyspace.table ADD COLUMN column1 columnType;
Running DESC table for a given table via CQLSH does not portray how the data is stored. It will always list the partition key & clustering key first; then the remaining columns in alphabetical order.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
Cassandra create table won't keep column order

Create a super column using CQL3

I am upgrading my thrift api to cql3. My data contains SuperColumns as follows:
- User //column family
- Division/name //my row key
-DivHead //SuperColumn
- name //Columns
- address //Columns
I understand all the column families to be changed to tables. And the primary key becomes the rowkey. So rest are the columns.
But my data has supercolumns. how do I create supercolumns using CQL3?
CREATE TABLE user (
rowkey varchar,
division text,
head_name text,
address text,
PRIMARY KEY (rowkey, division)
)
OR
CREATE TABLE user (
rowkey varchar,
division text,
head_name text,
head_address text,
PRIMARY KEY ((rowkey, division))
)
Under the covers the first example will have each rowkey assigned to the same partition. Each rowkey will have a set of logical rows, one for each division. Those rows will contain two columns: head_name and head_address. You can query based on the rowkey and get all divisions (sorted!). Or you can query a rowkey with a range of divisions or a single division and get a subset of the divisions with their division head and address.
The second example will have one partition for each rowkey and division combination. Each such partition will be one logical row as well. The single row for each composite key will have two columns: head_name and head_address. To make a query, you must provide BOTH the rowkey and the division.
EDIT: Cleared up some bad grammar.

Cassandra range slicing on composite key

I have columnfamily with composite key like this
CREATE TABLE sometable(
keya varchar,
keyb varchar,
keyc varchar,
keyd varchar,
value int,
date timestamp,
PRIMARY KEY (keya,keyb,keyc,keyd,date)
);
What I need to do is to
SELECT * FROM sometable
WHERE
keya = 'abc' AND
keyb = 'def' AND
date < '2014-01-01'
And that is giving me this error
Bad Request: PRIMARY KEY part date cannot be restricted (preceding part keyd is either not restricted or by a non-EQ relation)
What's the best way to solve this? Do I need to alter my columnfamily?
I also need to query those table with all keya, keyb, keyc, and date.
You cannot do it in cassandra. Moreover, such a range slicing is costlier too. You are trying to slice through a set of equalities that have the lower priority according to your schema.
I also need to query those table with all keya, keyb, keyc, and date.
If you are considering to solve this problem, considering having this schema. What i would suggest is to have the keys in a separate schema
create table (
timeuuid id,
keyType text,
primary key (timeuuid,keyType))
Use the timeuuid to store the values and do a range scan based on that.
create table(
timeuuid prevTableId,
value int,
date timestamp,
primary key(prevTableId,date))
Guess , in this way, your table is normalized for better scalability in your use case and may save a lot of disk space if keys are repetitive too.

need to search data using part of composite row key in cassandra

I am new to Cassandra and just playing around with it. I have created a Column family having composite key and composite column. Following is the script for same:
create column family TestCompositeKey with key_validation_class='CompositeType(UTF8Type, TimeUUIDType)' and comparator='CompositeType(UTF8Type, UTF8Type, UTF8Type, UTF8Type)' and default_validation_class='UTF8Type';
After inserting data in the column family using Hector following is the view I am getting on CLI:
RowKey: AB:e9a87550-c84b-11e2-8236-180373b60c1a
=> (column=0007:TAR:PUB:BD_2013_01_11_0125094813, value=TESTSEARCH, timestamp=1369823914277000)
Now I want to search for data just by 'AB' given in row key as second part of key will be dynamic. It works fine when I give complete row key. Please tell me how can this be done. I am supplying search criteria on column too along with specifying key.
Thanks
Harish Kumar
You can't do this (efficiently, at least): to lookup by row key you need the whole key. In general, using TimeUUIDs as row keys should be avoided, unless you have some other table acting as an index to retrieve TimeUUIDs for a query.
If you want to lookup just by the first component of the key you should move the second component to the column composite and just have a single component as the row key. The definition would be
create column family TestCompositeKey with key_validation_class='UTF8Type' and comparator='CompositeType(TimeUUIDType, UTF8Type, UTF8Type, UTF8Type, UTF8Type)' and default_validation_class='UTF8Type';
If you used the CQL3 definition:
CREATE TABLE TestCompositeKey (
a varchar,
b timeuuid varchar,
c varchar,
d varchar,
e varchar,
f varchar,
PRIMARY KEY (a, b, c, d, e, f)
);
you would get essentially the same schema as I described. The row key (partition key in CQL language) is a, and the column names are a composite of b:c:d:e:f.

How does a CQL3 composite index with 3 fields map in the thrift column family world?

After reading this blog at planetcassandra, I'm wondering how does a CQL3 composite index with 3 fields map in the thrift column family word, For e.g.:
CREATE TABLE comments (
article_id uuid,
posted_at timestamp,
author text,
karma int,
content text,
PRIMARY KEY (article_id, posted_at)
)
Here the column article_id will be mapped to the internal row key and posted_at will be mapped to (the first part of) the cell name.
What if the table design will be
CREATE TABLE comments (
author_id varchar,
posted_at timestamp,
article_id uuid,
author text,
karma int,
content text,
PRIMARY KEY (author_id, posted_at, article_id)
)
And will the internal row key mapped to 1st 2 fields of the composite index with article_id mapped to cell name, essentially slicing for as many articles upto 2 billion entries and any query on author_id and posted_at combination is one seek on the disk?
Is the behavior same for any number of fields in a composite key?
Your answers much appreciated.
The above observation is incorrect and the correct one is here
I've personally verified:
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id = partition key, posted_at:article_id = cluster key
First part of composite key (author_id) is called "Partition Key",
rest (posted_at,article_id) are remaining keys.
Cassandra stores columns differently when composite keys are used. Partition key
becomes row key. Remaining keys are concatenated with each column
name (":" as separator) to form column names. Column values remain
unchanged.
Remaining keys (other than partition keys) are ordered,
and it's not allowed to search on any random column, you have to
start with the first one and then you can move to the second one and
so on. This is evident from "Bad Request" error.
There's an excellent explanation by Aaron Morton # his site thelastpickle.
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id + posted_at = partition key, article_id = cluster key
hence be mindful of the disk seeks as you go by second method and see the row is not getting too wide and gives real benefit compared to the first case.
If you aren't crossing the 2 billion and well within the limits, don't overdo by adopting the 2nd method, as the dispersion of records happens on the combo key.

Resources