Cassandra Composite Column Family - cassandra

I have a simple requirement in sql world i want to create
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text
userid bigint
PRIMARY KEY (key, trackingid)
)
I need a cli create command which is I am not able to do it. I need to create column family through cli as pig cannot read column family created through cqlsh (duh)
Here what I tried and didnt worked
create column family event_tracking
... WITH comparator='CompositeType(TimeUUIDType)'
... AND key_validation_class=UTF8Type
... AND default_validation_class = UTF8Type;
1) I dont know why it add the value column to it when I see it in cqlsh
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
value text,
PRIMARY KEY (key, trackingid)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
2) I am using asynatax to insert the row.
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323');").execute();
but as soon as i try to add dynamic columns, it is not able to recognize
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value, userId, event) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323', 123455, 'view');").execute();
looks like I cannot add dynamic columns through cql3
3) If I try to add new column through cql3
alter table event_tracking add eventid bigint;
it gives me
Bad Request: Cannot add new column to a compact CF

0) If you create the table with COMPACT STORAGE Pig should be able to see it, even if you create it from CQL3. But you would need to put entityId and entityType into the primary key too for that to work (compact storage basically means that the first column in the primary key becomes the row key and the following become a composite type used as the column key, and then there is only room for one more column which will be the value).
1) When you create tables the old way there will always be a value, it's the value of the column, and in CQL3 that is represented as a column called value. This is just how CQL3 maps the underlying storage model onto tables.
2) You have created a table whose columns are of the type CompositeType(TimeUUIDType), so you can only add columns that are TimeUUIDs. You can't tell C* to save a string as a TimeUUID column key.
3) Looping back to 0 use this table:
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text,
userid bigint,
PRIMARY KEY (key, trackingid, entityId, entityType)
) WITH COMPACT STORAGE
this one assumes that there can only be one trackingId/entityId/entityType combination for each userid (what's up with your inconsistent capitalization, btw?). It that's not the case you need to go the full dynamic columns route, but then you can't have different data types for entityId and entityType (but this would have been the case before CQL3 too), see this question for an example of how to do dynamic columns: Inserting arbitrary columns in Cassandra using CQL3

Related

Cassandra add column after particular column

I need to alter the table to add a new column after a particular column or as last column, I have been through the document but no luck.
Let's say I'm starting with a table that has this definition:
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
1- Adding a column is a simple matter.
ALTER TABLE mykeyspace.letterstable ADD column_j TEXT;
2- After adding the new column, my table definition will look like this:
desc table mykeyspace.letterstable;
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_j TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
This is because columns in Cassandra are stored by ASCII-betical order, after the keys (so column_n will always be first, because it is the only key). I can't tell Cassandra that I want my new column_j to go after column_z. It's going to put it between column_c and column_z on its own.
Cassandra will store table data based on partition & clustering key.
Standard CQL for adding column:
ALTER TABLE keyspace.table ADD COLUMN column1 columnType;
Running DESC table for a given table via CQLSH does not portray how the data is stored. It will always list the partition key & clustering key first; then the remaining columns in alphabetical order.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
Cassandra create table won't keep column order

CQL IN set query

Have a table
REATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set<text>, PRIMARY KEY (uuidHotel, uuidRoom));
Tried to select with IN:
select * from tabletest where uuidhotel = 'uuidHotel' and bookedtimestampset IN ('1460710800000');
Got
'bookedtimestampset' (set<text>) cannot be restricted by a 'IN' relation"
Can I select elements by IN Set filter?
Can I select elements by IN Set filter?
No, but you can put a secondary index on bookedtimestampset and use the CONTAINS operator:
aploetz#cqlsh:stackoverflow> CREATE INDEX timeset_idx ON tabletest(bookedtimestampset);
aploetz#cqlsh:stackoverflow> SELECT uuidhotel,uuidroom FROM tabletest
WHERE uuidhotel = 'uuidHotel1' and bookedtimestampset CONTAINS '1460710800000';
uuidhotel | uuidroom
------------+----------
uuidHotel1 | uuidroom1
(1 rows)
Normally I wouldn't recommend a secondary index, but as long as you are filtering by a partition key (uuidhotel) it should perform ok.
Can I select elements by IN Set filter?
you can't use clause IN with your primary key. It is highly important to understand how significantly data model influences on query performance. Of course, you can add secondary index for column bookedtimestampset but in this case be ready to for performance degradation.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set, PRIMARY KEY (uuidHotel, uuidRoom));
your compound primary key consists of one partition key uuidHotel and one clustering key uuidRoom which means that all your hotels and rooms would physically stored on same node in order as result retrieval of rows is very efficient. bookedTimeStampSet is different column which would be spread through whole cluster and it is just impossible to restrict by this column without secondary indexing one.
Consequently. I would recommend you to create primary key according to your future queries even if you need to duplicate some data which is common practice for NoSql database such Cassandra is.
e.q.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text,
uuidRoom text, uuidGuest text, bookedTimeStamp timestamp, PRIMARY KEY
(uuidHotel, bookedTimeStamp , uuidRoom))
it allows you to make a query like
select * from tabletest where uuidhotel = 'uuidHotel' and
bookedtimestamp > '1460710800000 and bookedtimestamp < '1460710900000'

How is Cassandra sorting static column families

As far as I know, a comparator is specified on the column family level. So far I have use it with dynamic columns (wide-rows). Which type of comparator is Cassandra using when you create a static column family using CQL?
CREATE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob
);
and what happens if you throw a composite key into the mix.
CREATE TABLE songs (
id uuid,
title text,
album text,
artist text,
data blob,
PRIMARY KEY ((id, title), album)
);
http://cassandra.apache.org/doc/cql3/CQL.html#createTablepartitionClustering
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_compound_keys_c.html
On a given physical node, rows for a given partition key are stored in the order induced by the clustering columns.
So in the 2nd case your partition key is (id, title), and clustering key is album, meaning all the rows for a given partition key will be stored ordered by album

Choosing the right schema for cassandra "table" in CQL3

We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.

Cassandra Composite Columns - How are CompositeTypes chosen?

I'm trying to understand the type used when I create composite columns.
I'm using CQL3 (via cqlsh) to create the CF and then the CLI to issue a describe command.
The Types in the Columns sorted by: ...CompositeType(Type1,Type2,...) are not the ones I'm expecting.
I'm using Cassandra 1.1.6.
CREATE TABLE CompKeyTest1 (
KeyA int,
KeyB int,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(Int32,Int32,UTF8)
Shouldn't it be (Int32,Int32,Int32)?
CREATE TABLE CompKeyTest2 (
KeyA int,
KeyB varchar,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(UTF8,Int32,UTF8)
Why isn't it the same as the types used when I define the table? I'm probably missing something basic in the type assignment...
Thanks!
The composite column name is composed of the values of primary keys 2...n and the name of the non-primary key column being saved.
(So if you have 5 non-key fields then you'll have five such columns and their column names will differ only in the last composed value which would be the non-key field name.)
So in both examples the composite column is made up of the values of KeyB, KeyC and the name of the column being stored ("MyData", in both cases). That's why you're seeing those CompositeTypes being returned.
(btw, the first key in the primary key is the partitioning key and its value is only used as the row key (if you're familiar with Cassandra under the covers). It is not used as part of any of the composite column names.)

Resources