need to search data using part of composite row key in cassandra - cassandra

I am new to Cassandra and just playing around with it. I have created a Column family having composite key and composite column. Following is the script for same:
create column family TestCompositeKey with key_validation_class='CompositeType(UTF8Type, TimeUUIDType)' and comparator='CompositeType(UTF8Type, UTF8Type, UTF8Type, UTF8Type)' and default_validation_class='UTF8Type';
After inserting data in the column family using Hector following is the view I am getting on CLI:
RowKey: AB:e9a87550-c84b-11e2-8236-180373b60c1a
=> (column=0007:TAR:PUB:BD_2013_01_11_0125094813, value=TESTSEARCH, timestamp=1369823914277000)
Now I want to search for data just by 'AB' given in row key as second part of key will be dynamic. It works fine when I give complete row key. Please tell me how can this be done. I am supplying search criteria on column too along with specifying key.
Thanks
Harish Kumar

You can't do this (efficiently, at least): to lookup by row key you need the whole key. In general, using TimeUUIDs as row keys should be avoided, unless you have some other table acting as an index to retrieve TimeUUIDs for a query.
If you want to lookup just by the first component of the key you should move the second component to the column composite and just have a single component as the row key. The definition would be
create column family TestCompositeKey with key_validation_class='UTF8Type' and comparator='CompositeType(TimeUUIDType, UTF8Type, UTF8Type, UTF8Type, UTF8Type)' and default_validation_class='UTF8Type';
If you used the CQL3 definition:
CREATE TABLE TestCompositeKey (
a varchar,
b timeuuid varchar,
c varchar,
d varchar,
e varchar,
f varchar,
PRIMARY KEY (a, b, c, d, e, f)
);
you would get essentially the same schema as I described. The row key (partition key in CQL language) is a, and the column names are a composite of b:c:d:e:f.

Related

Cassandra add column after particular column

I need to alter the table to add a new column after a particular column or as last column, I have been through the document but no luck.
Let's say I'm starting with a table that has this definition:
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
1- Adding a column is a simple matter.
ALTER TABLE mykeyspace.letterstable ADD column_j TEXT;
2- After adding the new column, my table definition will look like this:
desc table mykeyspace.letterstable;
CREATE TABLE mykeyspace.letterstable (
column_n TEXT,
column_b TEXT,
column_c TEXT,
column_j TEXT,
column_z TEXT,
PRIMARY KEY (column_n));
This is because columns in Cassandra are stored by ASCII-betical order, after the keys (so column_n will always be first, because it is the only key). I can't tell Cassandra that I want my new column_j to go after column_z. It's going to put it between column_c and column_z on its own.
Cassandra will store table data based on partition & clustering key.
Standard CQL for adding column:
ALTER TABLE keyspace.table ADD COLUMN column1 columnType;
Running DESC table for a given table via CQLSH does not portray how the data is stored. It will always list the partition key & clustering key first; then the remaining columns in alphabetical order.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
Cassandra create table won't keep column order

Create a super column using CQL3

I am upgrading my thrift api to cql3. My data contains SuperColumns as follows:
- User //column family
- Division/name //my row key
-DivHead //SuperColumn
- name //Columns
- address //Columns
I understand all the column families to be changed to tables. And the primary key becomes the rowkey. So rest are the columns.
But my data has supercolumns. how do I create supercolumns using CQL3?
CREATE TABLE user (
rowkey varchar,
division text,
head_name text,
address text,
PRIMARY KEY (rowkey, division)
)
OR
CREATE TABLE user (
rowkey varchar,
division text,
head_name text,
head_address text,
PRIMARY KEY ((rowkey, division))
)
Under the covers the first example will have each rowkey assigned to the same partition. Each rowkey will have a set of logical rows, one for each division. Those rows will contain two columns: head_name and head_address. You can query based on the rowkey and get all divisions (sorted!). Or you can query a rowkey with a range of divisions or a single division and get a subset of the divisions with their division head and address.
The second example will have one partition for each rowkey and division combination. Each such partition will be one logical row as well. The single row for each composite key will have two columns: head_name and head_address. To make a query, you must provide BOTH the rowkey and the division.
EDIT: Cleared up some bad grammar.

Cassandra Composite Column Family

I have a simple requirement in sql world i want to create
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text
userid bigint
PRIMARY KEY (key, trackingid)
)
I need a cli create command which is I am not able to do it. I need to create column family through cli as pig cannot read column family created through cqlsh (duh)
Here what I tried and didnt worked
create column family event_tracking
... WITH comparator='CompositeType(TimeUUIDType)'
... AND key_validation_class=UTF8Type
... AND default_validation_class = UTF8Type;
1) I dont know why it add the value column to it when I see it in cqlsh
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
value text,
PRIMARY KEY (key, trackingid)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
2) I am using asynatax to insert the row.
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323');").execute();
but as soon as i try to add dynamic columns, it is not able to recognize
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value, userId, event) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323', 123455, 'view');").execute();
looks like I cannot add dynamic columns through cql3
3) If I try to add new column through cql3
alter table event_tracking add eventid bigint;
it gives me
Bad Request: Cannot add new column to a compact CF
0) If you create the table with COMPACT STORAGE Pig should be able to see it, even if you create it from CQL3. But you would need to put entityId and entityType into the primary key too for that to work (compact storage basically means that the first column in the primary key becomes the row key and the following become a composite type used as the column key, and then there is only room for one more column which will be the value).
1) When you create tables the old way there will always be a value, it's the value of the column, and in CQL3 that is represented as a column called value. This is just how CQL3 maps the underlying storage model onto tables.
2) You have created a table whose columns are of the type CompositeType(TimeUUIDType), so you can only add columns that are TimeUUIDs. You can't tell C* to save a string as a TimeUUID column key.
3) Looping back to 0 use this table:
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text,
userid bigint,
PRIMARY KEY (key, trackingid, entityId, entityType)
) WITH COMPACT STORAGE
this one assumes that there can only be one trackingId/entityId/entityType combination for each userid (what's up with your inconsistent capitalization, btw?). It that's not the case you need to go the full dynamic columns route, but then you can't have different data types for entityId and entityType (but this would have been the case before CQL3 too), see this question for an example of how to do dynamic columns: Inserting arbitrary columns in Cassandra using CQL3

How does a CQL3 composite index with 3 fields map in the thrift column family world?

After reading this blog at planetcassandra, I'm wondering how does a CQL3 composite index with 3 fields map in the thrift column family word, For e.g.:
CREATE TABLE comments (
article_id uuid,
posted_at timestamp,
author text,
karma int,
content text,
PRIMARY KEY (article_id, posted_at)
)
Here the column article_id will be mapped to the internal row key and posted_at will be mapped to (the first part of) the cell name.
What if the table design will be
CREATE TABLE comments (
author_id varchar,
posted_at timestamp,
article_id uuid,
author text,
karma int,
content text,
PRIMARY KEY (author_id, posted_at, article_id)
)
And will the internal row key mapped to 1st 2 fields of the composite index with article_id mapped to cell name, essentially slicing for as many articles upto 2 billion entries and any query on author_id and posted_at combination is one seek on the disk?
Is the behavior same for any number of fields in a composite key?
Your answers much appreciated.
The above observation is incorrect and the correct one is here
I've personally verified:
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id = partition key, posted_at:article_id = cluster key
First part of composite key (author_id) is called "Partition Key",
rest (posted_at,article_id) are remaining keys.
Cassandra stores columns differently when composite keys are used. Partition key
becomes row key. Remaining keys are concatenated with each column
name (":" as separator) to form column names. Column values remain
unchanged.
Remaining keys (other than partition keys) are ordered,
and it's not allowed to search on any random column, you have to
start with the first one and then you can move to the second one and
so on. This is evident from "Bad Request" error.
There's an excellent explanation by Aaron Morton # his site thelastpickle.
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id + posted_at = partition key, article_id = cluster key
hence be mindful of the disk seeks as you go by second method and see the row is not getting too wide and gives real benefit compared to the first case.
If you aren't crossing the 2 billion and well within the limits, don't overdo by adopting the 2nd method, as the dispersion of records happens on the combo key.

Cassandra Composite Columns - How are CompositeTypes chosen?

I'm trying to understand the type used when I create composite columns.
I'm using CQL3 (via cqlsh) to create the CF and then the CLI to issue a describe command.
The Types in the Columns sorted by: ...CompositeType(Type1,Type2,...) are not the ones I'm expecting.
I'm using Cassandra 1.1.6.
CREATE TABLE CompKeyTest1 (
KeyA int,
KeyB int,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(Int32,Int32,UTF8)
Shouldn't it be (Int32,Int32,Int32)?
CREATE TABLE CompKeyTest2 (
KeyA int,
KeyB varchar,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(UTF8,Int32,UTF8)
Why isn't it the same as the types used when I define the table? I'm probably missing something basic in the type assignment...
Thanks!
The composite column name is composed of the values of primary keys 2...n and the name of the non-primary key column being saved.
(So if you have 5 non-key fields then you'll have five such columns and their column names will differ only in the last composed value which would be the non-key field name.)
So in both examples the composite column is made up of the values of KeyB, KeyC and the name of the column being stored ("MyData", in both cases). That's why you're seeing those CompositeTypes being returned.
(btw, the first key in the primary key is the partitioning key and its value is only used as the row key (if you're familiar with Cassandra under the covers). It is not used as part of any of the composite column names.)

Resources