will composite partition & compound key affect perfomance in cassandra? - cassandra

Given below, CQL for 3 tables.
Both have same column structure, But difference in setting the PRIMARY KEY.
tab1: NO compound primary key
CREATE TABLE tab1
(
key1 text,
key2 text,
key3 text,
key4 text,
data1 text,
data2 text,
data3 int,
PRIMARY KEY(key1,key2,key3,key4));
tab2: (key1,key2) forms compound primary key
CREATE TABLE tab2
(
key1 text,
key2 text,
key3 text,
key4 int,
data1 text,
data2 text,
data3 text,
PRIMARY KEY((key1,key2),key3,key4));
tab3: (key1,key2,key3) forms compound primary key
CREATE TABLE tab3
(
key1 text,
key2 text,
key3 text,
key4 int,
data1 text,
data2 text,
data3 text,
PRIMARY KEY((key1,key2,key3),key4));
While querying value1,value2,value3 is known and key4 is specified as a range.
Sample CQL query,
select data1,data2,data3 from tab3 where key1='value1' and key2='value2' and key3='value3' and key4 > 1000 and key4 < 1000000 ;
key4 may have some 50,000 records.
Which TABLE Design in better?
Which design have better read/write performance?

If you need to support range queries over key4, then it needs to be a clustering column, so that rules out tab1. Since you're always specifying an exact value for key3, there's no need to make it a clustering column, so tab3 is a better choice than tab2. Leaving key3 in the partition key will partition your data more evenly around the cluster.

Related

c* schema - denormalization vs materialized view

Schema is:
col0 int,
col1 text,
col2 text,
stamp timestamp,
somemap map<text, int>
I want to query for somemap
- using col0, col1 and a range of stamp
- using col0, col1, col2 and a range of stamp
I need every value of somemap for distinct col0, col1, col2, stamp to be present for either query (IE for the first query I want all the values of col2 to be there)
I've tried various combinations of columns for primary key but I can't find one that permits both types of queries.
I can denormalize this and create both types of tables:
- primary key ((col0, col1), stamp, col2)
- primary key ((col0, col1), col2, stamp)
What Im hoping for is a way to use a materialized view to accomplish this.

CQL IN set query

Have a table
REATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set<text>, PRIMARY KEY (uuidHotel, uuidRoom));
Tried to select with IN:
select * from tabletest where uuidhotel = 'uuidHotel' and bookedtimestampset IN ('1460710800000');
Got
'bookedtimestampset' (set<text>) cannot be restricted by a 'IN' relation"
Can I select elements by IN Set filter?
Can I select elements by IN Set filter?
No, but you can put a secondary index on bookedtimestampset and use the CONTAINS operator:
aploetz#cqlsh:stackoverflow> CREATE INDEX timeset_idx ON tabletest(bookedtimestampset);
aploetz#cqlsh:stackoverflow> SELECT uuidhotel,uuidroom FROM tabletest
WHERE uuidhotel = 'uuidHotel1' and bookedtimestampset CONTAINS '1460710800000';
uuidhotel | uuidroom
------------+----------
uuidHotel1 | uuidroom1
(1 rows)
Normally I wouldn't recommend a secondary index, but as long as you are filtering by a partition key (uuidhotel) it should perform ok.
Can I select elements by IN Set filter?
you can't use clause IN with your primary key. It is highly important to understand how significantly data model influences on query performance. Of course, you can add secondary index for column bookedtimestampset but in this case be ready to for performance degradation.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set, PRIMARY KEY (uuidHotel, uuidRoom));
your compound primary key consists of one partition key uuidHotel and one clustering key uuidRoom which means that all your hotels and rooms would physically stored on same node in order as result retrieval of rows is very efficient. bookedTimeStampSet is different column which would be spread through whole cluster and it is just impossible to restrict by this column without secondary indexing one.
Consequently. I would recommend you to create primary key according to your future queries even if you need to duplicate some data which is common practice for NoSql database such Cassandra is.
e.q.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text,
uuidRoom text, uuidGuest text, bookedTimeStamp timestamp, PRIMARY KEY
(uuidHotel, bookedTimeStamp , uuidRoom))
it allows you to make a query like
select * from tabletest where uuidhotel = 'uuidHotel' and
bookedtimestamp > '1460710800000 and bookedtimestamp < '1460710900000'

Columns ordering in Cassandra

When I create a table in CQL, is it necessary to be exact for the order of column that are NOT in the primary_key and NOT clustering columns :
CREATE TABLE user (
a ascii,
b ascii,
c ascii,
PRIMARY KEY (a)
);
Is it equivalent to ?
CREATE TABLE user (
a ascii,
c ascii, <-- switched
b ascii, <-- switched
PRIMARY KEY (a)
);
Thank you for your help
Both of those statements will fail, because of:
The extra comma.
You have not provided a primary key definition.
Assuming you had those fixed, then the answer is still "yes they are the same."
Cassandra applies its own order to your columns at table creation time. Consider this table as I have typed it:
CREATE TABLE testorder (
acolumn text,
jcolumn text,
dcolumn text,
bcolumn text,
apkey text,
bpkey text,
ackey text,
bckey text,
PRIMARY KEY ((bpkey,apkey),bckey,ackey));
After creating it, I'll describe the table so you can see the order that Cassandra has applied to the columns.
aploetz#cqlsh:stackoverflow> desc table testorder ;
CREATE TABLE stackoverflow.testorder (
bpkey text,
apkey text,
bckey text,
ackey text,
acolumn text,
bcolumn text,
dcolumn text,
jcolumn text,
PRIMARY KEY ((bpkey, apkey), bckey, ackey)
) WITH CLUSTERING ORDER BY (bckey ASC, ackey ASC)
Essentially, Cassandra will order the partition keys and the clustering keys (ordered by their precedence in the PRIMARY KEY definition), and then the columns follow in ascending order.

Cassandra wide row with every column as composite key

I have a wide row table with column
page_id int, user_id int, session_tid timeuuid and end_time timestamp
with partition key = user_id
I need to do multiple queries on the table, some based on one column and some based on another - and it turns out that I have cases with where clause on every column
As Cassandra doesnt allow me to use where clause on non-indexed, non-key column, is it ok if I make all of the columns my composite key? (currently all but end_time column are already composite key, with user_id as the partition key)
Making all columns as part of the primary key will not allow you to perform where conditions to each column in the way you're thinking.
To make an easy example if you create such a primary key
PK(key1, key2, key3, key4)
you won't be able to perform a query like
select * from mytable where key2 = 'xyz';
Because the rule is that you have to follow the order of keys to create a "multiple-where" condition.
So valid queries with multiple where are the following:
select * from mytable where key1 = 'xyz' and key2 = 'abc';
select * from mytable where key1 = 'xyz' and key2 = 'abc' and key3 = 11;
select * from mytable where key2 = 'xyz' and key2 = 'abc' and key3 = 11 and key4 = 2014;
You can ask for keyN only providing keyN-1
HTH,
Carlo

PRIMARY KEY part colname cannot be restricted by IN relation

My CQL3 table is like this
CREATE TABLE stringindice (
id text,
colname text,
colvalue blob,
PRIMARY KEY (id, colname, colvalue)
) WITH COMPACT STORAGE
and I have inserted some values in it. Now when I am trying to do something like this:
QueryBuilder.select().all().from(keySpace, indTastringindice ble).where().and(QueryBuilder.eq("id", 'rowKey")).and(QueryBuilder.in("colname", "string1", "string2"));
which is essentially
select * from stringindice where id = "rowkey" and colname IN ("string1", "string2")
I am getting following exception:
com.datastax.driver.core.exceptions.InvalidQueryException: PRIMARY KEY part colname cannot be restricted by IN relation
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
at com.datastax.driver.core.Session.execute(Session.java:110)
In the documentation of CQL3, it is written that
"Moreover, the IN relation is only allowed on the last column of the
partition key and on the last column of the full primary key."
So it seems that it is not supported!! If yes, then what is the way if I have to use something like IN for equating many values at once?
It is because you are using compact storage, so the composite column is colname:colvalue (and the value is empty). This means colname is not the last column of the full primary key.
If you don't use compact storage (which is recommended for all new data models), you have the equivalent schema:
CREATE TABLE stringindice (
id text,
colname text,
colvalue blob,
PRIMARY KEY (id, colname)
);
Then your IN query will work:
cqlsh:ks> insert into stringindice (id, colname, colvalue) VALUES ('rowkey', 'string1', '01');
cqlsh:ks> insert into stringindice (id, colname, colvalue) VALUES ('rowkey', 'string2', '02');
cqlsh:ks> insert into stringindice (id, colname, colvalue) VALUES ('rowkey', 'string3', '03');
cqlsh:ks> select * from stringindice where id = 'rowkey' and colname IN ('string1', 'string2');
id | colname | colvalue
--------+---------+----------
rowkey | string1 | 0x01
rowkey | string2 | 0x02

Resources