Columns ordering in Cassandra - cassandra

When I create a table in CQL, is it necessary to be exact for the order of column that are NOT in the primary_key and NOT clustering columns :
CREATE TABLE user (
a ascii,
b ascii,
c ascii,
PRIMARY KEY (a)
);
Is it equivalent to ?
CREATE TABLE user (
a ascii,
c ascii, <-- switched
b ascii, <-- switched
PRIMARY KEY (a)
);
Thank you for your help

Both of those statements will fail, because of:
The extra comma.
You have not provided a primary key definition.
Assuming you had those fixed, then the answer is still "yes they are the same."
Cassandra applies its own order to your columns at table creation time. Consider this table as I have typed it:
CREATE TABLE testorder (
acolumn text,
jcolumn text,
dcolumn text,
bcolumn text,
apkey text,
bpkey text,
ackey text,
bckey text,
PRIMARY KEY ((bpkey,apkey),bckey,ackey));
After creating it, I'll describe the table so you can see the order that Cassandra has applied to the columns.
aploetz#cqlsh:stackoverflow> desc table testorder ;
CREATE TABLE stackoverflow.testorder (
bpkey text,
apkey text,
bckey text,
ackey text,
acolumn text,
bcolumn text,
dcolumn text,
jcolumn text,
PRIMARY KEY ((bpkey, apkey), bckey, ackey)
) WITH CLUSTERING ORDER BY (bckey ASC, ackey ASC)
Essentially, Cassandra will order the partition keys and the clustering keys (ordered by their precedence in the PRIMARY KEY definition), and then the columns follow in ascending order.

Related

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

How to find last and first entry in cassandra (date is part of partition key)

Is it possible to find first and last entry in Cassandra database if my partition key contains text date as a part of partition key to avoid large partitions?
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)
The one solution is - to create the second table and store separately.
only: stockexchange, symbol, timestamp
This gives you ability to find the first and last timestamp by your key (stockexchange:symbol)
Please pay attention, that you have to store the data in the same moment and Cassandra is not ACID database type.
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)
CREATE TABLE trades_timestampts (
stockexchange text,
symbol text,
tid text,
ts timestamp,
PRIMARY KEY ((stockexchange, symbol), ts, tid)) WITH CLUSTERING ORDER BY (ts asc, tid asc);

CQL IN set query

Have a table
REATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set<text>, PRIMARY KEY (uuidHotel, uuidRoom));
Tried to select with IN:
select * from tabletest where uuidhotel = 'uuidHotel' and bookedtimestampset IN ('1460710800000');
Got
'bookedtimestampset' (set<text>) cannot be restricted by a 'IN' relation"
Can I select elements by IN Set filter?
Can I select elements by IN Set filter?
No, but you can put a secondary index on bookedtimestampset and use the CONTAINS operator:
aploetz#cqlsh:stackoverflow> CREATE INDEX timeset_idx ON tabletest(bookedtimestampset);
aploetz#cqlsh:stackoverflow> SELECT uuidhotel,uuidroom FROM tabletest
WHERE uuidhotel = 'uuidHotel1' and bookedtimestampset CONTAINS '1460710800000';
uuidhotel | uuidroom
------------+----------
uuidHotel1 | uuidroom1
(1 rows)
Normally I wouldn't recommend a secondary index, but as long as you are filtering by a partition key (uuidhotel) it should perform ok.
Can I select elements by IN Set filter?
you can't use clause IN with your primary key. It is highly important to understand how significantly data model influences on query performance. Of course, you can add secondary index for column bookedtimestampset but in this case be ready to for performance degradation.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set, PRIMARY KEY (uuidHotel, uuidRoom));
your compound primary key consists of one partition key uuidHotel and one clustering key uuidRoom which means that all your hotels and rooms would physically stored on same node in order as result retrieval of rows is very efficient. bookedTimeStampSet is different column which would be spread through whole cluster and it is just impossible to restrict by this column without secondary indexing one.
Consequently. I would recommend you to create primary key according to your future queries even if you need to duplicate some data which is common practice for NoSql database such Cassandra is.
e.q.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text,
uuidRoom text, uuidGuest text, bookedTimeStamp timestamp, PRIMARY KEY
(uuidHotel, bookedTimeStamp , uuidRoom))
it allows you to make a query like
select * from tabletest where uuidhotel = 'uuidHotel' and
bookedtimestamp > '1460710800000 and bookedtimestamp < '1460710900000'

Cassandra non counter family

I attempted to create a table with counter as one of the column type in cassandra but getting the following error:
ConfigurationException: ErrorMessage code=2300 [Query invalid because
of configuration issue] message="Cannot add a counter column
(transaction_count) in a non counter column family"
My table schema is as follows:
CREATE TABLE MARKET_DATA_TRANSACTION_COUNT (
TRADE_DATE TIMESTAMP,
SECURITY_EXCHANGE TEXT,
PRODUCT_CODE TEXT,
SYMBOL TEXT,
SPREAD_TYPE TEXT,
USER_DEFINED TEXT,
PRODUCT_GUID TEXT,
CHANNEL_ID INT,
SECURITY_TYPE TEXT,
INSTRUMENT_GUID TEXT,
SECURITY_ID INT,
TRANSACTION_COUNT COUNTER,
PRIMARY KEY (TRADE_DATE));
That's a limitation of the current counter implementation. You can't mix counters and regular columns in the same table. So you need a separate table for counters.
They are thinking of removing this limitation in Cassandra 3.x. See this Jira ticket.
This is not exactly the answer to the question, might help some people with the similar error.
If you can make other columns as PRIMARY KEY then its possible.
Eg: CREATE TABLE rate_data (ts varchar, type varchar, rate counter, PRIMARY KEY (ts, type));

Cassandra range slicing on composite key

I have columnfamily with composite key like this
CREATE TABLE sometable(
keya varchar,
keyb varchar,
keyc varchar,
keyd varchar,
value int,
date timestamp,
PRIMARY KEY (keya,keyb,keyc,keyd,date)
);
What I need to do is to
SELECT * FROM sometable
WHERE
keya = 'abc' AND
keyb = 'def' AND
date < '2014-01-01'
And that is giving me this error
Bad Request: PRIMARY KEY part date cannot be restricted (preceding part keyd is either not restricted or by a non-EQ relation)
What's the best way to solve this? Do I need to alter my columnfamily?
I also need to query those table with all keya, keyb, keyc, and date.
You cannot do it in cassandra. Moreover, such a range slicing is costlier too. You are trying to slice through a set of equalities that have the lower priority according to your schema.
I also need to query those table with all keya, keyb, keyc, and date.
If you are considering to solve this problem, considering having this schema. What i would suggest is to have the keys in a separate schema
create table (
timeuuid id,
keyType text,
primary key (timeuuid,keyType))
Use the timeuuid to store the values and do a range scan based on that.
create table(
timeuuid prevTableId,
value int,
date timestamp,
primary key(prevTableId,date))
Guess , in this way, your table is normalized for better scalability in your use case and may save a lot of disk space if keys are repetitive too.

Resources