Is there a performance hit with simplifying a partition key? - cassandra

Let's say I have the following unsimplified column family:
CREATE TABLE emp (
empID int,
deptID int,
first_name varchar,
last_name varchar,
PRIMARY KEY ((empID, deptID)));
The partition key is both empID and deptID.
Under the assumption I will only search this table using both of these fields, can I simplify the table and rewrite is as following?
CREATE TABLE emp2 (
empID_deptID text
first_name varchar,
last_name varchar,
PRIMARY KEY (empID_deptID));

Yes you can, but I don't see any added value in doing it. In your first code example, Cassandra concatenates empID and deptID for you.

In the precise example that you provided, there will be no difference. As a matter of fact, that is how it was done before composite partition keys became allowed in the previous versions.

Related

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

Cassandra non counter family

I attempted to create a table with counter as one of the column type in cassandra but getting the following error:
ConfigurationException: ErrorMessage code=2300 [Query invalid because
of configuration issue] message="Cannot add a counter column
(transaction_count) in a non counter column family"
My table schema is as follows:
CREATE TABLE MARKET_DATA_TRANSACTION_COUNT (
TRADE_DATE TIMESTAMP,
SECURITY_EXCHANGE TEXT,
PRODUCT_CODE TEXT,
SYMBOL TEXT,
SPREAD_TYPE TEXT,
USER_DEFINED TEXT,
PRODUCT_GUID TEXT,
CHANNEL_ID INT,
SECURITY_TYPE TEXT,
INSTRUMENT_GUID TEXT,
SECURITY_ID INT,
TRANSACTION_COUNT COUNTER,
PRIMARY KEY (TRADE_DATE));
That's a limitation of the current counter implementation. You can't mix counters and regular columns in the same table. So you need a separate table for counters.
They are thinking of removing this limitation in Cassandra 3.x. See this Jira ticket.
This is not exactly the answer to the question, might help some people with the similar error.
If you can make other columns as PRIMARY KEY then its possible.
Eg: CREATE TABLE rate_data (ts varchar, type varchar, rate counter, PRIMARY KEY (ts, type));

Cassandra range slicing on composite key

I have columnfamily with composite key like this
CREATE TABLE sometable(
keya varchar,
keyb varchar,
keyc varchar,
keyd varchar,
value int,
date timestamp,
PRIMARY KEY (keya,keyb,keyc,keyd,date)
);
What I need to do is to
SELECT * FROM sometable
WHERE
keya = 'abc' AND
keyb = 'def' AND
date < '2014-01-01'
And that is giving me this error
Bad Request: PRIMARY KEY part date cannot be restricted (preceding part keyd is either not restricted or by a non-EQ relation)
What's the best way to solve this? Do I need to alter my columnfamily?
I also need to query those table with all keya, keyb, keyc, and date.
You cannot do it in cassandra. Moreover, such a range slicing is costlier too. You are trying to slice through a set of equalities that have the lower priority according to your schema.
I also need to query those table with all keya, keyb, keyc, and date.
If you are considering to solve this problem, considering having this schema. What i would suggest is to have the keys in a separate schema
create table (
timeuuid id,
keyType text,
primary key (timeuuid,keyType))
Use the timeuuid to store the values and do a range scan based on that.
create table(
timeuuid prevTableId,
value int,
date timestamp,
primary key(prevTableId,date))
Guess , in this way, your table is normalized for better scalability in your use case and may save a lot of disk space if keys are repetitive too.

Set/Query columns form secondary compound key with CQL3

Say I have the following CF with compound Primary Key
CREATE TABLE dpt (
empID int,
deptID int,
PRIMARY KEY (deptID, empID));
Because of the compound PK, cassandra will create one row for each dept, and the employee IDs that are members of the department will be stored as columns on that row with the :empID as the column name.
Quesiton #1: can I set a value to that column (e.g the employ name) with CQL3? if so, how?
Question #2: can I see the value of <individual_employ_ID>:empID column - if exists - with CQL3?
thanks
Question #1:
CREATE TABLE dpt (
empID int,
deptID int,
empName text,
PRIMARY KEY (deptID, empID));
Question #2:
Please take a look at the examples:
http://www.datastax.com/docs/1.1/references/cql/INSERT
http://www.datastax.com/docs/1.1/references/cql/UPDATE

Can a Cassandra / CQL3 column family have a composite partition key?

CQL 3 allows for a "compound" primary key using a definition like this:
CREATE TABLE timeline (
user_id varchar,
tweet_id uuid,
author varchar,
body varchar,
PRIMARY KEY (user_id, tweet_id)
);
With a schema like this, the partition key (storage engine row key) will consist of the user_id value, while the tweet_id will be compounded into the column name. What I am looking for, instead, is for the partition key (storage engine row key) to have a composite value like user_id:tweet_id. Obviously I could do something like key = user_id + ':' + tweet_id in my application, but is there any way to have CQL 3 do this for me?
Actually, yes you can. That functionality was added in this ticket:
https://issues.apache.org/jira/browse/CASSANDRA-4179
The format for you would be:
CREATE TABLE timeline (
user_id varchar,
tweet_id uuid,
author varchar,
body varchar,
PRIMARY KEY ((user_id, tweet_id))
);
Until 1.2 comes out, the answer is no. The partition key will always be the first component. As you said, the way to do this would be to create the composite key yourself. You shouldn't shy away from this as it's actually quite common.

Resources