How to use static column in scylladb and cassandra? - cassandra

I am new in scylladb and cassandra, I am facing some issues in querying data from the table, following is the schema I have created:
CREATE TABLE usercontacts (
userID bigint, -- userID
contactID bigint, -- Contact ID lynkApp userID
contactDeviceToken text, -- Device Token
modifyDate timestamp static ,
PRIMARY KEY (contactID,userID)
);
CREATE MATERIALIZED VIEW usercontacts_by_contactid
AS SELECT userID, contactID, contactDeviceToken,
FROM usercontacts
contactID IS NOT NULL AND userID IS NOT NULL AND modifyDate IS NOT NULL
-- Need to not null as these are the primary keys in main
-- table same structure as the main table
PRIMARY KEY(userID,contactID);
CREATE MATERIALIZED VIEW usercontacts_by_modifyDate
AS SELECT userID,contactID,contactDeviceToken,modifyDate
FROM usercontacts WHERE
contactID IS NOT NULL AND userID IS NOT NULL AND modifyDate IS NOT NULL
-- Need to not null as these are the primary keys in main
-- table same structure as the main table
PRIMARY KEY (modifyDate,contactID);
I want to create materialized view for contact table which is usercontacts_by_userid and usercontacts_by_modifydate
I need the following queries in case of when I set modifydate (timestamp) static:
update usercontacts set modifydate="newdate" where contactid="contactid"
select * from usercontacts_by_modifydate where modifydate="modifydate"
delete from usercontacts where contactid="contactid"

It is not currently possible to create a materialized view that includes a static column, either as part of the primary key or just as a regular column.
Including a static row would require the whole base table (usercontacts) to be read when the static column is changed, so that the view rows could be re-calculated. This has a significant performance penalty.
Having the static row be the view's partition key means that there would only be one entry in the view for all the rows of a partition. However, secondary indexes do work in this case, and you can use that instead.
This is valid for both Scylla and Cassandra at the moment.

Related

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

Materialised view error in Cassandra

I am new to Cassandra, I am trying to create a table and materialized view. but it not working.
My queries are:
-- all_orders
create table all_orders (
id uuid,
order_number bigint,
country text,
store_number bigint,
supplier_number bigint,
flow_type int,
planned_delivery_date timestamp,
locked boolean,
primary key ( order_number,store_number,supplier_number,planned_delivery_date ));
-- orders_by_date
CREATE MATERIALIZED VIEW orders_by_date AS
SELECT
id,
order_number,
country,
store_number,
supplier_number,
flow_type,
planned_delivery_date,
locked,
FROM all_orders
WHERE planned_delivery_date IS NOT NULL AND order_number IS NOT NULL
PRIMARY KEY ( planned_delivery_date )
WITH CLUSTERING ORDER BY (store_number,supplier_number);
I am getting an exception like this:
SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query]
message="line 1:7 no viable alternative at input 'MATERIALIZED' ([CREATE] MATERI
ALIZED...)">
Materialized Views in Cassandra solves the use case of not having to maintain additional table(s) for querying by different partition keys. But comes with following restrictions
Use all base table primary keys in the materialized view as primary keys.
Optionally, add one non-PRIMARY KEY column from the base table to the
materialized view's PRIMARY KEY.
Static columns are not supported as a PRIMARY KEY.
More documentation reference here.
So the correct syntax in your case of adding the materialized view would be
CREATE MATERIALIZED VIEW orders_by_date AS
SELECT id,
order_number,
country,
store_number,
supplier_number,
flow_type,
planned_delivery_date,
locked
FROM all_orders
WHERE planned_delivery_date IS NOT NULL AND order_number IS NOT NULL AND store_number IS NOT NULL AND supplier_number IS NOT NULL
PRIMARY KEY ( planned_delivery_date, store_number, supplier_number, order_number );
Here planned_delivery_date is the partition key and the rows are ordered by store_number, supplier_number, order_number (essentially the clustering columns). So there isn't a mandatory requirement to add "CLUSTERING ORDER BY" clause here.

Cassandra Update/Upsert does not set Static Column?

I am trying to "Upsert" data into my table with CQLSSTableWriter. Everything works fine, except for my static column not being set correctly. They end up being null for every occasion. My static column is defined as brand TEXT static.
After failing with the CQLSSTableWriter, I went into the cqlsh and tried to update the static column manually:
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
and with a batch as well (even though it should not matter)
begin batch
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
apply batch;
My "brand" column still shows null when I retrieve some of my data (select * from keyspace.data LIMIT 100;)
My entire schema:
CREATE TABLE keyspace.data (
id text,
date text,
ts timestamp,
id_two text,
brand text static,
latitude double,
longitude double,
signals_double map<text, double>,
signals_string map<text, text>,
name text static,
PRIMARY KEY ((id, date), ts, id_two)
) WITH CLUSTERING ORDER BY (ts ASC, id_two ASC);
The reason why I chose Update instead of Insert is because I have collections that I do not want to overwrite, but rather add more elements to. Using insert would overwrite the previously stored elements of my collections.
Why can I not set a static column with an Update query?

Cassandra order by on combination of composite keys

I originally wrote a table that tracks feeds that have been assigned to a user for review.
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, languageid, topicid, dateinserted)
};
I realized soon after I created this table that I wouldn't be able to sort this table (order by DESC) by dateinserted because for some weird reason, in Cassandra I can only order by the second (and last) column of a composite key table (as in, the table has to have 2 composite keys and order by can only happen on the second column of this key) so I changed my table to this:
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, dateinserted)
};
and now I was able to run a query to get the latest feeds for the user, using order by.
However, I have a new requirement that requires me to sort the feeds by a combination of (languageid + userid) or (topicid + userid) or (languageid + topicid + userid).
I had an idea to create three new tables and have the keys combined into one key column. For example, for userid + topic query, I would use:
create table user_feed_by_topic
{
usertopicidkey text,
dateinserted timeuuid,
primary key (usertopicidkey, dateinserted)
};
where usertopididkey = userid.toString() + topicid.toString().
Of course, this solution requires 4 separate inserts whenever I need to insert a new feed row since I have 4 rows, tracking identical data but partitioned differently to allow sorting.
My question is, is there a better way to do this? Is there any way to achieve what I want (query by a combination of columns and order by another column) or am I stuck with my 4 table design approach?
Many thanks,
Cassandra will order all rows based on the PKs clustering columns. In case your PK is primary key (userid, languageid, topicid, dateinserted) all rows will be sorted by languageid, topicid and dateinserted in ascending order. This implies that all rows will only be sorted within a specific language and topic by date. You'd have to use the date as the first clustering key column to change this behaviour.
Its common practice to denormalize your data across multiple tables to implement different ordering strategies.

Is it possible to filter by 2nd cluster key without specifying 1st?

I am just starting to get a hand of cassandra.
And I am curious if I can do the following:
session.execute("CREATE TABLE IF NOT EXISTS mykeyspace.newtable
(user_id int,
item_id int,
tstamp timestamp,
PRIMARY KEY(user_id, item_id, tstamp))")
My Understanding:
I know that my user_id is a partition key, and I can only filter on it by =. My item_id and tstamp are cluster keys. I can filter of item_id by range (using equality symbols), and for a given item_id I can filter by range for tstamp, but if item_id is not specified, I cannot filter by tstamp at all.
My question:
Is there a way to specify my PRIMARY KEY such that I can:
1) filter by = for user_id (I already have it)
2) filter by = item_id (I already have it)
3) filter by range for tstamp (I don't have it)
If I change my PRIMARY KEY specification to:
PRIMARY KEY(user_id, tstamp, item_id)
Then I would no longer be able to filter by item_id. So if there was a way to somehow filter by item_id without specifying tstamp I would have my answer.
Is there a way to achieve what I am looking for?
The answer to this is to build two tables. One with each primary key scheme you want. Duplicating data in tables with different schemas is relatively cheap since writes are so fast in Cassandra.

Resources