Cassandra: Set collection boolean, how to? - cassandra

I am making a columnfamily that will save values of different sensors.
Sensor_04 will have boolean values of 4 different doors. door1, door2, door3,door4.
The goal is to be able to query and ask for is door1,2,3 or 4 true or false?
How is the syntax done for this? Cause i know my example is false:
CREATE COLUMNFAMILY lockSystem (sID int, sNamn text, doors set<boolean>, PRIMARY KEY(sID));
Wrong --->
INSERT INTO lockSystem (sID, sNamn, doors) VALUES (4,'Sensor_04' {'door1:True','door2:False','door3:False','door4:false'});
I hope my question makes sense, my goal is something like this:
Sensor_4: sID int, sName text, set: door1 bool, door2 bool, door3 bool, door4 bool

You could assume that if a door is present in your set then its value is true, and then yuo can use the filtering capabilities of C* to query your data.
So I'd change the model to something like:
CREATE TABLE lockSystem (
sID int,
sNamn text,
doors set<text>,
PRIMARY KEY(sID)
);
where the doors set has been changed to a set of text. You then add data into your set when one door alarm gets "high" with:
UPDATE lockSystem SET doors = doors + { '1' } WHERE sID = ?;
To filter you data you can then use:
SELECT * FROM lockSystem WHERE doors CONTAINS '1';
Have a look at the using the set type documentation, and on how to filter data in a collection.

Related

Cassandra 3.11, Filter Results by list (searching for alternatives)

I want to ask cassandra to filter its results by a list of arguments. But this is not possible on a "normal" column. Or in cassandra's words:
IN predicates on non-primary-key columns (access_right_id) is not yet supported
My table looks like this:
CREATE TABLE "service"
(
course_id uuid,
type text,
access_token uuid,
name_de text,
name_en text,
url text,
edit_right_id uuid,
access_right_id uuid,
PRIMARY KEY (course_id, type, access_token)
);
I want to execute a query like this:
SELECT * FROM service WHERE
course_id = :courseId
AND type = :type
AND access_right_id IN :rights
I am now searching for a solution to my problem. I am thinking of three possible solutions:
Send N times the query (maybe with a materialized view and access_right_id as a primary key part (clustering key))
SELECT * FROM service WHERE
course_id = :courseId
AND type = :type
AND access_right_id = :right
Send a generated query like this:
SELECT * FROM service WHERE
course_id = :courseId
AND type = :type
AND (access_right_id = :right1 OR access_right_id = :right2 OR access_right_id = :right ...)
Send a query without filtering and filter the result in code.
What do you this is best in this case? What is more cassandra "compliant"?
Thank you in advance for your input.

Yugabyte YCQL check if a set contain a value?

Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;

How to avoid Cassandra ALLOW FILTERING?

I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.

Cassandra - how to update a record with a compound key

In the process of learning Cassandra and using it on a small pilot project at work. I've got one table that is filtered by 3 fields:
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation))
);
Then I can pull records like so, which is exactly the query I need for this:
select * from webhook
where event_id = '11E7DEB1B162E780AD3894B2C0AB197A'
and entity_type = 'user'
and entity_operation = 'insert';
However, I have an update query to set the record inactive (soft delete), which would be most convenient by partition key in the same table. Of course, this isn't possible:
update webhook
set status_flag = 0
where webhook_id = '11e8765068f50730ac964b31be21d64e'
An example of why I'd want to do this, is a simple DELETE from an API endpoint:
http://myapi.com/webhooks/11e8765068f50730ac964b31be21d64e
Naturally, if I update based on the composite key, I'd potentially inactivate more records than I intend to.
Seems like my only choice, doing it the "Cassandra Way", is to use two tables; the one I already have and one to track status_flag by webhook_id, so I can update based on that id. I'd then have to select by webhook_id in the first table and disable it there as well? Otherwise, I'd have to force users to pass all the compound key values in the URL of the API's DELETE request.
Simple things you take for granted in relational data, seem to get complex very quickly in Cassandraland. Is this the case or am I making it more complicated than it really is?
You can add webhook to your primary key.
So your table defination becomes somethign like this.
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation),webhook_id)
Now lets say you insert 2 records.
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('11E7DEB1B162E780AD3894B2C0AB197A','user','insert',1,'web_id');
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('12313131312313','user','insert',1,'web_id_1');
And you can update like following
update webhook
set status_flag = 0
where webhook_id = 'web_id' AND event_id = '11E7DEB1B162E780AD3894B2C0AB197A' AND entity_type = 'user'
AND entity_operation = 'insert';
It will only update 1 record.
However you have to send all the things defined in your primary key.

Does using all fields as a partitioning keys in a table a drawback in cassandra?

my aim is to get the msgAddDate based on below query :
select max(msgAddDate)
from sampletable
where reportid = 1 and objectType = 'loan' and msgProcessed = 1;
Design 1 :
here the reportid, objectType and msgProcessed may not be unique. To add the uniqueness I have added msgAddDate and msgProcessedDate (an additional unique value).
I use this design because I don't perform range query.
Create table sampletable ( reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType,msgAddDate,msgProcessedDate));
Design 2 :
create table sampletable (
reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType),msgAddDate, msgProcessedDate))
);
Please advice which one to use and what will be the pros and cons between two based on performance.
Design 2 is the one you want.
In Design 1, the whole primary key is the partition key. Which means you need to provide all the attributes (which are: reportid, msgProcessed, objectType, msgAddDate, msgProcessedDate) to be able to query your data with a SELECT statement (which wouldn't be useful as you would not retrieve any additional attributes than the one you already provided in the WHERE statemenent)
In Design 2, your partition key is reportid ,msgProcessed,objectType which are the three attributes you want to query by. Great. msgAddDate is the first clustering column, which will be automatically sorted for you. So you don't even need to run a max since it is sorted. All you need to do is use LIMIT 1:
SELECT msgAddDate FROM sampletable WHERE reportid = 1 and objectType = 'loan' and msgProcessed = 1 LIMIT 1;
Of course, make sure to define a DESC sorted order on msgAddDate (I think by default it is ascending...)
Hope it helps!

Resources