Yugabyte YCQL check if a set contain a value? - yugabytedb

Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?

Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);

Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;

Related

nested map in cassandra data modelling

I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.

Does using all fields as a partitioning keys in a table a drawback in cassandra?

my aim is to get the msgAddDate based on below query :
select max(msgAddDate)
from sampletable
where reportid = 1 and objectType = 'loan' and msgProcessed = 1;
Design 1 :
here the reportid, objectType and msgProcessed may not be unique. To add the uniqueness I have added msgAddDate and msgProcessedDate (an additional unique value).
I use this design because I don't perform range query.
Create table sampletable ( reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType,msgAddDate,msgProcessedDate));
Design 2 :
create table sampletable (
reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType),msgAddDate, msgProcessedDate))
);
Please advice which one to use and what will be the pros and cons between two based on performance.
Design 2 is the one you want.
In Design 1, the whole primary key is the partition key. Which means you need to provide all the attributes (which are: reportid, msgProcessed, objectType, msgAddDate, msgProcessedDate) to be able to query your data with a SELECT statement (which wouldn't be useful as you would not retrieve any additional attributes than the one you already provided in the WHERE statemenent)
In Design 2, your partition key is reportid ,msgProcessed,objectType which are the three attributes you want to query by. Great. msgAddDate is the first clustering column, which will be automatically sorted for you. So you don't even need to run a max since it is sorted. All you need to do is use LIMIT 1:
SELECT msgAddDate FROM sampletable WHERE reportid = 1 and objectType = 'loan' and msgProcessed = 1 LIMIT 1;
Of course, make sure to define a DESC sorted order on msgAddDate (I think by default it is ascending...)
Hope it helps!

Using MATERIALIZED VIEW in Cassandra gives error

I have following table.
CREATE TABLE test_x (id text PRIMARY KEY, type frozen<mycustomtype>);
mycustomtype is defined as follows,
CREATE TABLE mycustomtype (
id uuid PRIMARY KEY,
name text
)
And i have created following materialized view for queries based on mycustometype filed.
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (id, type)
WITH CLUSTERING ORDER BY (type ASC)
With above view i hope to execute following query.
select id from test_x_by_mycustomtype_name where type =
{id: a3e64f8f-bd44-4f28-b8d9-6938726e34d4, name: 'Sample'};
But the query fails saying i need to use 'ALLOW FILTERING'. I created the view not to use ALLOW FILTERING. Why this error is happening here since i have used the part of primary key of the view ?
In you view, the type column is still clustering key. Hence, ALLOW FILTER should be used. You can change the view as per below and retry
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name_2 AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (type, id)
WITH CLUSTERING ORDER BY (id ASC);
cqlsh:test> select id from test_x_by_mycustomtype_name_2 where type = {id: a3e64f8f-bd44-4f28-b8d9-6938726e34d4, name: 'Sample'};
id
----
Change the order of the primary key of materialized view
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (type, id)
WITH CLUSTERING ORDER BY (type ASC);

com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part

I have cassandra 2.1.15.
I have this table
CREATE TABLE ks_mobapp.messages (
pair_id text,
belong_to text,
message_id timeuuid,
cli_time bigint,
sender text,
text text,
time bigint,
PRIMARY KEY ((pair_id, belong_to), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
I was trying to delete multiple record as
instances.getCqlSession().execute(QueryBuilder.delete()
.from(AppConstants.KEYSPACE, "messages")
.where(QueryBuilder.eq("pair_id", pairId))
.and(QueryBuilder.eq("belong_to", currentUser.value("userId")))
.and(QueryBuilder.in("message_id", msgId)));
I am getting error:
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part message_id
Then I tried:
Session session = instances.getCqlSession();
PreparedStatement statement = session.prepare("DELETE FROM ks_mobApp.messages WHERE pair_id = ? AND belong_to = ? AND message_id = ?;");
Iterator<String> iterator = msgId.iterator();
while(iterator.hasNext()) {
try {
session.executeAsync(statement.bind(pairId, currentUser.value("userId"), UUID.fromString(iterator.next())));
} catch(Exception ex) {
}
}
Its working nice. Is this the correct way? I can't use IN for same partition key ?
DELETE in Query only supported for partition key.
Delete IN relation is only supported for partition key)
There are some WHERE clause restrictions for the UPDATE and DELETE statements in cassandra 2.x
more specifically you can only use the IN operator on the last partition key column. So in your case the last partition column is belong_to. so IN can only be used on that column.
However these limitation are removed in cassandra 3.0. and it will allow
IN to be specified on any partition key column
IN to be specified on any clustering column
Here is the patch https://issues.apache.org/jira/browse/CASSANDRA-6237
Read this also http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Way to handle autoincrement ID with counter on Cassandra?

This is not a question about using an autincrement integer for primary key instead of UUIDs on Cassandra, in this case I want to generate an autoincrement effect like PostgreSQL on Cassandra that doesn't need to be necessarily scalable. I'm using UUID as primary key for entries in a table, but I need to generate a shortid like bitly for those entries. So I came up trying to make an application that grabs an index for a specific entry and generates a shortid based on that index and then set the shortid to the entry.
So I'm trying to do something like this on Cassandra:
CREATE TABLE photo (
id uuid,
shortid text,
title text,
PRIMARY KEY (id)
);
CREATE TABLE shortid (
shortid text,
family text,
longid uuid,
index bigint,
created_at timestamp,
PRIMARY KEY ((shortid, family))
) WITH COMPACT STORAGE;
CREATE TABLE shortid_reverse (
longid uuid,
family text,
shortid text
PRIMARY KEY ((longid, family))
) WITH COMPACT STORAGE;
CREATE TABLE shortid_last_index (
family text,
last_index counter,
last_long_id uuid,
PRIMARY KEY (family)
);
So in this application that will handle the shortid, when the application initiates It'll get the last index for that family, and then it'll increase the value on the application itself, as this application will run on Nodejs and Nodejs can scale that.
Application.js
var index = lastIndexFromCassandra++ //5
, hashids = new Hashids("this is my salt")
, shortid = hashids.encrypt(index); //dDae3KDDj4Q
After the application increase the index and generate the shortid, It'll persist on Cassandra:
UPDATE shortid_last_index SET last_index = last_index+1, last_long_id = fabac1f0-7f88-11e3-baa7-0800200c9a66 WHERE family = 'photo';
INSERT INTO shortid (shortid, family, longid, index, created_at) VALUES ('dDae3KDDj4Q', 'photo', fabac1f0-7f88-11e3-baa7-0800200c9a66, 5, NOW());
INSERT INTO shortid_reverse (longid, family, shortid) VALUES (fabac1f0-7f88-11e3-baa7-0800200c9a66, 'photo', 'dDae3KDDj4Q');
UPDATE photo SET shortid = 'dDae3KDDj4Q' WHERE id = fabac1f0-7f88-11e3-baa7-0800200c9a66;
So, it really there isn't a better way to do this in Cassandra without creating an application that will just do that? Couldn't I just do something like PostgreSQL on Cassandra:
UPDATE shortid_last_index SET last_index = last_index+1, last_long_id = ? WHERE family = 'photo' RETURNING last_index;
In comparison, if the statement above worked it would probably lock the row, but increasing and grabbing the index in the application itself and then safely increase the counter in Cassandra wouldn't lock the row too? How scale would be the application?
If you need short incremental id generation please take a look at Snowflake or one of the other countless clones/inspirations.
What you are attempting to do is a bad idea on multiple counts.

Resources