SyntaxException: line 2:10 no viable alternative at input 'UNIQUE' > (...NOT EXISTS books ( id [UUID] UNIQUE...) - cassandra

I am trying the following codes to create a keyspace and a table inside of it:
CREATE KEYSPACE IF NOT EXISTS books WITH REPLICATION = { 'class': 'SimpleStrategy',
'replication_factor': 3 };
CREATE TABLE IF NOT EXISTS books (
id UUID PRIMARY KEY,
user_id TEXT UNIQUE NOT NULL,
scale TEXT NOT NULL,
title TEXT NOT NULL,
description TEXT NOT NULL,
reward map<INT,TEXT> NOT NULL,
image_url TEXT NOT NULL,
video_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
But I do get:
SyntaxException: line 2:10 no viable alternative at input 'UNIQUE'
(...NOT EXISTS books ( id [UUID] UNIQUE...)
What is the problem and how can I fix it?

I see three syntax issues. They are mainly related to CQL != SQL.
The first, is that NOT NULL is not valid at column definition time. Cassandra doesn't enforce constraints like that at all, so for this case, just get rid of all of them.
Next, Cassandra CQL does not allow default values, so this won't work:
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
Providing the current timestamp for created_at is something that will need to be done at write-time. Fortunately, CQL has a few of built-in functions to make this easier:
INSERT INTO books (id, user_id, created_at)
VALUES (uuid(), 'userOne', toTimestamp(now()));
In this case, I've invoked the uuid() function to generate a Type-4 UUID. I've also invoked now() for the current time. However now() returns a TimeUUID (Type-1 UUID) so I've nested it inside of the toTimestamp function to convert it to a TIMESTAMP.
Finally, UNIQUE is not valid.
user_id TEXT UNIQUE NOT NULL,
It looks like you're trying to make sure that duplicate user_ids are not stored with each id. You can help to ensure uniqueness of the data in each partition by adding user_id to the end of the primary key definition as a clustering key:
CREATE TABLE IF NOT EXISTS books (
id UUID,
user_id TEXT,
...
PRIMARY KEY (id, user_id));
This PK definition will ensure that data for books will be partitioned by id, containing multiple user_id rows.
Not sure what the relationship is between books and users is, though. If one book can have many users, then this will work. If one user can have many books, then you'll want to switch the order of the keys to this:
PRIMARY KEY (user_id, id));
In summary, a working table definition for this problem looks like this:
CREATE TABLE IF NOT EXISTS books (
id UUID,
user_id TEXT,
scale TEXT,
title TEXT,
description TEXT,
reward map<INT,TEXT>,
image_url TEXT,
video_url TEXT,
created_at TIMESTAMP,
PRIMARY KEY (id, user_id));

Related

Yugabyte YCQL check if a set contain a value?

Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;

Cassandra - how to update a record with a compound key

In the process of learning Cassandra and using it on a small pilot project at work. I've got one table that is filtered by 3 fields:
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation))
);
Then I can pull records like so, which is exactly the query I need for this:
select * from webhook
where event_id = '11E7DEB1B162E780AD3894B2C0AB197A'
and entity_type = 'user'
and entity_operation = 'insert';
However, I have an update query to set the record inactive (soft delete), which would be most convenient by partition key in the same table. Of course, this isn't possible:
update webhook
set status_flag = 0
where webhook_id = '11e8765068f50730ac964b31be21d64e'
An example of why I'd want to do this, is a simple DELETE from an API endpoint:
http://myapi.com/webhooks/11e8765068f50730ac964b31be21d64e
Naturally, if I update based on the composite key, I'd potentially inactivate more records than I intend to.
Seems like my only choice, doing it the "Cassandra Way", is to use two tables; the one I already have and one to track status_flag by webhook_id, so I can update based on that id. I'd then have to select by webhook_id in the first table and disable it there as well? Otherwise, I'd have to force users to pass all the compound key values in the URL of the API's DELETE request.
Simple things you take for granted in relational data, seem to get complex very quickly in Cassandraland. Is this the case or am I making it more complicated than it really is?
You can add webhook to your primary key.
So your table defination becomes somethign like this.
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation),webhook_id)
Now lets say you insert 2 records.
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('11E7DEB1B162E780AD3894B2C0AB197A','user','insert',1,'web_id');
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('12313131312313','user','insert',1,'web_id_1');
And you can update like following
update webhook
set status_flag = 0
where webhook_id = 'web_id' AND event_id = '11E7DEB1B162E780AD3894B2C0AB197A' AND entity_type = 'user'
AND entity_operation = 'insert';
It will only update 1 record.
However you have to send all the things defined in your primary key.

nested map in cassandra data modelling

I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.

Reference logical aggregates - CQL

I have a univerity assignment that consists in working with CQL and Cassandra.
In the first part of the assignment I have to "identify the
reference logical aggregates, thus providing a row-oriented view of schema information, and point out attributes which model relationships", but I don't understand what is a "reference logical aggregate".
The schema is the following (it's not created by me, I downloaded it with the assignment):
DROP KEYSPACE gameindustry;
CREATE KEYSPACE gameindustry WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE gameindustry;
CREATE TYPE fullname (
firstname text,
lastname text
);
CREATE TABLE games (
name text PRIMARY KEY,
description text,
tags set<text>
);
CREATE TABLE tags (
tag text PRIMARY KEY,
games set<text>
);
CREATE TABLE users (
email text PRIMARY KEY,
password text,
name fullname,
notes text,
games set<text>
);
CREATE TABLE user_games (
user text,
game text,
PRIMARY KEY (user, game)
);
CREATE TABLE friends (
user text,
friend text,
PRIMARY KEY (user, friend)
);
CREATE TABLE matches (
user text,
game text,
when timestamp,
score int,
PRIMARY KEY (game, when, user),
);

Cassandra Schema for a Chat Application

I have gone though this article and here is the schema I have got from it. This is helpful for my application for maintaining statuses of a user, but how can I extend this to maintain one to one chat archive and relations between users, relations mean people belong to specific group for me. I am new to this and need an approach for this.
Requirements :
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
I want to retrieve all the messages from different users to the user when user has requested.
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
This is what I have tried so far:
CREATE TABLE chatarchive (
chat_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE chatseries (
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
CREATE TABLE chattimeline (
to text,
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
Below is the schema that I currently have:
CREATE TABLE users (
username text PRIMARY KEY,
password text
)
CREATE TABLE friends (
username text,
friend text,
since timestamp,
PRIMARY KEY (username, friend)
)
CREATE TABLE followers (
username text,
follower text,
since timestamp,
PRIMARY KEY (username, follower)
)
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE userline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
CREATE TABLE timeline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
With C* you need to store data in the way you'll use it.
So let's see how this would look like for this case:
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
CREATE TABLE chat_messages (
message_id uuid,
from_user text,
to_user text,
body text,
class text,
time timeuuid,
PRIMARY KEY ((from_user, to_user), time)
) WITH CLUSTERING ORDER BY (time ASC);
This will allow you to retrieve a timeline of messages between two users. Note that a composite primary key is used so that wide rows are created for each pair of users.
SELECT * FROM chat_messages WHERE from_user = 'mike' AND to_user = 'john' ORDER BY time DESC ;
I want to retrieve all the messages from different users to the user when user has requested.
CREATE INDEX chat_messages_to_user ON chat_messages (to_user);
This allows you to do:
SELECT * FROM chat_messages WHERE to_user = 'john';
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
CREATE INDEX chat_messages_class ON chat_messages (class);
This will allow you to do:
SELECT * FROM chat_messages WHERE class = 'family';
Note that in this kind of database, DENORMALIZED DATA IS A GOOD PRACTICE. This means that using the name of the class again and again is not a bad practice.
Also note that I haven't used a 'chat_id' nor a 'chats' table. We could easily add this but I feel that your use case didn't require it as it has been put forward. In general, you cannot do joins in C*. So, using a chat id would imply two queries.
EDIT: Secondary indexes are inefficient. A materialised view will be a better implementation with C* 3.0
There is a chat application created by Alan Chandler on github that has the features you request:
MBchat
It uses a 2-phase authentication. First the user is validated in the forums and then, the user is validated on the chat database.
Here's the first validation part of the schema (schema located in inc/user.sql):
BEGIN;
CREATE TABLE users (
uid integer primary key autoincrement NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role text NOT NULL DEFAULT 'R', -- A (CEO), L (DIRECTOR), G (DEPT HEAD), H (SPONSOR) R(REGULAR)
cap integer DEFAULT 0 NOT NULL, -- 1 = blind, 2 = committee secretary, 4 = admin, 8 = mod, 16 = speaker 32 = can't whisper( OR of capabilities).
password character varying NOT NULL, -- raw password
rooms character varying, -- a ":" separated list of rooms nos which define which rooms the user can go in
isguest boolean DEFAULT 0 NOT NULL
);
CREATE INDEX userindex ON users(name);
-- Below here you can add the specific users for your set up in the form of INSERT Statements
-- This list is test users to cover the complete range of functions. Note names are converted to lowercase, so only put lowercase names in here
INSERT INTO users(uid,name,role,cap,password,rooms,isguest) VALUES
(1,'alice','A',4,'password','7',0), -- CEO class user alice
(2,'bob','L',3,'password','8',0), -- DIRECTOR class user bob
(3,'carol','G',2,'password','7:8:9',0), -- DEPT HEAD class user carol
And here's the second validation part of the schema (schema located in data/chat.sql):
CREATE TABLE users (
uid integer primary key NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role char(1) NOT NULL default 'R',
rid integer NOT NULL default 0,
mod char(1) NOT NULL default 'N',
question character varying,
private integer NOT NULL default 0,
cap integer NOT NULL default 0,
rooms character_varying
);
The following is the schema of the chat rooms you can see the user classes and the examples of it:
CREATE TABLE rooms (
rid integer primary key NOT NULL,
name varchar(30) NOT NULL,
type integer NOT NULL -- 0 = Open, 1 = meeting, 2 = guests can't speak, 3 moderated, 4 members(adult) only, 5 guests(child) only, 6 creaky door
) ;
INSERT INTO rooms (rid, name, type) VALUES
(1, 'The Forum', 0),
(2, 'Operations Gallery', 2), -- Guests Can't Speak
(3, 'Dungeon Club', 6), -- creaky door
(4, 'Auditorium', 3), -- Moderated Room
(5, 'Blue Room', 4), -- Members Only (in Melinda's Backups this is Adults)
(6, 'Green Room', 5), -- Guest Only (in Melinda's Backups this is Juveniles AKA Baby Backups)
(7, 'The Board Room', 1), -- Various meeting rooms - need to be on users room list
The users have another table to indicate the participation of the conversation:
CREATE table wid_sequence ( value integer);
INSERT INTO wid_sequence (value) VALUES (1);
CREATE TABLE participant (
uid integer NOT NULL REFERENCES users (uid) ON DELETE CASCADE ON UPDATE CASCADE,
wid integer NOT NULL,
primary key (uid,wid)
);
And the archives are recorded as follows:
CREATE TABLE chat_log (
lid integer primary key,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
uid integer NOT NULL REFERENCES user (uid) ON DELETE CASCADE ON UPDATE CASCADE,
name character varying NOT NULL,
role char(1) NOT NULL,
rid integer NOT NULL,
type char(2) NOT NULL,
text character varying
);
Edit: However this type of data modeling is not very suitable for Cassandra. Because, in Cassandra your data does not fit on one machine so joins are not available. So, in Cassandra denormalizing data is the practical choice. Check below for the denormalized version of chat_log table:
CREATE TABLE chat_log (
lid uuid,
time timestamp,
sender text NOT NULL,
receiver text NOT NULL,
room text NOT NULL,
sender_role varchar NOT NULL,
receiver_role varchar NOT NULL,
rid decimal NOT NULL,
status varchar NOT NULL,
message text,
PRIMARY KEY (sender, receiver, room)
-- PRIMARY KEY (sender, receiver) if you don't want the messages to be separated by the rooms
) WITH CLUSTERING ORDER BY (time ASC);
Now in order to retrieve data you'd use the following queries:
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
SELECT * FROM chat_log WHERE sender = 'bob' ORDER BY time ASC
I want to retrieve all the messages from different users to the user when user has requested.
SELECT * FROM chat_log WHERE receiver = 'alice' ORDER BY time ASC
I want to store and retrieve class of users.
SELECT * FROM chat_log WHERE sender_role = 'A' ORDER BY time ASC -- messages sent by CEOs
SELECT * FROM chat_log WHERE receiver_role = 'A' ORDER BY time ASC -- messages received by CEOs
After modeling the data. You'd need to create indexes for quick and efficient querying as follows:
For retrieving all messages from different users to the user efficiently
CREATE INDEX chat_log_uid ON chat_log (sender);
CREATE INDEX chat_log_uid ON chat_log (receiver);
For retrieving all messages from user classes efficiently
CREATE INDEX chat_log_class ON chat_log (sender_role);
CREATE INDEX chat_log_class ON chat_log (receiver_role);
I believe these examples will give you the approach you need.
If you'd like to learn more about Cassandra data modeling you can check down below:
Cassandra Data Modeling Best Practices, Part 1
Cassandra Data Modeling Best Practices, Part 2
Cassandra Data Modeling Best Practices Slide
Data Modeling Example

Resources