Cassandra select all keys from map - cassandra

I am trying to select keys from the courses map in the student table below
CREATE TABLE student (
studentid TEXT,
courses map<TEXT, TEXT>,
PRIMARY KEY (studentid)
);
With this data inserted
UPDATE student SET courses = courses + { 'bio101': 'Intro to Bio'} where studentid='xyz' ;
UPDATE student SET courses = courses + { 'bio102': 'Advanced Bio'} where studentid='xyz' ;
UPDATE student SET courses = courses + { 'chem101': 'Intro to Chem'} where studentid='xyz' ;
I would like to select only the keys in the courses map. So
courseId
--------
'bio101'
'bio102'
'chem101'
I've tried variations of
select courses from student where studentid='xyz';
but I can't seem to select just that column.

You can try to use UDF for this.
CREATE OR REPLACE FUNCTION keys (input Map<Text, Text>)
RETURNS NULL ON NULL INPUT
RETURNS Set<Text>
LANGUAGE java AS 'return input.keySet();';
You have to enable UDF because it disabled by default. Find enable_user_defined_functions in cassandra.yaml and replace false with true. Restart your Cassandra.
Change a little your query.
select keys(courses) from student where studentid='xyz';

Related

How to add attributes to database columns

Im currently working on creating correct database columns for my database. I have created two tables and used alter:
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
-- add more fields if needed
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
Now I am trying to use it together with PeeWee and I have managed to do a small step which is:
class Stores(Model):
id = IntegerField(column_name='id')
store_id = TextField(column_name='store_name')
class Products(Model):
id = IntegerField(column_name='id')
store_id = IntegerField(column_name='store_id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
However my problem is that I have used:
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which means that I do have a Foreign key and I am quite not sure how I can apply to use Foreign key together with PeeWee. I wonder how can I do that?
You need to add a ForeignKeyField to Products and remove store_id
class Products(Model):
id = IntegerField(column_name='id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')

Cassandra : Key Level access in Map type columns

In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';

How to avoid Cassandra ALLOW FILTERING?

I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.

Cassandra + Fetch the last records using in query

I am new in this cassandra database using with nodejs.
I have user_activity table. In this table data will insert based on user activity.
Also I have some user list. I need to fetch the data in that particular users and last record.
I don't interest to put the query in for loop. Have any other idea to achieve this?
Example Code:
var userlist = ["12", "34", "56"];
var query = 'SELECT * FROM user_activity WHERE userid IN ?';
server.user.execute(query, [userlist], {
prepare : true
}, function(err, result) {
console.log(results);
});
How to get the user lists for last one ?
Example:
user id = 12 - need to get last record;
user id = 34 - need to get last record;
user id = 56 - need to get last record;
I need to get these 3 records.
Table Schema:
CREATE TABLE test.user_activity (
userid text,
ts timestamp,
clientid text,
clientip text,
status text,
PRIMARY KEY (userid, ts)
)
It is not possible if you use the IN filter.
If it is a single user_id filter you can apply order by. Of course you need a column for inserted/updated time. So query will be like this:
SELECT * FROM user_activity WHERE user_id = 12 ORDER BY updated_at LIMIT 1;
You can put N value to get number of records
SELECT * FROM user_activity WHERE userid IN ? ORDER BY id DESC LIMIT N

Cassandra Schema for a Chat Application

I have gone though this article and here is the schema I have got from it. This is helpful for my application for maintaining statuses of a user, but how can I extend this to maintain one to one chat archive and relations between users, relations mean people belong to specific group for me. I am new to this and need an approach for this.
Requirements :
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
I want to retrieve all the messages from different users to the user when user has requested.
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
This is what I have tried so far:
CREATE TABLE chatarchive (
chat_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE chatseries (
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
CREATE TABLE chattimeline (
to text,
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
Below is the schema that I currently have:
CREATE TABLE users (
username text PRIMARY KEY,
password text
)
CREATE TABLE friends (
username text,
friend text,
since timestamp,
PRIMARY KEY (username, friend)
)
CREATE TABLE followers (
username text,
follower text,
since timestamp,
PRIMARY KEY (username, follower)
)
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE userline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
CREATE TABLE timeline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
With C* you need to store data in the way you'll use it.
So let's see how this would look like for this case:
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
CREATE TABLE chat_messages (
message_id uuid,
from_user text,
to_user text,
body text,
class text,
time timeuuid,
PRIMARY KEY ((from_user, to_user), time)
) WITH CLUSTERING ORDER BY (time ASC);
This will allow you to retrieve a timeline of messages between two users. Note that a composite primary key is used so that wide rows are created for each pair of users.
SELECT * FROM chat_messages WHERE from_user = 'mike' AND to_user = 'john' ORDER BY time DESC ;
I want to retrieve all the messages from different users to the user when user has requested.
CREATE INDEX chat_messages_to_user ON chat_messages (to_user);
This allows you to do:
SELECT * FROM chat_messages WHERE to_user = 'john';
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
CREATE INDEX chat_messages_class ON chat_messages (class);
This will allow you to do:
SELECT * FROM chat_messages WHERE class = 'family';
Note that in this kind of database, DENORMALIZED DATA IS A GOOD PRACTICE. This means that using the name of the class again and again is not a bad practice.
Also note that I haven't used a 'chat_id' nor a 'chats' table. We could easily add this but I feel that your use case didn't require it as it has been put forward. In general, you cannot do joins in C*. So, using a chat id would imply two queries.
EDIT: Secondary indexes are inefficient. A materialised view will be a better implementation with C* 3.0
There is a chat application created by Alan Chandler on github that has the features you request:
MBchat
It uses a 2-phase authentication. First the user is validated in the forums and then, the user is validated on the chat database.
Here's the first validation part of the schema (schema located in inc/user.sql):
BEGIN;
CREATE TABLE users (
uid integer primary key autoincrement NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role text NOT NULL DEFAULT 'R', -- A (CEO), L (DIRECTOR), G (DEPT HEAD), H (SPONSOR) R(REGULAR)
cap integer DEFAULT 0 NOT NULL, -- 1 = blind, 2 = committee secretary, 4 = admin, 8 = mod, 16 = speaker 32 = can't whisper( OR of capabilities).
password character varying NOT NULL, -- raw password
rooms character varying, -- a ":" separated list of rooms nos which define which rooms the user can go in
isguest boolean DEFAULT 0 NOT NULL
);
CREATE INDEX userindex ON users(name);
-- Below here you can add the specific users for your set up in the form of INSERT Statements
-- This list is test users to cover the complete range of functions. Note names are converted to lowercase, so only put lowercase names in here
INSERT INTO users(uid,name,role,cap,password,rooms,isguest) VALUES
(1,'alice','A',4,'password','7',0), -- CEO class user alice
(2,'bob','L',3,'password','8',0), -- DIRECTOR class user bob
(3,'carol','G',2,'password','7:8:9',0), -- DEPT HEAD class user carol
And here's the second validation part of the schema (schema located in data/chat.sql):
CREATE TABLE users (
uid integer primary key NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role char(1) NOT NULL default 'R',
rid integer NOT NULL default 0,
mod char(1) NOT NULL default 'N',
question character varying,
private integer NOT NULL default 0,
cap integer NOT NULL default 0,
rooms character_varying
);
The following is the schema of the chat rooms you can see the user classes and the examples of it:
CREATE TABLE rooms (
rid integer primary key NOT NULL,
name varchar(30) NOT NULL,
type integer NOT NULL -- 0 = Open, 1 = meeting, 2 = guests can't speak, 3 moderated, 4 members(adult) only, 5 guests(child) only, 6 creaky door
) ;
INSERT INTO rooms (rid, name, type) VALUES
(1, 'The Forum', 0),
(2, 'Operations Gallery', 2), -- Guests Can't Speak
(3, 'Dungeon Club', 6), -- creaky door
(4, 'Auditorium', 3), -- Moderated Room
(5, 'Blue Room', 4), -- Members Only (in Melinda's Backups this is Adults)
(6, 'Green Room', 5), -- Guest Only (in Melinda's Backups this is Juveniles AKA Baby Backups)
(7, 'The Board Room', 1), -- Various meeting rooms - need to be on users room list
The users have another table to indicate the participation of the conversation:
CREATE table wid_sequence ( value integer);
INSERT INTO wid_sequence (value) VALUES (1);
CREATE TABLE participant (
uid integer NOT NULL REFERENCES users (uid) ON DELETE CASCADE ON UPDATE CASCADE,
wid integer NOT NULL,
primary key (uid,wid)
);
And the archives are recorded as follows:
CREATE TABLE chat_log (
lid integer primary key,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
uid integer NOT NULL REFERENCES user (uid) ON DELETE CASCADE ON UPDATE CASCADE,
name character varying NOT NULL,
role char(1) NOT NULL,
rid integer NOT NULL,
type char(2) NOT NULL,
text character varying
);
Edit: However this type of data modeling is not very suitable for Cassandra. Because, in Cassandra your data does not fit on one machine so joins are not available. So, in Cassandra denormalizing data is the practical choice. Check below for the denormalized version of chat_log table:
CREATE TABLE chat_log (
lid uuid,
time timestamp,
sender text NOT NULL,
receiver text NOT NULL,
room text NOT NULL,
sender_role varchar NOT NULL,
receiver_role varchar NOT NULL,
rid decimal NOT NULL,
status varchar NOT NULL,
message text,
PRIMARY KEY (sender, receiver, room)
-- PRIMARY KEY (sender, receiver) if you don't want the messages to be separated by the rooms
) WITH CLUSTERING ORDER BY (time ASC);
Now in order to retrieve data you'd use the following queries:
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
SELECT * FROM chat_log WHERE sender = 'bob' ORDER BY time ASC
I want to retrieve all the messages from different users to the user when user has requested.
SELECT * FROM chat_log WHERE receiver = 'alice' ORDER BY time ASC
I want to store and retrieve class of users.
SELECT * FROM chat_log WHERE sender_role = 'A' ORDER BY time ASC -- messages sent by CEOs
SELECT * FROM chat_log WHERE receiver_role = 'A' ORDER BY time ASC -- messages received by CEOs
After modeling the data. You'd need to create indexes for quick and efficient querying as follows:
For retrieving all messages from different users to the user efficiently
CREATE INDEX chat_log_uid ON chat_log (sender);
CREATE INDEX chat_log_uid ON chat_log (receiver);
For retrieving all messages from user classes efficiently
CREATE INDEX chat_log_class ON chat_log (sender_role);
CREATE INDEX chat_log_class ON chat_log (receiver_role);
I believe these examples will give you the approach you need.
If you'd like to learn more about Cassandra data modeling you can check down below:
Cassandra Data Modeling Best Practices, Part 1
Cassandra Data Modeling Best Practices, Part 2
Cassandra Data Modeling Best Practices Slide
Data Modeling Example

Resources