Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;
Through postgresql, postgis, pgrouting and nodejs I am working on a project which basically finds a path between shops.
There are three tables in my database
1.CREATE TABLE public."edges" (id int, name varchar(100), highway varchar(100), oneway varchar(100), surface varchar(100), the_geom geometry, source int, target int);
2.CREATE TABLE public."edges_noded" (id bigint, old_id int, sub_id int, source bigint, target bigint, the_geom geometry, name varchar(100), type varchar(100), distance double precision);
3.CREATE TABLE public."edges_noded_vertices_pgr" (id bigint, cnt int, chk int, ein int, eout int, the_geom geometry); –
And the query by which I am finding path
client.query( "WITH dijkstra AS (SELECT * FROM pgr_dijkstra('SELECT id,source,target,distance AS cost FROM
edges_noded',"+source+","+target+",FALSE)) SELECT seq, CASE WHEN
dijkstra.node = edges_noded.source THEN
ST_AsGeoJSON(edges_noded.the_geom) ELSE
ST_AsGeoJSON(ST_Reverse(edges_noded.the_geom)) END AS
route_geom_x,CASE WHEN dijkstra.node = edges_noded.source THEN
ST_AsGeoJSON(edges_noded.the_geom) ELSE
ST_AsGeoJSON(ST_Reverse(edges_noded.the_geom)) END AS route_geom_y
FROM dijkstra JOIN edges_noded ON(edge = id) ORDER BY
seq",(err,res)=>{ })
This query works for me but taking too much time for example, If I want to find a path between 30 shops then it is taking almost 25 to 30 sec which is too much.
After searching about this problem I found this link
https://gis.stackexchange.com/questions/16886/how-can-i-optimize-pgrouting-for-speed/16888
In this link Délawenis is saying that use a st_buffer so it doesn't get all ways, but just the "nearby" ways:
So I tried to apply st_buffer in above query but not got any success.
If someone has any idea plz help me with this problem.
If this approach is wrong please also tell me the right way.
I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.
my aim is to get the msgAddDate based on below query :
select max(msgAddDate)
from sampletable
where reportid = 1 and objectType = 'loan' and msgProcessed = 1;
Design 1 :
here the reportid, objectType and msgProcessed may not be unique. To add the uniqueness I have added msgAddDate and msgProcessedDate (an additional unique value).
I use this design because I don't perform range query.
Create table sampletable ( reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType,msgAddDate,msgProcessedDate));
Design 2 :
create table sampletable (
reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType),msgAddDate, msgProcessedDate))
);
Please advice which one to use and what will be the pros and cons between two based on performance.
Design 2 is the one you want.
In Design 1, the whole primary key is the partition key. Which means you need to provide all the attributes (which are: reportid, msgProcessed, objectType, msgAddDate, msgProcessedDate) to be able to query your data with a SELECT statement (which wouldn't be useful as you would not retrieve any additional attributes than the one you already provided in the WHERE statemenent)
In Design 2, your partition key is reportid ,msgProcessed,objectType which are the three attributes you want to query by. Great. msgAddDate is the first clustering column, which will be automatically sorted for you. So you don't even need to run a max since it is sorted. All you need to do is use LIMIT 1:
SELECT msgAddDate FROM sampletable WHERE reportid = 1 and objectType = 'loan' and msgProcessed = 1 LIMIT 1;
Of course, make sure to define a DESC sorted order on msgAddDate (I think by default it is ascending...)
Hope it helps!
I have gone though this article and here is the schema I have got from it. This is helpful for my application for maintaining statuses of a user, but how can I extend this to maintain one to one chat archive and relations between users, relations mean people belong to specific group for me. I am new to this and need an approach for this.
Requirements :
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
I want to retrieve all the messages from different users to the user when user has requested.
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
This is what I have tried so far:
CREATE TABLE chatarchive (
chat_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE chatseries (
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
CREATE TABLE chattimeline (
to text,
username text,
time timeuuid,
chat_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time ASC)
Below is the schema that I currently have:
CREATE TABLE users (
username text PRIMARY KEY,
password text
)
CREATE TABLE friends (
username text,
friend text,
since timestamp,
PRIMARY KEY (username, friend)
)
CREATE TABLE followers (
username text,
follower text,
since timestamp,
PRIMARY KEY (username, follower)
)
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
username text,
body text
)
CREATE TABLE userline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
CREATE TABLE timeline (
username text,
time timeuuid,
tweet_id uuid,
PRIMARY KEY (username, time)
) WITH CLUSTERING ORDER BY (time DESC)
With C* you need to store data in the way you'll use it.
So let's see how this would look like for this case:
I want to store messages between user-user in a table.
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
CREATE TABLE chat_messages (
message_id uuid,
from_user text,
to_user text,
body text,
class text,
time timeuuid,
PRIMARY KEY ((from_user, to_user), time)
) WITH CLUSTERING ORDER BY (time ASC);
This will allow you to retrieve a timeline of messages between two users. Note that a composite primary key is used so that wide rows are created for each pair of users.
SELECT * FROM chat_messages WHERE from_user = 'mike' AND to_user = 'john' ORDER BY time DESC ;
I want to retrieve all the messages from different users to the user when user has requested.
CREATE INDEX chat_messages_to_user ON chat_messages (to_user);
This allows you to do:
SELECT * FROM chat_messages WHERE to_user = 'john';
And also want to store class of users. I mean for example user1 and user2 belong to "family" user3, user4, user1 belong to friends etc... This group can be custom name given by the user.
CREATE INDEX chat_messages_class ON chat_messages (class);
This will allow you to do:
SELECT * FROM chat_messages WHERE class = 'family';
Note that in this kind of database, DENORMALIZED DATA IS A GOOD PRACTICE. This means that using the name of the class again and again is not a bad practice.
Also note that I haven't used a 'chat_id' nor a 'chats' table. We could easily add this but I feel that your use case didn't require it as it has been put forward. In general, you cannot do joins in C*. So, using a chat id would imply two queries.
EDIT: Secondary indexes are inefficient. A materialised view will be a better implementation with C* 3.0
There is a chat application created by Alan Chandler on github that has the features you request:
MBchat
It uses a 2-phase authentication. First the user is validated in the forums and then, the user is validated on the chat database.
Here's the first validation part of the schema (schema located in inc/user.sql):
BEGIN;
CREATE TABLE users (
uid integer primary key autoincrement NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role text NOT NULL DEFAULT 'R', -- A (CEO), L (DIRECTOR), G (DEPT HEAD), H (SPONSOR) R(REGULAR)
cap integer DEFAULT 0 NOT NULL, -- 1 = blind, 2 = committee secretary, 4 = admin, 8 = mod, 16 = speaker 32 = can't whisper( OR of capabilities).
password character varying NOT NULL, -- raw password
rooms character varying, -- a ":" separated list of rooms nos which define which rooms the user can go in
isguest boolean DEFAULT 0 NOT NULL
);
CREATE INDEX userindex ON users(name);
-- Below here you can add the specific users for your set up in the form of INSERT Statements
-- This list is test users to cover the complete range of functions. Note names are converted to lowercase, so only put lowercase names in here
INSERT INTO users(uid,name,role,cap,password,rooms,isguest) VALUES
(1,'alice','A',4,'password','7',0), -- CEO class user alice
(2,'bob','L',3,'password','8',0), -- DIRECTOR class user bob
(3,'carol','G',2,'password','7:8:9',0), -- DEPT HEAD class user carol
And here's the second validation part of the schema (schema located in data/chat.sql):
CREATE TABLE users (
uid integer primary key NOT NULL,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
name character varying NOT NULL,
role char(1) NOT NULL default 'R',
rid integer NOT NULL default 0,
mod char(1) NOT NULL default 'N',
question character varying,
private integer NOT NULL default 0,
cap integer NOT NULL default 0,
rooms character_varying
);
The following is the schema of the chat rooms you can see the user classes and the examples of it:
CREATE TABLE rooms (
rid integer primary key NOT NULL,
name varchar(30) NOT NULL,
type integer NOT NULL -- 0 = Open, 1 = meeting, 2 = guests can't speak, 3 moderated, 4 members(adult) only, 5 guests(child) only, 6 creaky door
) ;
INSERT INTO rooms (rid, name, type) VALUES
(1, 'The Forum', 0),
(2, 'Operations Gallery', 2), -- Guests Can't Speak
(3, 'Dungeon Club', 6), -- creaky door
(4, 'Auditorium', 3), -- Moderated Room
(5, 'Blue Room', 4), -- Members Only (in Melinda's Backups this is Adults)
(6, 'Green Room', 5), -- Guest Only (in Melinda's Backups this is Juveniles AKA Baby Backups)
(7, 'The Board Room', 1), -- Various meeting rooms - need to be on users room list
The users have another table to indicate the participation of the conversation:
CREATE table wid_sequence ( value integer);
INSERT INTO wid_sequence (value) VALUES (1);
CREATE TABLE participant (
uid integer NOT NULL REFERENCES users (uid) ON DELETE CASCADE ON UPDATE CASCADE,
wid integer NOT NULL,
primary key (uid,wid)
);
And the archives are recorded as follows:
CREATE TABLE chat_log (
lid integer primary key,
time bigint DEFAULT (strftime('%s','now')) NOT NULL,
uid integer NOT NULL REFERENCES user (uid) ON DELETE CASCADE ON UPDATE CASCADE,
name character varying NOT NULL,
role char(1) NOT NULL,
rid integer NOT NULL,
type char(2) NOT NULL,
text character varying
);
Edit: However this type of data modeling is not very suitable for Cassandra. Because, in Cassandra your data does not fit on one machine so joins are not available. So, in Cassandra denormalizing data is the practical choice. Check below for the denormalized version of chat_log table:
CREATE TABLE chat_log (
lid uuid,
time timestamp,
sender text NOT NULL,
receiver text NOT NULL,
room text NOT NULL,
sender_role varchar NOT NULL,
receiver_role varchar NOT NULL,
rid decimal NOT NULL,
status varchar NOT NULL,
message text,
PRIMARY KEY (sender, receiver, room)
-- PRIMARY KEY (sender, receiver) if you don't want the messages to be separated by the rooms
) WITH CLUSTERING ORDER BY (time ASC);
Now in order to retrieve data you'd use the following queries:
Whenever a user want to load messages by a user. I want to retrieve them back and send it to user.
SELECT * FROM chat_log WHERE sender = 'bob' ORDER BY time ASC
I want to retrieve all the messages from different users to the user when user has requested.
SELECT * FROM chat_log WHERE receiver = 'alice' ORDER BY time ASC
I want to store and retrieve class of users.
SELECT * FROM chat_log WHERE sender_role = 'A' ORDER BY time ASC -- messages sent by CEOs
SELECT * FROM chat_log WHERE receiver_role = 'A' ORDER BY time ASC -- messages received by CEOs
After modeling the data. You'd need to create indexes for quick and efficient querying as follows:
For retrieving all messages from different users to the user efficiently
CREATE INDEX chat_log_uid ON chat_log (sender);
CREATE INDEX chat_log_uid ON chat_log (receiver);
For retrieving all messages from user classes efficiently
CREATE INDEX chat_log_class ON chat_log (sender_role);
CREATE INDEX chat_log_class ON chat_log (receiver_role);
I believe these examples will give you the approach you need.
If you'd like to learn more about Cassandra data modeling you can check down below:
Cassandra Data Modeling Best Practices, Part 1
Cassandra Data Modeling Best Practices, Part 2
Cassandra Data Modeling Best Practices Slide
Data Modeling Example