Hard time understanding Cassandra query - cassandra

In Cassandra, I understand that tables are supposed to be created according to what needs to be queried. For example, I currently have a Users and Users_By_Status table.
##Users##
CREATE TABLE Users (
user_id uuid,
name text,
password text,
status int,
username text,
PRIMARY KEY (user_id)
);
CREATE INDEX user_username_idx ON Users (username);
##Users_By_Status##
CREATE TABLE Users_By_Status (
username text,
status int,
user_id uuid,
PRIMARY KEY (username, status, user_id)
);
In this case, if a user leaves, their record won't be deleted. Instead, status will be changed from 1 to 0.
If I insert data into the Users table, do I need to manually insert the data into Users_By_Status table too? What happens if I update the status in Users? Do I need to manually update the record in Users_By_Status table too?
I have a feeling I'm understanding Cassandra wrongly. Appreciate all the help I can get.

Shortly answer: yes, in your case you need to delete manually.
In cassandra db you need to write more code in your app layer to handle cenarios like that.
But we have other options like materialized view or BATCH Statements.
For your solution, i think that materialized view is the best option. You can create a Materialized view from your table Users. Like this:
CREATE MATERIALIZED VIEW Users_By_Status
AS SELECT username, status, userid
FROM Users
PRIMARY KEY(username, status, userid);
And yes, when you update table users, the update will happen in the Materialized View Users_By_Status too.
Reference: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useCreateMV.html

Do I need to manually update the record in Users_By_Status table too?
So CoutinhoBR alluded to it, but I'll come right out and say it. You cannot update primary key values in Cassandra. So that's where a DELETE is required to get the old status value out of there, and then a write for the new one.

Related

Set Kentico primary key value when inserting TreeNode

With Kentico 13, I'm looking for a way to specify the primary key value when inserting a TreeNode via API. Something like:
var node = TreeNode.New("MyPageType");
node.SetValue("MyPageTypeID", 1234);
node.Insert(parentNode);
This needs to set the primary key in the MyPageType table so needs SQL identity insert on, and also needs to set the DocumentForeignKeyValue in the CMS_Document table.
The only way I have thought of doing it is with some custom SQL after the node is created, but feels like a hack. Is there a better way?
This is for a content migration task of thousands of documents. After the content migration the default SQL & primary key behavior will be used.
In case anyone finds this, the solution I came up with was to run the content migration script with the old primary key value in a temporary column. After migration I ran SQL to update Kentico references to the old primary key, remove the old primary key, and change the primary key to the temporary column. A bit nasty, but got the job done.

Cassandra Nested Data Model/Structure

I have problem with nested data model in my project.
Below are scripts for create table and user defined data type in my project.
// main table for keeping journey information
CREATE TABLE journey (
journeyid uuid,
journeyname text,
createdate timestamp,
journeyassetdetail LIST<FROZEN<assettype>>, // this is materials for journey
journeylist LIST<FROZEN<subjourneylist>>, // any journey can be a sub journey in other journey (a journey can have one or more sub-journey)
PRIMARY KEY (journeyid)
);
CREATE TYPE subjourneylist (
action FROZEN<actions>,
product FROZEN<products>,
suborderjourney int,
subjourneyid uuid,
createdate timestamp
);
CREATE TYPE assettype (
type text,
file LIST<FROZEN<file>>
);
CREATE TYPE file (
assetfileid uuid,
filename text,
url text
);
As you can see, there are 2 UDT on my journey table (assettype and subjourneylist) which mean it can be one or many sub-journey and assetdetail in a journey row. I design data model like this because I concern about READ performance, my developer need to get everything in one time connected to the database.
But look back into UPDATE, the problem is when I need to update somethings in any Asset or Sub-Journey, it means I have to apply the updated data to Journey (Main Journey table) which we don't know how to do that in easy way.
Right now, I have to use others tool or self-developed program to prepare a script to re-create whole journey again.
Do your guys have any suggestion with my data model or Do i have to reconsider another data model to support my read-write data.
Please feel free to give me an example or any suggestions.
Thank you very much.

Can we add primary key to collection datatypes?

When I tried to retrieve table using contains keyword it prompts "Cannot use CONTAINS relation on non collection column col1" but when I tried to create table using
CREATE TABLE test (id int,address map<text, int>,mail list<text>,phone set<int>,primary key (id,address,mail,phone));
it prompts "Invalid collection type for PRIMARY KEY component phone"
One of the basics in Cassandra is that you can't modify primary keys. Always keep that in mind.
You can't use a collection as primary key unless it is frozen, meaning you can't modify it.
This will work
CREATE TABLE test (id int,address frozen<map<text, int>>,mail frozen<list<text>>,phone frozen<set<int>>,primary key (id,address,mail,phone));;
However, I think you should take a look at this document: http://www.datastax.com/dev/blog/cql-in-2-1
You can put secondary indexes on collections after cql 2.1. You may want to use that functionality.

Data modelling for consistent secondary keys with Cassandra

With Cassandra,
I want to represent all users objects with a unique uuid, but also contain a set of zero or more secondary user keys to map to a user. Each secondary key should map to one and only one user(id). Because I need to be able to quick lookup of secondarykey to find a user, I maintain a separate lookup table, instead of a secondary INDEX.
I've modelled the data like this, but I am open to alternatives:
CREATE TABLE users (
userid uuid PRIMARY KEY,
name text,
secondarykeys set<text>
);
CREATE TABLE user_secondarykeys (
secondarykey text,
userid uuid,
PRIMARY KEY(secondarykey)
);
A typical use case is this:
I got this user with a secondary key mail:andreas#example.org, and I would like to see if there exists any user with that secondary key, and if it do not exists, I would like to create a new user object.
I can look for the secondary key:
SELECT * FROM "user_secondarykeys" WHERE secondarykey = "mail:andreas#example.org";
and if I do not find any matches, I can insert a new user:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0);
APPLY BATCH;
My problem is that this can lead to inconsistent data, because a user can be inserted with that secondary key in the meantime between my select and my inserts.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Any suggestions on how to implement this use case in a reliable way is appreciated. Thanks.
At first, I think that your model is pretty complicated, and I'm not sure if I understand correctly all of your requirements.
So if you get at first this secondary key, and then you have to decide what to do - add user or not - then the following will work for you:
Instead of checking user_secondarykeys table with SELECT statement for occurrence of particular secondary key, go with the following:
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
So if it applies, it means that this secondary key is not connected with any user - so there are two cases: user doesn't exists or user exists and someone want's to add new secondary key for him. The following will do the job in both cases:
INSERT INTO users(userid, name, secondarykeys) VALUES(77059e45-5fac-460b-9c4f-47528c292be0, 'Andreas', secondarykeys = secondarykeys + 'mail:andreas#example.org')
Because inserts/updates in Cassandra are idempotent(except counters), this will work even if there will be already an user with that id in users table - this should just add another secondary key for him.
Pros of this solution are that you will remove this gap in time which can make you 'inconsistent'. You have a guarantee that no one will insert two users with the same secondary key. You specified that user can have no secondary keys at all - in this situation you can add him straight to the users table.
I'm thinking that if I can make my INSERT transaction fail if the secondary key already exists in user_secondarykeys, that would work, because it should then also revert the insert into the users table, because of the atomic property of the transaction. However, I do not know any ways to make the INSERT fail if the secondary key exists. If I add IF NOT EXISTS to the second insert, it will not revert the trasaction it will just avoid inserting into user_secondarykeys, but it will still insert into users.
Since Cassandra 2.0.6 you can use a conditional statements inside a batch, and if any of conditions will be not met then all instructions in that batch won't fire. This sounds great but there is a limitation - all of the statements inside batch have to operate on the single, same partition. According to this, it is impossible to make cross partition/table conditional insert/update/delete. So in your case this:
BEGIN BATCH
INSERT INTO users (userid, name, secondarykeys) VALUES (77059e45-5fac-460b-9c4f-47528c292be0, "Andreas", {'mail:andreas#example.org'});
INSERT INTO user_secondarykeys (secondarykey, userid) VALUES ('mail:andreas#example.org', 77059e45-5fac-460b-9c4f-47528c292be0) IF NOT EXISTS;
APPLY BATCH;
would not even pass the query validation, because you try here to operate on two different tables.
I'm not sure if this will be suitable for other of your requirements, I would need more information about your queries and the velocity/volume of the data. For sure there are other ways for modeling this.
It would greatly simplify the problem if every user would have to have at least one specified secondary key(e.g. email would be a great unique key for your users table), but that's are your requirements, so unless you can't change them there is no discussion.
Hope this will help you a bit.
Good luck!

Cassandra secondary index using collection type

Here is a cassandra table:
CREATE TABLE Account(
id uuid,
userRef uuid,
name map<text, text>,
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (name);
'name' is a cql3 column of (collection) type 'map'. My question is: is it possible to create secondary index on a map type, i.e., name?
Thanks.
As of Cassandra 1.2.6, custom indexes on collections are supported.
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-1.2.6
Your question is quite old. In cassandra 2.1 valid syntax is
CREATE INDEX on Account(keys(name));
No response? I have decided to rewrite the table as follows:
CREATE TABLE Account(
id uuid,
userRef uuid,
**main_name text,**
**other_name map<text, text>,**
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (main_name);
*_name could be anything e.g., email, phone etc. For example, a main_name could be the mandatory, whereas other_name could be optional.
Anyway now I can index main_name as a 'text' type instead of the map of text values.
To answer your initial question:
There is no support for secondary indexes on collections yet. Concretely, you could associate a set of tags to a user, but you cannot automatically index users by their tags yet. Adding that support is definitively on the roadmap but remains to be implemented.
Coming in 1.2: Collections support in CQL3
Also, I don't quite see why you use a map? Why not a simple set or list? Have a look at the reference provided below.
create index idx_name on Account(ENTRIES(name))
this is for access the rows with particular entry in map.

Resources