Is it possible to write comment for UDT in cassandra? - cassandra

In Cassandra, for tables we can write a comment as follows:
CREATE TABLE company.address(
id int PRIMARY KEY,
street text,
...
) WITH COMMENT = 'Table containing the address of company
id - unique identifier of a company,
street - street of the company';
But for UDT(user defined type) I can't find if there is a way for writing a comment where I want to provide a description for each field of UDT. Is that possible in Cassandra ?

Comments for columns are not possible in cassandra 3.x (latest available version).
Jira Ticket for the same CASSANDRA-9836.
As of now best bet is to use self explanatory column names.

Related

In Cassandra, why dropping a column from tables defined with compact storage not allowed?

As per datastx documentation here, we cannot delete column from tables defined with COMPACT STORAGE option. What is the reason for this?
This goes back to the original implementation of CQL3, and changes which were made to allow it to abstract a "SQL-like," wide-row structure on top of the original Thrift-based storage engine. Ultimately, managing the schema comes down to whether or not the underlying structure is a table or a column_family.
As an example, I'll create two tables using an old install of Apache Cassandra (2.1.19):
CREATE TABLE student (
studentid TEXT PRIMARY KEY,
fname TEXT,
name TEXT);
CREATE TABLE studentcomp (
studentid TEXT PRIMARY KEY,
fname TEXT,
name TEXT)
WITH COMPACT STORAGE;
I'll insert one row into each table:
INSERT INTO student (studentid, fname, lname) VALUES ('janderson','Jordy','Anderson');
INSERT INTO studentcomp (studentid, fname, lname) VALUES ('janderson','Jordy','Anderson');
And then I'll look at the tables with the old cassandra-cli tool:
[default#stackoverflow] list student;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: janderson
=> (name=, value=, timestamp=1599248215128672)
=> (name=fname, value=4a6f726479, timestamp=1599248215128672)
=> (name=lname, value=416e646572736f6e, timestamp=1599248215128672)
[default#stackoverflow] list studentcomp;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: janderson
=> (name=fname, value=Jordy, timestamp=1599248302715066)
=> (name=lname, value=Anderson, timestamp=1599248302715066)
Do you see the empty/"ghost" column value in the first result? That empty column value was CQL3's link between the column values and the table's meta data. If it's not there, then CQL cannot be used to manage a table's columns.
The comparator used for type conversion was all that was really exposed via Thrift. This lack of meta data control/exposure is what allowed Cassandra to be considered "schemaless" in the pre-CQL days. If I run a describe studentcomp from within the cassandra-cli, I can see the comparators (validation class) used:
Column Metadata:
Column Name: lname
Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: fname
Validation Class: org.apache.cassandra.db.marshal.UTF8Type
But if I try describe student, I see this:
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Sorry, no Keyspace nor (non-CQL3) ColumnFamily was found with name: student (if this is a CQL3 table, you should use cqlsh instead)
Bascially, tables and column families were different entities forced into the same bucket. Adding WITH COMPACT STORAGE essentially made a table a column family.
With that came the lack of any schema management (adding or removing columns), outside of access to the comparators.
Edit 20200905
Can we somehow / someway (hack) drop the columns from table?
You might be able to accomplish this. Sylvain Lebresne wrote A Thrift to CQL3 Upgrade Guide which will have some necessary details for you. I also advise reading through the Jira ticket mentioned above (CASSANDRA-4377), as that covers many of the in-depth technical challenges that make this difficult.

How to create an efficient Cassandra Data model?

I'm new to Cassandra and trying to create an application. In which I have an entity 'student' consist of 4 columns as given below:
student_id
student_name
dob
course_name
create table student(student_id uuid, student_name text, dob date, course_name text, PRIMARY KEY(student_id));
I have to search students by course_name. Now according to Cassandra data modeling for searching student by course name I need to create another table as student_by_course_name which consist of two columns:
course_name
student_id
where course_name will be the partition key and student_id will be the cluster key as given below:
create table student_by_course_name(course_name text, student_id uuid PRIMARY KEY(course_name, student_id));
The problem arises when a student changes his course. Now I want to update the course name in the student_by_course_name table but it throws an error as the course_name column is a partition key. How to resolve this or pls suggest if i'm using Cassandra data modeling wrongly??
In this case you have to delete the old entry first and then add a new entry to student_by_course_name with the new course.
Your model looks good
The best way is indeed as Alex suggested. Delete and then update.
There are a couple of problems than you might need to be aware.
If your course have a LOT of students, it will generate big partitions (for this specific case might not be a issue)
Deleting entries will cause tombstones, and as such you should be prepared to handle them (Ex: Use low GC_GRACE, if you think a lot will be generated set unchecked_tombstones in the table)
Cassandra isn't the best for deleting data or updating data in-place. I believe that you have to use a batch statement to keep the tables in sync.
You can take two approaches. The first would be to delete the existing student ID/course name combination. This will create a tombstone but if it doesn't happen often, it won't be a big deal. The second option would be to use the original table and to create a secondary index on course name. This will allow both for the course name to be updated and queried by but may not preform well over time.

Understanding Cassandra Data Model

I have recently started learning No-SQL and Cassandra through this article. The author explains the data model through this diagram:
The author also gives the below column family example:
Book {
key: 9352130677{ name: “Hadoop The Definitive Guide”, author:” Tom White”, publisher:”Oreilly”, priceInr;650, category: “hadoop”, edition:4},
key: 8177228137{ name”” Hadoop in Action”, author: “Chuck Lam”, publisher:”manning”, priceInr;590, category: “hadoop”},
key: 8177228137{ name:” Cassandra: The Definitive Guide”, author: “Eben Hewitt”, publisher:” Oreilly”, priceInr:600, category: “cassandra”},
}
But in that tutorial and every other tutorial I have gone through, then end up creating regular tables in cassandra. I am unable to connect the Cassandar model with what I am creating.
For example, I created a column family called Employee as below:
create columnfamily Employee(empid int primary key,empName text,age int);
Now I inserted some data and my column family looks as this:
For me this looks like a regular relational table and not like the data model the author has explained. How do I create a Employee column family where each row represents an employee with different attributes? Something like:
Employee{
101:{name:Emp1,age:20}
102:{name:Emp2,salary:1000}
102:{manager_name:Emp3,age:45}
}
}
You need to understand that in the representation using cql, is may look like regular relational table, but the internal structure of the rows in Cassandra is completely different. It is saving different set of attributes for each employee, and the nulls you can see while querying with cql is just a representation of empty/nonexistent cells.
What you trying to achieve, is unstructured data model. Cassandra started with this model, and all was working as described in the tutorial you've read, but there is an opinion that unstructured data design is unhealthy to development and makes more problems than it solves. So, after sometime, Cassandra moved to the "structured" data structure (and from thrift to cql). It doesn't mean that you have to store all attributes for all keys/rows, it doesn't mean that all the rows are have same number of attributes, it just means that you have to declare attributes before you use them.
You can achieve some kind of unstructured data modeling using Map, List, Set, etc. data types, UDT (User defined types) or just saving your data as json string and parsing it on the application side.
What you have understood is correct. Just believe it. Internally cassandra stores columns exactly like the image in your question.
Now, what you are expecting is to insert a column which is not defined while creating the Employee table. For dynamic columns, you can always use Map data types .
For example
create table Employee(
empid int primary key,
empName text,
age int,
attributes Map<text,text>);
To add new attributes you can use below queries.
UPDATE Employee SET attributes = { manager_name : Emp3, age:45 } WHERE empid = 102;
Update -
another way to to create a dynamic column model is as below
create table Employee(
empid int primary key,
empName text,
attribute text,
attributevalue text,
primary key (empid,empName,attribute)
);
Lets take few inserts -
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','age','25') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','manager','emp2') ;
insert into Employee (empid,empName,attribute,attributevalue) values (102,'Emp1','department','hr') ;
this data structure will create a wide row, and behaves like dynamic column. you can see primary key empid and name is common for all three rows, only attribute and value will change.
Hope this will help
Cassandra uses a special primary key called compositie key. This is the representation of the partitions. This is also one reason why cassandra scales well. The composite key is used to determine the nodes on which the rows are stored.
The result in your console may be a result set of rows, but the intern organization of cassandra is differnt from that. Have you ever tried to query a table without an primary key? You will quickly see that you can't query that flexible (because of the partitioning).
After that you will understand why we have to use a query-first design aproach for cassandra. This is completely different from RDBBS.

Text based selection of records using CQL

I have a text field like 'address' in my Cassandra table. I want to search records on the basis of some piece of text from the 'address' field like city or street name.
for Example: I have address like 'House No. 18, Shehzad Colony, M.D.A. Chowk Lahore'. Here I want to search records having a part of string 'M.D.A. Chowk Lahore' in the address field.
how can i do this using CQL shell. can anyone guide me...
thanks...
There really isn't a way to do this out-of-the-box. In Cassandra, you need to design your tables to fit your query patterns. So if searching for addresses by city (or whatever) is a pattern you need to support, then there are a couple of ways to do this.
You can create a new query table, and partition by city:
CREATE TABLE userAddressesByCity (
userID uuid,
firstName text,
lastName text,
street text,
city text,
province text,
postalCode text,
PRIMARY KEY (city,userID));
This table structure would support querying by city as a partition key, and it also has userID as a clustering key to ensure uniqueness.
If you're working with addresses, a useful technique is to create a User Defined Type (UDT). UDTs are useful if you want to store a user's address in a single column. But you would still want to create a table specifically-designed to serve a query by whichever column you require.
Note: You could try one table and create a secondary index on one of the columns, but secondary indexes perform poorly at-scale, so I don't recommend that.

Cassandra DataModel Designing, Composite Key vs Super Column

while designing the datamodel in cassandra. I am stuck while designing the below scenario.
Like One API/Webservice can have multiple parameters(input/output). I don't know the parameters count and its column name as well.
How to design its cassandra datamodel. I am aware that supercolumns are not good to use and alternative good solution is using composite keys. But for my scenario I don't have fixed columns names and count that I can specify as composite keys.
Please see the pic below which I want to model
Secondly how to write its create table statement so that I can specify parameter name as column name.
Please let me know if anything is unclear.
Thanks,
Why not use a map?
http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_map_t.html
create table foo(
name text,
owner text,
version text,
params map<text, text>,
primary key (name, owner, version)
);
If you're one 2.1, you can create secondary indexes on the map keys / values, which caters to more flexibility if needed.

Resources