Convert dynamic Cassandra column family to static one - cassandra

Let's say I have a column family in Cassandra that was created using cassandra-cli like this:
create column family users with key_validation_class = UTF8Type and comparator = UTF8Type;
In terms of the thrift to CQL3 migration guide from Datastax this is a dynamic column family.
When viewed from CQL3 client using DESCRIBE TABLE users it looks like this:
CREATE TABLE users (
key text,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
That is the expected behavior. What I want is to add column metadata so that the column family is viewed as static.
So I tried this using cassandra-cli:
update column family users
with column_metadata = [{column_name: email, validation_class: UTF8Type}];
However the end result in CQL3 is not what I wanted:
CREATE TABLE users (
key text,
column1 text,
value blob,
email text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
What I expected is the same result as when I create the column family with the metadata from the beginning:
create column family users2
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata = [{column_name: email, validation_class: UTF8Type}];
In that case the CQL3 view of this is what I want:
CREATE TABLE users2 (
key text PRIMARY KEY,
email text
) WITH COMPACT STORAGE;
Is there some way how I can add column metadata to a column family that was created without any - so that it would be viewed from CQL3 the same way as if the metadata was provided when the column family was created? Without re-creating the column family, of course.

It's not possible to create static column using the old Thrift API. In fact, a static column is just a trick, e.g. a column with clustering value = NULL so there is only 1 instance of it for each partition key.
See those 2 slides for the explanation (sorry text in French):
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/218
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/219
You should take this opportunity to migrate to CQL. Thrift is deprecated and even disable by default starting with Cassandra 3.x

Ok I see what you mean. Look at the system keyspace, table schema_columnfamilies.
I think the label of the partition keys and clustering columns are stored there.
It maybe possible to change them but I don't know if it's a good idea to hack into those meta tables directly.
If you have n nodes, you'll probably need to update the label on all those nodes since the system keyspace has a LocalStrategy.
Execute this query to see the actual labels:
SELECT key_aliases,key_validator,column_aliases,comparator
FROM system.schema_columnfamilies
WHERE keyspace_name='xxx'
AND columnfamily_name='users';

Related

Cassandra dynamic column family

I am new to cassandra and I read some articles about static and dynamic column family.
It is mentioned ,From Cassandra 3 table and column family are same.
I created key space, some tables and inserted data into that table.
CREATE TABLE subscribers(
id uuid,
email text,
first_name text,
last_name text,
PRIMARY KEY(id,email)
);
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test#123.com','Test1','User1');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test2#222.com','Test2','User2');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test3#333.com','Test3','User3');
It all seems to work fine.
But what I need is to create a dynamic column family with only data types and no predefined columns.
With insert query I can have different arguments and the table should be inserted.
In articles, it is mentioned ,for dynamic column family, there is no need to create a schema(predefined columns).
I am not sure if this is possible in cassandra or my understanding is wrong.
Let me know if this is possible or not?
if possible Kindly provide with some examples.
Thanks in advance.
I think that articles that you're referring where written in the first years of Cassandra, when it was based on the Thrift protocols. Cassandra Query Language was introduced many years ago, and now it's the way to work with Cassandra - Thrift is deprecated in Cassandra 3.x, and fully removed in the 4.0 (not released yet).
If you really need to have fully dynamic stuff, then you can try to emulate this by using table with columns as maps from text to specific type, like this:
create table abc (
id int primary key,
imap map<text,int>,
tmap map<text,text>,
... more types
);
but you need to be careful - there are limitations and performance effects when using collections, especially if you want to store more then hundreds of elements.
another approach is to store data as individual rows:
create table xxxx (
id int,
col_name text,
ival int,
tval text,
... more types
primary key(id, col_name));
then you can insert individual values as separate columns:
insert into xxxx(id, col_name, ival) values (1, 'col1', 1);
insert into xxxx(id, col_name, tval) values (1, 'col2', 'text');
and select all columns as:
select * from xxxx where id = 1;

Cassandra CLI create column family with primary key

I am trying to create a column family in Cassandra CLI Version 1.1.6, I am not sure how to specify primary key as movieid.
CREATE COLUMN FAMILY movies
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: movieid, validation_class: UTF8Type}
{column_name: title, validation_class: UTF8Type}
{column_name: genres, validation_class: UTF8Type}];
Creating a column family via the CLI doesn't create a schema that you have to stick to. It depends on how you insert the data, that is what defines the primary key. When you create the column family via the CLi you only have to define what kind of value the primary key will contain i.e. is it a string (UTF8Type), int (IntegerType) etc.
Also you cant actually have an alias for the KEY column (aka the primary key in your table) via the CLI. You have to use CQL for that. If you want defined schemas and structured queries rather than using wide rows you should target a newer version of cassandra (why now 1.2.x?) and use CLQ3.
A more visual representation of what I mean, your cassandra-cli statement creates this when viewed from cqlsh via the describe table command:
CREATE TABLE movies (
KEY text PRIMARY KEY, <------- pk
genres text,
title text,
movieid text
) WITH // and a bunch of cf options
But this doesn't mean you cant insert another column that isn't defined there, because thrift doesn't really care about the CF's schema.

Creating column family or table in Cassandra while working Datastax API(which uses new Binary protocol)

I have started working with Cassandra database. I am planning to use Datastax API to upsert/read into/from cassandra database. I am totally new to this Datastax API (which uses new Binary protocol) and I am not able to find lot of documentations as well which have some proper examples.
When I was working with Cassandra CLI using the Netflix client(Astyanax client), then I created the column family like this-
create column family profile
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and column_metadata = [
{column_name : crd, validation_class : 'DateType'}
{column_name : lmd, validation_class : 'DateType'}
{column_name : account, validation_class : 'UTF8Type'}
{column_name : advertising, validation_class : 'UTF8Type'}
{column_name : behavior, validation_class : 'UTF8Type'}
{column_name : info, validation_class : 'UTF8Type'}
];
Now I was trying to do the same thing using Datastax API. So to start working with Datastax API, do I need to create the column family in some different way as mentioned above? Or the above column familiy will work fine whenever I will try to insert data into Cassandra database using Datastax API.
If the above column family will not work then-
First of all I have created the KEYSPACE like below-
CREATE KEYSPACE USERS WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = '1';
Now I am confuse how to create the table? I am not sure which is the right way to do that?
Should I create like this?
CREATE TABLE profile (
id varchar,
account varchar,
advertising varchar,
behavior varchar,
info varchar,
PRIMARY KEY (id)
);
or should I create like this?
CREATE COLUMN FAMILY profile (
id varchar,
account varchar,
advertising varchar,
behavior varchar,
info varchar,
PRIMARY KEY (id)
);
And also how to add-
crd as DateType
lmd as DateType
in above table or column family while working with Datastax API?
Any help will be appreciated.
Whether you use the keyword TABLE or COLUMNFAMILY, both are the same (synonyms). I guess the keyword TABLE was introduced with CQL3. So you can use either one in your statements.
Second question, adding DateType, you should use timestamp.
CREATE COLUMNFAMILY sample (rowkey text, ts timestamp, PRIMARY KEY(rowkey));
INSERT INTO sample (rowkey, ts ) VALUES ( '1','1366354711797');
// ts value is basically the System.currentTimeMillis(), I mean a long value
In cassandra keyspace or database are same,like wise columnfamily and table are just same.
Cassandra is more like Mysql In its syntax and supports hql(similar to sql)
A table in cassandra can be created like:
CREATE TABLE users (
user_name varchar,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint,
PRIMARY KEY (user_name));
More information here : Cassandra Tutorials
#neel4soft - With Cassandra things simply evolve. Therefore in order to be a kind of easier for people, steadily a renaming process is ongoing to make the transition from SQL to CQL easier for newbies. However CQL should not be thought of like being a relative to SQL, rather like a 3rd cousin from the side of it's mother, in other words not a close relative. Therefor comparing it to MySQL is an improper image of it's capabilities.
There are few differences while creating table/ colomn family using cassandra-cli and cqlsh.
One of them is, using cassandra-cli if we create table, it will create with compact storage format which is unable to alter further.
In cqlsh, it will not be created in this format unless we mention specifically while created the table/colomn family.
A Column Family is a collection of ordered columns and it is a container of the rows and it stores into Cassandra Keyspace and we can create multiple Column Families into a Keyspace.
A Column Family also called an RDBMS Table but the Column Families are not equal to tables.
Each Column Families are stored in separate files on disk. Each row has a unique key which is called Row Key.
The Cassandra has also the concept of Super Column Family which is allowing nested access by holding a different set of columns.
In the Column Family, We can set default ordering of data, we can make the compressed table, we can use compact storage, we can set the expiry of data.

Does CQL3 require a schema for Cassandra now?

I've just had a crash course of Cassandra over the last week and went from Thrift API to CQL to grokking SuperColumns to learning I shouldn't use them and user Composite Keys instead.
I'm now trying out CQL3 and it would appear that I can no longer insert into columns that are not defined in the schema, or see those columns in a select *
Am I missing some option to enable this in CQL3 or does it expect me to define every column in the schema (defeating the purpose of wide, flexible rows, imho).
Yes, CQL3 does require columns to be declared before used.
But, you can do as many ALTERs as you want, no locking or performance hit is entailed.
That said, most of the places that you'd use "dynamic columns" in earlier C* versions are better served by a Map in C* 1.2.
I suggest you to explore composite columns with "WITH COMPACT STORAGE".
A "COMPACT STORAGE" column family allows you to practically only define key columns:
Example:
CREATE TABLE entities_cargo (
entity_id ascii,
item_id ascii,
qt ascii,
PRIMARY KEY (entity_id, item_id)
) WITH COMPACT STORAGE
Actually, when you insert different values from itemid, you dont add a row with entity_id,item_id and qt, but you add a column with name (item_id content) and value (qt content).
So:
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 1',3);
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 2',3);
Now, here is how you see this rows in CQL3:
cqlsh:goh_master> select * from entities_cargo where entity_id = 100;
entity_id | item_id | qt
-----------+-----------+----
100 | oggetto 1 | 3
100 | oggetto 2 | 3
And how they are if you check tnem from cli:
[default#goh_master] get entities_cargo[100];
=> (column=oggetto 1, value=3, timestamp=1349853780838000)
=> (column=oggetto 2, value=3, timestamp=1349853784172000)
Returned 2 results.
You can access a single column with
select * from entities_cargo where entity_id = 100 and item_id = 'oggetto 1';
Hope it helps
Cassandra still allows using wide rows. This answer references that DataStax blog entry, written after the question was asked, which details the links between CQL and the underlying architecture.
Legacy support
A dynamic column family defined through Thrift with the following command (notice there is no column-specific metadata):
create column family clicks
with key_validation_class = UTF8Type
and comparator = DateType
and default_validation_class = UTF8Type
Here is the exact equivalent in CQL:
CREATE TABLE clicks (
key text,
column1 timestamp,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
Both of these commands create a wide-row column family that stores records ordered by date.
CQL Extras
In addition, CQL provides the ability to assign labels to the row id, column and value elements to indicate what is being stored. The following, alternative way of defining this same structure in CQL, highlights this feature on DataStax's example - a column family used for storing users' clicks on a website, ordered by time:
CREATE TABLE clicks (
user_id text,
time timestamp,
url text,
PRIMARY KEY (user_id, time)
) WITH COMPACT STORAGE
Notes
a Table in CQL is always mapped to a Column Family in Thrift
the CQL driver uses the first element of the primary key definition as the row key
Composite Columns are used to implement the extra columns that one can define in CQL
using WITH COMPACT STORAGE is not recommended for new designs because it fixes the number of possible columns. In other words, ALTER TABLE ... ADD is not possible on such a table. Just leave it out unless it's absolutely necessary.
interesting, something I didn't know about CQL3. In PlayOrm, the idea is it is a "partial" schema you must define and in the WHERE clause of the select, you can only use stuff that is defined in the partial schema BUT it returns ALL the data of the rows EVEN the data it does not know about....I would expect that CQL should have been doing the same :( I need to look into this now.
thanks,
Dean

Cassandra CLI: specify name of primary key

Is it possible to specify a name of primary key via cassandra CLI, like via CQL:
create columnfamily test (
my_key_name varchar primary key,
value varchar);
By default, cassandra cli creates primary key with name 'KEY'
The attribute you're looking for is key_alias. Unfortunately, you can't currently set it through cassandra-cli, only cqlsh. I've opened CASSANDRA-4158 to fix this.
When creating or updating a column family via the CLI, you can specify the column_metadata to identify the type (validation class) and/or if the column has an index.
e.g., assuming you have created the test column family, and wish to specify the column my_key_name as string type which is indexed:
update column family test
with column_metadata =
[
{column_name: 'my_key_name', validation_class: UTF8Type, index_type: KEYS}
];
if you wanted to later drop the index
update column family test with column_metadata = [];
Here is a CQL example from a Cassandra 1.1 schema related blog post on the Datastax website
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
CREATE TABLE users (
id uuid PRIMARY KEY,
name varchar,
state varchar
);
I have used only 0.7.x where you can specify the data type of the key. Following is from 0.7.6 cassandra-cli "help assume;" command
assume <column_family> keys as <type>;
Assume one of the attributes (comparator, sub_comparator, validator or keys)
of the given column family to match specified type. Available types: bytes, integer, long, lexicaluuid, timeuuid, utf8, ascii.

Resources