Cassandra CLI create column family with primary key - cassandra

I am trying to create a column family in Cassandra CLI Version 1.1.6, I am not sure how to specify primary key as movieid.
CREATE COLUMN FAMILY movies
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: movieid, validation_class: UTF8Type}
{column_name: title, validation_class: UTF8Type}
{column_name: genres, validation_class: UTF8Type}];

Creating a column family via the CLI doesn't create a schema that you have to stick to. It depends on how you insert the data, that is what defines the primary key. When you create the column family via the CLi you only have to define what kind of value the primary key will contain i.e. is it a string (UTF8Type), int (IntegerType) etc.
Also you cant actually have an alias for the KEY column (aka the primary key in your table) via the CLI. You have to use CQL for that. If you want defined schemas and structured queries rather than using wide rows you should target a newer version of cassandra (why now 1.2.x?) and use CLQ3.
A more visual representation of what I mean, your cassandra-cli statement creates this when viewed from cqlsh via the describe table command:
CREATE TABLE movies (
KEY text PRIMARY KEY, <------- pk
genres text,
title text,
movieid text
) WITH // and a bunch of cf options
But this doesn't mean you cant insert another column that isn't defined there, because thrift doesn't really care about the CF's schema.

Related

Convert dynamic Cassandra column family to static one

Let's say I have a column family in Cassandra that was created using cassandra-cli like this:
create column family users with key_validation_class = UTF8Type and comparator = UTF8Type;
In terms of the thrift to CQL3 migration guide from Datastax this is a dynamic column family.
When viewed from CQL3 client using DESCRIBE TABLE users it looks like this:
CREATE TABLE users (
key text,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
That is the expected behavior. What I want is to add column metadata so that the column family is viewed as static.
So I tried this using cassandra-cli:
update column family users
with column_metadata = [{column_name: email, validation_class: UTF8Type}];
However the end result in CQL3 is not what I wanted:
CREATE TABLE users (
key text,
column1 text,
value blob,
email text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
What I expected is the same result as when I create the column family with the metadata from the beginning:
create column family users2
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata = [{column_name: email, validation_class: UTF8Type}];
In that case the CQL3 view of this is what I want:
CREATE TABLE users2 (
key text PRIMARY KEY,
email text
) WITH COMPACT STORAGE;
Is there some way how I can add column metadata to a column family that was created without any - so that it would be viewed from CQL3 the same way as if the metadata was provided when the column family was created? Without re-creating the column family, of course.
It's not possible to create static column using the old Thrift API. In fact, a static column is just a trick, e.g. a column with clustering value = NULL so there is only 1 instance of it for each partition key.
See those 2 slides for the explanation (sorry text in French):
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/218
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/219
You should take this opportunity to migrate to CQL. Thrift is deprecated and even disable by default starting with Cassandra 3.x
Ok I see what you mean. Look at the system keyspace, table schema_columnfamilies.
I think the label of the partition keys and clustering columns are stored there.
It maybe possible to change them but I don't know if it's a good idea to hack into those meta tables directly.
If you have n nodes, you'll probably need to update the label on all those nodes since the system keyspace has a LocalStrategy.
Execute this query to see the actual labels:
SELECT key_aliases,key_validator,column_aliases,comparator
FROM system.schema_columnfamilies
WHERE keyspace_name='xxx'
AND columnfamily_name='users';

Cassandra - What is meant by - "cannot rename non primary key part"

I have created a table users as follows:
create table users (user_id text primary key, email text, first_name text, last_name text, session_token int);
I am referring to the CQL help documentation on the DataStax website.
I now want to rename the email column to "emails". But I when I execute the command -
alter table users rename email to emails;
I am getting the error -
Bad Request: cannot rename non primary key part email
I am using CQL 3 . My CQLSH is 3.1.6 and C* is 1.2.8.
Why cannot I rename the above column? If I run help alter table, it shows the option to rename the column. How do I rename the column?
In CQL, you can rename the column used as the primary key, but not any others. This seems opposite from what it should be, one would think that the primary key would need to stay the same and the others would be easy to change! The reason comes from implementation details.
The name of the primary key is not written into each row, rather it is stored in a different place that's easily changeable. But for non-primary key fields, the names of the fields are written into each row. In order to rename the column, the system would have to rewrite every single row.
This article has some fantastic examples and a much longer discussion of Cassandra's internals.
To borrow an example directly from the article, consider this example column family:
cqlsh:test> CREATE TABLE example (
... field1 int PRIMARY KEY,
... field2 int,
... field3 int);
Insert a little data:
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);
And then the Cassandra-CLI output (not CQLSH) from querying this column family:
[default#test] list example;
-------------------
RowKey: 1
=> (column=, value=, timestamp=1374546754299000)
=> (column=field2, value=00000002, timestamp=1374546754299000)
=> (column=field3, value=00000003, timestamp=1374546754299000)
The name of the primary key, "field1" is not stored in any of the rows, but "field2" and "field3" are written out, so changing those names would require rewriting every row.
So if you really still want to rename a non-primary column, there are basically two different strategies and neither of them are very desirable.
Drop the column and add it back, as another poster mentioned. This has the big downside of dropping all the data in that column.
or
Create a new column family that is basically a copy of the old but with the column in question renamed and rewrite your data there. This is, of course, very computationally expensive.
In order to RENAME the field, the only way I got it working was dropping the field first and then adding it in. So it is like this:
alter table users drop email;
alter table users add emails text;
The main purpose of the RENAME clause is to change the names of CQL 3-generated primary key and column names that are missing from a legacy table (table created with COMPACT STORAGE).

Creating column family or table in Cassandra while working Datastax API(which uses new Binary protocol)

I have started working with Cassandra database. I am planning to use Datastax API to upsert/read into/from cassandra database. I am totally new to this Datastax API (which uses new Binary protocol) and I am not able to find lot of documentations as well which have some proper examples.
When I was working with Cassandra CLI using the Netflix client(Astyanax client), then I created the column family like this-
create column family profile
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and column_metadata = [
{column_name : crd, validation_class : 'DateType'}
{column_name : lmd, validation_class : 'DateType'}
{column_name : account, validation_class : 'UTF8Type'}
{column_name : advertising, validation_class : 'UTF8Type'}
{column_name : behavior, validation_class : 'UTF8Type'}
{column_name : info, validation_class : 'UTF8Type'}
];
Now I was trying to do the same thing using Datastax API. So to start working with Datastax API, do I need to create the column family in some different way as mentioned above? Or the above column familiy will work fine whenever I will try to insert data into Cassandra database using Datastax API.
If the above column family will not work then-
First of all I have created the KEYSPACE like below-
CREATE KEYSPACE USERS WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = '1';
Now I am confuse how to create the table? I am not sure which is the right way to do that?
Should I create like this?
CREATE TABLE profile (
id varchar,
account varchar,
advertising varchar,
behavior varchar,
info varchar,
PRIMARY KEY (id)
);
or should I create like this?
CREATE COLUMN FAMILY profile (
id varchar,
account varchar,
advertising varchar,
behavior varchar,
info varchar,
PRIMARY KEY (id)
);
And also how to add-
crd as DateType
lmd as DateType
in above table or column family while working with Datastax API?
Any help will be appreciated.
Whether you use the keyword TABLE or COLUMNFAMILY, both are the same (synonyms). I guess the keyword TABLE was introduced with CQL3. So you can use either one in your statements.
Second question, adding DateType, you should use timestamp.
CREATE COLUMNFAMILY sample (rowkey text, ts timestamp, PRIMARY KEY(rowkey));
INSERT INTO sample (rowkey, ts ) VALUES ( '1','1366354711797');
// ts value is basically the System.currentTimeMillis(), I mean a long value
In cassandra keyspace or database are same,like wise columnfamily and table are just same.
Cassandra is more like Mysql In its syntax and supports hql(similar to sql)
A table in cassandra can be created like:
CREATE TABLE users (
user_name varchar,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint,
PRIMARY KEY (user_name));
More information here : Cassandra Tutorials
#neel4soft - With Cassandra things simply evolve. Therefore in order to be a kind of easier for people, steadily a renaming process is ongoing to make the transition from SQL to CQL easier for newbies. However CQL should not be thought of like being a relative to SQL, rather like a 3rd cousin from the side of it's mother, in other words not a close relative. Therefor comparing it to MySQL is an improper image of it's capabilities.
There are few differences while creating table/ colomn family using cassandra-cli and cqlsh.
One of them is, using cassandra-cli if we create table, it will create with compact storage format which is unable to alter further.
In cqlsh, it will not be created in this format unless we mention specifically while created the table/colomn family.
A Column Family is a collection of ordered columns and it is a container of the rows and it stores into Cassandra Keyspace and we can create multiple Column Families into a Keyspace.
A Column Family also called an RDBMS Table but the Column Families are not equal to tables.
Each Column Families are stored in separate files on disk. Each row has a unique key which is called Row Key.
The Cassandra has also the concept of Super Column Family which is allowing nested access by holding a different set of columns.
In the Column Family, We can set default ordering of data, we can make the compressed table, we can use compact storage, we can set the expiry of data.

Does CQL3 require a schema for Cassandra now?

I've just had a crash course of Cassandra over the last week and went from Thrift API to CQL to grokking SuperColumns to learning I shouldn't use them and user Composite Keys instead.
I'm now trying out CQL3 and it would appear that I can no longer insert into columns that are not defined in the schema, or see those columns in a select *
Am I missing some option to enable this in CQL3 or does it expect me to define every column in the schema (defeating the purpose of wide, flexible rows, imho).
Yes, CQL3 does require columns to be declared before used.
But, you can do as many ALTERs as you want, no locking or performance hit is entailed.
That said, most of the places that you'd use "dynamic columns" in earlier C* versions are better served by a Map in C* 1.2.
I suggest you to explore composite columns with "WITH COMPACT STORAGE".
A "COMPACT STORAGE" column family allows you to practically only define key columns:
Example:
CREATE TABLE entities_cargo (
entity_id ascii,
item_id ascii,
qt ascii,
PRIMARY KEY (entity_id, item_id)
) WITH COMPACT STORAGE
Actually, when you insert different values from itemid, you dont add a row with entity_id,item_id and qt, but you add a column with name (item_id content) and value (qt content).
So:
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 1',3);
insert into entities_cargo (entity_id,item_id,qt) values(100,'oggetto 2',3);
Now, here is how you see this rows in CQL3:
cqlsh:goh_master> select * from entities_cargo where entity_id = 100;
entity_id | item_id | qt
-----------+-----------+----
100 | oggetto 1 | 3
100 | oggetto 2 | 3
And how they are if you check tnem from cli:
[default#goh_master] get entities_cargo[100];
=> (column=oggetto 1, value=3, timestamp=1349853780838000)
=> (column=oggetto 2, value=3, timestamp=1349853784172000)
Returned 2 results.
You can access a single column with
select * from entities_cargo where entity_id = 100 and item_id = 'oggetto 1';
Hope it helps
Cassandra still allows using wide rows. This answer references that DataStax blog entry, written after the question was asked, which details the links between CQL and the underlying architecture.
Legacy support
A dynamic column family defined through Thrift with the following command (notice there is no column-specific metadata):
create column family clicks
with key_validation_class = UTF8Type
and comparator = DateType
and default_validation_class = UTF8Type
Here is the exact equivalent in CQL:
CREATE TABLE clicks (
key text,
column1 timestamp,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
Both of these commands create a wide-row column family that stores records ordered by date.
CQL Extras
In addition, CQL provides the ability to assign labels to the row id, column and value elements to indicate what is being stored. The following, alternative way of defining this same structure in CQL, highlights this feature on DataStax's example - a column family used for storing users' clicks on a website, ordered by time:
CREATE TABLE clicks (
user_id text,
time timestamp,
url text,
PRIMARY KEY (user_id, time)
) WITH COMPACT STORAGE
Notes
a Table in CQL is always mapped to a Column Family in Thrift
the CQL driver uses the first element of the primary key definition as the row key
Composite Columns are used to implement the extra columns that one can define in CQL
using WITH COMPACT STORAGE is not recommended for new designs because it fixes the number of possible columns. In other words, ALTER TABLE ... ADD is not possible on such a table. Just leave it out unless it's absolutely necessary.
interesting, something I didn't know about CQL3. In PlayOrm, the idea is it is a "partial" schema you must define and in the WHERE clause of the select, you can only use stuff that is defined in the partial schema BUT it returns ALL the data of the rows EVEN the data it does not know about....I would expect that CQL should have been doing the same :( I need to look into this now.
thanks,
Dean

Cassandra CLI: specify name of primary key

Is it possible to specify a name of primary key via cassandra CLI, like via CQL:
create columnfamily test (
my_key_name varchar primary key,
value varchar);
By default, cassandra cli creates primary key with name 'KEY'
The attribute you're looking for is key_alias. Unfortunately, you can't currently set it through cassandra-cli, only cqlsh. I've opened CASSANDRA-4158 to fix this.
When creating or updating a column family via the CLI, you can specify the column_metadata to identify the type (validation class) and/or if the column has an index.
e.g., assuming you have created the test column family, and wish to specify the column my_key_name as string type which is indexed:
update column family test
with column_metadata =
[
{column_name: 'my_key_name', validation_class: UTF8Type, index_type: KEYS}
];
if you wanted to later drop the index
update column family test with column_metadata = [];
Here is a CQL example from a Cassandra 1.1 schema related blog post on the Datastax website
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
CREATE TABLE users (
id uuid PRIMARY KEY,
name varchar,
state varchar
);
I have used only 0.7.x where you can specify the data type of the key. Following is from 0.7.6 cassandra-cli "help assume;" command
assume <column_family> keys as <type>;
Assume one of the attributes (comparator, sub_comparator, validator or keys)
of the given column family to match specified type. Available types: bytes, integer, long, lexicaluuid, timeuuid, utf8, ascii.

Resources