How to change the type of a column name in Cassandra? - cassandra

I read somewhere that Cassandra supports different types for the Column Names unlike RDBMS's out there that supports only Strings.
How do i go about changing the column name of a table in Cassandra?. Or How do i create a table FOO with a Column name as 1985-12-05 ?

You can use alter command. Check https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
but 1985-12-05 won't work as Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters.

You can use Map column type
CREATE TABLE FOO(
id int,
mymap map<timestamp, text>,
PRIMARY KEY (int));
This will serve as dynamic column for you.

Related

Cassandra dynamic column family

I am new to cassandra and I read some articles about static and dynamic column family.
It is mentioned ,From Cassandra 3 table and column family are same.
I created key space, some tables and inserted data into that table.
CREATE TABLE subscribers(
id uuid,
email text,
first_name text,
last_name text,
PRIMARY KEY(id,email)
);
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test#123.com','Test1','User1');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test2#222.com','Test2','User2');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test3#333.com','Test3','User3');
It all seems to work fine.
But what I need is to create a dynamic column family with only data types and no predefined columns.
With insert query I can have different arguments and the table should be inserted.
In articles, it is mentioned ,for dynamic column family, there is no need to create a schema(predefined columns).
I am not sure if this is possible in cassandra or my understanding is wrong.
Let me know if this is possible or not?
if possible Kindly provide with some examples.
Thanks in advance.
I think that articles that you're referring where written in the first years of Cassandra, when it was based on the Thrift protocols. Cassandra Query Language was introduced many years ago, and now it's the way to work with Cassandra - Thrift is deprecated in Cassandra 3.x, and fully removed in the 4.0 (not released yet).
If you really need to have fully dynamic stuff, then you can try to emulate this by using table with columns as maps from text to specific type, like this:
create table abc (
id int primary key,
imap map<text,int>,
tmap map<text,text>,
... more types
);
but you need to be careful - there are limitations and performance effects when using collections, especially if you want to store more then hundreds of elements.
another approach is to store data as individual rows:
create table xxxx (
id int,
col_name text,
ival int,
tval text,
... more types
primary key(id, col_name));
then you can insert individual values as separate columns:
insert into xxxx(id, col_name, ival) values (1, 'col1', 1);
insert into xxxx(id, col_name, tval) values (1, 'col2', 'text');
and select all columns as:
select * from xxxx where id = 1;

Cassandra 2.2.11 add new map column from text column

Let's say I have table with 2 columns
primary key: id - type varchar
and non-primary-key: data - type text
Data column consist only of json values for example like:
{
"name":"John",
"age":30
}
I know that i can not alter this column to map type but maybe i can add new map column with values from data column or maybe you have some other idea?
What can i do about it ? I want to get map column in this table with values from data
You might want to make use of the CQL COPY command to export all your data to a CSV file.
Then alter your table and create a new column of type map.
Convert the exported data to another file containing UPDATE statements where you only update the newly created column with values converted from JSON to a map. For conversion use a tool or language of your choice (be it bash, python, perl or whatever).
BTW be aware, that with map you specify what data type is your map's key and what data type is your map's value. So you will most probably be limited to use strings only if you want to be generic, i.e. a map<text, text>. Consider whether this is appropriate for your use case.

Cassandra valid column names

I'm creating a api which will work on either mongo or cassandra, for that reason I'm using '_id' as a column name.
This should be a valid name according to the docs:
Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters. User-defined data type names and field names, user-defined function names, and user-defined aggregate names created using CQL can only contain alphanumeric and underscore characters. If you enter names for these objects using anything other than alphanumeric characters or underscores, Cassandra will issue an invalid syntax message and fail to create the object.
However, when I run this statement:
CREATE TABLE users(_id: bigint, entires: map<timestamp, text>, PRIMARY KEY(_id));
I return the following error:
Invalid syntax at line 1, char 20
Is it possible to use underscores in column names?
Underscores in columns names? Yes. Column names starting with underscores? No.
From the CREATE TABLE documentation:
Valid table names are strings of alphanumeric characters and underscores, which begin with a letter.
You can create a column name starting with an underscore. Use quotes:
CREATE TABLE users("_id": bigint, entires: map<timestamp, text>, PRIMARY KEY("_id"));
The column name will be _id
Although you can, it does not mean that you should have such a column - you will need to continue using quotes in each query making it cumbersome:
SELECT "_id" FROM users;

Why cassandra/cql restrict to use where clause on a column that not indexed?

I have a table as follows in Cassandra 2.0.8:
CREATE TABLE emp (
empid int,
deptid int,
first_name text,
last_name text,
PRIMARY KEY (empid, deptid)
)
when I try to search by: "select * from emp where first_name='John';"
cql shell says:
"Bad Request: No indexed columns present in by-columns clause with Equal operator"
I searched for the issue and every places it says add a secondary index for the column 'first_name'.
But I need to know the exact reason for why that column need to be indexed?
Only thing I can figure out is performance.
Any other reasons?
Cassandra does not support for searching by arbitrary column. It is because it would involve scanning all the rows, which is not supported.
The data are internally organised into something which one can compare to HashMap[X, SortedMap[Y, Z]]. The key of the outer map is a partition key value and the key of the inner map is a kind of concatenation of all clustering columns values and a name of some regular column.
Unless you have an index on a column, you need to provide full (preferred) or partial path to the data you want to collect with the query. Therefore, you should design your schema so that queries contain primary key value and some range on clustering columns.
You may read about what is allowed and what is not here
Alternatively you can create an index in Cassandra, but that will hamper your write performance.

Cassandra - What is meant by - "cannot rename non primary key part"

I have created a table users as follows:
create table users (user_id text primary key, email text, first_name text, last_name text, session_token int);
I am referring to the CQL help documentation on the DataStax website.
I now want to rename the email column to "emails". But I when I execute the command -
alter table users rename email to emails;
I am getting the error -
Bad Request: cannot rename non primary key part email
I am using CQL 3 . My CQLSH is 3.1.6 and C* is 1.2.8.
Why cannot I rename the above column? If I run help alter table, it shows the option to rename the column. How do I rename the column?
In CQL, you can rename the column used as the primary key, but not any others. This seems opposite from what it should be, one would think that the primary key would need to stay the same and the others would be easy to change! The reason comes from implementation details.
The name of the primary key is not written into each row, rather it is stored in a different place that's easily changeable. But for non-primary key fields, the names of the fields are written into each row. In order to rename the column, the system would have to rewrite every single row.
This article has some fantastic examples and a much longer discussion of Cassandra's internals.
To borrow an example directly from the article, consider this example column family:
cqlsh:test> CREATE TABLE example (
... field1 int PRIMARY KEY,
... field2 int,
... field3 int);
Insert a little data:
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);
And then the Cassandra-CLI output (not CQLSH) from querying this column family:
[default#test] list example;
-------------------
RowKey: 1
=> (column=, value=, timestamp=1374546754299000)
=> (column=field2, value=00000002, timestamp=1374546754299000)
=> (column=field3, value=00000003, timestamp=1374546754299000)
The name of the primary key, "field1" is not stored in any of the rows, but "field2" and "field3" are written out, so changing those names would require rewriting every row.
So if you really still want to rename a non-primary column, there are basically two different strategies and neither of them are very desirable.
Drop the column and add it back, as another poster mentioned. This has the big downside of dropping all the data in that column.
or
Create a new column family that is basically a copy of the old but with the column in question renamed and rewrite your data there. This is, of course, very computationally expensive.
In order to RENAME the field, the only way I got it working was dropping the field first and then adding it in. So it is like this:
alter table users drop email;
alter table users add emails text;
The main purpose of the RENAME clause is to change the names of CQL 3-generated primary key and column names that are missing from a legacy table (table created with COMPACT STORAGE).

Resources