How to Rename Column name using cassandra table - cassandra

I have a question in cassandra db. I want to rename the column name. But its showing syntax error. Because my column name contain space. So how can I change column name:
Ex: sample column into samplecolumn?

You can use alter table to rename a column but theres a lot of restrictions on it. As sstables are immutable in order to change state of things on disk everything must be rewritten.
The main purpose of RENAME is to change the names of CQL-generated primary key and column names that are missing from a legacy table. The following restrictions apply to the RENAME operation:
You can only rename clustering columns, which are part of the primary key.
You cannot rename the partition key.
You can index a renamed column.
You cannot rename a column if an index has been created on it.
You cannot rename a static column (since you cannot use a static column in the table's primary key).
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html

Related

Cassandra Altering the table

I have a table in Cassandra say employee(id, email, role, name, password) with only id as my primary key.
I want to ...
1. Add another column (manager_id) in with a default value in it
I know that I can add a column in the table but there is no way i can provide a default value to that column through CQL. I can also not update the value for manager_id later since I need to know the id (Partition key and the values are randomly generated unique values which i don't know) to update the row. Is there any way I can achieve this?
2. Rename this table to all_employee.
I also know that its not allowed to rename a table in cassandra. So I am trying to copy the data of table(employee) to csv and copy from csv to new table (all_employee) and deleting the old table(employee). I am doing this through an automated script with cql queries in it and script works fine but will fail if it gets executed again(Which i can not restrict) since the table employee will not be there once its deleted. Essentially I am looking for "If exists" clause in COPY query which is not supported in cql. Is there any other way I can achieve the outcome?
Please note that the amount of data in the table is very small so performance in not an issue.
For #1
I dont think cassandra support default column . You need to do that from your appliaction. Write some default value every time you insert a row.
For #2
you can check if the table exists before trying to copy from it.
SELECT your_table_name FROM system_schema.tables WHERE keyspace_name='your_keyspace_name';

How to delete a row in cql based on Set<text> content

I have a Cassandra table and one column is defined as Set<text>. I want to delete rows that contain specific elements in that set.
For example if the table had a column names contained random values like ["Alice","Bob","Eve"],
I want a command to delete all the rows that contain the word Eve.
If namewas of type text then the command would go something like:
delete from keyspace.table where name='Eve';
however that does not work since name is not text but Set<text>. What would be an equivalent command here?
delete from keyspace.table where name CONTAINS 'Eve';
however you need to have secondary index on name column.

Cassandra alter column type: best way with non-compatible types?

I have a large table in Cassandra with a column of type int but no values are outside the range 0-10. I want to reduce the table size by changing the type of the column to tinyint.
This is the error I get
[Query invalid because of configuration issue] message="Cannot change COLUMN_NAME from type int to type tinyint: types are not order-compatible.">
Is there a nice way to handle this with a cast or other such query trickery?
If not ... and without taking the database down, is there a better way to solve this than doing the following?
make a new column of type tinyint
update my code to duplicate data to this column during write operations
copy old data to the new column [will take a while probably]
swap the names of the columns
revert my code change (only update one column)
delete the old int column
I would say deleting old columns and copying data to new columns is not ideal.
If your cassandra column family is accessed by a single entry point (service), my suggestion would be,
Add a new column.
Retain the old column. (You can rename it like COLUMNNAME_OBSOLETE).
After updating your code, only populate the data against new column in your code.
While reading data into domain object, if your new column is null then fill it with old column.
In one of our project, we followed the above steps against prod data and it worked fine. After few months, when we weren't need of COLUMNNAME_OBSOLETE we dropped that column.

Do specific rows in a partition have to be specified in order to update and/or delete static columns?

The CQL3 specification description of the UPDATE statement begins with the following paragraph:
The UPDATE statement writes one or more columns for a given row in a
table. The (where-clause) is used to select the row to update and must
include all columns composing the PRIMARY KEY (the IN relation is only
supported for the last column of the partition key). Other columns
values are specified through after the SET keyword.
The description in the specification of the DELETE statement begins with a similar paragraph:
The DELETE statement deletes columns and rows. If column names are provided
directly after the DELETE keyword, only those columns are deleted from the row
indicated by the (where-clause) (the id[value] syntax in (selection) is for
collection, please refer to the collection section for more details).
Otherwise whole rows are removed. The (where-clause) allows to specify the
key for the row(s) to delete (the IN relation is only supported for the last
column of the partition key).
The bolded portions of each of these descriptions state, in layman's terms, that these statements can be used to modify data in a solely row-based manner.
However, given the nature of the relationship (or lack thereof) between the rows and the static columns (which exist independent of any particular row) of a table, it seems as though there should be a way to modify such columns given only the keys of the partitions they're respectively contained in. According to the specification however, that does not seem to be possible, and I'm not sure if that is a product of the difficulty to allow such in the CQL3 syntax, or something else.
If a static column cannot be updated or deleted independent of any row in its table, then such operations become coupled with their non-static-column-based counterparts, making the set of columns targeted by such operations, difficult to determine. For example, given a populated table with the following definition:
CREATE TABLE IF NOT EXISTS example_table
(
partitionKeyColumn int
clusteringColumn int
nonPrimaryKeyColumn int
staticColumn varchar static
PRIMARY KEY (partitionKeyColumn, clusteringColumn)
)
... it is not immediately obvious if the following DELETE statements are equivalent:
//#1 (Explicitly specifies all of the columns in and "in" the target row)
DELETE partitionKeyColumn, clusteringColumn, nonPrimaryKeyColumn, staticColumn FROM example_table WHERE partitionKeyColumn = 1 AND clusteringColumn = 2
//#2 (Implicitly specifies all of the columns in (but not "in"?) the target row)
DELETE FROM example_table WHERE partitionKeyColumn = 1 AND clusteringColumn = 2
So, phrasing my observations in question form:
Are the above DELETE statements equivalent?
Does the primary key of at least one row in a CQL3 table have to be supplied in order to update or delete a static column in said table? If so, why?
I do not know about specification but in the real cassandra world, your two DELETE statements are not equivalent.
The first statement deletes the static_column whereas the second one does not. The reason of this is that static columns are shared by rows. You have to specify it explicitly to actually delete it.
Furthermore, I do not think its a good idea to DELETE static columns and non-static columns at the same time. By the way, this statement won't work :
DELETE staticColumn FROM example_table WHERE partitionKeyColumn = 1 AND clusteringColumn = 2
The error output is :
Bad Request: Invalid restriction on clustering column priceable_name since the DELETE statement modifies only static columns

Cassandra - What is meant by - "cannot rename non primary key part"

I have created a table users as follows:
create table users (user_id text primary key, email text, first_name text, last_name text, session_token int);
I am referring to the CQL help documentation on the DataStax website.
I now want to rename the email column to "emails". But I when I execute the command -
alter table users rename email to emails;
I am getting the error -
Bad Request: cannot rename non primary key part email
I am using CQL 3 . My CQLSH is 3.1.6 and C* is 1.2.8.
Why cannot I rename the above column? If I run help alter table, it shows the option to rename the column. How do I rename the column?
In CQL, you can rename the column used as the primary key, but not any others. This seems opposite from what it should be, one would think that the primary key would need to stay the same and the others would be easy to change! The reason comes from implementation details.
The name of the primary key is not written into each row, rather it is stored in a different place that's easily changeable. But for non-primary key fields, the names of the fields are written into each row. In order to rename the column, the system would have to rewrite every single row.
This article has some fantastic examples and a much longer discussion of Cassandra's internals.
To borrow an example directly from the article, consider this example column family:
cqlsh:test> CREATE TABLE example (
... field1 int PRIMARY KEY,
... field2 int,
... field3 int);
Insert a little data:
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);
And then the Cassandra-CLI output (not CQLSH) from querying this column family:
[default#test] list example;
-------------------
RowKey: 1
=> (column=, value=, timestamp=1374546754299000)
=> (column=field2, value=00000002, timestamp=1374546754299000)
=> (column=field3, value=00000003, timestamp=1374546754299000)
The name of the primary key, "field1" is not stored in any of the rows, but "field2" and "field3" are written out, so changing those names would require rewriting every row.
So if you really still want to rename a non-primary column, there are basically two different strategies and neither of them are very desirable.
Drop the column and add it back, as another poster mentioned. This has the big downside of dropping all the data in that column.
or
Create a new column family that is basically a copy of the old but with the column in question renamed and rewrite your data there. This is, of course, very computationally expensive.
In order to RENAME the field, the only way I got it working was dropping the field first and then adding it in. So it is like this:
alter table users drop email;
alter table users add emails text;
The main purpose of the RENAME clause is to change the names of CQL 3-generated primary key and column names that are missing from a legacy table (table created with COMPACT STORAGE).

Resources