Cassandra alter column type: best way with non-compatible types? - cassandra

I have a large table in Cassandra with a column of type int but no values are outside the range 0-10. I want to reduce the table size by changing the type of the column to tinyint.
This is the error I get
[Query invalid because of configuration issue] message="Cannot change COLUMN_NAME from type int to type tinyint: types are not order-compatible.">
Is there a nice way to handle this with a cast or other such query trickery?
If not ... and without taking the database down, is there a better way to solve this than doing the following?
make a new column of type tinyint
update my code to duplicate data to this column during write operations
copy old data to the new column [will take a while probably]
swap the names of the columns
revert my code change (only update one column)
delete the old int column

I would say deleting old columns and copying data to new columns is not ideal.
If your cassandra column family is accessed by a single entry point (service), my suggestion would be,
Add a new column.
Retain the old column. (You can rename it like COLUMNNAME_OBSOLETE).
After updating your code, only populate the data against new column in your code.
While reading data into domain object, if your new column is null then fill it with old column.
In one of our project, we followed the above steps against prod data and it worked fine. After few months, when we weren't need of COLUMNNAME_OBSOLETE we dropped that column.

Related

How to Rename Column name using cassandra table

I have a question in cassandra db. I want to rename the column name. But its showing syntax error. Because my column name contain space. So how can I change column name:
Ex: sample column into samplecolumn?
You can use alter table to rename a column but theres a lot of restrictions on it. As sstables are immutable in order to change state of things on disk everything must be rewritten.
The main purpose of RENAME is to change the names of CQL-generated primary key and column names that are missing from a legacy table. The following restrictions apply to the RENAME operation:
You can only rename clustering columns, which are part of the primary key.
You cannot rename the partition key.
You can index a renamed column.
You cannot rename a column if an index has been created on it.
You cannot rename a static column (since you cannot use a static column in the table's primary key).
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html

Replacing integer column in Cassandra table

In a table the cluster key is an int column which is a system generated number - chrg Issue is
Since its defined as int datatype it can store values only uptil 2billion.
And since the data of the table is huge..by next two months load we will hit the max value that can be stored in the column beyond which loads will fail.
Hence the requirement is to change the datatype of the column to something like longint with least impact.
How can this be achieved with a minimal downtime?
You Cannot change the type of primary key.
So one of the approach I can think of is:
Create a separate table with modified datatype.
Modify your application to write data to both the tables.
Then you can use spark & cassandra to read data from older table and write it to new table.
Then again in your application you can stop writing to old table.
With above approach I don't think you will have major impact.

Change the type of a column in Cassandra

I have created a table my_table with a column phone, which has been declared as of type varint. After entering some data, I realized that it would have been better if I had declared this column as list<int>.
I tried to:
ALTER TABLE my_table
ALTER phone TYPE list<int>
but unfortunately I am not allowed to do so. Hopefully, there is a way to make this change.
UPDATE: Assume that I make a new column phonelist of type list<int>. Is there any efficient way to move the data in the phone column into the phonelist column?
You cannot change the type of an existing column to a map or collection.
The table shows the allowed alterations for data types

Cassandra: Push the filtered rows to a new column family using CQL and delete them from existing column family

I'm a newbie to cassandra. I have a confusion with archival of data. Following is the approach I am trying to implement.
Filter the records to be archived.
Create a new column family
Move the filtered records to the new column family
Delete the filtered records from existing column family
Filter the records to be archived. - Achieved with the use of secondary indexes
Create a new column family Create Query
Move the filtered records to the new column family I thought of implementing this by the approach mentioned in cassandra copy data from one columnfamily to another columnfamily But this copies all the data from column family 1 to 2. Is it possible to move only the filtered rows to new column family?
Delete the filtered records from existing column family I am not sure of how to achieve this in CQL. Please help me.
Additionally, Let me know if there is any better approach.
COPY then DELETE sounds like a valid strategy here.
For deleting rows, take a look at the DELETE command, it takes the same WHERE condition as a SELECT does.
Unfortunately this won't work for a query that requires "ALLOW FILTERING", although there is an enhancement request to add this.

how to retrieve the all the values of a super column in a set rowID from a columnfamily in Hector Cassandra

I want to retrieve the different row id values depending on super column name.
For that purpose I have used this code
SuperColumnQuery<String, String, String, String> superColumnQuery =
HFactory.createSuperColumnQuery(keyspaceOperator, se, se,se,se);
superColumnQuery.setColumnFamily(COLUMN_FAMILY).setKey(rowID).setSuperName(superColumnName);
QueryResult<HSuperColumn<String, String, String>> result = superColumnQuery.execute();
//rowID contains a list of rows separated by ','
But it's not working.
Given that you're trying to select row keys based on column names, I'd venture to guess that your data model is backwards. You should generally be moving from the outside in -- select on row key, then on supercolumn name, then on column name. Otherwise you're going to be stuck iterating over rows in your code trying to match a column name, instead of using the Cassandra engine to select what you need. This approach is never going to scale.
So I'd suggest redoing your data model -- or if you need to have it this way, consider adding another ColumnFamily that serves as an index for the first. Contrary to old-school SQL databases, the credo in NoSQL dbs like Cassandra is "If you're denormalizing -- you're doing it right".

Resources