Does it have any issue if I want to add a new column for big table in cassandra? - cassandra

In my project, I am using Cassandra to store huge data. With MYSQL big table it will take a long time to add a new column or index. Will Cassandra solve that issue?

Yes it is relatively very easy to add a column and index that column in Cassandra.
Any column added will be propagated to all nodes very fast too. The added column will be initialised with NULL by default

Related

Cassandra DB change many rows where = null

I have table with many millions rows. I added new field, and this is null in old rows, so I need to update it to 0, can I do it?)
Yes you can do it. You can update the value of new column. For this you can write a utility which will scan the complete table and update record one by one. If you are aware with spark and use it then things will be easier and faster.

Replacing integer column in Cassandra table

In a table the cluster key is an int column which is a system generated number - chrg Issue is
Since its defined as int datatype it can store values only uptil 2billion.
And since the data of the table is huge..by next two months load we will hit the max value that can be stored in the column beyond which loads will fail.
Hence the requirement is to change the datatype of the column to something like longint with least impact.
How can this be achieved with a minimal downtime?
You Cannot change the type of primary key.
So one of the approach I can think of is:
Create a separate table with modified datatype.
Modify your application to write data to both the tables.
Then you can use spark & cassandra to read data from older table and write it to new table.
Then again in your application you can stop writing to old table.
With above approach I don't think you will have major impact.

Update the column I searched for

is there any possibility to update a column-value in cassandra that I searched for (is part of my primary key)?
I have a (huge) list of items with a field calld "LastUpdateDateTime" and from time to time I search for columns that haven't updated for a while.
So, the reason i searched for this columns is cause I want to update them and after I update them I want to set the timestamp to the current date.
How to do this with cassandra?
You can't update primary key column, It will insert another record.
That's how cassandra work.
May be you will have to use spark-cassandra connector OR Delete the records with old values and insert new values.
Note: Deleting and inserting is not recommended if you have many records as it will create corresponding number of tombstones

Cassandra: Push the filtered rows to a new column family using CQL and delete them from existing column family

I'm a newbie to cassandra. I have a confusion with archival of data. Following is the approach I am trying to implement.
Filter the records to be archived.
Create a new column family
Move the filtered records to the new column family
Delete the filtered records from existing column family
Filter the records to be archived. - Achieved with the use of secondary indexes
Create a new column family Create Query
Move the filtered records to the new column family I thought of implementing this by the approach mentioned in cassandra copy data from one columnfamily to another columnfamily But this copies all the data from column family 1 to 2. Is it possible to move only the filtered rows to new column family?
Delete the filtered records from existing column family I am not sure of how to achieve this in CQL. Please help me.
Additionally, Let me know if there is any better approach.
COPY then DELETE sounds like a valid strategy here.
For deleting rows, take a look at the DELETE command, it takes the same WHERE condition as a SELECT does.
Unfortunately this won't work for a query that requires "ALLOW FILTERING", although there is an enhancement request to add this.

how to define dynamic columns in a column family in Cassandra?

We don't want to fix the columns definition when creating a column family, as we might have to insert new columns into the column family. Is it possible to achieve it? I am wondering whether it is possible to not to define the column metadata when creating a column family, but to specify the column when client updates data, for example:
CREATE COLUMN FAMILY products WITH default_validation_class= UTF8Type AND key_validation_class=UTF8Type AND comparator=UTF8Type;
set products['1001']['brand']= ‘Sony’;
Thanks,
Fan
Yes... it is possible to achieve this, without even taking any special effort. Per the DataStax documentation of the Cassandra data model (a good read, by the way, along with the CQL spec):
The Cassandra data model is a schema-optional, column-oriented data model. This means that, unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application.

Resources