Purging cassandra column family - cassandra

I have a column family whose definition is as follows :
create column family Message with key_validation_class ='UTF8Type' and default_validation_class = 'UTF8Type'
Whose row key is a unique id and two columns are stored in a row
message : a string message
created_dt : the date time when this row was created in cassandra
Now my requirement is to move and delete all messages that are there since more than a year. I do not want to completely delete that data, rather move it from the working cassandra cluster to another one , which is used for archival.
Are there any tools/scripts that can help achieving this?
If I have to write code using hector then how can this be done efficiently ? How do I figure out the keys that have the created_dt < current_dt - 1 year ?

Related

Cassandra update query to append data to existing value in a column

Can you please provide the query to append data to an existing value in a column of type text? Something similar to this:
UPDATE cycling.upcoming_calendar SET events = events + ['Tour de France Stage 10'] WHERE year = 2015 AND month = 06;
The above query will update a list. My column datatype is text.
In my case, if the column "events" has a value, "Test" I want to update it to the value, "Test , Test1".
Appending data to a text column is not possible in Cassandra. The only possible options I can think of are
Option 1 : Change the column data type to List
Option 2 : Fetch the data from the column in your application and then append the new value to the existing value, and finally update the DB.

Cassandra is inserting null values in skipped column

Anybody please help me understand why Cassandra is inserting null values in columns that was skipped? Isn't it supposed to skip the column? It should not insert any value (not even null) if I skip the column entirely while inserting data? I am bit confused because as per the following tutorial, data is stored by row key with the columns (the diagram in column family), if it is true then I should not get null for the column.
Or the whole concept I learned about the Cassandra column family is wrong?
http://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
Here is the CQL script
create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
create table users (firstname text,lastname text,age int, gender ascii, primary key(firstname))
insert into users(firstname,age,gender,lastname) values("Michael",30,"male","smith");
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
insert into users(firstname,age,gender) values('Jane',23,'female');
select * from users;
Why don't you go to the most comprehensive source of documentation and learning for Cassandra : http://academy.datastax.com ? And it's free. The content and tutorialspoint.com is very old and not updated since ages (SuperColumn are deprecated since 2011 - 2012 ...)
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
In CQL, null == value is not present or value has been deleted
Since you did not insert any value for column lastname Cassandra will return null (== not present in this case)

Copying a value from table to another using cassandra

I have a very huge table in cassandra that consists of (caseid ,timestamp, activity)as columns with caseid and timestamp being the primary key.The values of caseid are getting repeated and I want to extract the 1st value of activity corresponding to a caseid and put it to another table(named initialActivity) that consists of only activity. Can someone please help me as to how I can acheive this using a cql query.Thanks.
Please try this
Insert into initialActivity() values
(select activity from preActivity where caseId = 111 LIMIT 1 );
Only first rows with column activity with caseId = 111 will get inserted into initialActivity table
Please refer this for more info
CQL

Incrementing Cassandra Counter Column Family in NodeJs

I am trying to do an insert/increment in cassandra database through node.js...
Suppose I have this table:
CREATE COLUMN FAMILY MsCounter
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND default_validation_class = CounterColumnType;
then let's say i want to insert/increment a row key and a value on MsCounter:
rowKey: 'Tim', columnName1: columnName1 + 1
Is there any way to do this programatically in Node Js using
cassandra-client ?
I am aware they show an example of inserting and updating a regular column family, but does the same applied for counter column family ?
After reading this sample , I realize that it is possible to do the CQL without using the optional parameter. Here is how I do it in my case:
con.execute('UPDATE MsCounter SET columnName = columnName + 1 WHERE key=?', ['Tim'], function(err) {});

Reading the most recent updated row in cassandra

I have a use case and want suggestion on the below.
Structure :
Rowkey_1:
Column1 = value1;
Column2 = value2;
Rowkey_2:
Column1 = value1;
Column2 = value2;
" Suppose i am writing 1000 rows into cassandra with each row having couple of columns. After sometime i update only 100 rows and make changes for column values ".
-> when i read data from cassandra i only want to get these 100 updated rows and not the entire row key information.
Is there a way to say to cassandra like give me all row keys from start - > end where time in between "Time_start" to "Time_end"
in SQL Lingo -- > select from "" to "" where time between "time_start" and "time_end".
P.S. i read Basic Time Series with Cassandra where it says you can annotate rowkey like the below
Inserting data — {:key => ‘server1-load-20110306′, :column_name => TimeUUID(now), :column_value => 0.75}
Here the column family has TimeUUID columns.
My question is can you annotate you rowkey with date and time like this : { :key ==> 2012-11-18 16:00:15 }
OR any other way to get only the recent updated rows.
Any suggestion/ guidance really appreciated.
You can't do range queries on keys unless you use ByteOrderedPartitioner, which you shouldn't use. The way to do this is by writing known sentinel values as keys, such as a timestamp representing the beginning of the day. Then you can do the column slice by time.

Resources