I have a table in Cassandra say employee(id, email, role, name, password) with only id as my primary key.
I want to ...
1. Add another column (manager_id) in with a default value in it
I know that I can add a column in the table but there is no way i can provide a default value to that column through CQL. I can also not update the value for manager_id later since I need to know the id (Partition key and the values are randomly generated unique values which i don't know) to update the row. Is there any way I can achieve this?
2. Rename this table to all_employee.
I also know that its not allowed to rename a table in cassandra. So I am trying to copy the data of table(employee) to csv and copy from csv to new table (all_employee) and deleting the old table(employee). I am doing this through an automated script with cql queries in it and script works fine but will fail if it gets executed again(Which i can not restrict) since the table employee will not be there once its deleted. Essentially I am looking for "If exists" clause in COPY query which is not supported in cql. Is there any other way I can achieve the outcome?
Please note that the amount of data in the table is very small so performance in not an issue.
For #1
I dont think cassandra support default column . You need to do that from your appliaction. Write some default value every time you insert a row.
For #2
you can check if the table exists before trying to copy from it.
SELECT your_table_name FROM system_schema.tables WHERE keyspace_name='your_keyspace_name';
Related
is there any possibility to update a column-value in cassandra that I searched for (is part of my primary key)?
I have a (huge) list of items with a field calld "LastUpdateDateTime" and from time to time I search for columns that haven't updated for a while.
So, the reason i searched for this columns is cause I want to update them and after I update them I want to set the timestamp to the current date.
How to do this with cassandra?
You can't update primary key column, It will insert another record.
That's how cassandra work.
May be you will have to use spark-cassandra connector OR Delete the records with old values and insert new values.
Note: Deleting and inserting is not recommended if you have many records as it will create corresponding number of tombstones
I have two tables one is users and other is expired_users.
users columns-> id, name, age
expired_users columns -> id, name
I want to execute the following query.
delete from users where id in (select id from expired_users);
This query works fine with SQL related databases. I want find a solution to solve this in cassandra.
PS: I don't want to add any extra columns in the tables.
While designing cassandra data model, we cannot think exactly like RDBMS .
Design like this --
create table users (
id int,
name text,
age int,
expired boolean static,
primary key (id,name)
);
To mark a user as expired -- Just insert the same row again
insert into users (id,name,age,expired) values (100,'xyz',80,true);
you don't have to update or delete the row, just insert it again, previous column values will get overridden.
What you want to is to use join as a filter for your delete statement, and this is not what the Cassandra model is built for.
AFAIK there is no way to perform this using cql. If you want to perform this action without changing the schema - run external script in any language that has drivers for Cassandra.
Just thinking about this so please correct my understanding if any of this isn't right.
Environment: Apache Cassandra v3.0.0
Say you have a table and a materialized view created on it:
create table source(
id text, field text, stamp timestamp, data text,
primary key(id, field))
create materialized view myview as
select * from source
where data is not null and id is not null and field is not null
primary key (data, field, id)
My understanding is that myview.data would essentially be the partition key for the view here (and data in source is automatically replicated by the server into myview?).
If that is true, what happens internally when a table update is performed on source table and the source.data column is updated?
I posted this to Cassandra's user mailing list and got the following two useful replies that answered the question.
It should all just work as expected, as if by magic. That's the whole point of having MV, so that Cassandra does all the bookkeeping for you. Yes, the partition key can change, so an update to the base table can cause one (or more) MV rows to be deleted and one (or more) new MV rows to be created. It does not change the partition key per se, but it is as if it were changed and the row moved. This can in fact result in the row moving from one node to another if the column(s) used in the MV partition key change in the base table row.
-- Jack Krupansky
In the case of an update to the source table where data is changed, a tombstone will be generated for the old value and an insert will be generated for the new value. This happens serially for the source partition, so if there are multiple updates to the same partition, a tombstone will be generated for each intermediate value.
This blog post has more details: http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
-Carl Yeksigian
I hava created a cassandra table with 20 million records. Now I want to delete the expired data decided by one none primary key column. But it doesn't support the operation on the column. So I try to retrieve the table and get the data line by line to delete the data.Unfortunately,it is too huge to retrieve. Otherwise,I couldn't delete the whole table, how could I achieve my goal?
Your question is actually, how to get the data from the table in bulks (also called pagination).
You can do that by selecting different slices from your primary key: For example, if your primary key is some sort of ID, select a range of IDs each time, process the results and do whatever you want to do with them, then get the next range, and so on.
Another way, which depends on the driver you're working with, will be to use fetch_size. You can see a Python example here and a Java example here.
Suppose we have such table:
create table users (
id text,
roles set<text>,
PRIMARY KEY ((id))
);
I want all the values of this table to be stored on the same Cassandra node (OK, not really the same, same 3, but have all the data mirrored, but you got the point), so to achieve that i want to change this table to be like this:
create table users_v2 (
partition int,
id text,
roles set<text>,
PRIMARY KEY ((partition), id)
);
How can i do that without losing the data from the first table?
It seems to be impossible to ALTER TABLE in order to add such column. i'm OK with that.
What i try to do is to copy data from the first table and insert to the second table.
When i do it as it is, the partition column іs missing, which is expected.
I can ALTER the first table and add a 'partition' column to the end, and then COPY in correct order, but i can't update all the rows in the first table to set the all some partition, and it seems to be no "default" value when column is added.
You simply cannot alter the primary key of a Cassandra table. You need to create another table with your new schema and perform a data migration. I would suggest that you use Spark for that since it is really easy to do a migration between two tables with only a few lines of code.
This also answer to the alter primary key question.
If you have not a lot of data in table there is another way.
In utility "DataStax Dev Center", select table and use command "Export All result to file as INSERT". It will save all data from table to file with Insert CQL-instructions.
Then you should drop table, create new one with new PARTITION KEY and finally fill it by instructions from file via CQL.