Alter cassandra column family primary key using cassandra-cli or CQL - cassandra

I am using Cassandra 1.2.5. After creating a column family in Cassandra using cassandra-cli, is it possible to modify the primary key on the column family using either cassandra-cli or CQL?
Specifically, I currently have the following table (from CQL):
CREATE TABLE "table1" (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
);
I would like the table to be as follows, without having to drop and recreate the table:
CREATE TABLE "table1" (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key)
);
Is this possible through either cassandra-cli or CQL?

The primary keys directly determine how and where cassandra stores the data contained in a table (column family). The primary key consists of partition key and clustering key (optional).
The partition key determines which node stores the data. It is responsible for data distribution across the nodes. The additional columns determine per-partition clustering (see compound key documentation).
So changing the primary key will always require all data to be migrated. I do not think that either cqlsh or cassandra-cli have a command for this (as of 2015)..

Related

Retrieve rows based on column of type "time" in cassandra db

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).
You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

Cassandra: Is partition key also used in clustering?

Let's say I have a primary key like this: primary key (PK, CK).
Based on what I read (see refs), I think I can loosely describe the way Cassandra uses PK and CK as follows - PK will be used to decide which node(s) the data should go to and CK will be used for clustering (aka ordering) of data within that node.
Then, it seems PK is not used in clustering data within the node and that sounds wrong. What if I have a simple primary with with just PK? Will Cassandra only distribute data across nodes and not order data within each node since there is no clustering column?
refs:
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
Difference between partition key, composite key and clustering key in Cassandra?
Then, it seems PK is not used in clustering data within the node and
that sounds wrong. What if I have a simple primary with with just PK?
Will Cassandra only distribute data across nodes and not order data
within each node since there is no clustering column?
Good question. Let's try this out. I'll create a simple table and INSERT some data:
aploetz#cqlsh:stackoverflow> CREATE TABLE programs
(name text PRIMARY KEY, data text);
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Tron');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Yori');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Quorra');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Clu');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Flynn');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Zuze');
Now, let's run a query that should answer your question:
aploetz#cqlsh:stackoverflow> SELECT name, token(name) FROM programs;
name | system.token(name)
--------+----------------------
Flynn | -1059892732813900311
Zuze | 1815531347795840810
Yori | 2854211700591734382
Quorra | 3079126743186967718
Tron | 6359222509420865788
Clu | 8304850648940574176
(6 rows)
As you can see, they are definitely not in order by name, which is the partition key and lone PRIMARY KEY. But, my query runs the token() function on name, which shows the hashed value of the partition key (name in this case). The results are ordered by that.
So to answer your question, Cassandra orders its partitions by the hashed value of the partition key. Note that this order is maintained throughout the cluster, not just on a single node. Therefore, results for an unbound query (not recommended to be run in a multi-node configuration) will be ordered by the hashed value of the partition key, regardless of the number of nodes in the cluster.
Since all data for a table will be written to the same SSTables with a ordering of the partition key. So yes they are sorted.
I think what you're asking is why you can't use a primary key the same way you use a clustering key. For example you can't do less than (<) or greater than (>) on a partition key. Since one node doesn't have all the partition keys this type of query would have to check with all nodes in your cluster to see if they have any partition key that matches your query.

Cassandra - Internal data storage when no clustering key is specified

I'm trying to understand the scenario when no clustering key is specified in a table definition.
If a table has only a partition key and no clustering key, what order the rows under the same partition are stored in? Is it even allowed to have multiple rows under the same partition when no clustering key exists? I tried searching for it online but couldn't get a clear explanation.
I got the below explanation from Cassandra user group so posting it here in case someone else is looking for the same info:
"Note that a table always has a partition key, and that if the table has
no clustering columns, then every partition of that table is only
comprised of a single row (since the primary key uniquely identifies
rows and the primary key is equal to the partition key if there is no
clustering columns)."
http://cassandra.apache.org/doc/latest/cql/ddl.html#the-partition-key

Convert dynamic Cassandra column family to static one

Let's say I have a column family in Cassandra that was created using cassandra-cli like this:
create column family users with key_validation_class = UTF8Type and comparator = UTF8Type;
In terms of the thrift to CQL3 migration guide from Datastax this is a dynamic column family.
When viewed from CQL3 client using DESCRIBE TABLE users it looks like this:
CREATE TABLE users (
key text,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
That is the expected behavior. What I want is to add column metadata so that the column family is viewed as static.
So I tried this using cassandra-cli:
update column family users
with column_metadata = [{column_name: email, validation_class: UTF8Type}];
However the end result in CQL3 is not what I wanted:
CREATE TABLE users (
key text,
column1 text,
value blob,
email text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
What I expected is the same result as when I create the column family with the metadata from the beginning:
create column family users2
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata = [{column_name: email, validation_class: UTF8Type}];
In that case the CQL3 view of this is what I want:
CREATE TABLE users2 (
key text PRIMARY KEY,
email text
) WITH COMPACT STORAGE;
Is there some way how I can add column metadata to a column family that was created without any - so that it would be viewed from CQL3 the same way as if the metadata was provided when the column family was created? Without re-creating the column family, of course.
It's not possible to create static column using the old Thrift API. In fact, a static column is just a trick, e.g. a column with clustering value = NULL so there is only 1 instance of it for each partition key.
See those 2 slides for the explanation (sorry text in French):
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/218
http://www.slideshare.net/doanduyhai/cassandra-techniques-de-modlisation-avance/219
You should take this opportunity to migrate to CQL. Thrift is deprecated and even disable by default starting with Cassandra 3.x
Ok I see what you mean. Look at the system keyspace, table schema_columnfamilies.
I think the label of the partition keys and clustering columns are stored there.
It maybe possible to change them but I don't know if it's a good idea to hack into those meta tables directly.
If you have n nodes, you'll probably need to update the label on all those nodes since the system keyspace has a LocalStrategy.
Execute this query to see the actual labels:
SELECT key_aliases,key_validator,column_aliases,comparator
FROM system.schema_columnfamilies
WHERE keyspace_name='xxx'
AND columnfamily_name='users';

How to add the multiple column as a primary keys in cassandra?

I have an existing table with millions of records and initially we have two columns as partitioning key and clustering key and now I want add two more columns in a table as a partitioning key.
How?
If you make a change to the partition key you will need to create a new table and import the existing data. This is due to, in part, the fact that a partition key is not equal to a primary key in a relational database. The partition key is hashed by Cassandra and that hash is used to find partitions on disk. If you change the partition key you change the hash value and can no longer look up the partition!
CREATE TABLE KEYSPACE_NAME.AMAR_EXAMPLE (
COLUMN_1 TYPE,
COLUMN_2 TYPE,
COLUMN_3 TYPE,
...
COLUMN_N TYPE
// Here we declare the partition key columns and clustering columns
PRIMARY KEY ((COLUMN_1, COLUMN_2, COLUMN_3, COLUMN_4), CLUSTERING_COLUMN)
)
//If you need to change the default clustering order declare that here
WITH CLUSTERING ORDER BY (COLUMN_4 DESC);
You could export the data to CSV using COPY and then import the data to the new table via COPY or use the SSTABLELOADER. There is plenty of documentation and walkthroughs on how to use those tools. For example, this Datastax blog post talks about the changes made to the updated SSTABLELOADER. If you create a new table and import the existing data you will create new partitions and new hashes. Cassandra will not let you simply add additional columns to the partition key after the table has been created.
Understanding your data and the Cassandra data modeling techniques will help mitigate the amount of work you may find yourself doing changing partition keys. Check out the self-paced courses provided by Datastax. DS220: Data Modeling could really help.

Resources