CQL ALTER TABLE not working (add column + with properties) - cassandra

I'm trying to run the following CQL command
ALTER TABLE keyspace_name_abc.table_name
ADD (field1 text, field2 text)
WITH default_time_to_live = 15552000;
But it is not working, the error I'm getting is:
SyntaxException: line 1:NN mismatched input 'WITH' expecting EOF (...field1 text, field2 text [WITH] ...)
If I run that command separated, it works. Is there any limitation in CQL to combine multiple changes?
ALTER TABLE keyspace_name_abc.table_name
ADD (field1 text, field2 text);
ALTER TABLE keyspace_name_abc.table_name
WITH default_time_to_live = 15552000;
This way, the commands are accepted.

I don't believe you'll be able to update the default TTL for a subset of columns, so you'll have to separate out those statements and run them like:
ALTER TABLE keyspace_name_abc.table_name ADD (field1 text, field2 text);
ALTER TABLE keyspace_name_abc.table_name WITH default_time_to_live = 15552000;
If you require some non-default TTL for a particular column, then you'll need to add that on the insert, for example:
INSERT INTO keyspace_name_abc.table_name (col1, col2, field1, field2) VALUES ('col1text', 'col2text', 'field1text', 'field2text') USING TTL 86400;

Related

Is it possible to count no. of rows belonging to a partition if I use clustering columns

updated after comment from Jim.
I have a database with schema
field1 //partition key
field2 //clustering column
field3
I suppose Cassandra will calculate hash on field1, decide in which node this data entry will go into and will store it there. As I am using clustering column, for 2 data entries with same value of field1 but different values of field2, the data will be stored as two rows.
field1, field2.1, field3
field1, field2.2, field3
Is it possible to create a query which would return value 2 (count of rows) as there are two rows belonging to partition key field1?
Do a
select count(*) from table where field1 = “x” ;
You should get two in case of the example shown in your question

Cassandra select distinct and order by cqlsh

I am new to Cassandra and to this forum. I' m executing Cassandra query using cqlsh but I don't know how to do execute a query like sql select distinct a, b, c from table order by d asc using Cassandra. How can I do? What would be the structure of the table?
Your primary key consists of partition keys and clustering columns.
DISTINCT queries must only request partition keys.
ORDER BY is supported on clustered columns.
Suppose we have a sample table like following,
CREATE TABLE Sample (
field1 text,
field2 text,
field3 text,
field4 text,
PRIMARY KEY ((field1, field2), field3));
DISTINCT requires all the parition keys to be passed comma separated.
So you can't run this query select distinct field1 from Sample;. A valid expression would be select distinct field1, field2 from Sample;.
It internally hits all the nodes in the cluster to find all the partition keys so if you have millions of partitions in you table, I would expect a performance drop with multiple nodes.
By default, records will be in ascending order for field3. Below query will provide records in descending order of field3.
select * from Sample where field1 = 'a' and field2 = 'b' order by field3 desc;
If you already know your query patterns and the way you would require data to be ordered, you can design the table in that way. Suppose you always require records in descending order for field3 , you could have designed your table in this way.
CREATE TABLE Sample (
field1 text,
field2 text,
field3 text,
field4 text,
PRIMARY KEY ((field1, field2), field3))
WITH CLUSTERING ORDER BY (field3 DESC);
Now querying without order by will result in the same result.
You can use order by with multiple clustered columns. But you can't skip the order. To understand that let's have a sample table like below,
CREATE TABLE Sample1 (
field1 text,
field2 text,
field3 text,
field4 int,
field5 int,
PRIMARY KEY ((field1, field2), field3, field4));
I added few dummy records.
You may use order by multiple columns like this select * from Sample1 where field1 = 'a' and field2 = 'b' order by field3 desc, field4 desc;
NOTE : All fields need to be either in positive order (field3 asc, field4 asc) or negative order (field3 desc, field4 desc). You can't do (field3 asc, field4 desc) or vice versa.
Above query will result in this.
By writing we can't skip the order in order by, I meant we cant do something like select * from Sample1 where field1 = 'a' and field2 = 'b' order by field4 desc;
I hope this helps !

Cassandra Update/Upsert does not set Static Column?

I am trying to "Upsert" data into my table with CQLSSTableWriter. Everything works fine, except for my static column not being set correctly. They end up being null for every occasion. My static column is defined as brand TEXT static.
After failing with the CQLSSTableWriter, I went into the cqlsh and tried to update the static column manually:
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
and with a batch as well (even though it should not matter)
begin batch
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
apply batch;
My "brand" column still shows null when I retrieve some of my data (select * from keyspace.data LIMIT 100;)
My entire schema:
CREATE TABLE keyspace.data (
id text,
date text,
ts timestamp,
id_two text,
brand text static,
latitude double,
longitude double,
signals_double map<text, double>,
signals_string map<text, text>,
name text static,
PRIMARY KEY ((id, date), ts, id_two)
) WITH CLUSTERING ORDER BY (ts ASC, id_two ASC);
The reason why I chose Update instead of Insert is because I have collections that I do not want to overwrite, but rather add more elements to. Using insert would overwrite the previously stored elements of my collections.
Why can I not set a static column with an Update query?

Cassandra does not support DELETE on indexed columns

Say I have a cassandra table xyz with the following schema :
create table xyz(
xyzid uuid,
name text,
fileid int,
sid int,
PRIMARY KEY(xyzid));
I create index on columns fileid , sid:
CREATE INDEX file_index ON xyz (fileid);
CREATE INDEX sid_index ON xyz (sid);
I insert data :
INSERT INTO xyz (xyzid, name , fileid , sid ) VALUES ( now(), 'p120' , 1, 100);
INSERT INTO xyz (xyzid, name , fileid , ssid ) VALUES ( now(), 'p120' , 1, 101);
INSERT INTO xyz (xyzid, name , fileid , sid ) VALUES ( now(), 'p122' , 2, 101);
I want to delete data using the indexed columns :
DELETE from xyz WHERE fileid=1 and sid=101;
Why do I get this error ?
InvalidRequest: code=2200 [Invalid query] message="Non PRIMARY KEY fileid found in where clause"
Is it mandatory to specify the primary key in the where clause for delete queries ?
Does Cassandra supports deletes using secondary index s ?
What has to be done to delete data using secondary index s ?
Any suggestions that could help .
I am using Data Stax Community Cassandra 2.1.8 but I also want to know whether delete using indexed columns is supported by Data Stax Community Cassandra 3.2.1
Thanks
Let me try and answer your questions in order:
1) Yes, if you are going to use a where clause in a CQL statement then the PARTITION KEY must be an equality operator in the where clause. Other than that you are only allowed to filter on clustering columns specified in your primary key. (Unless you have a secondary index)
2) No it does not. See this post for some more information as it is essentially the same problem.
Why can cassandra "select" on secondary key, but not update using secondary key? (1.2.8+)
3) Why not add sid as a clustering column in your primary key. This would allow you to do the delete or query using both as you have shown.
create table xyz(
xyzid uuid,
name text,
fileid int,
sid int,
PRIMARY KEY(xyzid, sid));
4) In general using secondary indexes is considered an anti-pattern (a bit less so with SASI indexes in C* 3.4) so my question is can you add these fields as clustering columns to your primary key? How are you querying these secondary indexes?
I suppose you can perform delete in two steps:
Select data by secondary index and get primary index column values
(xyzid) from query result
Perform delete by primary index values.

Choosing the right schema for cassandra "table" in CQL3

We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.

Resources