I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
Now I have a scenario where I need to fetch results from only B and C. Is there a way to this with out using Allow Filtering.
You cannot fetch on basis of 'B' and 'C' (the clustering columns) without partition key without using Allow Filtering. Though you can use spark and spark-cassandra-connector for filtering out the results on basis of 'B' and 'C'. Behind the scene it also used allow filtering but it has efficient mechanism to scan the table the right way.
I'm trying to select everything where two columns contain equal values. Here is my CQL query:
select count(someColumn) from somekeySpace.sometable where columnX = columnY
This doesn't work. How can I do this?
You can't query like that, cassandra don't support it
You can do this in different way.
First you have to create a separate counter table.
CREATE TABLE match_counter(
partition int PRIMARY KEY,
count counter
);
At the time of insertion into your main table if columnX = columnY then increment the value here. Though you have only a single count, you can use a static value of partition
UPDATE match_counter SET count = count + 1 WHERE partition = 1;
Now you can get the count of match column
SELECT * FROM match_counter WHERE partition = 1;
I have Cassandra version 2.0, and in it I am totally new in it, so the question...
I have table T1, with columns with names: 1,2,3...14 (for simplicity);
Partitioning key is column 1 , 2;
Clustering key is column 3, 1 , 5;
I need to perform following query:
SELECT 1,2,7 FROM T1 where 2='A';
Column 2 is a flag, so values are repeating.
I get the following error:
Unable to execute CQL query: Partitioning column 2 cannot be restricted because the preceding column 1 is either not restricted or is restricted by a non-EQ relation
So what is the right way to do it? I really need to get the data that already filtered. Thanks.
So, to make sure I understand your schema, you have defined a table T1:
CREATE TABLE T1 (
1 INT,
2 INT,
3 INT,
...
14 INT,
PRIMARY ((1, 2), 3, 1, 5)
);
Correct?
If this is the case, then Cassandra cannot find the data to answer your CQL query:
SELECT 1,2,7 FROM T1 where 2 = 'A';
because your query has not provided a value for column "1", without which Cassandra cannot compute the partition key (which requires, per your composite PRIMARY KEY definition, both columns "1" and "2"), and without that, it cannot determine where to look on which nodes in the ring. By including "2" in your partition key, you are telling Cassandra that that data is required for determine where to store (and thus, where to read) that data.
For example, given your schema, this query should work:
SELECT 7 FROM T1 WHERE 1 = 'X' AND 2 = 'A';
since you are providing both values of your partition key.
#Caleb Rockcliffe has good advice, though, regarding the need for other, secondary/supplemental lookup mechanisms if the above table definition is a big part of your workload. You may need to find some way to first lookup the values for "1" and "2", then issue your query. E.g.:
CREATE TABLE T1_index (
1 INT,
2 INT,
PRIMARY KEY (1, 2);
);
Given a value for "1", the above will provide all of the possible "2" values, through which you can then iterate:
SELECT 2 FROM T1_index WHERE 1 = 'X';
And then, for each "1" and "2" combination, you can then issue your query against table T1:
SELECT 7 FROM T1 WHERE 1 = 'X' AND 2 = 'A';
Hope this helps!
Your WHERE clause needs to include the first element of the partition key.
I have a column family whose definition is as follows :
create column family Message with key_validation_class ='UTF8Type' and default_validation_class = 'UTF8Type'
Whose row key is a unique id and two columns are stored in a row
message : a string message
created_dt : the date time when this row was created in cassandra
Now my requirement is to move and delete all messages that are there since more than a year. I do not want to completely delete that data, rather move it from the working cassandra cluster to another one , which is used for archival.
Are there any tools/scripts that can help achieving this?
If I have to write code using hector then how can this be done efficiently ? How do I figure out the keys that have the created_dt < current_dt - 1 year ?
I have following cassandra column family:
create column family cfn
with comparator = UTF8Type
and key_validation_class = UUIDType
and column_metadata =[
{column_name:email, validation_class: UTF8Type,index_type: KEYS}
{column_name:full_name, validation_class: UTF8Type}
];
I want to update "full_name" of given "email" but i don't know the row key i only have "email". How can i do that using hector thrift api?
I know that i will have to insert a new column as there is no update kind of thing in cassandra. Will it be necessary to get row key before inserting new column for the same row?
Inserting using cassandra-cli or hector is the most basic thing. May be you need to brush your cassandra repo Read This.
Try this using cli
SET cfn[RowKey]['full_name']='XYZ';
/*
*Remember you have mentioned your key as UUID Type. So while providing a Rowkey it should
*be in UUID type only. e.g 8aff3820-1e55-11b2-a248-41825ac3edd8
*/
SET cfn[RowKey]['email']='XYZ#GMAIL.COM';
Retrieve Data,
list cfn;