How to make the query to work? - cassandra

I have Cassandra version 2.0, and in it I am totally new in it, so the question...
I have table T1, with columns with names: 1,2,3...14 (for simplicity);
Partitioning key is column 1 , 2;
Clustering key is column 3, 1 , 5;
I need to perform following query:
SELECT 1,2,7 FROM T1 where 2='A';
Column 2 is a flag, so values are repeating.
I get the following error:
Unable to execute CQL query: Partitioning column 2 cannot be restricted because the preceding column 1 is either not restricted or is restricted by a non-EQ relation
So what is the right way to do it? I really need to get the data that already filtered. Thanks.

So, to make sure I understand your schema, you have defined a table T1:
CREATE TABLE T1 (
1 INT,
2 INT,
3 INT,
...
14 INT,
PRIMARY ((1, 2), 3, 1, 5)
);
Correct?
If this is the case, then Cassandra cannot find the data to answer your CQL query:
SELECT 1,2,7 FROM T1 where 2 = 'A';
because your query has not provided a value for column "1", without which Cassandra cannot compute the partition key (which requires, per your composite PRIMARY KEY definition, both columns "1" and "2"), and without that, it cannot determine where to look on which nodes in the ring. By including "2" in your partition key, you are telling Cassandra that that data is required for determine where to store (and thus, where to read) that data.
For example, given your schema, this query should work:
SELECT 7 FROM T1 WHERE 1 = 'X' AND 2 = 'A';
since you are providing both values of your partition key.
#Caleb Rockcliffe has good advice, though, regarding the need for other, secondary/supplemental lookup mechanisms if the above table definition is a big part of your workload. You may need to find some way to first lookup the values for "1" and "2", then issue your query. E.g.:
CREATE TABLE T1_index (
1 INT,
2 INT,
PRIMARY KEY (1, 2);
);
Given a value for "1", the above will provide all of the possible "2" values, through which you can then iterate:
SELECT 2 FROM T1_index WHERE 1 = 'X';
And then, for each "1" and "2" combination, you can then issue your query against table T1:
SELECT 7 FROM T1 WHERE 1 = 'X' AND 2 = 'A';
Hope this helps!

Your WHERE clause needs to include the first element of the partition key.

Related

How do I select all rows from two clustering columns in cassandra database

I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
Now I have a scenario where I need to fetch results from only B and C. Is there a way to this with out using Allow Filtering.
You cannot fetch on basis of 'B' and 'C' (the clustering columns) without partition key without using Allow Filtering. Though you can use spark and spark-cassandra-connector for filtering out the results on basis of 'B' and 'C'. Behind the scene it also used allow filtering but it has efficient mechanism to scan the table the right way.

Cassandra schema - select by frequently updated column

Given table:
CREATE TABLE T (
a int,
last_modification_time timestamp,
b int,
PRIMARY KEY (a)
);
I'm frequently updating records. With each update last_modification_time is set to now() and also other fields are set.
What is the right cassandra approach to be able to query by last_modification_time range? I need to query like this:
select * from .. where a=Z and last_modification_time < X and last_modification_time > Y;
One way would be to create materialized view with PRIMARY KEY (a, last_modification_time) but I want to avoid this since materialized views are buggy in 3.X cassandra versions.
What would be alternative way of querying by last_modification_time range given last_modification_time is frequently updated?
How about having two tables? One could hold the current snapshot where you're updating the last_modification_time field and another one which holds the changes over time (something like a history table)? You could write to both of them using BATCH statements.
CREATE TABLE t_modifications (
a int,
last_modification_time timestamp,
b int,
PRIMARY KEY (a, last_modification_time)
) WITH CLUSTERING ORDER BY (last_modificaton_time DESC);
BEGIN BATCH
UPDATE T SET last_modification_time = 123, b = 4 WHERE a = 2;
INSERT INTO t_modifications (a, last_modification_time, b) values (2, 123, 4);
APPLY BATCH;
If you're interested on the latest snapshot by a given modification range, you can select and limit the t_modifications table:
SELECT * FROM t_modifications WHERE a = 2 AND last_modification_time < 136 LIMIT 1;
In general, to do range queries like this, the field you want to range on has to be part of the composite key, has to be the right-most element of the composite key, and all other elements in the composite key have to be specified. In your case, you would modify your PRIMARY KEY to (a, last_modification_time). You can then
SELECT * from t_modifications
WHERE a = aval
AND last_modification_time > beg
AND last_modification_time < end;
This will get you all records for aval between beg and end.

How do I select all rows for a clustering column in cassandra?

I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
On certain cases, I want the B value to be any value in that column.
Is there a way I can query like the following?
Select * from table where A = ? and B = 'any value' and C = ?
Option 1:
In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your fourth query (queries by A and C, but not necessarily knowing B value), is to create a new table to handle that specific query. This table will be pretty much the same, except the CLUSTERING COLUMNS will be in slightly different order:
PRIMARY KEY (A, C, B)
Now this query will work:
Select * from table where A = ? and C = ?
Option 2:
Alternatively you can create a materialized view, with a different clustering order. Now Cassandra will keep the MV in sync with your table data.
create materialized view mv_acbd as
select A, B, C, D
from TABLE1
where A is not null and B is not null and C is not null
primary key (A, C, B);
Now the query against this MV will work like a charm
Select * from mv_acbd where A = ? and C = ?
Option 3:
Not the best, but you could use the following query with your table as it is
Select * from table where A = ? and C = ? ALLOW FILTERING
Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster. For this particular case, the scan is within the same partition and performance may vary depending on ratio of how many clustering columns per partition your use case has.

Updating a Column in Cassandra based on Where Clause

I have a very simple table
cqlsh:hell> describe columnfamily info ;
CREATE TABLE info (
nos int,
value map<text, text>,
PRIMARY KEY (nos)
)
The following is the query where I am trying to update the value .
update info set value = {'count' : '0' , 'onget' : 'function onget(value,count) { count++ ; return {"value": value, "count":count} ; }' } where nos <= 1000 ;
Bad Request: Invalid operator LTE for PRIMARY KEY part nos
I use any operator for specifying the constraint . It complains saying invalid operator. I am not sure what I am doing wrong in here , according to cassandra 3.0 cql doc, there are similar update queries.
The following is my version
[cqlsh 4.1.0 | Cassandra 2.0.3 | CQL spec 3.1.1 | Thrift protocol 19.38.0]
I have no idea , whats going wrong.
The answer is really in my comment, but it needs a bit of elaboration. To restate from the comment...
The first predicate of the where clause has to uniquely identify the partition key. In your case, since the primary key is only one column the partition key == the primary key.
Cassandra can't do range scans over partitions. In the language of CQL, a partition is a potentially wide storage row that is uniquely identified by a key. In this case, the values in your nos column. The values of the partition keys are hashed into tokens which explicitly identify where that data lives in the cluster. Since that hash has no order to it, Cassandra cannot use any operator other than equality to route a statement to the correct destination. This isn't a primary key index that could potentially be updated, it is the fundamental partitioning mechanism in Cassandra. So, you can't use inequality operators as the first clause of a predicate. You can use them in subsequent clauses because the partition has been identified and now you're dealing with an ordered set of columns.
You can't use non-equal condition on the partition key (nos is your partition key).
http://cassandra.apache.org/doc/cql3/CQL.html#selectWhere
Cassandra currently does not support user defined functions inside a query such as the following.
update info set value = {'count' : '0' , 'onget' : 'function onget(value,count) { count++ ; return {"value": value, "count":count} ; }' } where nos <= 1000 ;
First, can you push this onget function into the application layer? You can first query all the rows which nos < 1000. Then increment rows those via some batch query.
Otherwise, you can use a counter column for nos, not a int data type. Notice though, you cannot mix map data type with counter column families unless the non-counter columns are part of a composite key.
Also, you probably doe not want to have nos, a column that changes value as the primary key.
CREATE TABLE info (
id UUID,
value map<text, text>,
PRIMARY KEY (id)
)
CREATE TABLE nos_counter (
info_id UUID,
nos COUNTER,
PRIMARY KEY (info_id)
)
Now you can update the nos counter like this.
update info set nos = nos + 1 where info_id = 'SOME_UUID';

CQL3 and millions of columns composite key use case

How in CQL3 do we do millions of columns? We have one special table where all rows are basically composite keys and very very wide.
I was reading this question that implied two ways
Does collections in CQL3 have certain limits?
Also, the types of our composite keys are String.bytes and ordered by STring
We have an exact matching table that is Decimal.bytes and ordered by decimal.
How would one handle this in CQL3?
thanks,
Dean
"oh, and part of my question was missing since SO formatted it out of the question. I was looking for Decimal.bytes and String.bytes as my composite key....there is no "value", just a col name and I want all columns were decimal > 10 and decimal < 20 so to speak and the column name = 10 occurs multiple times as in 10.a, 10.b 11.c, 11.d, 11.e"
CREATE TABLE widerow
(
row_key text, //whatever
column_composite1 decimal,
column_composite2 text,
PRIMARY KEY(row_key,column_composite1,column_composite2)
)
SELECT * FROM widerow WHERE row_key=...
AND column_composite1>=10.0
AND column_composite1<=20.0
In that case, you can query with range over column_composite1 and have for EACH column_composite1, different values of column_composite2 (10.a, 10.b 11.c, 11.d, 11.e...)
"How do I get all the columns where row_composite1 > "a" and row_composite1 < "b" in that use case? ie. I dont' care about the second half of the composite name. "
2 possible solutions here
Make row_composite1 a composite component of column
Use OrderPreservingPartitioner (this is indeed strongly discouraged)
For solution 1
CREATE TABLE widerow
(
fake_row_key text, //whatever
column_composite1 text, // previously row_composite1
column_composite2 decimal,
column_composite3 text,
PRIMARY KEY(row_key,column_composite1,column_composite2,column_composite3)
)
SELECT * FROM widerow WHERE row_key=...
AND column_composite1>='a'
AND column_composite1<='b'
This modeling has some drawback though. To be able to range query over DOUBLE values, you need to provide first the column_composite1:
SELECT * FROM widerow WHERE row_key=...
AND column_composite1='a'
AND column_composite2>=10.0
AND column_composite2<=20.0
CREATE TABLE widerow
(
row_composite1 text,
row_composite2 text,
column_name decimal,
value text,
PRIMARY KEY((row_composite1,row_composite2),column_name)
)
SELECT * FROM widerow WHERE row_composite1=...
AND row_composite2=...
AND column_name>=10.0
AND column_name<=20.0
ORDER BY column_name DESC

Resources