Cassandra error: "Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive" - cassandra

I am running a cassandra query, actually previously done this. but now i can't execute the query, it throws error:
Cassandra error: InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
My query is:
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(deaths))with clustering order by (deaths DESC);
Please Help!

This is happening because you have only specified a single PRIMARY KEY. Single PRIMARY KEYs default to become partition keys. So two problems here:
You need to define a clustering key.
You're trying to enforce order by your partition key.
There are a couple of options here. But as you want to order by deaths, you probably should specify a different column as your partition key. Maybe partition by country_name?
...
PRIMARY KEY (country_name,deaths))
WITH CLUSTERING ORDER BY (deaths DESC);
The caveat, is then you would need to also/always filter by country_name in your WHERE clause.

A primary key in Cassandra consists of one or more partition keys and zero or more clustering key components. The order of these components always puts the partition key first and then the clustering key.
deaths column in this case is a partition key and not a clustering key
For example in below query structure name1 is a partition key and name2 is the clustering key.
CREATE TABLE IF NOT EXISTS
table(column name1 data type,
column name2 data type,
column name3 data type,
PRIMARY KEY(name1,name2))
with clustering order by (name2 DESC);
Find more information here Cassandra keys

Related

Cassandra creating table with order by command fails

Im trying to create a new table using the command:
create table schema2(city varchar, loc list, pop int, zip
varchar,state varchar, primary key (city, zip)) WITH CLUSTERING ORDER
BY (city ASC, zip DESC);
But I get the error:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Only clustering key columns can be defined in CLUSTERING
ORDER directive"
I specified the primary keys I want and I did the clustering order by with only primary keys but it still gets and error. How do I fix this?
create table schema2(city varchar, loc list, pop int, zip
varchar,state varchar, primary key (city, zip)) WITH CLUSTERING ORDER
BY (city ASC, zip DESC);
In this definition (city,zip) is called PRIMARY KEY, city is called partition key and zip is called clustering key. Data is distributed among all the nodes based on the partition key. Data is ordered within the partition based on the clustering key. So, you cannot perform ordering on the city. The error which you mentioned clearly states it. If you skip city in your clustering order you DDL will be accepted.

cassandra partition and clustering key

Is is possible to have a column as a partition and clustering key? For example,
Create table citylist2 ( city varchar, loc list, pop
int, zip varchar, state varchar, primary key (city,city,zip))
WITH CLUSTERING ORDER BY (city ASC, zip DESC);
results in:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Unknown definition city referenced in PRIMARY KEY"
I might be doing this wrong, but can anyone tell me if it is possible to have the column "city" as the partition and clustering key and how to do so if it is possible?
The issue is likely that you are trying to reference city twice in the Primary key definition.
As far as I understand, this is not possible. The partition key splits your data over partitions, and the cluster key will then sort the data within each partition. So it doesn't make sense to have a partition key which is also a clustering key. You may need to rethink your data model for what it is you are attempting.
Create table citylist2 ( city varchar,citycopy varchar, loc list, pop int, zip varchar, state varchar, primary key (city,citycopy,zip)) WITH CLUSTERING ORDER BY (citycopy ASC, zip DESC);
The above can be used if you really want to do what you are trying to do - by duplicating the same data in two columns.
If you can provide more details on why do you want to use the same data as partition and clustering, may be the answer will change.

Retrieve rows based on column of type "time" in cassandra db

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).
You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

Cassandra:Only clustering key columns can be defined in CLUSTERING ORDER directive

cqlsh> CREATE TABLE mykeyspace.counts ( company text, day bigint, type text, host inet, eventcount counter, PRIMARY KEY ((company, day), type, host) ) WITH CLUSTERING ORDER BY (company ASC, day ASC, ftype ASC, host ASC);
InvalidRequest: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
Why? How to fix?
Thanks
You cannot have an ordering of Partition Keys because these keys declare how the Data is spread around the cluster and cannot be used for range queries. The partition keys are hashed before the data is placed on a node so unless you are using the ByteOrderedPartitioner (do not use this) the ordering is pretty much irrelevant as you need to do a full scan of all nodes to actually do a range query on those columns.
I would recommend looking through some basic data modeling talks to get more information on this:
Data Modeling at Cassandra Day Chicago 2015

Error creating table in cassandra - Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directiv

I get the above error when I try to use following cql statement, not sure whats wrong with it.
CREATE TABLE Stocks(
id uuid,
  market text,
  symbol text,
value text,
time timestamp,
  PRIMARY KEY(id)
) WITH CLUSTERING ORDER BY (time DESC);
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
But this works fine, can't I use some column which is not part of primary key to arrange my rows ?
CREATE TABLE timeseries (
... event_type text,
... insertion_time timestamp,
... event blob,
... PRIMARY KEY (event_type, insertion_time)
... )
... WITH CLUSTERING ORDER BY (insertion_time DESC);
"can't I use some column which is not part of primary key to arrange my rows?"
No, you cannot. From the DataStax documentation on the SELECT command:
ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key.
Therefore, for your first CREATE to work, you will need to adjust your PRIMARY KEY to this:
PRIMARY KEY(id,time)
The second column of in a compound primary key is known as the "clustering column." This is the column that determines the on-disk sort order of data within a partitioning key. Note that last part in italics, because it is important. When you query your Stocks column family (table) by id, all "rows" of column values for that id will be returned, sorted by time. In Cassandra you can only specify order within a partitioning key (and not for your entire table), and your partitioning key is the first key listed in a compound primary key.
Of course the problem with this, is that you probably want id to be unique (which means that CQL will only ever return one "row" of column values per partitioning key). Requiring time to be part of the primary key negates that, and makes it possible to store multiple values for the same id. This is the problem with partitioning your data by a unique id. It might be a good idea in the RDBMS world, but it can make querying in Cassandra more difficult.
Essentially, you are going to need to revisit your data model here. For instance, if you wanted to query prices over time, you could name the table something like "StockPriceEvents" with a primary key of (id,time) or (symbol,time). Querying that table would give you the prices recorded for each id or symbol, sorted by time. Now that may or may not be of any value to your use case. Just trying to explain how primary keys and sort order work in Cassandra.
Note: You should really use column names that have more meaning. Things like "id," "time," and "timeseries" are pretty vague don't really describe anything about the context in which they are used.
While creating a Table in Cassandra with "CLUSTERING ORDER BY" option, make sure the clustering column is Primary column.
Below table created with clustering column ,but the clustering column "Datetime" is not a Primary key column. Hence below error.
ERROR_SCRIPT
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);
ERROR:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
CORRECTED_SCRIPT (Where the "Datetime" is added into the Primary Key columns)
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP,Datetime)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);

Resources