Error creating table in cassandra - Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directiv - cassandra

I get the above error when I try to use following cql statement, not sure whats wrong with it.
CREATE TABLE Stocks(
id uuid,
  market text,
  symbol text,
value text,
time timestamp,
  PRIMARY KEY(id)
) WITH CLUSTERING ORDER BY (time DESC);
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
But this works fine, can't I use some column which is not part of primary key to arrange my rows ?
CREATE TABLE timeseries (
... event_type text,
... insertion_time timestamp,
... event blob,
... PRIMARY KEY (event_type, insertion_time)
... )
... WITH CLUSTERING ORDER BY (insertion_time DESC);

"can't I use some column which is not part of primary key to arrange my rows?"
No, you cannot. From the DataStax documentation on the SELECT command:
ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key.
Therefore, for your first CREATE to work, you will need to adjust your PRIMARY KEY to this:
PRIMARY KEY(id,time)
The second column of in a compound primary key is known as the "clustering column." This is the column that determines the on-disk sort order of data within a partitioning key. Note that last part in italics, because it is important. When you query your Stocks column family (table) by id, all "rows" of column values for that id will be returned, sorted by time. In Cassandra you can only specify order within a partitioning key (and not for your entire table), and your partitioning key is the first key listed in a compound primary key.
Of course the problem with this, is that you probably want id to be unique (which means that CQL will only ever return one "row" of column values per partitioning key). Requiring time to be part of the primary key negates that, and makes it possible to store multiple values for the same id. This is the problem with partitioning your data by a unique id. It might be a good idea in the RDBMS world, but it can make querying in Cassandra more difficult.
Essentially, you are going to need to revisit your data model here. For instance, if you wanted to query prices over time, you could name the table something like "StockPriceEvents" with a primary key of (id,time) or (symbol,time). Querying that table would give you the prices recorded for each id or symbol, sorted by time. Now that may or may not be of any value to your use case. Just trying to explain how primary keys and sort order work in Cassandra.
Note: You should really use column names that have more meaning. Things like "id," "time," and "timeseries" are pretty vague don't really describe anything about the context in which they are used.

While creating a Table in Cassandra with "CLUSTERING ORDER BY" option, make sure the clustering column is Primary column.
Below table created with clustering column ,but the clustering column "Datetime" is not a Primary key column. Hence below error.
ERROR_SCRIPT
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);
ERROR:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
CORRECTED_SCRIPT (Where the "Datetime" is added into the Primary Key columns)
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP,Datetime)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);

Related

Can I change the order of rows without specifying a column as the clustering key?

I know I can change the on-disk sorting to descending by defining a table like this:
create table timeseries (
event_type text,
insertion_time timestamp,
event blob,
PRIMARY KEY (event_type, insertion_time)
)
WITH CLUSTERING ORDER BY (insertion_time DESC);
But the problem is that now insertion_time is part of a composite unique constraint (event_type + insertation_time) which is not desired.
Is there any way to change the order without making the column a primary key? Something like this:
create table timeseries (
event_type text,
insertion_time timestamp,
event blob,
PRIMARY KEY (event_type)
)
WITH CLUSTERING ORDER BY (insertion_time DESC);
What you want is not possible because in the second table schema, there will only every be one row for every single partition. Another way of putting it is -- there are no rows to sort since there is only ever one row.
You can only specify the CLUSTERING ORDER if there is a clustering column to sort. Cheers!

Cassandra error: "Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"

I am running a cassandra query, actually previously done this. but now i can't execute the query, it throws error:
Cassandra error: InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
My query is:
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(deaths))with clustering order by (deaths DESC);
Please Help!
This is happening because you have only specified a single PRIMARY KEY. Single PRIMARY KEYs default to become partition keys. So two problems here:
You need to define a clustering key.
You're trying to enforce order by your partition key.
There are a couple of options here. But as you want to order by deaths, you probably should specify a different column as your partition key. Maybe partition by country_name?
...
PRIMARY KEY (country_name,deaths))
WITH CLUSTERING ORDER BY (deaths DESC);
The caveat, is then you would need to also/always filter by country_name in your WHERE clause.
A primary key in Cassandra consists of one or more partition keys and zero or more clustering key components. The order of these components always puts the partition key first and then the clustering key.
deaths column in this case is a partition key and not a clustering key
For example in below query structure name1 is a partition key and name2 is the clustering key.
CREATE TABLE IF NOT EXISTS
table(column name1 data type,
column name2 data type,
column name3 data type,
PRIMARY KEY(name1,name2))
with clustering order by (name2 DESC);
Find more information here Cassandra keys

cassandra partition and clustering key

Is is possible to have a column as a partition and clustering key? For example,
Create table citylist2 ( city varchar, loc list, pop
int, zip varchar, state varchar, primary key (city,city,zip))
WITH CLUSTERING ORDER BY (city ASC, zip DESC);
results in:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Unknown definition city referenced in PRIMARY KEY"
I might be doing this wrong, but can anyone tell me if it is possible to have the column "city" as the partition and clustering key and how to do so if it is possible?
The issue is likely that you are trying to reference city twice in the Primary key definition.
As far as I understand, this is not possible. The partition key splits your data over partitions, and the cluster key will then sort the data within each partition. So it doesn't make sense to have a partition key which is also a clustering key. You may need to rethink your data model for what it is you are attempting.
Create table citylist2 ( city varchar,citycopy varchar, loc list, pop int, zip varchar, state varchar, primary key (city,citycopy,zip)) WITH CLUSTERING ORDER BY (citycopy ASC, zip DESC);
The above can be used if you really want to do what you are trying to do - by duplicating the same data in two columns.
If you can provide more details on why do you want to use the same data as partition and clustering, may be the answer will change.

Retrieve rows based on column of type "time" in cassandra db

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).
You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

Order by created date In Cassandra

i have problem with ordering data in cassandra Database.
this is my table structure:
CREATE TABLE posts (
id uuid,
created_at timestamp,
comment_enabled boolean,
content text,
enabled boolean,
meta map<text, text>,
post_type tinyint,
summary text,
title text,
updated_at timestamp,
url text,
user_id uuid,
PRIMARY KEY (id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)
and when i run this query, i got the following message:
Query:
select * from posts order by created_at desc;
message:
ORDER BY is only supported when the partition key is restricted by an EQ or an IN.
Or this query return data without sorting:
select * from posts
There are couple of things you need to understand,
In your case the partition key is "id" and the clustering key is "created_at".
what that essentially means is any row will be stored in a partition based on the hash of "id"(depending on your hashing scheme by default it is Murmur3), now inside that partition the data is sorted based on your clustering key, in your case "created_at".
So if you query some data from that table by default the results which come are sorted based on your clustering order and the default sort order is the one which you specify while creating the table. However there is a gotcha there.
If yo do not specify the partition key in the WHERE clause, the actual order of the result set then becomes dependent on the hashed values of partition key(in your case id).
So in order to get the posts by that specific order. you have to specify the partition key like this
select * from posts WHERE id=1 order by created_at desc;
Note:
It is not necessary to specify the ORDER BY clause on a query if your desired sort direction (“ASCending/DESCending”) already matches the CLUSTERING ORDER in the table definition.
So essentially the above query is same as
select * from posts WHERE id=1
You can read more about this here http://www.datastax.com/dev/blog/we-shall-have-order
The error message is pretty clear: you cannot ORDER BY without restricting the query with a WHERE clause. This is by design.
The data you get when running without a WHERE clause actually are ordered, not with your clustering key, but by applying the token function to your partition key. You can verify the order by issuing:
SELECT token(id), id, created_at, user_id FROM posts;
where the token function arguments exactly match your PARTITION KEY.
I suggest you to read this and this to understand what you can/can't do.

Resources