cassandra partition and clustering key

cassandra partition and clustering key - cassandra

Is is possible to have a column as a partition and clustering key? For example,
Create table citylist2 ( city varchar, loc list, pop
int, zip varchar, state varchar, primary key (city,city,zip))
WITH CLUSTERING ORDER BY (city ASC, zip DESC);
results in:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Unknown definition city referenced in PRIMARY KEY"
I might be doing this wrong, but can anyone tell me if it is possible to have the column "city" as the partition and clustering key and how to do so if it is possible?

The issue is likely that you are trying to reference city twice in the Primary key definition.

As far as I understand, this is not possible. The partition key splits your data over partitions, and the cluster key will then sort the data within each partition. So it doesn't make sense to have a partition key which is also a clustering key. You may need to rethink your data model for what it is you are attempting.

Create table citylist2 ( city varchar,citycopy varchar, loc list, pop int, zip varchar, state varchar, primary key (city,citycopy,zip)) WITH CLUSTERING ORDER BY (citycopy ASC, zip DESC);
The above can be used if you really want to do what you are trying to do - by duplicating the same data in two columns.
If you can provide more details on why do you want to use the same data as partition and clustering, may be the answer will change.

Related

Cassandra error: "Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"

I am running a cassandra query, actually previously done this. but now i can't execute the query, it throws error:
Cassandra error: InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
My query is:
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(deaths))with clustering order by (deaths DESC);
Please Help!

This is happening because you have only specified a single PRIMARY KEY. Single PRIMARY KEYs default to become partition keys. So two problems here:
You need to define a clustering key.
You're trying to enforce order by your partition key.
There are a couple of options here. But as you want to order by deaths, you probably should specify a different column as your partition key. Maybe partition by country_name?
...
PRIMARY KEY (country_name,deaths))
WITH CLUSTERING ORDER BY (deaths DESC);
The caveat, is then you would need to also/always filter by country_name in your WHERE clause.

A primary key in Cassandra consists of one or more partition keys and zero or more clustering key components. The order of these components always puts the partition key first and then the clustering key.
deaths column in this case is a partition key and not a clustering key
For example in below query structure name1 is a partition key and name2 is the clustering key.
CREATE TABLE IF NOT EXISTS
table(column name1 data type,
column name2 data type,
column name3 data type,
PRIMARY KEY(name1,name2))
with clustering order by (name2 DESC);
Find more information here Cassandra keys

How do order by with one primary key cassandra?

I'm trying to use the order by feature of cassandra, but with only one primary key. But when I try to create my table, this is what cassandra returns.
CREATE TABLE user_classement
(
user_name set<text>,
score float,
PRIMARY KEY (score)
) WITH CLUSTERING ORDER BY (score DESC);
But cassandra throws this error:
Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive
In case there are two primary keys when I create a new column, it works but with only one primary key, I get this error.
Do you know if it is possible to make an order by with only one primary key?

primary key in Cassandra consists of partition key and clustering key. First part in primary key represents partition key. So in your example score is the partition key and ordering can be applied on clustering keys. If you have had a primary key like PRIMARY KEY (score, rank) then you can apply ordering on rank. For partition ordering you may try ByteOrderedPartitioner. But I have not tried it so cannot comment further than this.
Edit 1: As added by Aaron in comments only Murmur3 paritioner should be used. ByteOrderPartitioner is only for backward compatibility for upgrade from old versions.

Cassandra creating table with order by command fails

Im trying to create a new table using the command:
create table schema2(city varchar, loc list, pop int, zip
varchar,state varchar, primary key (city, zip)) WITH CLUSTERING ORDER
BY (city ASC, zip DESC);
But I get the error:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Only clustering key columns can be defined in CLUSTERING
ORDER directive"
I specified the primary keys I want and I did the clustering order by with only primary keys but it still gets and error. How do I fix this?

create table schema2(city varchar, loc list, pop int, zip
varchar,state varchar, primary key (city, zip)) WITH CLUSTERING ORDER
BY (city ASC, zip DESC);
In this definition (city,zip) is called PRIMARY KEY, city is called partition key and zip is called clustering key. Data is distributed among all the nodes based on the partition key. Data is ordered within the partition based on the clustering key. So, you cannot perform ordering on the city. The error which you mentioned clearly states it. If you skip city in your clustering order you DDL will be accepted.

Cassandra:Only clustering key columns can be defined in CLUSTERING ORDER directive

cqlsh> CREATE TABLE mykeyspace.counts ( company text, day bigint, type text, host inet, eventcount counter, PRIMARY KEY ((company, day), type, host) ) WITH CLUSTERING ORDER BY (company ASC, day ASC, ftype ASC, host ASC);
InvalidRequest: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
Why? How to fix?
Thanks

You cannot have an ordering of Partition Keys because these keys declare how the Data is spread around the cluster and cannot be used for range queries. The partition keys are hashed before the data is placed on a node so unless you are using the ByteOrderedPartitioner (do not use this) the ordering is pretty much irrelevant as you need to do a full scan of all nodes to actually do a range query on those columns.
I would recommend looking through some basic data modeling talks to get more information on this:
Data Modeling at Cassandra Day Chicago 2015

Error creating table in cassandra - Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directiv

I get the above error when I try to use following cql statement, not sure whats wrong with it.
CREATE TABLE Stocks(
id uuid,
  market text,
  symbol text,
value text,
time timestamp,
  PRIMARY KEY(id)
) WITH CLUSTERING ORDER BY (time DESC);
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
But this works fine, can't I use some column which is not part of primary key to arrange my rows ?
CREATE TABLE timeseries (
... event_type text,
... insertion_time timestamp,
... event blob,
... PRIMARY KEY (event_type, insertion_time)
... )
... WITH CLUSTERING ORDER BY (insertion_time DESC);

"can't I use some column which is not part of primary key to arrange my rows?"
No, you cannot. From the DataStax documentation on the SELECT command:
ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key.
Therefore, for your first CREATE to work, you will need to adjust your PRIMARY KEY to this:
PRIMARY KEY(id,time)
The second column of in a compound primary key is known as the "clustering column." This is the column that determines the on-disk sort order of data within a partitioning key. Note that last part in italics, because it is important. When you query your Stocks column family (table) by id, all "rows" of column values for that id will be returned, sorted by time. In Cassandra you can only specify order within a partitioning key (and not for your entire table), and your partitioning key is the first key listed in a compound primary key.
Of course the problem with this, is that you probably want id to be unique (which means that CQL will only ever return one "row" of column values per partitioning key). Requiring time to be part of the primary key negates that, and makes it possible to store multiple values for the same id. This is the problem with partitioning your data by a unique id. It might be a good idea in the RDBMS world, but it can make querying in Cassandra more difficult.
Essentially, you are going to need to revisit your data model here. For instance, if you wanted to query prices over time, you could name the table something like "StockPriceEvents" with a primary key of (id,time) or (symbol,time). Querying that table would give you the prices recorded for each id or symbol, sorted by time. Now that may or may not be of any value to your use case. Just trying to explain how primary keys and sort order work in Cassandra.
Note: You should really use column names that have more meaning. Things like "id," "time," and "timeseries" are pretty vague don't really describe anything about the context in which they are used.

While creating a Table in Cassandra with "CLUSTERING ORDER BY" option, make sure the clustering column is Primary column.
Below table created with clustering column ,but the clustering column "Datetime" is not a Primary key column. Hence below error.
ERROR_SCRIPT
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);
ERROR:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
CORRECTED_SCRIPT (Where the "Datetime" is added into the Primary Key columns)
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP,Datetime)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string