Retrieve rows based on column of type "time" in cassandra db - cassandra

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).

You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

Related

Cassandra DB misunderstanding partition key and primary key

Good Evening,
my problem is, that my recent understanding for partition and primary key is, that the partition key is to distribute the data between the nodes, and the primary ALWAYS contains the partition key. I want to create a partition key to cluster the data with duplicate partition keys and in these clusters I want to have a primary key for unique rows. In my first understanding of Cassandra, it could be possible if can take apart the partition and primary key. Is this possible?
An example to ease my idea:
country
state
unique_id
USA
TEXAS
123
USA
TEXAS
114
country and state as the partition key and the unique id as the primary key.
If I create the primary key like this: PRIMARY KEY ((country, state,unique_id)) I can't filter without using the unique_id but I want e.g. a query like SELECT unique_id FROM table WHERE state = 'Texas' and country = 'USA'.
If I create the primary key in this way: PRIMARY KEY ((country, state)), it obviously overwrites the data every time one entry gets inserted with the same country and state that's why I need the unique primary key.
Primary key always includes the partition key, that's always a first item in the primary key. Partition key could consist out of multiple columns, that's why you have brackets around first item in your example. I believe that in your case, primary key should be as following:
PRIMARY KEY ((country, state),unique_id)
In this case, partition key is a combination of country + state, and then inside that partition you will have unique IDs that will be used to select specific items. General syntax for primary key is:
partition key, clustering column1, clustering column2, ...
where partition key could be either:
column - single column
(column1, column2, ...) - multiple columns

Cassandra error: "Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"

I am running a cassandra query, actually previously done this. but now i can't execute the query, it throws error:
Cassandra error: InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
My query is:
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(deaths))with clustering order by (deaths DESC);
Please Help!
This is happening because you have only specified a single PRIMARY KEY. Single PRIMARY KEYs default to become partition keys. So two problems here:
You need to define a clustering key.
You're trying to enforce order by your partition key.
There are a couple of options here. But as you want to order by deaths, you probably should specify a different column as your partition key. Maybe partition by country_name?
...
PRIMARY KEY (country_name,deaths))
WITH CLUSTERING ORDER BY (deaths DESC);
The caveat, is then you would need to also/always filter by country_name in your WHERE clause.
A primary key in Cassandra consists of one or more partition keys and zero or more clustering key components. The order of these components always puts the partition key first and then the clustering key.
deaths column in this case is a partition key and not a clustering key
For example in below query structure name1 is a partition key and name2 is the clustering key.
CREATE TABLE IF NOT EXISTS
table(column name1 data type,
column name2 data type,
column name3 data type,
PRIMARY KEY(name1,name2))
with clustering order by (name2 DESC);
Find more information here Cassandra keys

Partition key only in Cassandra

In Cassandra, I understand that by default, given PRIMARY KEY(id1, id2), id1 will be partition key and id2 will be clustering key.
I want to know if can I define two partition keys without any clustering key as follows:
PRIMARY KEY ((id1, id2));
Your understanding is correct.
Your PRIMARY KEY ((id1, id2)) is correct and you are specifying one partition key consisting of two columns.
In the second case, you can query the data only by specifying both columns values. EG:
SELECT * FROM mytable WHERE id1=1 AND id2=3;
and queries like:
SELECT * FROM mytable WHERE id1=1;
will fail because id2 is part of your primary key.

Error creating table in cassandra - Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directiv

I get the above error when I try to use following cql statement, not sure whats wrong with it.
CREATE TABLE Stocks(
id uuid,
  market text,
  symbol text,
value text,
time timestamp,
  PRIMARY KEY(id)
) WITH CLUSTERING ORDER BY (time DESC);
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
But this works fine, can't I use some column which is not part of primary key to arrange my rows ?
CREATE TABLE timeseries (
... event_type text,
... insertion_time timestamp,
... event blob,
... PRIMARY KEY (event_type, insertion_time)
... )
... WITH CLUSTERING ORDER BY (insertion_time DESC);
"can't I use some column which is not part of primary key to arrange my rows?"
No, you cannot. From the DataStax documentation on the SELECT command:
ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key.
Therefore, for your first CREATE to work, you will need to adjust your PRIMARY KEY to this:
PRIMARY KEY(id,time)
The second column of in a compound primary key is known as the "clustering column." This is the column that determines the on-disk sort order of data within a partitioning key. Note that last part in italics, because it is important. When you query your Stocks column family (table) by id, all "rows" of column values for that id will be returned, sorted by time. In Cassandra you can only specify order within a partitioning key (and not for your entire table), and your partitioning key is the first key listed in a compound primary key.
Of course the problem with this, is that you probably want id to be unique (which means that CQL will only ever return one "row" of column values per partitioning key). Requiring time to be part of the primary key negates that, and makes it possible to store multiple values for the same id. This is the problem with partitioning your data by a unique id. It might be a good idea in the RDBMS world, but it can make querying in Cassandra more difficult.
Essentially, you are going to need to revisit your data model here. For instance, if you wanted to query prices over time, you could name the table something like "StockPriceEvents" with a primary key of (id,time) or (symbol,time). Querying that table would give you the prices recorded for each id or symbol, sorted by time. Now that may or may not be of any value to your use case. Just trying to explain how primary keys and sort order work in Cassandra.
Note: You should really use column names that have more meaning. Things like "id," "time," and "timeseries" are pretty vague don't really describe anything about the context in which they are used.
While creating a Table in Cassandra with "CLUSTERING ORDER BY" option, make sure the clustering column is Primary column.
Below table created with clustering column ,but the clustering column "Datetime" is not a Primary key column. Hence below error.
ERROR_SCRIPT
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);
ERROR:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"
CORRECTED_SCRIPT (Where the "Datetime" is added into the Primary Key columns)
cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data (
... IP text,
... URL text,
... Status text,
... UserAgent text,
... Datetime timestamp,
... PRIMARY KEY (IP,Datetime)
... ) WITH CLUSTERING ORDER BY (Datetime DESC);

Alter cassandra column family primary key using cassandra-cli or CQL

I am using Cassandra 1.2.5. After creating a column family in Cassandra using cassandra-cli, is it possible to modify the primary key on the column family using either cassandra-cli or CQL?
Specifically, I currently have the following table (from CQL):
CREATE TABLE "table1" (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
);
I would like the table to be as follows, without having to drop and recreate the table:
CREATE TABLE "table1" (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key)
);
Is this possible through either cassandra-cli or CQL?
The primary keys directly determine how and where cassandra stores the data contained in a table (column family). The primary key consists of partition key and clustering key (optional).
The partition key determines which node stores the data. It is responsible for data distribution across the nodes. The additional columns determine per-partition clustering (see compound key documentation).
So changing the primary key will always require all data to be migrated. I do not think that either cqlsh or cassandra-cli have a command for this (as of 2015)..

Resources