I am new to Cassandra and to this forum. I' m executing Cassandra query using cqlsh but I don't know how to do execute a query like sql select distinct a, b, c from table order by d asc using Cassandra. How can I do? What would be the structure of the table?
Your primary key consists of partition keys and clustering columns.
DISTINCT queries must only request partition keys.
ORDER BY is supported on clustered columns.
Suppose we have a sample table like following,
CREATE TABLE Sample (
field1 text,
field2 text,
field3 text,
field4 text,
PRIMARY KEY ((field1, field2), field3));
DISTINCT requires all the parition keys to be passed comma separated.
So you can't run this query select distinct field1 from Sample;. A valid expression would be select distinct field1, field2 from Sample;.
It internally hits all the nodes in the cluster to find all the partition keys so if you have millions of partitions in you table, I would expect a performance drop with multiple nodes.
By default, records will be in ascending order for field3. Below query will provide records in descending order of field3.
select * from Sample where field1 = 'a' and field2 = 'b' order by field3 desc;
If you already know your query patterns and the way you would require data to be ordered, you can design the table in that way. Suppose you always require records in descending order for field3 , you could have designed your table in this way.
CREATE TABLE Sample (
field1 text,
field2 text,
field3 text,
field4 text,
PRIMARY KEY ((field1, field2), field3))
WITH CLUSTERING ORDER BY (field3 DESC);
Now querying without order by will result in the same result.
You can use order by with multiple clustered columns. But you can't skip the order. To understand that let's have a sample table like below,
CREATE TABLE Sample1 (
field1 text,
field2 text,
field3 text,
field4 int,
field5 int,
PRIMARY KEY ((field1, field2), field3, field4));
I added few dummy records.
You may use order by multiple columns like this select * from Sample1 where field1 = 'a' and field2 = 'b' order by field3 desc, field4 desc;
NOTE : All fields need to be either in positive order (field3 asc, field4 asc) or negative order (field3 desc, field4 desc). You can't do (field3 asc, field4 desc) or vice versa.
Above query will result in this.
By writing we can't skip the order in order by, I meant we cant do something like select * from Sample1 where field1 = 'a' and field2 = 'b' order by field4 desc;
I hope this helps !
Related
I'm trying to run the following CQL command
ALTER TABLE keyspace_name_abc.table_name
ADD (field1 text, field2 text)
WITH default_time_to_live = 15552000;
But it is not working, the error I'm getting is:
SyntaxException: line 1:NN mismatched input 'WITH' expecting EOF (...field1 text, field2 text [WITH] ...)
If I run that command separated, it works. Is there any limitation in CQL to combine multiple changes?
ALTER TABLE keyspace_name_abc.table_name
ADD (field1 text, field2 text);
ALTER TABLE keyspace_name_abc.table_name
WITH default_time_to_live = 15552000;
This way, the commands are accepted.
I don't believe you'll be able to update the default TTL for a subset of columns, so you'll have to separate out those statements and run them like:
ALTER TABLE keyspace_name_abc.table_name ADD (field1 text, field2 text);
ALTER TABLE keyspace_name_abc.table_name WITH default_time_to_live = 15552000;
If you require some non-default TTL for a particular column, then you'll need to add that on the insert, for example:
INSERT INTO keyspace_name_abc.table_name (col1, col2, field1, field2) VALUES ('col1text', 'col2text', 'field1text', 'field2text') USING TTL 86400;
updated after comment from Jim.
I have a database with schema
field1 //partition key
field2 //clustering column
field3
I suppose Cassandra will calculate hash on field1, decide in which node this data entry will go into and will store it there. As I am using clustering column, for 2 data entries with same value of field1 but different values of field2, the data will be stored as two rows.
field1, field2.1, field3
field1, field2.2, field3
Is it possible to create a query which would return value 2 (count of rows) as there are two rows belonging to partition key field1?
Do a
select count(*) from table where field1 = “x” ;
You should get two in case of the example shown in your question
I am not able to perform Group by on a primary partition. I am using Cassandra 3.10. When I group by I get the following error.
InvalidReqeust: Error from server: code=2200 [Invalid query] message="Group by currently only support groups of columns following their declared order in the Primary Key. My column is a primary key even still I am facing the problem.
My schema is
Table trends{
name text,
price int,
quantity int,
code text,
code_name text,
cluster_id text
uitime timeuuid,
primary key((name,price),code,uitime))
with clustering order by (code DESC, uitime DESC)
And the command that I run is: select sum(quantity) from trends group by code;
For starters your schema is invalid. You cannot set clustering order on code because it is the partition key. The order is going to be determined by the hash of it (unless using byte order partitioner - but don't do that).
The query and thing your talking about does work though. For example you can run
> SELECT keyspace_name, sum(partitions_count) AS approx_partitions FROM system.size_estimates GROUP BY keyspace_name;
keyspace_name | approx_partitions
--------------------+-------------------
system_auth | 128
basic | 4936508
keyspace1 | 870
system_distributed | 0
system_traces | 0
where they schema is:
CREATE TABLE system.size_estimates (
keyspace_name text,
table_name text,
range_start text,
range_end text,
mean_partition_size bigint,
partitions_count bigint,
PRIMARY KEY ((keyspace_name), table_name, range_start, range_end)
) WITH CLUSTERING ORDER BY (table_name ASC, range_start ASC, range_end ASC)
Perhaps the pseudo-schema you provided differs from the actual one. Can you provide output of describe table xxxxx in your question?
This is the query I used to create the table:
CREATE TABLE test.comments (msguuid timeuuid, page text, userid text, username text, msg text, timestamp int, PRIMARY KEY (timestamp, msguuid));
then I create a materialized view:
CREATE MATERIALIZED VIEW test.comments_by_page AS
SELECT *
FROM test.comments
WHERE page IS NOT NULL AND msguuid IS NOT NULL
PRIMARY KEY (page, timestamp, msguuid)
WITH CLUSTERING ORDER BY (msguuid DESC);
I want to get the last 50 rows sorted by timestamp in ascending order.
This is the query I'm trying:
SELECT * FROM test.comments_by_page WHERE page = 'test' AND timestamp < 1496707057 ORDER BY timestamp ASC LIMIT 50;
which then gives this error: InvalidRequest: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
How can I accomplish this?
Materialized View rules are basically the same of "standard" tables ones. If you want a specific order you must specify that in the clustering key.
So you have to put your timestamp into the clustering section.
clustering order statement should be modified as below:
//Don't forget to put the primary key before timestamp into ()
CLUSTERING ORDER BY ((msguuid DESC), timestamp ASC)
In the following scenario:
CREATE TABLE temperature_by_day (
weatherstation_id text,
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time)
)WITH CLUSTERING ORDER BY (event_time DESC);
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 08:01:00','74F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 07:01:00','73F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 08:01:00','76F');
If I do the following query:
SELECT *
FROM temperature_by_day
WHERE weatherstation_id='1234ABCD'
AND date in ('2013-04-04', '2013-04-03') limit 2;
I realized that the result of cassandra is ordered by the same sequence of patkeys in clausa IN. In this case, I'd like to know if the expected result is ALWAYS the two records of the day '2013-04-04'? Ie Cassadra respects the order of the IN clause in the ordering of the result even in a scenario with multiple nodes?