ORDER BY with 2ndary indexes is not supported - cassandra

I am using cassandra 2.1 with latest CQL.
Here is my table & indexes:
CREATE TABLE mydata.chats_new (
id bigint,
adid bigint,
fromdemail text,
fromemail text,
fromjid text,
messagebody text,
messagedatetime text,
messageid text,
messagetype text,
todemail text,
toemail text,
tojid text,
PRIMARY KEY(messageid,messagedatetime)
);
CREATE INDEX user_fromJid ON mydata.chats_new (fromjid);
CREATE INDEX user_toJid ON mydata.chats_new (tojid);
CREATE INDEX user_adid ON mydata.chats_new (adid);
When i execute this query:
select * from chats_new WHERE fromjid='test' AND toJid='test1' ORDER BY messagedatetime DESC;
I got this error:
code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
So how should fetch this data?

select * from chats_new
WHERE fromjid='test' AND toJid='test1'
ORDER BY messagedatetime DESC;
code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
To get the WHERE clause of this query to work, I would build a specific query table, like this:
CREATE TABLE mydata.chats_new_by_fromjid_and_tojid (
id bigint,
adid bigint,
fromdemail text,
fromemail text,
fromjid text,
messagebody text,
messagedatetime text,
messageid text,
messagetype text,
todemail text,
toemail text,
tojid text,
PRIMARY KEY((fromjid, tojid), messagedatetime, messageid)
);
Note the primary key definition. This creates a partitioning key out of fromjid and tojid. While this will allow you to query on both fields, it will also require both fields to be specified in all queries on this table. But that's why they call it a "query table", as it is generally designed to serve one particular query.
As for the remaining fields in the primary key, I kept messagedatetime as the first clustering column, to assure on-disk sort order. Default ordering in Cassandra is ascending, so if you want to change that at query time, that's where your ORDER BY messagedatetime DESC comes into play. And lastly, I made sure that the messageid was the second clustering column, to help ensure primary key uniqueness (assuming that messageid is unique).
Now, this query will work:
select * from chats_new_by_fromjid_and_tojid
WHERE fromjid='test' AND toJid='test1'
ORDER BY messagedatetime DESC;
If you need to query this data by additional criteria, I highly recommend that you create additional query table(s). Remember, Cassandra works best with tables that are specifically designed for each query they serve. It's ok to replicate your data a few times, because disk space is cheap...operation time is not.
Also, DataStax has a great article on when not to use a secondary index. It's definitely worth a read.

Related

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

Cassandra Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY

This is the query I used to create the table:
CREATE TABLE test.comments (msguuid timeuuid, page text, userid text, username text, msg text, timestamp int, PRIMARY KEY (timestamp, msguuid));
then I create a materialized view:
CREATE MATERIALIZED VIEW test.comments_by_page AS
SELECT *
FROM test.comments
WHERE page IS NOT NULL AND msguuid IS NOT NULL
PRIMARY KEY (page, timestamp, msguuid)
WITH CLUSTERING ORDER BY (msguuid DESC);
I want to get the last 50 rows sorted by timestamp in ascending order.
This is the query I'm trying:
SELECT * FROM test.comments_by_page WHERE page = 'test' AND timestamp < 1496707057 ORDER BY timestamp ASC LIMIT 50;
which then gives this error: InvalidRequest: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
How can I accomplish this?
Materialized View rules are basically the same of "standard" tables ones. If you want a specific order you must specify that in the clustering key.
So you have to put your timestamp into the clustering section.
clustering order statement should be modified as below:
//Don't forget to put the primary key before timestamp into ()
CLUSTERING ORDER BY ((msguuid DESC), timestamp ASC)

cassandra, select via a non primary key

I'm new with cassandra and I met a problem. I created a keyspace demodb and a table users. This table got 3 columns: id (int and primary key), firstname (varchar), name (varchar).
this request send me the good result:
SELECT * FROM demodb.users WHERE id = 3;
but this one:
SELECT * FROM demodb.users WHERE firstname = 'francois';
doesn't work and I get the following error message:
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: "
This request also doesn't work:
SELECT * FROM users WHERE firstname = 'francois' ORDER BY id DESC LIMIT 5;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
Thanks in advance.
This request also doesn't work:
That's because you are mis-understanding how sort order works in Cassandra. Instead of using a secondary index on firstname, create a table specifically for this query, like this:
CREATE TABLE usersByFirstName (
id int,
firstname text,
lastname text,
PRIMARY KEY (firstname,id));
This query should now work:
SELECT * FROM usersByFirstName WHERE firstname='francois'
ORDER BY id DESC LIMIT 5;
Note, that I have created a compound primary key on firstname and id. This will partition your data on firstname (allowing you to query by it), while also clustering your data by id. By default, your data will be clustered by id in ascending order. To alter this behavior, you can specify a CLUSTERING ORDER in your table creation statement:
WITH CLUSTERING ORDER BY (id DESC)
...and then you won't even need an ORDER BY clause.
I recently wrote an article on how clustering order works in Cassandra (We Shall Have Order). It explains this, and covers some ordering strategies as well.
There is one constraint in cassandra: any field you want to use in the where clause has to be the primary key of the table or there must be a secondary index on it. So you have to create an index to firstname and only after that you can use firstname in the where condition and you will get the result you were expecting.

Apache Cassandra table not sorting by name or title correctly

I have the following Apache Cassandra Table working.
CREATE TABLE user_songs (
member_id int,
song_id int,
title text,
timestamp timeuuid,
album_id int,
album_title text,
artist_names set<text>,
PRIMARY KEY ((member_id, song_id), title)
) WITH CLUSTERING ORDER BY (title ASC)
CREATE INDEX user_songs_member_id_idx ON music.user_songs (member_id);
When I try to do a select * FROM user_songs WHERE member_id = 1; I thought the Clustering Order by title would have given me a sorted ASC of the return - but it doesn't
Two questions:
Is there something with the table in terms of ordering or PK?
Do I need more tables for my needs in order to have a sorted title by member_id.
Note - my Cassandra queries for this table are:
Find all songs with member_id
Remove a song from memeber_id given song_id
Hence why the PK is composite
UPDATE
It is simialr to: Query results not ordered despite WITH CLUSTERING ORDER BY
However one of the suggestion in the comments is to put member_id,song_id,title as primary instead of the composite that I currently have. When I do that It seems that I cannot delete with only song_id and member_id which is the data that I get for deleting (hence title is missing when deleting)

Non-EQ relation error Cassandra - how fix primary key?

I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?
My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0

Resources