cassandra, select via a non primary key - cassandra

I'm new with cassandra and I met a problem. I created a keyspace demodb and a table users. This table got 3 columns: id (int and primary key), firstname (varchar), name (varchar).
this request send me the good result:
SELECT * FROM demodb.users WHERE id = 3;
but this one:
SELECT * FROM demodb.users WHERE firstname = 'francois';
doesn't work and I get the following error message:
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: "
This request also doesn't work:
SELECT * FROM users WHERE firstname = 'francois' ORDER BY id DESC LIMIT 5;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
Thanks in advance.

This request also doesn't work:
That's because you are mis-understanding how sort order works in Cassandra. Instead of using a secondary index on firstname, create a table specifically for this query, like this:
CREATE TABLE usersByFirstName (
id int,
firstname text,
lastname text,
PRIMARY KEY (firstname,id));
This query should now work:
SELECT * FROM usersByFirstName WHERE firstname='francois'
ORDER BY id DESC LIMIT 5;
Note, that I have created a compound primary key on firstname and id. This will partition your data on firstname (allowing you to query by it), while also clustering your data by id. By default, your data will be clustered by id in ascending order. To alter this behavior, you can specify a CLUSTERING ORDER in your table creation statement:
WITH CLUSTERING ORDER BY (id DESC)
...and then you won't even need an ORDER BY clause.
I recently wrote an article on how clustering order works in Cassandra (We Shall Have Order). It explains this, and covers some ordering strategies as well.

There is one constraint in cassandra: any field you want to use in the where clause has to be the primary key of the table or there must be a secondary index on it. So you have to create an index to firstname and only after that you can use firstname in the where condition and you will get the result you were expecting.

Related

How can I run SELECT DISTINCT for any column in CQLSH?

Every time I try to run SELECT DISTINCT %column_name from %table_name I receive
InvalidRequest: Error from server: code=2200 [Invalid query] message="SELECT DISTINCT queries must only request partition key columns and/or static columns (not specified %column_name)"
You can run SELECT DISTINCT only on your partition key column. For example, if your schema looks like:
CREATE TABLE artist (
id int PRIMARY KEY,
band_name text,
name text,
role text
);
Then query will be:
SELECT DISTINCT id FROM artist;

Cassandra Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY

This is the query I used to create the table:
CREATE TABLE test.comments (msguuid timeuuid, page text, userid text, username text, msg text, timestamp int, PRIMARY KEY (timestamp, msguuid));
then I create a materialized view:
CREATE MATERIALIZED VIEW test.comments_by_page AS
SELECT *
FROM test.comments
WHERE page IS NOT NULL AND msguuid IS NOT NULL
PRIMARY KEY (page, timestamp, msguuid)
WITH CLUSTERING ORDER BY (msguuid DESC);
I want to get the last 50 rows sorted by timestamp in ascending order.
This is the query I'm trying:
SELECT * FROM test.comments_by_page WHERE page = 'test' AND timestamp < 1496707057 ORDER BY timestamp ASC LIMIT 50;
which then gives this error: InvalidRequest: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
How can I accomplish this?
Materialized View rules are basically the same of "standard" tables ones. If you want a specific order you must specify that in the clustering key.
So you have to put your timestamp into the clustering section.
clustering order statement should be modified as below:
//Don't forget to put the primary key before timestamp into ()
CLUSTERING ORDER BY ((msguuid DESC), timestamp ASC)

How can I use the second column in this primary key both in ORDER BY clause as well as update it using an UPDATE command

Below is the table.
CREATE TABLE threadpool(
threadtype int,
threadid bigint,
jobcount bigint,
valid boolean,
PRIMARY KEY (threadtype, jobcount, threadid)
);
I want to run the below 2 queries on this table.
SELECT * FROM threadpool WHERE threadtype = 1 ORDER BY jobcount ASC LIMIT 1;
UPDATE threadpool SET valid = false WHERE threadtype = 1 and threadid = 4;
The second query fails with the below reason.
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column "threadid" cannot be restricted (preceding column "jobcount" is either not restricted or by a non-EQ relation)"
Can any body please help me in modelling the data to support both the above queries.
Your described data model can't work, as
only values of datatype counter can be incremented using a CQL statement
counter tables can only have the counter as a single column beside the PK
you cannot sort by counter values

cassandra error when using select and where in cql

I have a cassandra table defined like this:
CREATE TABLE test.test(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (id, time, tag)
)
And I want to select one column using select.
I tried:
select * from test where lonumb = 4231;
It gives:
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"
Also I cannot do
select * from test where mstatus = true;
Doesn't cassandra support where as a part of CQL? How to correct this?
You can only use WHERE on the indexed or primary key columns. To correct your issue you will need to create an index.
CREATE INDEX iname
ON keyspacename.tablename(columname)
You can see more info here.
But you have to keep in mind that this query will have to run against all nodes in the cluster.
Alternatively you might rethink your table structure if the lonumb is something you'll do the most queries on.
Jny is correct in that WHERE is only valid on columns in the PRIMARY KEY, or those where a secondary index has been created for. One way to solve this issue is to create a specific query table for lonumb queries.
CREATE TABLE test.testbylonumb(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (lonumb, time, id)
)
Now, this query will work:
select * from testbylonumb where lonumb = 4231;
It will return all CQL rows where lonumb = 4231, sorted by time. I put id on the PRIMARY KEY to ensure uniqueness.
select * from test where mstatus = true;
This one is trickier. Indexes and keys on low-cardinality columns (like booleans) are generally considered a bad idea. See if there's another way you could model that. Otherwise, you could experiment with a secondary index on mstatus, but only use it when you specify a partition key (lonumb in this case), like this:
select * from testbylonumb where lonumb = 4231 AND mstatus = true;
Maybe that wouldn't perform too badly, as you are restricting it to a specific partition. But I definitely wouldn't ever do a SELECT * on mstatus.

ORDER BY with 2ndary indexes is not supported

I am using cassandra 2.1 with latest CQL.
Here is my table & indexes:
CREATE TABLE mydata.chats_new (
id bigint,
adid bigint,
fromdemail text,
fromemail text,
fromjid text,
messagebody text,
messagedatetime text,
messageid text,
messagetype text,
todemail text,
toemail text,
tojid text,
PRIMARY KEY(messageid,messagedatetime)
);
CREATE INDEX user_fromJid ON mydata.chats_new (fromjid);
CREATE INDEX user_toJid ON mydata.chats_new (tojid);
CREATE INDEX user_adid ON mydata.chats_new (adid);
When i execute this query:
select * from chats_new WHERE fromjid='test' AND toJid='test1' ORDER BY messagedatetime DESC;
I got this error:
code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
So how should fetch this data?
select * from chats_new
WHERE fromjid='test' AND toJid='test1'
ORDER BY messagedatetime DESC;
code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
To get the WHERE clause of this query to work, I would build a specific query table, like this:
CREATE TABLE mydata.chats_new_by_fromjid_and_tojid (
id bigint,
adid bigint,
fromdemail text,
fromemail text,
fromjid text,
messagebody text,
messagedatetime text,
messageid text,
messagetype text,
todemail text,
toemail text,
tojid text,
PRIMARY KEY((fromjid, tojid), messagedatetime, messageid)
);
Note the primary key definition. This creates a partitioning key out of fromjid and tojid. While this will allow you to query on both fields, it will also require both fields to be specified in all queries on this table. But that's why they call it a "query table", as it is generally designed to serve one particular query.
As for the remaining fields in the primary key, I kept messagedatetime as the first clustering column, to assure on-disk sort order. Default ordering in Cassandra is ascending, so if you want to change that at query time, that's where your ORDER BY messagedatetime DESC comes into play. And lastly, I made sure that the messageid was the second clustering column, to help ensure primary key uniqueness (assuming that messageid is unique).
Now, this query will work:
select * from chats_new_by_fromjid_and_tojid
WHERE fromjid='test' AND toJid='test1'
ORDER BY messagedatetime DESC;
If you need to query this data by additional criteria, I highly recommend that you create additional query table(s). Remember, Cassandra works best with tables that are specifically designed for each query they serve. It's ok to replicate your data a few times, because disk space is cheap...operation time is not.
Also, DataStax has a great article on when not to use a secondary index. It's definitely worth a read.

Resources