I have a cassandra table defined like this:
CREATE TABLE test.test(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (id, time, tag)
)
And I want to select one column using select.
I tried:
select * from test where lonumb = 4231;
It gives:
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"
Also I cannot do
select * from test where mstatus = true;
Doesn't cassandra support where as a part of CQL? How to correct this?
You can only use WHERE on the indexed or primary key columns. To correct your issue you will need to create an index.
CREATE INDEX iname
ON keyspacename.tablename(columname)
You can see more info here.
But you have to keep in mind that this query will have to run against all nodes in the cluster.
Alternatively you might rethink your table structure if the lonumb is something you'll do the most queries on.
Jny is correct in that WHERE is only valid on columns in the PRIMARY KEY, or those where a secondary index has been created for. One way to solve this issue is to create a specific query table for lonumb queries.
CREATE TABLE test.testbylonumb(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (lonumb, time, id)
)
Now, this query will work:
select * from testbylonumb where lonumb = 4231;
It will return all CQL rows where lonumb = 4231, sorted by time. I put id on the PRIMARY KEY to ensure uniqueness.
select * from test where mstatus = true;
This one is trickier. Indexes and keys on low-cardinality columns (like booleans) are generally considered a bad idea. See if there's another way you could model that. Otherwise, you could experiment with a secondary index on mstatus, but only use it when you specify a partition key (lonumb in this case), like this:
select * from testbylonumb where lonumb = 4231 AND mstatus = true;
Maybe that wouldn't perform too badly, as you are restricting it to a specific partition. But I definitely wouldn't ever do a SELECT * on mstatus.
Related
Have a table
REATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set<text>, PRIMARY KEY (uuidHotel, uuidRoom));
Tried to select with IN:
select * from tabletest where uuidhotel = 'uuidHotel' and bookedtimestampset IN ('1460710800000');
Got
'bookedtimestampset' (set<text>) cannot be restricted by a 'IN' relation"
Can I select elements by IN Set filter?
Can I select elements by IN Set filter?
No, but you can put a secondary index on bookedtimestampset and use the CONTAINS operator:
aploetz#cqlsh:stackoverflow> CREATE INDEX timeset_idx ON tabletest(bookedtimestampset);
aploetz#cqlsh:stackoverflow> SELECT uuidhotel,uuidroom FROM tabletest
WHERE uuidhotel = 'uuidHotel1' and bookedtimestampset CONTAINS '1460710800000';
uuidhotel | uuidroom
------------+----------
uuidHotel1 | uuidroom1
(1 rows)
Normally I wouldn't recommend a secondary index, but as long as you are filtering by a partition key (uuidhotel) it should perform ok.
Can I select elements by IN Set filter?
you can't use clause IN with your primary key. It is highly important to understand how significantly data model influences on query performance. Of course, you can add secondary index for column bookedtimestampset but in this case be ready to for performance degradation.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set, PRIMARY KEY (uuidHotel, uuidRoom));
your compound primary key consists of one partition key uuidHotel and one clustering key uuidRoom which means that all your hotels and rooms would physically stored on same node in order as result retrieval of rows is very efficient. bookedTimeStampSet is different column which would be spread through whole cluster and it is just impossible to restrict by this column without secondary indexing one.
Consequently. I would recommend you to create primary key according to your future queries even if you need to duplicate some data which is common practice for NoSql database such Cassandra is.
e.q.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text,
uuidRoom text, uuidGuest text, bookedTimeStamp timestamp, PRIMARY KEY
(uuidHotel, bookedTimeStamp , uuidRoom))
it allows you to make a query like
select * from tabletest where uuidhotel = 'uuidHotel' and
bookedtimestamp > '1460710800000 and bookedtimestamp < '1460710900000'
I have a column family and syntax like this:
CREATE TABLE sr_number_callrecord (
id int,
callerph text,
sr_number text,
callid text,
start_time text,
plan_id int,
PRIMARY KEY((sr_number), start_time, callerph)
);
I want to do the query like :
a) select * from dummy where sr_number='+919xxxx8383'
and start_time >='2014-12-02 08:23:18' limit 10;
b) select * from dummy where sr_number='+919xxxxxx83'
and start_time >='2014-12-02 08:23:18'
and callerph='+9120xxxxxxxx0' limit 10;
First query works fine but second query is giving error like
Bad Request: PRIMARY KEY column "callerph" cannot be restricted
(preceding column "start_time" is either not restricted or by a non-EQ
relation)
If I get the result in first query, In second query I am just adding one
more cluster key to get filter result and the row will be less
Just like you cannot skip PRIMARY KEY components, you may only use a non-equals operator on the last component that you query (which is why your 1st query works).
If you do need to serve both of the queries you have listed above, then you will need to have separate query tables for each. To serve the second query, a query table (with the same columns) will work if you define it with a PRIMARY KEY like this:
PRIMARY KEY((sr_number), callerph, start_time)
That way you are still specifying the parts of your PRIMARY KEY in order, and your non-equals condition is on the last PRIMARY KEY component.
There are certain restrictions in the way the primary key columns are to be used in the where clause http://docs.datastax.com/en/cql/3.1/cql/cql_reference/select_r.html
One solution that will work in your situation is to change the order of clustering columns in the primary key
CREATE TABLE sr_number_callrecord (
id int,
callerph text,
sr_number text,
callid text,
start_time text,
plan_id int,
PRIMARY KEY((sr_number), callerph, start_time,)
);
Now you can use range query on the last column as
select * from sr_number_callrecord where sr_number = '1234' and callerph = '+91123' and start_time >= '1234';
I'm currently trying to model a column family that has two timestamps specifying whether an entry is valid (or 'active') at a given date (typically execution time).
No big issue with traditional SQL, 64 gigs of RAM and some indices, we're doing that quite often with our SQL server.
However, in CQL I haven't managed to model this scenario and write valid queries for it.
My basic model is (I skipped the PK definition!)
create table myTable(
id uuid,
validFrom timeuuid,
validTo timeuuid,
someInformationalData varChar
);
Some explanations:
due to the fact, that a validity date is not unique, I need a combined key in my final application this is going to be a usergroup reference (would be an ideal partition key)
validFrom/To are designed to be optional, but I could deal with by using boundary values (1970, 2038) for 'null' values passed through the persistence layer
I tried various combinations of partitioning/clustering keys, however neither of them resulted in valid CQL
-- only active results
select *
from
myTable
where
validFrom < now()
and
validTo > now()
I'm quite new to the NoSQL/CQL world and am struggling a bit with converting some of our applications. I could do it in memory, but I'm afraid, this could get a bottleneck at some point...
No sure if this kind of 'I have no idea what I'm doing' yell is appropriate, but any kind of help would be appreciated. :)
edit Here's one of the approaches I've been messing around with
drop table if exists myTable;
create table myTable(
id int,
datefrom timeuuid,
dateto timeuuid,
someColumns varChar,
primary key((id,datefrom),dateto)
);
create index if not exists my_idx on myTable(datefrom);
insert into myTable(id, datefrom,dateto,somecolumns)
values(0,minTimeuuid('1970-01-01 00:00:00'),minTimeuuid('2020-01-01 00:00:00'),'test');
insert into myTable(id,datefrom,dateto,somecolumns)
values(1,minTimeuuid('1970-01-01 00:00:00'),minTimeuuid('2012-01-01 00:00:00'),'test2');
select * from myTable where dateto > now() allow filtering;
-- invalid ("A column of a partition key can be restricted only if the preceding one is restricted by an Equal relation.")
select * from myTable where datefrom < now() and dateto > now() allow filtering;
The first query is limiting my result, the row with 'validTo=2012-01-01' is filtered, but I wasn't able to work out a scheme that worked on both limitations in the where clause.
If I understand your problem, what you are looking for is a way to run a range query based on the timestamp. Basically to be able to do this, your model will have to have the timestamp component as part of the clustering key:
create table myTable(
eventType uuid,
ts timestamp,
val text,
PRIMARY KEY (eventType, ts)
);
The above will allow you to run a query like: SELECT eventType, val from myTable where eventType = 'your_event' and ts >= 'start_ts' and ts < 'end_ts'.
What you need to remember is that the clustering keys are dictating the order on disk, thus making it possible to run efficiently queries like above. You can read more details about this in the CQL spec SELECT section.
Their is no such thing as Now() in cassandra like any other sql databases. you have to clearly mention today's date instead of Now() ..
You can use columns in which you defined as primary key or secondary index in where clause.
I have columnfamily with composite key like this
CREATE TABLE sometable(
keya varchar,
keyb varchar,
keyc varchar,
keyd varchar,
value int,
date timestamp,
PRIMARY KEY (keya,keyb,keyc,keyd,date)
);
What I need to do is to
SELECT * FROM sometable
WHERE
keya = 'abc' AND
keyb = 'def' AND
date < '2014-01-01'
And that is giving me this error
Bad Request: PRIMARY KEY part date cannot be restricted (preceding part keyd is either not restricted or by a non-EQ relation)
What's the best way to solve this? Do I need to alter my columnfamily?
I also need to query those table with all keya, keyb, keyc, and date.
You cannot do it in cassandra. Moreover, such a range slicing is costlier too. You are trying to slice through a set of equalities that have the lower priority according to your schema.
I also need to query those table with all keya, keyb, keyc, and date.
If you are considering to solve this problem, considering having this schema. What i would suggest is to have the keys in a separate schema
create table (
timeuuid id,
keyType text,
primary key (timeuuid,keyType))
Use the timeuuid to store the values and do a range scan based on that.
create table(
timeuuid prevTableId,
value int,
date timestamp,
primary key(prevTableId,date))
Guess , in this way, your table is normalized for better scalability in your use case and may save a lot of disk space if keys are repetitive too.
We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.