i have created a table with this schema
CREATE TABLE iplocation (
"idIPLocation" uuid,
"fromIP" bigint,
"toIP" bigint,
"idCity" uuid,
"idCountry" uuid,
"idProvince" uuid,
"isActive" boolean,
PRIMARY KEY ("idIPLocation", "fromIP", "toIP")
)
and inserted some records in it!
now i want to fetch a record like this
select * from iplocation where "toIP" <= 3065377522 and "fromIP" >= 3065377522 ALLOW FILTERING;
but its giving me an error of
A column of a clustering key can be restricted only if the preceding one is restricted by an Equal relation.
You need to restrict fromIP before restrict toIP.
but if i want to do just
select * from iplocation where "toIP" <= 3065377522 ALLOW FILTERING;
It still says
column of a clustering key can be restricted only if the preceding
one is restricted by an Equal relation.
You need to restrict
fromIP before restrict toIP.
i cant figureout whats the problem?
Your are misusing partition key concept. In your case the partition key is idIPLocation Cassandra use this key to know in which partition data will be write or read. So in your select statement you have to provide the partition key. Then you can filter data within the specified partition by provide fromIP, toIP.
You have four solutions :
1) Chose a better partition key : you can for example use followinf partition key clause : PRIMARY KEY ("toIP"). But in your case I guess this solution won't work because you want to query data by idIPLocation too.
2) Denormalize : add a new table with the same data structure but a différent partition key like so :
CREATE TABLE backup_advertyze.iplocation (
"idIPLocation" uuid,
"fromIP" bigint,
"toIP" bigint,
"idCity" uuid,
"idCountry" uuid,
"idProvince" uuid,
"isActive" boolean,
PRIMARY KEY ("idIPLocation", "fromIP", "toIP")
);
CREATE TABLE backup_advertyze.iplocationbytoip (
"idIPLocation" uuid,
"fromIP" bigint,
"toIP" bigint,
"idCity" uuid,
"idCountry" uuid,
"idProvince" uuid,
"isActive" boolean,
PRIMARY KEY ("toIP", "fromIP")
);
with this structure you can run this query select * from iplocationbytoip where "toIP" <= 3065377522 and "fromIP" >= 3065377522.
But with this solution you have to maintain doubles in two tables
3) Use materialized view :
This is the same concept as 2) but you have to maintain data in one table instead of two :
`CREATE TABLE backup_advertyze.iplocation (
"idIPLocation" uuid,
"fromIP" bigint,
"toIP" bigint,
"idCity" uuid,
"idCountry" uuid,
"idProvince" uuid,
"isActive" boolean,
PRIMARY KEY ("idIPLocation", "fromIP", "toIP")
);
CREATE MATERIALIZED VIEW backup_advertyze.iplocationbytoip
AS
SELECT *
FROM backup_advertyze.iplocation
WHERE idIPLocation IS NOT NULL
AND fromIP IS NOT NULL
AND toIP IS NOT NULL
PRIMARY KEY (toip, fromip, idiplocation);`
4) The most simple solution but i don't recommend due to query performences issues is to use secondary indexes :
CREATE INDEX iplocationfromindex ON backup_advertyze.iplocation(fromip);
you can run your query select * from iplocation where "toIP" <= 3065377522 and "fromIP" >= 3065377522 ALLOW FILTERING;.
Hope this can help you.
First of all, use of the ALLOW FILTERING directive is horribly inefficient, and its use is considered to be an anti-pattern. If you find yourself having to use it to satisfy a query requirement, you should be building a new table that better-suits your query, instead. Perhaps, one that makes better use of your partition keys for data retrieval.
select * from implication
where "toIP" <= 3065377522 and "fromIP" >= 3065377522 ALLOW FILTERING;
This fails because Cassandra only use non-equals conditions (>,=>,<,<=) on a single column, and it has to be the last one.
select * from implication
where "toIP" <= 3065377522 ALLOW FILTERING;
This fails with the same error message, because it senses that you are actively trying to prevent Cassandra from doing what it does best. And that is read a single row or a contiguous range of ordered rows off of the disk. Essentially, you are asking it to perform random reads, because it will have to check every node in your cluster to satisfy this query. As Cassandra is designed to support large-scale, that could introduce lots of network time into your query equation...something it is trying to save you from.
To solve this issue, I would rework the table with an appropriate partition key (as mentioned above) a single IP address column, and a from/to column...all a part of the key. It would look something like this:
CREATE TABLE iplocation (
idIPLocation uuid,
IP bigint,
fromTo text,
idCity uuid,
idCountry uuid,
idProvince uuid,
isActive boolean,
PRIMARY KEY (idIPLocation, IP, fromTo)
);
Now you essentially store your data twice, giving you a starting and ending IP range. The rows are differentiated by a F or T as a clustering key to tell you which is the "From IP" and which is the "To IP."
aploetz#cqlsh:stackoverflow> SELECT * FROm implication
WHERE idiplocation=76080f76-92f7-4d25-a531-a44c38ff38a7
AND IP>=10000 AND IP<=3065377522;
idiplocation | ip | fromto | idcity | idcountry | idprovince | isactive
--------------------------------------+----------+--------+--------------------------------------+--------------------------------------+--------------------------------------+----------
76080f76-92f7-4d25-a531-a44c38ff38a7 | 10001 | F | 6921a08b-c156-428e-8d4f-b371ff13f073 | f33bd5ed-b9b3-419b-99ab-ac2a7c87ba55 | 5a13cfcc-382e-418a-aeae-309f43671336 | True
76080f76-92f7-4d25-a531-a44c38ff38a7 | 10480101 | T | 6921a08b-c156-428e-8d4f-b371ff13f073 | f33bd5ed-b9b3-419b-99ab-ac2a7c87ba55 | 5a13cfcc-382e-418a-aeae-309f43671336 | True
(2 rows)
This is similar to how I model problems where data points have a range of both a starting and ending time. While your end solution will probably be different, the modeling mechanism here is something that may work for you.
Related
Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.
I am not able to perform Group by on a primary partition. I am using Cassandra 3.10. When I group by I get the following error.
InvalidReqeust: Error from server: code=2200 [Invalid query] message="Group by currently only support groups of columns following their declared order in the Primary Key. My column is a primary key even still I am facing the problem.
My schema is
Table trends{
name text,
price int,
quantity int,
code text,
code_name text,
cluster_id text
uitime timeuuid,
primary key((name,price),code,uitime))
with clustering order by (code DESC, uitime DESC)
And the command that I run is: select sum(quantity) from trends group by code;
For starters your schema is invalid. You cannot set clustering order on code because it is the partition key. The order is going to be determined by the hash of it (unless using byte order partitioner - but don't do that).
The query and thing your talking about does work though. For example you can run
> SELECT keyspace_name, sum(partitions_count) AS approx_partitions FROM system.size_estimates GROUP BY keyspace_name;
keyspace_name | approx_partitions
--------------------+-------------------
system_auth | 128
basic | 4936508
keyspace1 | 870
system_distributed | 0
system_traces | 0
where they schema is:
CREATE TABLE system.size_estimates (
keyspace_name text,
table_name text,
range_start text,
range_end text,
mean_partition_size bigint,
partitions_count bigint,
PRIMARY KEY ((keyspace_name), table_name, range_start, range_end)
) WITH CLUSTERING ORDER BY (table_name ASC, range_start ASC, range_end ASC)
Perhaps the pseudo-schema you provided differs from the actual one. Can you provide output of describe table xxxxx in your question?
Have a table
REATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set<text>, PRIMARY KEY (uuidHotel, uuidRoom));
Tried to select with IN:
select * from tabletest where uuidhotel = 'uuidHotel' and bookedtimestampset IN ('1460710800000');
Got
'bookedtimestampset' (set<text>) cannot be restricted by a 'IN' relation"
Can I select elements by IN Set filter?
Can I select elements by IN Set filter?
No, but you can put a secondary index on bookedtimestampset and use the CONTAINS operator:
aploetz#cqlsh:stackoverflow> CREATE INDEX timeset_idx ON tabletest(bookedtimestampset);
aploetz#cqlsh:stackoverflow> SELECT uuidhotel,uuidroom FROM tabletest
WHERE uuidhotel = 'uuidHotel1' and bookedtimestampset CONTAINS '1460710800000';
uuidhotel | uuidroom
------------+----------
uuidHotel1 | uuidroom1
(1 rows)
Normally I wouldn't recommend a secondary index, but as long as you are filtering by a partition key (uuidhotel) it should perform ok.
Can I select elements by IN Set filter?
you can't use clause IN with your primary key. It is highly important to understand how significantly data model influences on query performance. Of course, you can add secondary index for column bookedtimestampset but in this case be ready to for performance degradation.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text, uuidRoom text, uuidGuest text, bookedTimeStampSet set, PRIMARY KEY (uuidHotel, uuidRoom));
your compound primary key consists of one partition key uuidHotel and one clustering key uuidRoom which means that all your hotels and rooms would physically stored on same node in order as result retrieval of rows is very efficient. bookedTimeStampSet is different column which would be spread through whole cluster and it is just impossible to restrict by this column without secondary indexing one.
Consequently. I would recommend you to create primary key according to your future queries even if you need to duplicate some data which is common practice for NoSql database such Cassandra is.
e.q.
CREATE TABLE IF NOT EXISTS tabletest (uuid text, uuidHotel text,
uuidRoom text, uuidGuest text, bookedTimeStamp timestamp, PRIMARY KEY
(uuidHotel, bookedTimeStamp , uuidRoom))
it allows you to make a query like
select * from tabletest where uuidhotel = 'uuidHotel' and
bookedtimestamp > '1460710800000 and bookedtimestamp < '1460710900000'
I have a cassandra table defined like this:
CREATE TABLE test.test(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (id, time, tag)
)
And I want to select one column using select.
I tried:
select * from test where lonumb = 4231;
It gives:
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"
Also I cannot do
select * from test where mstatus = true;
Doesn't cassandra support where as a part of CQL? How to correct this?
You can only use WHERE on the indexed or primary key columns. To correct your issue you will need to create an index.
CREATE INDEX iname
ON keyspacename.tablename(columname)
You can see more info here.
But you have to keep in mind that this query will have to run against all nodes in the cluster.
Alternatively you might rethink your table structure if the lonumb is something you'll do the most queries on.
Jny is correct in that WHERE is only valid on columns in the PRIMARY KEY, or those where a secondary index has been created for. One way to solve this issue is to create a specific query table for lonumb queries.
CREATE TABLE test.testbylonumb(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (lonumb, time, id)
)
Now, this query will work:
select * from testbylonumb where lonumb = 4231;
It will return all CQL rows where lonumb = 4231, sorted by time. I put id on the PRIMARY KEY to ensure uniqueness.
select * from test where mstatus = true;
This one is trickier. Indexes and keys on low-cardinality columns (like booleans) are generally considered a bad idea. See if there's another way you could model that. Otherwise, you could experiment with a secondary index on mstatus, but only use it when you specify a partition key (lonumb in this case), like this:
select * from testbylonumb where lonumb = 4231 AND mstatus = true;
Maybe that wouldn't perform too badly, as you are restricting it to a specific partition. But I definitely wouldn't ever do a SELECT * on mstatus.
I'm working on Cassandra, trying to get to know how it works. Encountered something strange while using IN operator. Example:
Table:
CREATE TABLE test_time (
name text,
age int,
time timeuuid,
"timestamp" timestamp,
PRIMARY KEY ((name, age), time)
)
I have inserted few dummy data. Used IN operator as follows:
SELECT * from test_time
where name="9" and age=81
and time IN (c7c88000-190e-11e4-8000-000000000000, c7c88000-190e-11e4-7000-000000000000);
It worked properly.
Then, added a column of type Map. Table will look like:
CREATE TABLE test_time (
name text,
age int,
time timeuuid,
name_age map<text, int>,
"timestamp" timestamp,
PRIMARY KEY ((name, age), time)
)
On executing same query, I got following error:
Bad Request: Cannot restrict PRIMARY KEY part time by IN relation as a collection is selected by the query
From the above examples, we can say, IN operator doesn't work if there are any column of type collection(Map or List) in the table.
I don't understand why it behaves like this. Please let me know If I'm missing anything here. Thanks in advance.
Yup...that is a limitation. You can do the following:
select * from ...where name='9' and age=81 and time > x and time < y
select [columns except collection] from ...where name='9' and age=81 and time in (...)
You can then filter client side, or do another query.
You can either include your column as a part of partitioning expression in the primary key
CREATE TABLE test_time (
name text,
age int,
time timeuuid,
"timestamp" timestamp,
PRIMARY KEY ((name, time), age)
);
or create a separate Materialized View to satisfy your query requirements:
CREATE MATERIALIZED VIEW test_time_mv AS
SELECT * FROM test_time
WHERE name IS NOT NULL AND time IS NOT NULL AND age IS NOT NULL
PRIMARY KEY ((name, time), age);
Now use the Materialized View in your query instead of the base table:
SELECT * from test_time_mv
where name='9'
and age=81
and time IN (c7c88000-190e-11e4-8000-000000000000,
c7c88000-190e-11e4-7000-000000000000);