Cassandra(Amazon keyspace) Query Error on clustered columns - cassandra

I am trying execute query on clustering columns on amazon keyspace, since I don't want to use ALLOW FILTERING with my native query I have created 4-5 clustering columns for better performance.
But while trying to filter it based on >= and <= with on 2 clustering columns, I am getting error with below message
message="Clustering column "start_date" cannot be restricted (preceding column "segment_id" is restricted by a non-EQ relation)"
I had also tried with multiple columns query but I am getting not supported error
message="MultiColumn relation is not yet supported."
Query for the reference
select * from table_name where shard_id = 568 and division = '10' and customer_id = 568113 and (segment_id, start_date,end_date)>= (-1, '2022-05-16','2017-03-28') and flag = 1;
or
select * from table_name where shard_id = 568 and division = '10' and customer_id = 568113 and segment_id > -1 and start_date >='2022-05-16';

I am assuming that the your table has the following primary key:
CREATE TABLE table_name (
...
PRIMARY KEY(shard_id, division, customer_id, segment_id, start_date, end_date)
)
In any case, your CQL query is invalid because you can only apply an inequality operator on the last clustering column in your query. For example, these are valid queries based on your table schema:
SELECT * FROM table_name
WHERE shard_id = ? AND division = ?
AND customer_id <= ?
SELECT SELECT * FROM table_name \
WHERE shard_id = ? AND division = ? \
AND customer_id = ? AND segment_id > ?
SELECT SELECT * FROM table_name \
WHERE shard_id = ? AND division = ? \
AND customer_id = ? AND segment_id = ? AND start_date >= ?
All preceding columns must be filtered by an equality operator except for the very last clustering column in your query.
If you require a complex predicate for your queries, you will need to index your Cassandra data with tools such as Elasticsearch or Apache Solr. They will allow you to run complex search parameters to retrieve data from your database. Cheers!

ALLOW Filtering gets a bad rap sometimes. It all depends on how many rows you end up scanning. It's good to understand how many rows per partition will be scanned and work backwards from there. Only the last column can contain inequality statements to bound ranges. Try to order your columns to eliminate the most columns first, which reduce the number of rows 'Filtered'.
In the example below we used the index for keys up to start date and filtered on end_data, segment_id, and flag.
select * from table_name where shard_id = 568 and division = '10' and customer_id = 568113 and start_date >= '2022-05-16' and end_date > '2017-03-28') and (segment_id > -1 flag = 1;```

Related

How do I select all rows from two clustering columns in cassandra database

I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
Now I have a scenario where I need to fetch results from only B and C. Is there a way to this with out using Allow Filtering.
You cannot fetch on basis of 'B' and 'C' (the clustering columns) without partition key without using Allow Filtering. Though you can use spark and spark-cassandra-connector for filtering out the results on basis of 'B' and 'C'. Behind the scene it also used allow filtering but it has efficient mechanism to scan the table the right way.

how to check all null values in a particular column in cassandra table?

by using the below query i am not getting anyrecords
SELECT * FROM test_table where prty_ad_line3_tx = '' and prty_role_type_cd='Co-Borrower' and prty_role_sq_nb =5 limit 10 ALLOW FILTERING;
need help from your end.

How do I select all rows for a clustering column in cassandra?

I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
On certain cases, I want the B value to be any value in that column.
Is there a way I can query like the following?
Select * from table where A = ? and B = 'any value' and C = ?
Option 1:
In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your fourth query (queries by A and C, but not necessarily knowing B value), is to create a new table to handle that specific query. This table will be pretty much the same, except the CLUSTERING COLUMNS will be in slightly different order:
PRIMARY KEY (A, C, B)
Now this query will work:
Select * from table where A = ? and C = ?
Option 2:
Alternatively you can create a materialized view, with a different clustering order. Now Cassandra will keep the MV in sync with your table data.
create materialized view mv_acbd as
select A, B, C, D
from TABLE1
where A is not null and B is not null and C is not null
primary key (A, C, B);
Now the query against this MV will work like a charm
Select * from mv_acbd where A = ? and C = ?
Option 3:
Not the best, but you could use the following query with your table as it is
Select * from table where A = ? and C = ? ALLOW FILTERING
Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster. For this particular case, the scan is within the same partition and performance may vary depending on ratio of how many clustering columns per partition your use case has.

Cassandra cql: select N “most recent” rows in ascending order

I understand that the best way to fetch the most recent rows in Cassandra is to create my table as following:
CREATE TABLE IF NOT EXISTS data1(
asset_id int
date timestamp,
value decimal,
PRIMARY KEY ((asset_id), date)
) WITH CLUSTERING ORDER BY (date desc);
Then select 1000 recent data items via:
select * from data1 where asset_id = 8 limit 1000;
The client requires the data in ascending order.
Server side is python.
Is there a way to reverse the results in CQL and not in code (i.e. python)?
Have you tried using the ORDER BY clause
select * from data1 where asset_id = 8 ORDER BY date asc limit 1000;
More information available here:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/useColumnsSort.html

CQLSH - Check for null in where clause for MAP Data type

CASSANDRA Version : 2.1.10
CREATE TABLE customer_raw_data (
id uuid,
hash_prefix bigint,
profile_data map<varchar,varchar>
PRIMARY KEY (hash_prefix,id));
I have an index on profile_data and I have row where profile_data is null.
How to write a select query to retrieve the rows where profile_data is null ?
I tried the following
select count(*) from customer_raw_data where profile_data=null;
select count(*) from customer_raw_data where profile_data CONTAINS KEY null;
With Reference to : https://issues.apache.org/jira/browse/CASSANDRA-3783
There is currently no select support for indexed nulls, and given the design of Cassandra, is considered a difficult/prohibitive problem.
Basic problem.
where condition column has to be either primary key or secondary index so make your column what-ever is suitable and then try below query.
Try this..
select count(*) from customer_raw_data where profile_data='';
SELECT * FROM TableName WHERE colName > 5000 ALLOW FILTERING; //Work fine
SELECT * FROM TableName WHERE colName > 5000 limit 10 ALLOW FILTERING;
https://cassandra.apache.org/doc/old/CQL-3.0.html
Check the "ALLOW FILTERING" Part.

Resources