Cassandra query on Map - Contains Clause [duplicate]

Cassandra query on Map - Contains Clause [duplicate] - cassandra

This question already has answers here:
SELECT Specific Value from map
(3 answers)
Closed 7 years ago.
I am trying to query a table containing Map. Is it possible to apply contains clause on map data type table?
CREATE TABLE data.Table1 (
fetchDataMap map<text, frozen<Config>>,
userId text ,
PRIMARY KEY(userId)
);
Getting following Error:
cqlsh> SELECT * FROM data.Table1 WHERE fetchDataMap CONTAINS '233322554843924';
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on
the restricted columns support the provided operators: "
Please enlighten me with better query approach on this requirement.

For this to work, you have to create a secondary index on the map. But, you first have to ask yourself if you want to index your map keys or values (cannot do both). Given your CQL statement, I'll assume that you want to index your map key (and we'll go from there).
CREATE INDEX table1_fetchMapKey ON table1(KEYS(fetchDataMap));
After inserting some data (making a guess as to what your Config UDT looks like), I can SELECT with a slightly modified version of your CQL query above:
aploetz#cqlsh:stackoverflow> SELECT * FROm table1 WHERE
fetchDataMap CONTAINS KEY '233322554843924';
userid | fetchdatamap
--------+------------------------------------------------------------
B26354 | {'233322554843924': {key: 'location', value: '~/scripts'}}
(1 rows)
Note that I cannot in good conscience provide you with this solution, without passing along a link to the DataStax doc When To Use An Index. Secondary indexes are known to not perform well. So I can only imagine that a secondary index on a collection would perform worse, but I suppose that really depends on the relative cardinality. If it were me, I would re-model my table to avoid using a secondary index, if at all possible.

Related

IN operator for non prime attributes

Can I use the IN operator for non-primary key attributes in Cassandra? Or any other methods to alternative instead of using IN in the query?
SELECT * FROM abc WHERE domain IN ('domain1','domain2') allow filtering;
Error from server: code=2200 [Invalid query] message="IN predicates on non-primary-key columns (domain) is not yet supported"

can you recommend any alternative for IN operator in cassandra for my purpose?
Remember, it is still possible to use the IN operator on a partition key. It wouldn't say it's recommended, but it should be fine with a small number of parameters. First though, you'd have to rebuild your table, or build a new table to support that query.
As I don't know your exact table definition, I'm going make some assumptions (like abc_id).
If you recreated your table similar to this:
CREATE TABLE abc_by_domain (
domain TEXT,
abc_id TEXT,
value TEXT,
PRIMARY KEY (domain,abc_id));
Now write some data to it, and then this works:
SELECT * FROM abc_by_domain WHERE domain IN ('domain1','domain2');
domain | abc_id | value
---------+--------+---------
domain1 | 1 | 1st row
domain2 | 2 | 2nd row
(2 rows)
Notes:
I assumed a current, single primary key of abc_id. Essentially, I made that a clustering key to ensure that the underlying rows now partitioned by domain would still be unique. In your case, take whatever key column enforces uniquness in the abc table, and use that as a clustering key to accomplish the same thing.
As per my warning above, this is known as a "multi-key query" which is an anti-pattern in Cassandra. The problem, is that Cassandra cannot guarantee that data on two partitions will be on the same node, so it essentially picks a coordinator and runs two queries behind the scenes. For two parameters, it's probably not too bad. But I would try to keep that in single digits.

Filtering by `list<double>` column's element value range

I'd like to filter rows of following table in cassandra.
CREATE TABLE mids_test_db.defect_data (
wafer_id text,
defect_id text,
document_id text,
fields list<double>,
PRIMARY KEY (wafer_id, defect_id)
)
...
CREATE INDEX defect_data_fields_idx ON mids_test_db.defect_data (values(fields));
What I firstly tried using something like field[0] > 0.5 but failed.
cqlsh:mids_test_db> select fields from defect_data where wafer_id = 'MIDS_1_20170101_023000_30000_1548100671' and fields[0] > 0.5;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Indexes on list entries (fields[index] = value) are not currently supported."
After searching google for a while, i'm feeling like this kind of job can not be easily done in Cassandra. The data model is something like a field value collection. mostly I want to query defect data using its fields data like above which is quite important in my business.
What approach should I have taken into consideration?. Application side filtering? Any hint or advice will be appreciated.

It's not possible to do directly with Cassandra, but you have following alternatives:
if your Cassandra is DataStax Enterprise, then you can use DSE Search;
you can add an additional table to perform lookup:
(...ignore this line...)
CREATE TABLE mids_test_db.defect_data_lookup (
wafer_id text,
defect_id text,
field double,
PRIMARY KEY (wafer_id, field, defect_id)
);
after that you should be able to to do a range scan inside partition, and at least fetch the defect_id field, and fetch all field values via second query.
Depending on your Cassandra version, you may be able to use materialized view to maintain that lookup table for you.

Cassandra insert value disappear

I want to use the Cassandra database system to create tables. The original data is in the picture.
So I create these tables and insert the value
Create table course(
Course_ID text PRIMARY KEY,
Course_Name text,
student_id text
);
However when I want to select all the student id from course American History :select * from course where Course_Name = 'Biology';
Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
Then when I try to print out all the table, I found all the value with some part of duplicate value is missing... Is it because of the way I design table is wrong? How can I change it and select all the student id from one course?
Thanks!!

The issue is that your query for the table course is not using the primary key; unlike relational databases, the tables in Cassandra are designed based on the query that you are going to execute, in this case, you can include the course name as part of the composite key:
Create table course(
Course_ID text,
Course_Name text,
student_id text,
PRIMARY KEY (Course_Name, Course_ID)
);
There are already answers explaining the difference between the keys like this one, you may also want to read this article from Datastax

Why Secondary Index ( = ?) and Clustering Columns (order by) CANNOT be used together for CQL Query?

EDIT: a related jira ticket
A query in pattern select * from <table> where <partition_keys> = ? and <secondary_index_column> = ? order by <first_clustering_column> desc does not work, with error msg:
InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
From the structure of index table, above query include the partition key, and first two cluster columns in the index table. Also, note that without order by clause, the result is sorted by clustering column as CLUSTERING ORDER.
Is there any way to make the query work? If not, why?

Data in Cassandra is naturally stored based on the sort order of Clustering Columns.
Secondary index in Cassandra is way different than a corresponding index in relation database. Its local per node, which means that its contents aren't known to other nodes of the cluster. So sorting by this index is highly impossible. Also within the node, the secondary indexes are holding just pointers to corresponding partition key.
If you need sorting to be performed by Cassandra, have them as clustering columns. Otherwise you can sort them in code, after you retrieve the results.
Also secondary indexes aren't ideal for Cassandra and definitely a better model is to not have them in first place, to save some headache for future.

Why cassandra/cql restrict to use where clause on a column that not indexed?

I have a table as follows in Cassandra 2.0.8:
CREATE TABLE emp (
empid int,
deptid int,
first_name text,
last_name text,
PRIMARY KEY (empid, deptid)
)
when I try to search by: "select * from emp where first_name='John';"
cql shell says:
"Bad Request: No indexed columns present in by-columns clause with Equal operator"
I searched for the issue and every places it says add a secondary index for the column 'first_name'.
But I need to know the exact reason for why that column need to be indexed?
Only thing I can figure out is performance.
Any other reasons?

Cassandra does not support for searching by arbitrary column. It is because it would involve scanning all the rows, which is not supported.
The data are internally organised into something which one can compare to HashMap[X, SortedMap[Y, Z]]. The key of the outer map is a partition key value and the key of the inner map is a kind of concatenation of all clustering columns values and a name of some regular column.
Unless you have an index on a column, you need to provide full (preferred) or partial path to the data you want to collect with the query. Therefore, you should design your schema so that queries contain primary key value and some range on clustering columns.
You may read about what is allowed and what is not here

Alternatively you can create an index in Cassandra, but that will hamper your write performance.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra query on Map - Contains Clause [duplicate] - cassandra

Related

IN operator for non prime attributes

Filtering by `list<double>` column's element value range

Cassandra insert value disappear

Why Secondary Index ( = ?) and Clustering Columns (order by) CANNOT be used together for CQL Query?

Why cassandra/cql restrict to use where clause on a column that not indexed?

Categories

Resources