CQLSH - Check for null in where clause for MAP Data type - cassandra

CASSANDRA Version : 2.1.10
CREATE TABLE customer_raw_data (
id uuid,
hash_prefix bigint,
profile_data map<varchar,varchar>
PRIMARY KEY (hash_prefix,id));
I have an index on profile_data and I have row where profile_data is null.
How to write a select query to retrieve the rows where profile_data is null ?
I tried the following
select count(*) from customer_raw_data where profile_data=null;
select count(*) from customer_raw_data where profile_data CONTAINS KEY null;

With Reference to : https://issues.apache.org/jira/browse/CASSANDRA-3783
There is currently no select support for indexed nulls, and given the design of Cassandra, is considered a difficult/prohibitive problem.

Basic problem.
where condition column has to be either primary key or secondary index so make your column what-ever is suitable and then try below query.
Try this..
select count(*) from customer_raw_data where profile_data='';

SELECT * FROM TableName WHERE colName > 5000 ALLOW FILTERING; //Work fine
SELECT * FROM TableName WHERE colName > 5000 limit 10 ALLOW FILTERING;
https://cassandra.apache.org/doc/old/CQL-3.0.html
Check the "ALLOW FILTERING" Part.

Related

SELECT with yb_hash_code() and DELETE in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
We have below schema in postgresql (yugabyte DB 2.8.3) using YSQL:
CREATE TABLE IF NOT EXISTS public.table1
(
customer_id uuid NOT NULL ,
item_id uuid NOT NULL ,
kind character varying(100) NOT NULL ,
details character varying(100) NOT NULL ,
created_date timestamp without time zone NOT NULL,
modified_date timestamp without time zone NOT NULL,
CONSTRAINT table1_pkey PRIMARY KEY (customer_id, kind, item_id)
);
CREATE UNIQUE INDEX IF NOT EXISTS unique_item_id ON table1(item_id);
CREATE UNIQUE INDEX IF NOT EXISTS unique_item ON table1(customer_id, kind) WHERE kind='NEW' OR kind='BACKUP';
CREATE TABLE IF NOT EXISTS public.item_data
(
item_id uuid NOT NULL,
id2 integer NOT NULL,
create_date timestamp without time zone NOT NULL,
modified_date timestamp without time zone NOT NULL,
CONSTRAINT item_data_pkey PRIMARY KEY (item_id, id2)
);
Goal:
Step 1) Select item_id’s from table1 WHERE modified_date < someDate
Step 2) DELETE FROM table item_data WHERE item_id = any of those item_id’s from step 1
Currently we use query
SELECT item_id FROM table1 WHERE modified_date < $1
Can the SELECT query apply yb_hash_code(item_id) with the SELECT query? Because table1 is indexed on item_id ? to enhance the performance of the SELECT query
Currently we perform:
DELETE FROM item_data x WHERE x.item_id IN the listOfItemIds(provided in Step1 above).
With the given listOfItemIds, can we use yb_hash_code(item_id) to enhance performance of DELETE operation?
Yes, it should work out. Something like:
SELECT item_id FROM item_data WHERE yb_hash_code(customer_id, kind, item_id) <= 128 AND yb_hash_code(customer_id, kind, item_id) >= 0 AND modified_date < x;
While you can combine the SELECT + DELETE in 1 query (like a subselect), this is probably better because it will result in smaller transactions.
Also, no need to use yb_hash_code. The db should be able to find the correct rows since you’re sending the columns that are used for partitioning.

how to check all null values in a particular column in cassandra table?

by using the below query i am not getting anyrecords
SELECT * FROM test_table where prty_ad_line3_tx = '' and prty_role_type_cd='Co-Borrower' and prty_role_sq_nb =5 limit 10 ALLOW FILTERING;
need help from your end.

Cassandra does not support DELETE on indexed columns

Say I have a cassandra table xyz with the following schema :
create table xyz(
xyzid uuid,
name text,
fileid int,
sid int,
PRIMARY KEY(xyzid));
I create index on columns fileid , sid:
CREATE INDEX file_index ON xyz (fileid);
CREATE INDEX sid_index ON xyz (sid);
I insert data :
INSERT INTO xyz (xyzid, name , fileid , sid ) VALUES ( now(), 'p120' , 1, 100);
INSERT INTO xyz (xyzid, name , fileid , ssid ) VALUES ( now(), 'p120' , 1, 101);
INSERT INTO xyz (xyzid, name , fileid , sid ) VALUES ( now(), 'p122' , 2, 101);
I want to delete data using the indexed columns :
DELETE from xyz WHERE fileid=1 and sid=101;
Why do I get this error ?
InvalidRequest: code=2200 [Invalid query] message="Non PRIMARY KEY fileid found in where clause"
Is it mandatory to specify the primary key in the where clause for delete queries ?
Does Cassandra supports deletes using secondary index s ?
What has to be done to delete data using secondary index s ?
Any suggestions that could help .
I am using Data Stax Community Cassandra 2.1.8 but I also want to know whether delete using indexed columns is supported by Data Stax Community Cassandra 3.2.1
Thanks
Let me try and answer your questions in order:
1) Yes, if you are going to use a where clause in a CQL statement then the PARTITION KEY must be an equality operator in the where clause. Other than that you are only allowed to filter on clustering columns specified in your primary key. (Unless you have a secondary index)
2) No it does not. See this post for some more information as it is essentially the same problem.
Why can cassandra "select" on secondary key, but not update using secondary key? (1.2.8+)
3) Why not add sid as a clustering column in your primary key. This would allow you to do the delete or query using both as you have shown.
create table xyz(
xyzid uuid,
name text,
fileid int,
sid int,
PRIMARY KEY(xyzid, sid));
4) In general using secondary indexes is considered an anti-pattern (a bit less so with SASI indexes in C* 3.4) so my question is can you add these fields as clustering columns to your primary key? How are you querying these secondary indexes?
I suppose you can perform delete in two steps:
Select data by secondary index and get primary index column values
(xyzid) from query result
Perform delete by primary index values.

cassandra error when using select and where in cql

I have a cassandra table defined like this:
CREATE TABLE test.test(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (id, time, tag)
)
And I want to select one column using select.
I tried:
select * from test where lonumb = 4231;
It gives:
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"
Also I cannot do
select * from test where mstatus = true;
Doesn't cassandra support where as a part of CQL? How to correct this?
You can only use WHERE on the indexed or primary key columns. To correct your issue you will need to create an index.
CREATE INDEX iname
ON keyspacename.tablename(columname)
You can see more info here.
But you have to keep in mind that this query will have to run against all nodes in the cluster.
Alternatively you might rethink your table structure if the lonumb is something you'll do the most queries on.
Jny is correct in that WHERE is only valid on columns in the PRIMARY KEY, or those where a secondary index has been created for. One way to solve this issue is to create a specific query table for lonumb queries.
CREATE TABLE test.testbylonumb(
id text,
time bigint,
tag text,
mstatus boolean,
lonumb int,
PRIMARY KEY (lonumb, time, id)
)
Now, this query will work:
select * from testbylonumb where lonumb = 4231;
It will return all CQL rows where lonumb = 4231, sorted by time. I put id on the PRIMARY KEY to ensure uniqueness.
select * from test where mstatus = true;
This one is trickier. Indexes and keys on low-cardinality columns (like booleans) are generally considered a bad idea. See if there's another way you could model that. Otherwise, you could experiment with a secondary index on mstatus, but only use it when you specify a partition key (lonumb in this case), like this:
select * from testbylonumb where lonumb = 4231 AND mstatus = true;
Maybe that wouldn't perform too badly, as you are restricting it to a specific partition. But I definitely wouldn't ever do a SELECT * on mstatus.

Cassandra CQL: Filter the rows between a range of values

The structure of my column family is something like
CREATE TABLE product (
id UUID PRIMARY KEY,
product_name text,
product_code text,
status text,//in stock, out of stock
mfg_date timestamp,
exp_date timestamp
);
Secondary Index is created on status, mfg_date, product_code and exp_date fields.
I want to select the list of products whose status is IS (In Stock) and the manufactured date is between timestamp xxxx to xxxx.
So I tried the following query.
SELECT * FROM product where status='IS' and mfg_date>= xxxxxxxxx and mfg_date<= xxxxxxxxxx LIMIT 50 ALLOW FILTERING;
It throws error like No indexed columns present in by-columns clause with "equals" operator.
Is there anything I need to change in the structure? Please help me out. Thanks in Advance.
cassandra is not supporting >= so you have to change the value and have to use only >(greater then) and <(lessthen) for executing query.
You should have at least one "equals" operator on one of the indexed or primary key column fields in your where clause, i.e. "mfg_date = xxxxx"

Resources