I'm new to Cassandra and I'm having a difficulty to use a simple select query on a very basic table. For example,
SELECT * FROM cars WHERE date > '2015-10-10';
on this given table:
CREATE TABLES cars ( id int primary key, name varchar, type varchar, date varchar);
I'm able to use the = operator but not the >, < >=, <=.
I have read on this subject including this article and this overflow question on the different key types, but it is still unclear to me. In the table above, date is a SIMPLE column, why can't I use the WHERE clause like I would use it in a regular RDBMS?
In Cassandra, you can only use the WHERE clause on Keys, that's why your query doesn't work.
Take a look on this article that is similar to your problem, you'll understand that Cassandra data model isn't the same as the relational one.
Related
I am a new bee to Cassandra.
I have a Table(table1) and the Data like
ch1,ch2,ch3,ch4
LD,9813970,1484914,'T03103','T04014'
LD,1008203,1486104,'T03103','T04024'
Want to find a string in this cassandra table : table1. Is there any option to search a given string in this table's column ch4 using only IN operator (not LIKE operator). Sample query is like
select * from table1 where 'T04014' IN (ch4)
if required ch4 column may included in the partition or clustering keys.
You didn't post the table schema so I'm going to assume that ch4 is not part of the primary key.
You cannot include a column in the filter unless it is part of the primary key or you have a secondary index defined on it. Be aware that secondary indexes are not always a good fit. Have a look at when to use an index for details.
The general recommendation is to denormalise and create a table specifically designed for each app query so you get the best performance out of your cluster. Cheers!
I am new to cassandra and I read some articles about static and dynamic column family.
It is mentioned ,From Cassandra 3 table and column family are same.
I created key space, some tables and inserted data into that table.
CREATE TABLE subscribers(
id uuid,
email text,
first_name text,
last_name text,
PRIMARY KEY(id,email)
);
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test#123.com','Test1','User1');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test2#222.com','Test2','User2');
INSERT INTO subscribers(id,email,first_name,last_name)
VALUES(now(),'Test3#333.com','Test3','User3');
It all seems to work fine.
But what I need is to create a dynamic column family with only data types and no predefined columns.
With insert query I can have different arguments and the table should be inserted.
In articles, it is mentioned ,for dynamic column family, there is no need to create a schema(predefined columns).
I am not sure if this is possible in cassandra or my understanding is wrong.
Let me know if this is possible or not?
if possible Kindly provide with some examples.
Thanks in advance.
I think that articles that you're referring where written in the first years of Cassandra, when it was based on the Thrift protocols. Cassandra Query Language was introduced many years ago, and now it's the way to work with Cassandra - Thrift is deprecated in Cassandra 3.x, and fully removed in the 4.0 (not released yet).
If you really need to have fully dynamic stuff, then you can try to emulate this by using table with columns as maps from text to specific type, like this:
create table abc (
id int primary key,
imap map<text,int>,
tmap map<text,text>,
... more types
);
but you need to be careful - there are limitations and performance effects when using collections, especially if you want to store more then hundreds of elements.
another approach is to store data as individual rows:
create table xxxx (
id int,
col_name text,
ival int,
tval text,
... more types
primary key(id, col_name));
then you can insert individual values as separate columns:
insert into xxxx(id, col_name, ival) values (1, 'col1', 1);
insert into xxxx(id, col_name, tval) values (1, 'col2', 'text');
and select all columns as:
select * from xxxx where id = 1;
I have a question to query to cassandra collection.
I want to make a query that work with collection search.
CREATE TABLE rd_db.test1 (
testcol3 frozen<set<text>> PRIMARY KEY,
testcol1 text,
testcol2 int
)
table structure is this...
and
this is the table contents.
in this situation, I want to make a cql query has alternative option values on set column.
if it is sql and testcol3 isn't collection,
select * from rd.db.test1 where testcol3 = 4 or testcol3 = 5
but it is cql and collection.. I try
select * from test1 where testcol3 contains '4' OR testcol3 contains '5' ALLOW FILTERING ;
select * from test1 where testcol3 IN ('4','5') ALLOW FILTERING ;
but this two query didn't work...
please help...
This won't work for you for multiple reasons:
there is no OR operation in CQL
you can do only full match on the value of partition key (testcol3)
although you may create secondary indexes for fields with collection type, it's impossible to create an index for values of partition key
You need to change data model, but you need to know the queries that you're executing in advance. From brief looking into your data model, I would suggest to rollout the set field into multiple rows, with individual fields corresponding individual partitions.
But I want to suggest to take DS201 & DS220 courses on DataStax Academy site for better understanding how Cassandra works, and how to model data for it.
Sorry the title might/might not give exact description of what i intended.
Here is the problem. I need to select data based on date ranges and most of our queries have 'id' field that is used in our queries.
So, i have created data model with the id as primary key, and date as clustering key.
Essentially like below(i am just using fake/sample statements as i cannot give actual details).
create table tab1(
id text,
col1 text,
... coln text,
rec_date date,
rec_time timestamp,
PRIMARY KEY((id),rec_date,rec_time)
) WITH CLUSTERING ORDER BY rec_date DESC, rec_time DESC;
It works for most of the queries and worked fine.
However, i was trying to optimize below scenario.
-> All the records that are greater than the date abcd-xy-kl
Which one of the below approaches would be good for me.? Or any thing better than these two.?
1) very basic or simple approach. Use the query:
select * from tab1 where id > '0' AND rec_date > 'abcd-xy-kl'
Every record will be essentially greater than '0'. It might still do full table scan.
2) Create secondary index on rec_date and simply use the query:
select * from tab1 where rec_date > 'abcd-xy-kl'
Also, one key thing is i am using spark and using cassandraSqlContext.sql to get the dataframe.
So, considering all the above details, which approach would be better.?
I don't see the point of filtering with id as in your first example. The following should work and would be better approach from my perspective:
select * from tab1 where rec_date > 'abcd-xy-kl' ALLOW FILTERING;
Note that it won't work without ALLOW FILTERING at the end.
You cannot use > 0 for the partition key. It is not supported by Cassandra. Check the documentation for more information on the limitations on the WHERE part of the queries.
In order to query by your clustering keys efficiently you really need to use a secondary index. Refrain from using the ALLOW FILTERING unless you know what you're doing, because it could trigger a "distributed" scan and perform very poorly. Check the documentation for more information.
I have a table as follows in Cassandra 2.0.8:
CREATE TABLE emp (
empid int,
deptid int,
first_name text,
last_name text,
PRIMARY KEY (empid, deptid)
)
when I try to search by: "select * from emp where first_name='John';"
cql shell says:
"Bad Request: No indexed columns present in by-columns clause with Equal operator"
I searched for the issue and every places it says add a secondary index for the column 'first_name'.
But I need to know the exact reason for why that column need to be indexed?
Only thing I can figure out is performance.
Any other reasons?
Cassandra does not support for searching by arbitrary column. It is because it would involve scanning all the rows, which is not supported.
The data are internally organised into something which one can compare to HashMap[X, SortedMap[Y, Z]]. The key of the outer map is a partition key value and the key of the inner map is a kind of concatenation of all clustering columns values and a name of some regular column.
Unless you have an index on a column, you need to provide full (preferred) or partial path to the data you want to collect with the query. Therefore, you should design your schema so that queries contain primary key value and some range on clustering columns.
You may read about what is allowed and what is not here
Alternatively you can create an index in Cassandra, but that will hamper your write performance.