I have a secondary index on a table:
CREATE NULL_FILTERED INDEX RidesByPassenger ON Rides(
passenger_id,
start_time,
)
If I run the following query:
SELECT start_time FROM Rides#{FORCE_INDEX=RidesByPassenger}
WHERE passenger_id='someid' AND start_time IS NOT NULL;
Can I be sure the base table won't be accessed for it? In other words, if I query a secondary index using only the first part of the primary key (in this case passenger_id), will it use only the secondary index? Or the base table as well? Also, is there a way to ask Spanner exactly which tables it's accessing when I run a query?
Since this query only uses columns that are covered by the index, it will not join the base table.
You can always run (EXPLAIN/PROFILE SQL_QUERY for the query plan) in gcloud tool to be sure.
Related
I am a new bee to Cassandra.
I have a Table(table1) and the Data like
ch1,ch2,ch3,ch4
LD,9813970,1484914,'T03103','T04014'
LD,1008203,1486104,'T03103','T04024'
Want to find a string in this cassandra table : table1. Is there any option to search a given string in this table's column ch4 using only IN operator (not LIKE operator). Sample query is like
select * from table1 where 'T04014' IN (ch4)
if required ch4 column may included in the partition or clustering keys.
You didn't post the table schema so I'm going to assume that ch4 is not part of the primary key.
You cannot include a column in the filter unless it is part of the primary key or you have a secondary index defined on it. Be aware that secondary indexes are not always a good fit. Have a look at when to use an index for details.
The general recommendation is to denormalise and create a table specifically designed for each app query so you get the best performance out of your cluster. Cheers!
I have a question to query to cassandra collection.
I want to make a query that work with collection search.
CREATE TABLE rd_db.test1 (
testcol3 frozen<set<text>> PRIMARY KEY,
testcol1 text,
testcol2 int
)
table structure is this...
and
this is the table contents.
in this situation, I want to make a cql query has alternative option values on set column.
if it is sql and testcol3 isn't collection,
select * from rd.db.test1 where testcol3 = 4 or testcol3 = 5
but it is cql and collection.. I try
select * from test1 where testcol3 contains '4' OR testcol3 contains '5' ALLOW FILTERING ;
select * from test1 where testcol3 IN ('4','5') ALLOW FILTERING ;
but this two query didn't work...
please help...
This won't work for you for multiple reasons:
there is no OR operation in CQL
you can do only full match on the value of partition key (testcol3)
although you may create secondary indexes for fields with collection type, it's impossible to create an index for values of partition key
You need to change data model, but you need to know the queries that you're executing in advance. From brief looking into your data model, I would suggest to rollout the set field into multiple rows, with individual fields corresponding individual partitions.
But I want to suggest to take DS201 & DS220 courses on DataStax Academy site for better understanding how Cassandra works, and how to model data for it.
I have just started working on Cassandra.
I am bit confuse with the concept of secondary key.
From the definition I understood is indexing on the non key attribute of a table which is not sorted is secondary index.
So I have this table
CREATE TABLE IF NOT EXISTS userschema.user (id int,name text, address text, company text, PRIMARY KEY (id, name))
So If I create index like this
CREATE INDEX IF NOT EXISTS user_name_index ON userschema.user (name)
this should be secondary index.
But my requirement is to create index containing columns name , id , company.
How can I create a secondary index like this in Cassandra ?
I got this link which defines something of this short, but how come are these secondary indexes aren't they just table ?
These above user table is just the example not the actual one.
I am using Cassandra 3.0.9
id and name are already part of primary key.
So following queries will work
SELECT * FROM table WHERE id=1
SELECT * FROM table WHERE id=1 and name='some value'
SELECT * FROM table WHERE name='some value' ALLOW FILTERING (This is inefficeint)
You can create secondary index on company column
CREATE INDEX IF NOT EXISTS company_index ON userschema.user (company)
Now once secondary index is defined, it can be used in where clause along with primary key.
SELECT * FROM table WHERE id=1 and name='some value' and company='some value'
Though SELECT * FROM table WHERE company='some value' ALLOW FILTERING works it will be highly inefficient.
Before creating secondary index have look at When to use secondary index in cassandra
The link which you have referred mainly focuses on materialized views, in which we create virtual tables to execute the queries with non-primary keys.
Moreover, it seems you are creating secondary key on a Primary Key, which you have already defined in the creation of the table. Always remember that Secondary Index should be Non-Primary key.
To have a clear idea about the Secondary Indexes- Refer this https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSecondaryIndex.html
Now, Pros and cons of the alternative methods for the secondary index
1.Materialized views:
It will create new virtual tables and you should run the queries in a virtual table using the old Primary keys in old and original tables and new virtual Primary keys in the new materialized table. Any changes in data modification in the original old table will be reflected at materialized table. If you drop the materialized table, but the data will be created as tombstones whose gcc_graceseconds is 864000(10 days) default. Dropping the materialized table will not have any effect on original table.
2.ALLOW FILTERING:
It is highly inefficient and is not at all advised to use allow filtering as the latencies will be high and performance will be degraded.
If you want much more information, refer this link too How do secondary indexes work in Cassandra?
Correct me if I am wrong
I have two tables one is users and other is expired_users.
users columns-> id, name, age
expired_users columns -> id, name
I want to execute the following query.
delete from users where id in (select id from expired_users);
This query works fine with SQL related databases. I want find a solution to solve this in cassandra.
PS: I don't want to add any extra columns in the tables.
While designing cassandra data model, we cannot think exactly like RDBMS .
Design like this --
create table users (
id int,
name text,
age int,
expired boolean static,
primary key (id,name)
);
To mark a user as expired -- Just insert the same row again
insert into users (id,name,age,expired) values (100,'xyz',80,true);
you don't have to update or delete the row, just insert it again, previous column values will get overridden.
What you want to is to use join as a filter for your delete statement, and this is not what the Cassandra model is built for.
AFAIK there is no way to perform this using cql. If you want to perform this action without changing the schema - run external script in any language that has drivers for Cassandra.
Very new to Cassandra so apologies if the question is simple.
I created a table:
create table ApiLog (
LogId uuid,
DateCreated timestamp,
ClientIpAddress varchar,
primary key (LogId, DateCreated));
This work fine:
select * from apilog
If I try to add a where clause with the DateCreated like this:
select * from apilog where datecreated <= '2016-07-14'
I get this:
Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
From other questions here on SO and from the tutorials on datastax it is my understanding that since the datecreated column is a clustering key it can be used to filter data.
I also tried to create an index but I get the same message back. And I tried to remove the DateCreated from the primary key and have it only as an index and I still get the same back:
create index ApiLog_DateCreated on dotnetdemo.apilog (datecreated);
The partition key LogId determines on which node each partition will be stored. So if you don't specify the partition key, then Cassandra has to filter all the partitions of this table on all the nodes to find matching data. That's why you have to say ALLOW FILTERING, since that operation is very inefficient and is discouraged.
If you specify a specific LogId, then Cassandra can find the partition on a single node and efficiently do a range query by the clustering key.
So you need to plan your schema such that you can do your range queries within a single partition and not have to do a full table scan like you're trying to do.
When your query is rejected by Cassandra because it needs filtering, you should resist the urge to just add ALLOW FILTERING to it. You should think about your data, your model and what you are trying to do. You always have multiple options.
You can change your data model, add an index, use another table or use ALLOW FILTERING.
You have to make the right choice for your specific use case.
Anyway you want to make it work.
select * from dev."3" where "column" = '' limit 1000 ALLOW FILTERING;