In my Cassandra Java driver code, I am creating a query and then I print the consitency level of the query
val whereClause = whereConditions(tablename, id); cassandraRepositoryLogger.trace("getRowsByPartitionKeyId: looking in table "+tablename+" with partition key "+partitionKeyColumns +" and values "+whereClause +" fetch size "+fetchSize)
cassandraRepositoryLogger.trace("where clause is "+whereClause)
cassandraRepositoryLogger.trace(s"consistency level ${whereClause.getConsistencyLevel}")
But the print shows taht consistency level is null. Why? Shouldn't it be One by default?
2020-06-10 07:16:44,146 [TRACE] from repository.UsersRepository in scala-execution-context-global-115 - where clause is SELECT * FROM users WHERE bucket=109 AND email='manu.chadha#hotmail.com';
2020-06-10 07:16:44,146 [TRACE] from repository.UsersRepository in scala-execution-context-global-115 - getOneRowByPartitionKeyId: looking in table users with partition key List(bucket, email) and values SELECT * FROM users WHERE bucket=109 AND email='manu.chadha#hotmail.com';
2020-06-10 07:16:44,146 [TRACE] from repository.UsersRepository in scala-execution-context-global-115 - consistency level null <-- Why is this null?
The query if build like follows
def whereConditions(tableName:String,id: UserKeys):Where= {
QueryBuilder.select().from(tableName).where(QueryBuilder.eq("bucket", id.bucket))
.and(QueryBuilder.eq("email", id.email))
}
This is how getConsistencyLevel method is implemented. This method returns consistencylevel of the query or null if no consistency level has been set using setConsistencyLevel.
Related
I am trying to filter data using a where clause having OR condition from Greenplum. I am using “Greenplum” connector in spark.
Snippet -
Df1 = Df.filter(col(‘id’)==‘1’ & (col(‘Name’)==‘abc’ | col(‘Name’).isNull()))
The connector translates this into a sql query internally and it looks like this -
Select * from df where
id=‘1’ and Name=‘abc’ or Name is null;
This is an incorrect query because I want to fetch all the records where id is 1 and name is abc or null. With this query the data fetched has records where id not equal to 1 but name is null.
I have following table in cassandra:
CREATE TABLE article (
id text,
price int,
validFrom timestamp,
PRIMARY KEY (id, validFrom)
) WITH CLUSTERING ORDER BY (validFrom DESC);
With articles and historical price information (validFrom is a timestamp of new price). Article price changes often. I want to query for
Historic price for a particular article.
Last price for an article.
From my understanding, I can solve both problems with following query:
select id, price from article where id = X validFrom < Y limit 1;
This query uses article id as restriction, query uses the partition key. Since the clustering order is based on the validFrom timestamp in reversed order, cassandra can efficient perform this query.
Am I getting this right?
What is the best approach to delete old data (house-keeping). Let's assume, I want delete all articles with validFrom > 20150101 and validFrom < 20151231. Since I don't have a primary key, this would be inefficient, even if I use an index on validFrom, right? How can I achieve this?
You can use external tools for that:
Spark with Spark Cassandra Connector (even in the local mode). Code could look as following (note that I'm using validfrom as name, not validFrom, as it's not escaped in your schema):
import com.datastax.spark.connector._
val data = sc.cassandraTable("test", "article")
.where("validfrom >= '2020-07-28T11:50:00Z' AND validfrom < '2020-07-28T12:50:00Z'")
.select("id", "validfrom")
data.deleteFromCassandra("test", "article", keyColumns=SomeColumns("id", "validfrom"))
use DSBulk to do find the matching entries and output them into the file (output.csv in my case), and then perform their deletion:
bin/dsbulk unload -url output.csv \
-query "SELECT id, validfrom FROM test.article WHERE token(id) > :start AND token(id) <= :end AND validFrom >= '2020-07-28T11:50:00Z' AND validFrom < '2020-07-28T12:50:00Z' ALLOW FILTERING"
bin/dsbulk load -query "DELETE from test.article WHERE id = :id and validfrom = :validfrom" \
-url output.csv
To add to Alex Ott's answer, this comment of yours is incorrect:
This query uses article id as restriction, query uses the partition key. Since the clustering order is based on price, cassandra can efficient perform this query.
The rows are not ordered by price. They are ordered by validFrom in reverse-chronological order. Cheers!
I have one table TestTable and partition Key TestColumn.
Inputs Dates:
from_date= "2017-04-20T16:31:54.451071+00:00"
to_date = "2018-04-20T16:31:54.451071+00:00"
when I use equal query the date then it is working.
key_expr = Key('TestColumn').eq(to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
but when I use between query then is not working.
key_expr = Key('TestColumn').between(from_date, to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
Error:
Unknown err_msg while querying dynamodb: An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
DynamoDB Query will return data from one and only one partition, meaning you have to supply a single partition key in the request.
KeyConditionExpression
The condition that specifies the key value(s)
for items to be retrieved by the Query action.
The condition must perform an equality test on a single partition key
value.
You can optionally use a BETWEEN operator on a sort key (but you still have to supply a single partition key).
If you use a Scan you can use an ExpressionFilter and use the BETWEEN operator on TestColumn
I have many tables per keyspace, therefore I would like to filter the tables based on restriction criteria. I tried this query but it is not really giving the intended result that I want:
SELECT table_name FROM system_schema.tables
WHERE keyspace_name = 'test'
and table_name >= 'test_001_%';
The output shown is:
'table_name'
---------------------
'test_001_metadata'
'test_001_time1'
'test_001_time2'
'test_001_time3'
'test_001_time4'
'test_002_metadata'
'test_002_time1'
'test_002_time2'
'test_002_time3'
What I really want is:
The output shown is:
'table_name'
---------------------
'test_001_metadata'
'test_001_time1'
'test_001_time2'
'test_001_time3'
'test_001_time4'
The other way out is to use LIKE keyword by creating secondary index on table_name. But I am a bit skeptical if it might cause problem as it is a system table. Another concern is, does clustering column ACTUALLY support secondary index?
Create a SASI index with mode contains on the table_name column after removing the previous index and try the query as
SELECT table_name FROM system_schema.tables
WHERE keyspace_name = 'test'
and table_name LIKE '%test_001_%';
The command to create a SASI index with mode contains is as follows:
CREATE CUSTOM INDEX ON system_schema.tables(table_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false', 'tokenization_normalize_uppercase': 'true', 'mode': 'CONTAINS'}
And for your second question, you cannot create secondary index on anything which is part of PRIMARY KEY.
I am trying out Cassandra for the first time and running it locally for simple session management db. [Cassandra-2.0.4, CQL3, datastax driver 2.0.0-rc2]
The following count query works fine when there is no data in the table:
select count(*) from session_data where app_name=? and account=? and last_access > ?
But after even a single row is inserted into the table, the query fails with the following error:
java.lang.AssertionError
at org.apache.cassandra.db.filter.ExtendedFilter$WithClauses.getExtraFilter(ExtendedFilter.java:258)
at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1719)
at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1674)
at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:111)
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1418)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Here is the schema I am using:
CREATE KEYSPACE session WITH replication= {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE session_data (
username text,
session_id text,
app_name text,
account text,
last_access timestamp,
created_on timestamp,
PRIMARY KEY (username, session_id, app_name, account)
);
create index sessionIndex ON session_data (session_id);
create index sessionAppName ON session_data (app_name);
create index lastAccessIndex ON session_data (last_access);
I am wondering if there is something wrong in the table definition/indexes or the query itself. Any help/insight would be greatly appreciated.
It looks like you're tripping over a bug in Cassandra. Here is the assertion and related comments in the Cassandra sources:
/*
* This method assumes the IndexExpression names are valid column names, which is not the
* case with composites. This is ok for now however since:
* 1) CompositeSearcher doesn't use it.
* 2) We don't yet allow non-indexed range slice with filters in CQL3 (i.e. this will never be
* called by CFS.filter() for composites).
*/
assert !(cfs.getComparator() instanceof CompositeType);
This code was modified between cassandra-2.0.4 and trunk as part of ticket CASSANDRA-5417, but it's not clear to me that the author was aware of this issue. The assertion was removed, but the comment was not. I would recommend submitting a bug report to the Cassandra project.