Select first N rows of Cassandra table - cassandra

As stated in this doc to select a range of rows i have to write this:
select first 100 col1..colN from table;
but when I launch this on cql shell I get this error:
<ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:13 no viable alternative at input '100' (select [first] 100...)">
What's wrong?

According to the Docs, key word first is to limit the number of Columnns, not rows
to limit the number of rows , you must just keyword limit.
select col1..colN from table limit 100;
the default limit is 10000

Related

how can i efficiently query on a Cassandra table having various counts to retrieve?

I have a Cassandra Table FeedCount with Partition key (PKey) cluster keys (filetype,status, time).
I need to get data for a chart where i need to show
TOTAL COUNT: 100
PASSED : 80
FAILED : 20
how shall i query efficiently on above table.
Query for COUNT ALL for Total as count() & Passed as: Count()
where status ="passed" and then programmatically calculate failed as
Failed = Total - passed;
Total = select count(*) from FeedCount where Pkey='any';
Passed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =true'
Query on just Statuses against given fileType and calculate total
i.e. Passed + Failed = Total.
Passed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =true'
Failed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =false'
Point is going over all rows count is efficient or just query with 2nd query to find total?
Imho, there shouldn't be very big difference between two approaches, as you basically read all the data - you have only 2 variants for status field, so you're effectively read all data in 2nd case.
The only difference that I could imagine is that in the first case you're doing select count(*) from FeedCount where Pkey='any';, while in 2nd case you're effectively doing select count(*) from FeedCount where Pkey='any' AND filetype = 'abc';, and if you have multiple file types, then your results aren't the same.

check if table is empty in cassandra DB

I am trying to find a way to determine if the table is empty in Cassandra DB.
cqlsh> SELECT * from examples.basic ;
key | value
-----+-------
(0 rows)
I am running count(*) to get the value of the number of rows , but I am getting warning message, So I wanted to know if there is any better way to check if the table is empty(zero rows).
cqlsh> SELECT count(*) from examples.basic ;
count
-------
0
(1 rows)
Warnings :
Aggregation query used without partition key
cqlsh>
Aggregations, like count, can be an overkill for what you are trying to accomplish, specially with the star wildcard, as if there is any data on your table, the query will need to do a full table scan. This can be quite expensive if you have several records.
One way to get the result you are looking for is the query
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
Empty table:
The resultset will be empty
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
key
-----
(0 rows)
Table with data:
The resultset will have a record
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
key
----------------------------------
uL24bhnsHYRX8wZItWM6xKdS0WLvDsgi
(1 rows)

SELECT COLUMN which has null values (Cassandra 3.11.3)

I have a table (table1) with 14 columns in which 10 columns has data and I need to import data to rest of 4 columns from other table (For now these 4 columns has empty/null value)
My team met has written a code perform import of data from other table but he is facing some issues and he is asking me to give him a select query which will give/display columns which has null/empty dataset (In this example 4 columns which are having null/empty dataset)
I have tried below select query... were I have used distinct query with partition key.
SELECT distinct host_name from table1 WHERE empty_column = '' ;
Note: Column - host_name is the primary or partition key and empty_column is the column which does not have any value or null/empty dataset.
Getting error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="SELECT DISTINCT with WHERE clause only supports restriction by partition key and/or static columns."
Please help...

how to check all null values in a particular column in cassandra table?

by using the below query i am not getting anyrecords
SELECT * FROM test_table where prty_ad_line3_tx = '' and prty_role_type_cd='Co-Borrower' and prty_role_sq_nb =5 limit 10 ALLOW FILTERING;
need help from your end.

Count rows in table

I have a trouble with the rows counting of very huge table in Cassandra DB.
Simple statement:
SELECT COUNT(*) FROM my.table;
Invokes the timeout error:
OperationTimedOut: errors={}, ...
I have increased client_timeout in ~/.cassandra/cqlshrc file:
[connection]
client_timeout = 900
Statement is running this time and invokes OperationTimeout error again. How can I count rows in table?
You could count multiple times by using split token ranges.
Cassandra uses a token range from -2^63 to +2^63-1. So by splitting up this range you could do queries like that:
select count(*) from my.table where token(partitionKey) > -9223372036854775808 and token(partitionKey) < 0;
select count(*) from my.table where token(partitionKey) >= 0 and token(partitionKey) < 9223372036854775807;
Add those two counts and you'll have the total count.
If those querys still not go through you can split them again into smaller token ranges.
Check out this tool, which does basically exactly that: https://github.com/brianmhess/cassandra-count

Resources