I have a trouble with the rows counting of very huge table in Cassandra DB.
Simple statement:
SELECT COUNT(*) FROM my.table;
Invokes the timeout error:
OperationTimedOut: errors={}, ...
I have increased client_timeout in ~/.cassandra/cqlshrc file:
[connection]
client_timeout = 900
Statement is running this time and invokes OperationTimeout error again. How can I count rows in table?
You could count multiple times by using split token ranges.
Cassandra uses a token range from -2^63 to +2^63-1. So by splitting up this range you could do queries like that:
select count(*) from my.table where token(partitionKey) > -9223372036854775808 and token(partitionKey) < 0;
select count(*) from my.table where token(partitionKey) >= 0 and token(partitionKey) < 9223372036854775807;
Add those two counts and you'll have the total count.
If those querys still not go through you can split them again into smaller token ranges.
Check out this tool, which does basically exactly that: https://github.com/brianmhess/cassandra-count
Related
I have a Cassandra Table FeedCount with Partition key (PKey) cluster keys (filetype,status, time).
I need to get data for a chart where i need to show
TOTAL COUNT: 100
PASSED : 80
FAILED : 20
how shall i query efficiently on above table.
Query for COUNT ALL for Total as count() & Passed as: Count()
where status ="passed" and then programmatically calculate failed as
Failed = Total - passed;
Total = select count(*) from FeedCount where Pkey='any';
Passed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =true'
Query on just Statuses against given fileType and calculate total
i.e. Passed + Failed = Total.
Passed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =true'
Failed= select count(*) from FeedCount where Pkey='any' and filetype ='abc' and status =false'
Point is going over all rows count is efficient or just query with 2nd query to find total?
Imho, there shouldn't be very big difference between two approaches, as you basically read all the data - you have only 2 variants for status field, so you're effectively read all data in 2nd case.
The only difference that I could imagine is that in the first case you're doing select count(*) from FeedCount where Pkey='any';, while in 2nd case you're effectively doing select count(*) from FeedCount where Pkey='any' AND filetype = 'abc';, and if you have multiple file types, then your results aren't the same.
I have this code:
select DOLFUT from [DATABASE $]
How do I get it to get data from the 2nd line? (skip only the first line of data and collect all the rest)
You can use LIMIT to skip any number of row you want. Something like
SELECT * FROM table
LIMIT 1 OFFSET 10
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
MySql docs
In Access, which you seem to use, you can use:
Select DOLFUT
From [DATABASE $]
Where DOLFUT Not In
(Select Top 1 T.DOLFUT
From [DATABASE $] As T
Order By 1)
Data in tables have no inherent order.
To get data from the 2nd line
, you have to set up some sort sequence and then bypass the first record of the set - as Gustav has shown.
I am trying to find a way to determine if the table is empty in Cassandra DB.
cqlsh> SELECT * from examples.basic ;
key | value
-----+-------
(0 rows)
I am running count(*) to get the value of the number of rows , but I am getting warning message, So I wanted to know if there is any better way to check if the table is empty(zero rows).
cqlsh> SELECT count(*) from examples.basic ;
count
-------
0
(1 rows)
Warnings :
Aggregation query used without partition key
cqlsh>
Aggregations, like count, can be an overkill for what you are trying to accomplish, specially with the star wildcard, as if there is any data on your table, the query will need to do a full table scan. This can be quite expensive if you have several records.
One way to get the result you are looking for is the query
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
Empty table:
The resultset will be empty
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
key
-----
(0 rows)
Table with data:
The resultset will have a record
cqlsh> SELECT key FROM keyspace1.table1 LIMIT 1;
key
----------------------------------
uL24bhnsHYRX8wZItWM6xKdS0WLvDsgi
(1 rows)
Currently I´m doing this (get pagination and count), in Informix:
select a.*, b.total from (select skip 0 first 10 * from TABLE) a,(select count(*) total from TABLE) b
The problem is that I´m repeating the same pattern - I get the first ten results from all and then I count all the results.
I want to make something like this:
select *, count(*) from TABLE:
so I can make my query much faster. It is possible?
As stated in this doc to select a range of rows i have to write this:
select first 100 col1..colN from table;
but when I launch this on cql shell I get this error:
<ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:13 no viable alternative at input '100' (select [first] 100...)">
What's wrong?
According to the Docs, key word first is to limit the number of Columnns, not rows
to limit the number of rows , you must just keyword limit.
select col1..colN from table limit 100;
the default limit is 10000