Cassandra limit clause for timestamp(timeuuid)

Cassandra limit clause for timestamp(timeuuid) - cassandra

I have a cassandra table structure as follows
create table demo (user_id text , comment_id text , timestamp timeuuid , PRIMARY KEY (user_id , comment_id , timestamp))
Now in the UI I want pagination such that on the click of next button , I should get the value from 10 to 20 then 20 to 30 and so on.
I know we cant initiate a query in cassandra as
select * from demo limit 10,20
So if I create a query as
select * from demo where timestamp = 'sometimestampvalue' limit 10;
This will give 10 values from 'sometimestampvalue' till next 10 values.
Then store the last row timestamp value in a variable (say X) and then initiate the next query as
select * from demo where timestamp = 'X' limit 10;
And so On, will this work ? or something better can be done as I'm ready to change the structure of table also with counter columns added as basically I should be able to do pagination based on any column.

See this answer:
Cassandra Limit 10,20 clause
Basically you will have to handle it in your app code. Your suggestion looks like it will work, with a little tweaking.

Pagination is more easily done in the driver where you can set the fetch size. For example, in Java:
cluster.getConfiguration().getQueryOptions().setFetchSize(10);
See DataStax Java Driver for Apache Cassandra, Features, Paging

Related

How can i get first 100 rows , then next 100 rows , until you get all the records in cassandra?

I am having a table that has millions of records. I would like to purge the old data from these cassandra table.
the following is my table definition.
CREATE TABLE "Openmind".mep_notification (
id uuid PRIMARY KEY,
campaign uuid,
created timestamp,
flight uuid,
read boolean,
type text,
user uuid
);
CREATE INDEX mep_notification_user_idx ON "Openmind".mep_notification (user);
how I can get first X number of rows at a time using cql. then the next X number and so on till i get all the rows from the table.
appreciate if you can help
thank you

You can use the limit to get X numbers of rows. See the example bellow:
SELECT *
FROM cycling.rank_by_year_and_name
PER PARTITION LIMIT 2;
DataStax documentation:
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useQueryColumnsSort.html
If you trying to do a pagination, you can use the Pagin. Depending the program language you are using, you can check the rigth application to the Pagin.
DataStax documentation:
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshPaging.html
In CQL prompt: https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/search_index/cursorsDeepPaging.html
In Java:
https://docs.datastax.com/en/developer/java-driver/3.11/manual/paging/
In PHP : https://datastax.github.io/php-driver/features/result_paging/
In Spring Boot :
https://docs.spring.io/spring-data/cassandra/docs/current/reference/html/#reference

how Cql's Collection contains alternative value?

I have a question to query to cassandra collection.
I want to make a query that work with collection search.
CREATE TABLE rd_db.test1 (
testcol3 frozen<set<text>> PRIMARY KEY,
testcol1 text,
testcol2 int
)
table structure is this...
and
this is the table contents.
in this situation, I want to make a cql query has alternative option values on set column.
if it is sql and testcol3 isn't collection,
select * from rd.db.test1 where testcol3 = 4 or testcol3 = 5
but it is cql and collection.. I try
select * from test1 where testcol3 contains '4' OR testcol3 contains '5' ALLOW FILTERING ;
select * from test1 where testcol3 IN ('4','5') ALLOW FILTERING ;
but this two query didn't work...
please help...

This won't work for you for multiple reasons:
there is no OR operation in CQL
you can do only full match on the value of partition key (testcol3)
although you may create secondary indexes for fields with collection type, it's impossible to create an index for values of partition key
You need to change data model, but you need to know the queries that you're executing in advance. From brief looking into your data model, I would suggest to rollout the set field into multiple rows, with individual fields corresponding individual partitions.
But I want to suggest to take DS201 & DS220 courses on DataStax Academy site for better understanding how Cassandra works, and how to model data for it.

Cassandra : How to select data updated in last 30 days

We have a requirement to load last 30 days updated data from the table.
One of the potential solution below does not allow to do so.
select * from XYZ_TABLE where WRITETIME(lastupdated_timestamp) > (TOUNIXTIMESTAMP(now())-42,300,000);
select * from XYZ_TABLE where lastupdated_timestamp > (TOUNIXTIMESTAMP(now())-42,300,000);
The table has columns as
lastupdated_timestamp (with an index on this field)
lastupdated_userid (with an index on this field)
Any pointers ...

Unless your table was built with this query in mind, your query will search every partition of the database, which will become very costly once your dataset has become large and will probably result in a timeout.
To efficiently complete this query, the XYZ_TABLE should have a primary key something like so:
PRIMARY KEY ((update_month, update_day), lastupdated_timestamp)
This is so Cassandra knows right where to go find the data. It has month and day buckets it can quickly find, then you can run queries like this to find updates on a certain day.
SELECT * FROM XYZ_TABLE WHERE update_month = 07-18 and update_day = 06

Cassandra Data Model design for vnodes enabled cluster?

I have recently started working with Cassandra. We have cassandra cluster which is using DSE 4.0 version and has VNODES enabled. We have a tables like this -
Below is my first table -
CREATE TABLE customers (
customer_id int PRIMARY KEY,
last_modified_date timeuuid,
customer_value text
)
Read query pattern is like this on above table as of now since we need to get everything from above table and load it into our application memory every x minutes.
select customer_id, customer_value from datakeyspace.customers;
We have second table like this -
CREATE TABLE client_data (
client_name text PRIMARY KEY,
client_id text,
creation_date timestamp,
is_valid int,
last_modified_date timestamp
)
CREATE INDEX idx_is_valid_clnt_data ON client_data (is_valid);
Right now in the above table, we have 500 records and all those records has "is_valid" column value set as 1. And the read query pattern is like this on above table as of now since we need to get everything from above table and load it into our application memory every x minutes so the below query will return me all 500 records since everything has is_valid set to 1.
select client_name, client_id from datakeyspace.client_data where is_valid=1;
Since our cluster is VNODES enabled so my above query pattern is not efficient at all and it is taking lot of time to get the data from Cassandra. It takes around 50 seconds to get the data from cqlsh client. We are reading from these table with consistency level QUORUM.
Is there any possibility of improving our data model by using wide rows concept or anything else?
Any suggestions will be greatly appreciated.

Cassandra Limit 10,20 clause

I am using Cassandra 1.2.3 and can execute select query with Limit 10.
If I want records from 10 to 20, I cannot do "Limit 10,20".
Below query gives me an error.
select * from page_view_counts limit 10,20
How can this be achieved?
Thanks
Nikhil

You can't do skips like this in CQL. You have have to do paging by specifying a start place e.g.
select * from page_view_counts where field >= 'x' limit 10;
to get the next 10 elements starting from x.
I wrote a full example in this answer: Cassandra pagination: How to use get_slice to query a Cassandra 1.2 database from Python using the cql library.

for that you have to first plan your data model so that it can get records according to your requirement...
Can you tell which sort of example your are doing?
and Are you using hector client or any other ?

sorry mate I did it using hector client & java,but seeing your requirement I can suggest to plan your data model like this :
Use time span as a row key in yyyyMMddHH format,in that store column name as composite key made up of UTF8Type and TimeUUID (e.g C1+timeUUID ).
note: here first composite key would be counter column family column name (e.g. C1)
Now you will only store limited records say 20 in your CF and make this c1 counter 20,now if any new record came for the same timespan you have to insert that with key C2+timeUUID now u will increment counter column family c2 upto 20 records
Now to fetch record you just have to pass value C1 , C2 ...etc with rowkey like 2013061116
it will give you 20 records than another 20 and so on...you have to implement this programmatically..hope you got this and helps you

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra limit clause for timestamp(timeuuid) - cassandra

See this answer: Cassandra Limit 10,20 clause Basically you will have to handle it in your app code. Your suggestion looks like it will work, with a little tweaking.

Pagination is more easily done in the driver where you can set the fetch size. For example, in Java: cluster.getConfiguration().getQueryOptions().setFetchSize(10); See DataStax Java Driver for Apache Cassandra, Features, Paging

Related

How can i get first 100 rows , then next 100 rows , until you get all the records in cassandra?

how Cql's Collection contains alternative value?

Cassandra : How to select data updated in last 30 days

Cassandra Data Model design for vnodes enabled cluster?

Cassandra Limit 10,20 clause

Categories

Resources