CQL - find related rows in the same table? - cassandra

I am new to CQL. Using Cassandra 3.x
I have a basic university class table as
Class_ID INT
Class_Name VARCHAR
Class_Date TIMESTAMP
Class_TimeHour INT
Sample entries are
{1,"Bio 1","01/01/2018","700"}
{2,"MC 1" ,"01/01/2018","700"}
{3,"Bio 2","01/01/2018","815"}
{3,"MC 2" ,"01/01/2018","1100"}
700 represents 0700 hours in 24 hour notation.
I need to answer some basic queries, please advise on how to best setup the table and queries.
Can i get a Class_TimeHour desc ordered list of classes?
Can i get a Class_TimeHour desc ordered list of a particular class. Does this mean i need to setup a partition key differently from #1 ?
Can i get a list of all Class_Name that are within 60 min window of each other. My results from above should be
{1,"Bio 1","01/01/2018","700"}
{2,"MC 1","01/01/2018","700"}
{3,"Bio 2","01/01/2018","815"}
How many times does "Bio 1" occur per day
Count of how many Class_Name contain "MC" literal.
Thanks!

Related

Cassandra DB Query for System Date

I have one table customer_info in a Cassandra DB & it contains one column as billing_due_date, which is date field (dd-MMM-yy ex. 17-AUG-21). I need to fetch the certain fields from customer_info table based on billing_due_date where billing_due_date should be equal to system date +1.
Can anyone suggest a Cassandra DB query for this?
fetch the certain fields from customer_info table based on billing_due_date
transaction_id is primarykey , It is just generated through uuid()
Unfortunately, there really isn't going to be a good way to do this. Right now, the data in the customer_info table is distributed across all nodes in the cluster based on a hash of the transaction_id. Essentially, any query based on something other than transaction_id is going to read from multiple nodes, which is a query anti-pattern in Cassandra.
In Cassandra, you need to design your tables based on the queries that they need to support. For example, choosing transaction_id as the sole primary key may distribute well, but it doesn't offer much in the way of query flexibility.
Therefore, the best way to solve for this query, is to create a query table containing the data from customer_info with a key definition of PRIMARY KEY (billing_date,transaction_id). Then, a query like this should work:
> SELECT * FROM customer_info_by_date
WHERE billing_due_date = toDate(now()) + 2d;
billing_due_date | transaction_id | name
------------------+--------------------------------------+---------
2021-08-20 | 2fe82360-e314-4d5b-aa33-5deee9f03811 | Rinzler
2021-08-20 | 92cb9ee5-dee6-47fe-b372-0829f2e384cd | Clu
(2 rows)
Note that for this example, I am using the system date plus 2 days out. So in your case, you'll want to adjust the "duration" aspect from 2d down to 1d. Cassandra 4.0 allows date arithmetic, so this should work just fine if you are on that version. If you are not, you'll have to do the "system date plus one" calculation on the app side.
Another way to go about this, would be to create a secondary index on billing_due_date, but I don't recommend that path as it will query multiple nodes to build the result set.

How can I search a table for a timestamp x hours old in Cassandra?

I am trying to search for timestamps in a Cassandra table that are within a given length of time. For example, "all timestamp that are 4 hours or less old".
I have tried using DATESUB(), TIMEDIFF() and other ways to find a time delta, but just haven't had any luck. I am not sure if I am just looking at this from a relational DB mindset.
EDIT: Adding an example below
SELECT *
FROM events
WHERE event_timestamp < (now() - 4 hours) -- This part is giving issues
ORDER BY region DESC;
Thanks!

how to do a query with cassandradb counter table

i have a table in Cassandradb as mentioned below:
CREATE TABLE remaining (owner varchar,buddy varchar,remain counter,primary key(owner,buddy));
generally i do some inc/dec operations on REMAIN field ,using cql like below:
update remaining set remain=remain + 1 where owner='userA' and buddy='userB';
update remaining set remain=remain + 1 where owner='userA' and buddy='userC';
....
and now i need to find out all buddies for userA which it's REMAIN field greater then 1. when i using:
select buddy,remain from remaining where owner='userA' and remain > 0;
gives me an error:
No indexed columns present in by-columns clause with Equal operator
how to do this in a cassandradb way?
The short answer to this is that you cannot do queries with conditionals on counter columns in Cassandra.
The reason behind this is that all Cassandra queries need to be modeled around the primary key of that particular table. Counter columns are not allowed as parts of the primary key of a table (their changing values would cause constant reorganization of the dat on disk). Counter columns are more used for tracking the state of a known piece of data, for example number of times a photo has been up-voted. This could be quickly recalled as long as we knew which photo we were interested in. To actually sort photos by numbers of votes you would need to perform an analytics style query using spark or Hadoop.

Cassandra Limit 10,20 clause

I am using Cassandra 1.2.3 and can execute select query with Limit 10.
If I want records from 10 to 20, I cannot do "Limit 10,20".
Below query gives me an error.
select * from page_view_counts limit 10,20
How can this be achieved?
Thanks
Nikhil
You can't do skips like this in CQL. You have have to do paging by specifying a start place e.g.
select * from page_view_counts where field >= 'x' limit 10;
to get the next 10 elements starting from x.
I wrote a full example in this answer: Cassandra pagination: How to use get_slice to query a Cassandra 1.2 database from Python using the cql library.
for that you have to first plan your data model so that it can get records according to your requirement...
Can you tell which sort of example your are doing?
and Are you using hector client or any other ?
sorry mate I did it using hector client & java,but seeing your requirement I can suggest to plan your data model like this :
Use time span as a row key in yyyyMMddHH format,in that store column name as composite key made up of UTF8Type and TimeUUID (e.g C1+timeUUID ).
note: here first composite key would be counter column family column name (e.g. C1)
Now you will only store limited records say 20 in your CF and make this c1 counter 20,now if any new record came for the same timespan you have to insert that with key C2+timeUUID now u will increment counter column family c2 upto 20 records
Now to fetch record you just have to pass value C1 , C2 ...etc with rowkey like 2013061116
it will give you 20 records than another 20 and so on...you have to implement this programmatically..hope you got this and helps you

Cassandra super column structure

I'm new to Cassandra, and I'm not familiar with super columns.
Consider this scenario: Suppose we have a some fields of a customer entity like
Name
Contact_no
address
and we can store all these values in a normal column. I want to arrange that when a person moves from one location to another location (the representative field could store the longitude and latitude) that values will be stored consecutively with respect to customer location. I think we can do this with super columns but I'm confused how to design the schema to accomplish this.
Please help me to create this schema and come to understand the concepts behind super columns.
supercolumns are really not recommended anymore...still used but more and more have switched to composite columns. For example playOrm uses this concept for indexing. If I am indexing an integer, and indexing row may look like this
rowkey = 10.pk56 10.pk39 11.pk50
Where the column name type is a composite integer and string in this case. These rows can be up to about 10 million columns though I have only run expirements up to 1 million my self. For example, playOrm's queries use these types of indexes to do a query that took 60 ms on 1,000,000 rows.
With playOrm, you can do scalable relational models in noSQL....you just need to figure out how to partition your data correctly as you can have as many partitions as you want in each table, but a partition should really not be over 10 million rows.
Back to the example though, if you have a table with columns numShares, price, username, age, you may wnat to index numShares and the above row would be that index so you could grab the index by key OR better yet, grab all column names with numShares > 20 and numShares < 50
Once you have those columns, you can then get the second half of the column name which is the primary key. The reason primary key is NOT a value is because as in the example above there is two rows pk56 and pk39 with the same 10 and you can't have two columns named 10, but you can have a 10.pk56 and 10.pk39.
later,
Dean

Resources