How to get current year on cassandra - cassandra

How can I get just a part of the current date in Cassandra? In my particular case I need to get just the year.
I got to this for the moment select dateof(now()) from system.local;
But I could not find any function to get just the year in the documentation
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/refCqlFunction.html#refCqlFunction__toTimestamp
I'm new with Cassandra so this maybe a silly question.

The safe way, would be to return a timestamp and parse-out the year client-side.
Natively, Cassandra does not have any functions that can help with this. However, you can write a user-defined function (UDF) to accomplish this:
First, user defined functions are disabled by default. You'll need to adjust that setting in your cassandra.yaml and restart your node(s).
enable_user_defined_functions=true
NOTE: This setting is defaulted this way for a reason. Although Cassandra 3.x has some safeguards in-place to protect against malicious code, it is a good idea to leave this turned-off unless both you need it and you know what you are doing. And even then, you'll want to keep an eye on UDFs that get defined.
Now I'll create my function using Java, from within cqlsh:
cassdba#cqlsh:stackoverflow> CREATE OR REPLACE FUNCTION year (input DATE)
RETURNS NULL ON NULL INPUT RETURNS TEXT
LANGUAGE java AS 'return input.toString().substring(0,4);';
Note that there are a number of ways (and types) to query the current date/time:
cassdba#cqlsh:stackoverflow> SELECT todate(now()) as date,
totimestamp(now()) as timestamp, now() as timeuuid FROm system.local;
date | timestamp | timeuuid
------------+---------------------------------+-------------------------------------
2017-12-20 | 2017-12-20 21:18:37.708000+0000 | 58167cc1-e5cb-11e7-9765-a98c427e8248
(1 rows)
To return just only year, I can call my year function on the todate(now()) column:
SELECT stackoverflow.year(todate(now())) as year FROm system.local;
year
------
2017
(1 rows)

Related

Use `make-series` operator without defining exact date range

I am using make-series to create an error dashboard showing events over a given period at a specified interval like so:
make-series dcount(id) default=0 on timestamp from ago(30d) to now() step 8h
This works great, and displays the data as expected. However this specifies an exact date range (30 days ago to now), and I would like to make this use the time range picked by the user on the dashboard (24 hours, 48 hours, etc.).
I know it is possible to get this behavior using summarize, however summarize does not easily allow for setting a default value of zero per timestamp bin (as far as I know).
Is it possible to use the make-series operator without defining a hardcoded date range, and instead use the time range set for a dashboard?
Unfortunately, it is impossible as of now.
You can take a look at this user feedback and upvote for it: Retrieve the portal time span and use it inside the kusto query.
Whilst this is not officially supported (i.e. there is no variable you can use to retrieve the values), you can work around this with a bit of a hack.
For context, I am displaying some aggregations from Azure Container Insights on my dashboards and I wanted to use make-series instead of summarize - the latter does not return empty bins so leaves gaps in graphs where you have no data returned in that bin; however, make-series requires explicit start/end times and a grain.
Given the nature of the above, I have access to a large table of data that is constantly updated (ContainerLog), which gives me a way to find a close approximation of the date range (and any inaccuracy is not a problem as I am reporting on the data of this table anyway).
// All tables with Timestamp or TimeGenerated columns are implicitly filtered, so we can retrieve a very close approximation of min and max here
let startDate = toscalar(ContainerLog | summarize min(TimeGenerated));
let endDate = toscalar(ContainerLog | summarize max(TimeGenerated));
// The regular query sits here, and the above variables can be passed in to make-series
MyLogFunction
| make-series Count=count() default=0 on Timestamp in range(startDate, endDate, 30m) by Severity
| render columnchart with ( legend=hidden )

How to get Last 6 Month data comparing with timestamp column using cassandra query?

How to get Last 6 Month data comparing with timestamp column using cassandra query?
I need to get all account statement which belongs to last 3/6 months comparing with updatedTime(TimeStamp column) and CurrentTime.
For example in SQL we are using DateAdd() function tor this to get. i dont know how to proceed this in cassandra.
If anyone know,reply.Thanks in Advance.
Cassandra 2.2 and later allows users to define functions (UDT) that can be applied to data stored in a table as part of a query result.
You can create your own method if you use Cassandra 2.2 and later UDF
CREATE FUNCTION monthadd(date timestamp, month int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS $$java.util.Calendar c = java.util.Calendar.getInstance();c.setTime(date);c.add(java.util.Calendar.MONTH, month);return c.getTime();$$
This method receive two parameter
date timestamp: The date from you want add or subtract number of month
month int: Number of month you want to or add(+) subtract(-) from date
Return the date timestamp
Here is how you can use this :
SELECT * FROM ttest WHERE id = 1 AND updated_time >= monthAdd(dateof(now()), -6) ;
Here monthAdd method subtract 1 mont from the current timestamp, So this query will data of last month
Note : By default User-defined-functions are disabled in cassandra.yaml - set enable_user_defined_functions=true to enable if you are aware of the security risks
In cassandra you have to build the queries upfront.
Also be aware that you will probably have to bucket the data depending on the number of accounts that you have within some period of time.
If your whole database doesn't contain more than let's say 100k entries you are fine with just defining a single generic partition let's say with name 'all'. But usually people have a lot of data that simply goes into bucket that carries a name of month, week, hour. This depends on the number of inserts you get.
The reason for creating buckets is that every node can find a partition by it's partition key. This is the first part of the primary key definition. Then on every node the data is sorted by the second information that you pass in to the primary key. Having the data sorted enables you to "scan" over them i.e. you will be able to retrieve them by giving timestamp parameter.
Let's say you want to retrieve accounts from the last 6 months and that you are saving all the accounts from one month in the same bucket.
The schema might be something on the lines of:
create table accounts {
month text,
created_time timestamp,
account text,
PRIMARY KEY (month, created_time)
}
Usually you will do this at the application level, merging queries is an anti pattern but is o.k. for smaller amount of queries:
select account
from accounts
where month = '201701';
Output:
'201702'
'201703'
and so on.
If you have something really simple with let's say expected 100 000 entries then you could use the above schema and just do something like:
create table accounts {
bucket text,
created_time timestamp,
account text,
PRIMARY KEY (bucket, created_time)
}
select account
from accounts
where bucket = 'some_predefined_name'
and created_time > '2016-10-04 00:00:00'
Once more as a wrap-up, with cassandra you always have to prepare the structures for the access pattern you are going to use.

cassandra lastupdated(auto_now), lastaccessed and created(auto_now_add)

Is there a way we can auto update the columns creation and last updated/accessed timestamp?
We can use toTimestamp(now()) function to store the creation time. But do we have a function like writetime(name), which is used to get the last modified time? Is there a similar function for reading the creation and accessed-time?
Is there a way I can get all the three timestamps lastupdated/lastaccessed and created timestamp auto-generated and stored?
Yes, there is a writetime function, but it only operates on non-primary key columns.
aploetz#cqlsh:stackoverflow> SELECT name,description,writetime(description)
FROm bookbyname WHERE name='Patriot Games';
name | writetime(description) | description
---------------+------------------------+------------------------------------------------------------------------------------------------
Patriot Games | 1442340092257821 | Jack Ryan saves England's next king, and becomes the target of an IRA splinter terrorism cell.
Cassandra does not keep track of last accessed/read, or anything like that.
In Cassandra the last write wins, so last updated and created are going to be the same. But if you had a column that you know had changed, and one that you know had not changed, you could get the write times of both, and then you'd have your updated and created times.

Get Date Range for Cassandra - Select timeuuid with IN returning 0 rows

I'm trying to get data from a date range on Cassandra, the table is like this:
CREATE TABLE test6 (
time timeuuid,
id text,
checked boolean,
email text,
name text,
PRIMARY KEY ((time), id)
)
But when I select a data range I get nothing:
SELECT * FROM teste WHERE time IN ( minTimeuuid('2013-01-01 00:05+0000'), now() );
(0 rows)
How can I get a date range from a Cassandra Query?
The IN condition is used to specify multiple keys for a SELECT query. To run a date range query for your table, (you're close) but you'll want to use greater-than and less-than.
Of course, you can't run a greater-than/less-than query on a partition key, so you'll need to flip your keys for this to work. This also means that you'll need to specify your id in the WHERE clause, as well:
CREATE TABLE teste6 (
time timeuuid,
id text,
checked boolean,
email text,
name text,
PRIMARY KEY ((id), time)
)
INSERT INTO teste6 (time,id,checked,email,name)
VALUES (now(),'B26354',true,'rdeckard#lapd.gov','Rick Deckard');
SELECT * FROM teste6
WHERE id='B26354'
AND time >= minTimeuuid('2013-01-01 00:05+0000')
AND time <= now();
id | time | checked | email | name
--------+--------------------------------------+---------+-------------------+--------------
B26354 | bf0711f0-b87a-11e4-9dbe-21b264d4c94d | True | rdeckard#lapd.gov | Rick Deckard
(1 rows)
Now while this will technically work, partitioning your data by id might not work for your application. So you may need to put some more thought behind your data model and come up with a better partition key.
Edit:
Remember with Cassandra, the idea is to get a handle on what kind of queries you need to be able to fulfill. Then build your data model around that. Your original table structure might work well for a relational database, but in Cassandra that type of model actually makes it difficult to query your data in the way that you're asking.
Take a look at the modifications that I have made to your table (basically, I just reversed your partition and clustering keys). If you still need help, Patrick McFadin (DataStax's Chief Evangelist) wrote a really good article called Getting Started with Time Series Data Modeling. He has three examples that are similar to yours. In fact his first one is very similar to what I have suggested for you here.

CQL: Search a table in cassandra using '<' on a indexed column

My cassandra data model:
CREATE TABLE last_activity_tracker ( id uuid, recent_activity_time timestamp, PRIMARY KEY(id));
CREATE INDEX activity_idx ON last_activity_tracker (recent_activity_time) ;
The idea is to keep track of 'id's and their most recent activity of an event.
I need to find the 'id's whose last activity was an year ago.
So, I tried:
SELECT * from last_activity_tracker WHERE recent_activity_time < '2013-12-31' allow filtering;
I understand that I cannot use other than '=' for secondary indexed columns.
However, I cannot add 'recent_activity_time' to the key as I need to update this column with the most recent activity time of an event if any.
Any ideas in solving my problem are highly appreciated.
I can see an issue with your query. You're not hitting a partition. As such, the performance of your query will be quite bad. It'll need to query across your whole cluster (assuming you took measures to make this work).
If you're looking to query the last activity time for an id, think about storing it in a more query friendly format. You might try this:
create table tracker (dummy int, day timestamp, id uuid, primary key(dummy, day, id));
You can then insert with the day to be the epoch for the date (ignoring the time), and dummy = 0.
That should enable you to do:
select * from tracker where dummy=0 and day > '2013-12-31';
You can set a ttl on insert so that old entries expire (maybe after a year in this case). The idea is that you're storing information in a way that suits your query.

Resources