Inserting date using yesterday/tomorrow into Cassandra - cassandra

I'm trying to insert a date into Cassandra based on the current date.
create table mobileTimeSeries (
deviceid text,
date date,
PRIMARY KEY(deviceid, date));
insert into mobileTimeSeries (deviceid, date) values ('test', toDate(now()));
That works, but I'm wondering if it's possible to do something like
insert into mobileTimeSeries (deviceid, date) values ('test', toDate(now()-1));
insert into mobileTimeSeries (deviceid, date) values ('test', toDate(now()+1));
I just get this error mismatched input '+' expecting ')' (... 'tablet',toDate(now()) [+]...)
Not sure if this is possible at all. Thanks

You can calculate date on your app and just insert it as a date instead of using now().
After CASSANDRA-11936 in 4.0+ you can do now() - 1d kinda things.

Related

Retrieve rows from last 24 hours

I have a table with the following (with other fields removed)
CREATE TABLE if NOT EXISTS request_audit (
user_id text,
request_body text,
lookup_timestamp TIMESTAMP
PRIMARY KEY ((user_id), lookup_timestamp)
) WITH CLUSTERING ORDER BY ( lookup_timestamp DESC);
I create a record with the following
INSERT INTO request_audit (user_id, lookup_timestamp, request_body) VALUES (?, ?, toTimestamp(now()))
I am trying to retrieve all rows within the last 24 hours, but I am having trouble with the timestamp,
I have tried
SELECT * from request_audit WHERE user_id = '1234' AND lookup_timestamp > toTimestamp(now() - "1 day" )
and various other ways of trying to take a day away from the query.
Cassandra has a very limited date operation support. What you need is a custom function to do date math calculation.
Inspired from here.
How to get Last 6 Month data comparing with timestamp column using cassandra query?
you can write a UDF (user defined function) to date operation.
CREATE FUNCTION dateAdd(date timestamp, day int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS
$$java.util.Calendar c = java.util.Calendar.getInstance();
c.setTime(date);
c.add(java.util.Calendar.DAY_OF_MONTH, day);
return c.getTime();$$ ;
remember that you would have to enable UDF in config. Cassandra.yml. Hope that is possible.
enable_user_defined_functions: true
once done this query works perfectly.
SELECT * from request_audit WHERE user_id = '1234' AND lookup_timestamp > dateAdd(dateof(now()), -1)
You couldn't do it directly from CQL, as it doesn't support this kind of expressions. If you're running this query from cqlsh, then you can try to substitute the desired date with something like this:
date --date='-1 day' '+%F %T%z'
and execute this query.
If you're invoking this from your program, just use corresponding date/time library to get date corresponding -1 day, but this depends on the language that you're using.

Storing time specific data in cassandra

I am looking for a good way to store time specific data in cassandra.
Each entry can look like (start_time, value). Later, I would like to retrieve the current value.
Logic of retrieving current value is like following.
Find all rows with start_time<=current_time.
Then find the value with maximum start_time from the rows obtained in the first step.
PS:- Edited the question to make it more clear
The exact requirements are not possible. But we can get close to it with one more column.
First, to be able to use <= operator, your start_time column need to be the clustering key of your table.
Then, you need a different partition key. You could choose a fixed value but it could bring problems when the partition will have too many rows. Then you should better use something like the year or the month of the start_time.
CREATE TABLE time_specific_table (
year bigint,
start_time timestamp,
value text,
PRIMARY KEY((year), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC);
The problem is that when you will query the table, you will need to know the value of the partition key :
Find all rows with start_time<=current_time
SELECT * FROM time_specific_table
WHERE year = :year AND start_time <= :time;
select the value with maximum start_time
SELECT * FROM time_specific_table
WHERE year = :year LIMIT 1;
Create two separate table like below :
CREATE TABLE data (
start_time timestamp,
value int,
PRIMARY KEY(start_time, value)
);
CREATE TABLE current_value (
partition int PRIMARY KEY,
value int
);
Now you have to insert data into both table, to insert data into second table use a static value like 1
INSERT INTO current_value(partition, value) VALUES(1, 10);
Now In current value table your data will be upsert and You will get latest value whenever you select.

Is it possible to insert ddmmyyhh to text column based on now() value of timeuuid column

I'm referring to one of the presentation slide from eBay - http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices
I want to try out the same thing. Hence, I create the following table.
CREATE TABLE ebay_event (
date text,
eventtype text,
time timeuuid,
payload text,
PRIMARY KEY((date, eventtype), time));
Then, in my PHP script, I will perform insert using the following insert statement.
insert into ebay_event(date, eventtype, time, payload) values('03031611', 'view', now(), 'additional data');
Instead of hard code value '03031611', is there a way to tell cassandra, to generate ddmmyyhh based on the now() value of timeuuid column?
No. There are no such functions available in cassandra. You will have to create it in the language you are using.
Values for the timestamp type are encoded as 64-bit signed integers
representing a number of milliseconds since the standard base time
known as the epoch: January 1 1970 at 00:00:00 GMT.
There are some functions available that can create date in YYYY-mm-dd format.
Date from timeuuid

How to get current timestamp with CQL while using Command Line?

I am trying to insert into my CQL table from the command line. I am able to insert everything. But I am wondering if I have a timestamp column, then how can I insert into timestamp column from the command line? Basically, I want to insert current timestamp whenever I am inserting into my CQL table -
Currently, I am hardcoding the timestamp whenever I am inserting into my below CQL table -
CREATE TABLE TEST (ID TEXT, NAME TEXT, VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID));
INSERT INTO TEST (ID, NAME, VALUE, LAST_MODIFIED_DATE) VALUES ('1', 'elephant', 'SOME_VALUE', 1382655211694);
Is there any way to get the current timestamp using some predefined functions in CQL so that while inserting into above table, I can use that method to get the current timestamp and then insert into above table?
You can use the timeuuid functions now() and dateof() (or in later versions of Cassandra, toTimestamp()), e.g.,
INSERT INTO TEST (ID, NAME, VALUE, LAST_MODIFIED_DATE)
VALUES ('2', 'elephant', 'SOME_VALUE', dateof(now()));
The now function takes no arguments and generates a new unique timeuuid (at the time where the statement using it is executed). The dateOf function takes a timeuuid argument and extracts the embedded timestamp. (Taken from the CQL documentation on timeuuid functions).
Cassandra >= 2.2.0-rc2
dateof() was deprecated in Cassandra 2.2.0-rc2. For later versions you should replace its use with toTimestamp(), as follows:
INSERT INTO TEST (ID, NAME, VALUE, LAST_MODIFIED_DATE)
VALUES ('2', 'elephant', 'SOME_VALUE', toTimestamp(now()));
In new version of cassandra could use toTimestamp(now()), and note that function dateof is deprecated.
e.g
insert into dummy(id, name, size, create_date) values (1, 'Eric', 12, toTimestamp(now()));
There are actually 2 different ways for different purposes to insert the current timestamp. From the docs:
Inserting the current timestamp
Use functions to insert the current
date into date or timestamp fields as follows:
Current date and time
into timestamp field: toTimestamp(now()) sets the timestamp to the
current time of the coordinator.
Current date (midnight) into
timestamp field: toTimestamp(toDate(now())) sets the timestamp to the
current date beginning of day (midnight).

Selecting timeuuid columns corresponding to a specific date

Short version: Is it possible to query for all timeuuid columns corresponding to a particular date?
More details:
I have a table defined as follows:
CREATE TABLE timetest(
key uuid,
activation_time timeuuid,
value text,
PRIMARY KEY(key,activation_time)
);
I have populated this with a single row, as follows (f0532ef0-2a15-11e3-b292-51843b245f21 is a timeuuid corresponding to the date 2013-09-30 22:19:06+0100):
insert into timetest (key, activation_time, value) VALUES (7daecb80-29b0-11e3-92ec-e291eb9d325e, f0532ef0-2a15-11e3-b292-51843b245f21, 'some value');
And I can query for that row as follows:
select activation_time,dateof(activation_time) from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e
which results in the following (using cqlsh)
activation_time | dateof(activation_time)
--------------------------------------+--------------------------
f0532ef0-2a15-11e3-b292-51843b245f21 | 2013-09-30 22:19:06+0100
Now lets assume there's a lot of data in my table and I want to retrieve all rows where activation_time corresponds to a particular date, say 2013-09-30 22:19:06+0100.
I would have expected to be able to query for the range of all timeuuids between minTimeuuid('2013-09-30 22:19:06+0100') and maxTimeuuid('2013-09-30 22:19:06+0100') but this doesn't seem possible (the following query returns zero rows):
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100') and activation_time<=maxTimeuuid('2013-09-30 22:19:06+0100');
It seems I need to use a hack whereby I increment the second date in my query (by a second) to catch the row(s), i.e.,
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100') and activation_time<=maxTimeuuid('2013-09-30 22:19:07+0100');
This feels wrong. Am I missing something? Is there a cleaner way to do this?
The CQL documentation discusses timeuuid functions but it's pretty short on gte/lte expressions with timeuuids, beyond:
The min/maxTimeuuid example selects all rows where the timeuuid column, t, is strictly later than 2013-01-01 00:05+0000 but strictly earlier than 2013-02-02 10:00+0000. The t >= maxTimeuuid('2013-01-01 00:05+0000') does not select a timeuuid generated exactly at 2013-01-01 00:05+0000 and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').
p.s. the following query also returns zero rows:
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time<=maxTimeuuid('2013-09-30 22:19:06+0100');
and the following query returns the row(s):
select * from timetest where key=7daecb80-29b0-11e3-92ec-e291eb9d325e and activation_time>minTimeuuid('2013-09-30 22:19:06+0100');
I'm sure the problem is that cqlsh does not display milliseconds for your timestamps
So the real timestamp is something like '2013-09-30 22:19:06.123+0100'
When you call maxTimeuuid('2013-09-30 22:19:06+0100') as milliseconds are missing, zero is assumed so it is the same as calling maxTimeuuid('2013-09-30 22:19:06.000+0100')
And as 22:19:06.123 > 22:19:06.000 that causes record to be filtered out.
Not directly related to answer but as an additional addon to #dimas answer.
cqlsh (version 5.0.1) seem to show the miliseconds now
system.dateof(id)
---------------------------------
2016-06-03 02:42:09.990000+0000
2016-05-28 17:07:30.244000+0000

Resources