I am performing a cql query on a column that stores the values as unix timestmap, but want the results to output as datetime. Is there a way to do this?
i.e. something like the following:
select convertToDateTime(column) from table;
I'm trying to remember if there's an easier, more direct route. But if you have a table with a UNIX timestamp and want to show it in a datetime format, you can combine the dateOf and min/maxTimeuuid functions together, like this:
aploetz#cqlsh:stackoverflow2> SELECT datetime,unixtime,dateof(mintimeuuid(unixtime)) FROM unixtime;
datetimetext | unixtime | dateof(mintimeuuid(unixtime))
----------------+---------------+-------------------------------
2015-07-08 | 1436380283051 | 2015-07-08 13:31:23-0500
(1 rows)
aploetz#cqlsh:stackoverflow2> SELECT datetime,unixtime,dateof(maxtimeuuid(unixtime)) FROM unixtime;
datetimetext | unixtime | dateof(maxtimeuuid(unixtime))
----------------+---------------+-------------------------------
2015-07-08 | 1436380283051 | 2015-07-08 13:31:23-0500
(1 rows)
Note that timeuuid stores greater precision than either a UNIX timestamp or a datetime, so you'll need to first convert it to a TimeUUID using either the min or maxtimeuuid function. Then you'll be able to use dateof to convert it to a datetime timestamp.
Related
I need to convert timestamp '1998/02/12 00:00:00 to data 1998-02-12' using Cassandra query. Can anyone help me on this.
Is it possible or not?
You can use toDate function in cql to get date out of datetime. \
For example, if your table entry looks like:
id | datetime | value
-------------+---------------------------------+-------
22170825421 | 2018-02-15 14:06:01.000000+0000 | 50
You can run the following query:
select id, datetime, toDate(datetime) as day, value from datatable;
and it will give you:
id | datetime | day | value
-------------+---------------------------------+------------+-------
22170825421 | 2018-02-15 14:06:01.000000+0000 | 2018-02-15 | 50
You can't do it directly in the Cassandra, as it accepts data as YYYY-mm-dd, so you need to use some other method (depending on language that you're using) to convert your string into this format.
I have the following table.
CREATE TABLE experiment(
id uuid,
country text,
data text,
insert_timestamp timestamp,
PRIMARY KEY(insert_timestamp));
I insert data via
INSERT INTO experiment(id, country, data, insert_timestamp) VALUES (uuid(), 'my', 'the data', dateof(now()));
When I
SELECT * from experiment;
I get
insert_timestamp | country | data | id
--------------------------+---------+----------+--------------------------------------
2016-03-03 03:04:36+0000 | my | the data | e08cddd2-b93d-4e39-b0f3-82b813f83a87
But, if I SELECT via insert_timestamp
SELECT * from experiment WHERE insert_timestamp = '2016-03-03 03:04:36+0000';
I get empty result.
insert_timestamp | country | data | id
------------------+---------+------+----
(0 rows)
Any idea why it is so?
A timestamp. Strings constant are allow to input timestamps as dates,
see Working with dates below for more information. Datestamps with
format YYYY-MM-DD HH:MM:SS.SSS are returned.
So when you query the data using 2016-03-03 03:04:36+0000 it is interpreted as 2016-03-03 03:04:36.0+0000 which might not be true when you inserted the data.
Hence it is returning 0 rows.
Note: The date format visible in cql shell is configured in cqlshrc file's UI section.
Also dateOf function is deprecated Details. And based on your data model if there are multiple threads writing data at same time your data will get override.
I'm new to Cassandra and I'm trying to define a data model that fits my requirements.
I have a sensor that collects one value every millisecond and I have to store those data in Cassandra. The queries that I want to perform are:
1) Give me all the sensor values from - to these timestamp values
2) Tell me when this range of values was recorded
I'm not sure if there exist a common schema that can satisfy both queries because I want to perform range queries on both values. For the first query I should use something like:
CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (value, timestamp));
but then for the second query I need the opposite since I can't do range queries on the partition key without using a token that restricts the timestamp:
CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (timestamp, value));
So do I need two tables for this? Or there exist another way?
Thanks
PS: I need to be as fast as possible while reading
I have a sensor that collects one value every millisecond and I have to store those data in Cassandra.
The main problem I see here, is that you're going to run into Cassandra's limit of 2 billion col values per partition fairly quickly. DataStax's Patrick McFadin has a good example for weather station data (Getting Started with Time Series Data Modeling) that seems to fit here. If I apply it to your model, it looks something like this:
CREATE TABLE fooByTime (
sensor_id text,
day text,
timestamp timestamp,
value text,
PRIMARY KEY ((sensor_id,day),timestamp)
);
This will partition on both sensor_id and day, while sorting rows within the partition by timestamp. So you could query like:
> SELECT * FROM fooByTime WHERE sensor_id='5' AND day='20151002'
AND timestamp > '2015-10-02 00:00:00' AND timestamp < '2015-10-02 19:00:00';
sensor_id | day | timestamp | value
-----------+----------+--------------------------+-------
5 | 20151002 | 2015-10-02 13:39:22-0500 | 24
5 | 20151002 | 2015-10-02 13:49:22-0500 | 23
And yes, the way to model in Cassandra, is to have one table for each query pattern. So your second table where you want to range query on value might look something like this:
CREATE TABLE fooByValues (
sensor_id text,
day text,
timestamp timestamp,
value text,
PRIMARY KEY ((sensor_id,day),value)
);
And that would support queries like:
> SELECT * FROm foobyvalues WHERE sensor_id='5'
AND day='20151002' AND value > '20' AND value < '25';
sensor_id | day | value | timestamp
-----------+----------+-------+--------------------------
5 | 20151002 | 22 | 2015-10-02 14:49:22-0500
5 | 20151002 | 23 | 2015-10-02 13:49:22-0500
5 | 20151002 | 24 | 2015-10-02 13:39:22-0500
I have a table like this:
CREATE TABLE mytable (
user_id int,
device_id ascii,
record_time timestamp,
timestamp timeuuid,
info_1 text,
info_2 int,
PRIMARY KEY (user_id, device_id, record_time, timestamp)
);
When I ask Cassandra to delete a record (an entry in the columnfamily) like this:
DELETE from my_table where user_id = X and device_id = Y and record_time = Z and timestamp = XX;
it returns without an error, but when I query again the record is still there. Now if I try to delete a whole row like this:
DELETE from my_table where user_id = X
It works and removes the whole row, and querying again immediately doesn't return any more data from that row.
What I am doing wrong? How you can remove a record in Cassandra?
Thanks
Ok, here is my theory as to what is going on. You have to be careful with timestamps, because they will store data down to the millisecond. But, they will only display data to the second. Take this sample table for example:
aploetz#cqlsh:stackoverflow> SELECT id, datetime FROM data;
id | datetime
--------+--------------------------
B25881 | 2015-02-16 12:00:03-0600
B26354 | 2015-02-16 12:00:03-0600
(2 rows)
The datetimes (of type timestamp) are equal, right? Nope:
aploetz#cqlsh:stackoverflow> SELECT id, blobAsBigint(timestampAsBlob(datetime)),
datetime FROM data;
id | blobAsBigint(timestampAsBlob(datetime)) | datetime
--------+-----------------------------------------+--------------------------
B25881 | 1424109603000 | 2015-02-16 12:00:03-0600
B26354 | 1424109603234 | 2015-02-16 12:00:03-0600
(2 rows)
As you are finding out, this becomes problematic when you use timestamps as part of your PRIMARY KEY. It is possible that your timestamp is storing more precision than it is showing you. And thus, you will need to provide that hidden precision if you will be successful in deleting that single row.
Anyway, you have a couple of options here. One, find a way to ensure that you are not entering more precision than necessary into your record_time. Or, you could define record_time as a timeuuid.
Again, it's a theory. I could be totally wrong, but I have seen people do this a few times. Usually it happens when they insert timestamp data using dateof(now()) like this:
INSERT INTO table (key, time, data) VALUES (1,dateof(now()),'blah blah');
CREATE TABLE worker_login_table (
worker_id text,
logged_in_time timestamp,
PRIMARY KEY (worker_id, logged_in_time)
);
INSERT INTO worker_login_table (worker_id, logged_in_time)
VALUES ("worker_1",toTimestamp(now()));
after 1 hour executed the above insert statement once again
select * from worker_login_table;
worker_id| logged_in_time
----------+--------------------------
worker_1 | 2019-10-23 12:00:03+0000
worker_1 | 2015-10-23 13:00:03+0000
(2 rows)
Query the table to get absolute timestamp
select worker_id, blobAsBigint(timestampAsBlob(logged_in_time )), logged_in_time from worker_login_table;
worker_id | blobAsBigint(timestampAsBlob(logged_in_time)) | logged_in_time
--------+-----------------------------------------+--------------------------
worker_1 | 1524109603000 | 2019-10-23 12:00:03+0000
worker_1 | 1524209403234 | 2019-10-23 13:00:03+0000
(2 rows)
The below command will not delete the entry from Cassandra as the precise value of timestamp is required to delete the entry
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='2019-10-23 12:00:03+0000';
By using the timestamp from blob we can delete the entry from Cassandra
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='1524209403234';
I'm inserting into a Cassandra table with timestamp columns. The data I have comes with microsecond precision, so the time data string looks like this:
2015-02-16T18:00:03.234+00:00
However, in cqlsh when I run a select query the microsecond data is not shown, I can only see time down to second precision. The 234 microseconds data is not shown.
I guess I have two questions:
1) Does Cassandra capture microseconds with timestamp data type? My guess is yes?
2) How can I see that with cqlsh to verify?
Table definition:
create table data (
datetime timestamp,
id text,
type text,
data text,
primary key (id, type, datetime)
)
with compaction = {'class' : 'DateTieredCompactionStrategy'};
Insert query ran with Java PreparedStatment:
insert into data (datetime, id, type, data) values(?, ?, ?, ?);
Select query was simply:
select * from data;
In an effort to answer your questions, I did a little digging on this one.
Does Cassandra capture microseconds with timestamp data type?
Microseconds no, milliseconds yes. If I create your table, insert a row, and try to query it by the truncated time, it doesn't work:
aploetz#cqlsh:stackoverflow> INSERT INTO data (datetime, id, type, data)
VALUES ('2015-02-16T18:00:03.234+00:00','B26354','Blade Runner','Deckard- Filed and monitored.');
aploetz#cqlsh:stackoverflow> SELECT * FROM data
WHERE id='B26354' AND type='Blade Runner' AND datetime='2015-02-16 12:00:03-0600';
id | type | datetime | data
----+------+----------+------
(0 rows)
But when I query for the same id and type values while specifying milliseconds:
aploetz#cqlsh:stackoverflow> SELECT * FROM data
WHERE id='B26354' AND type='Blade Runner' AND datetime='2015-02-16 12:00:03.234-0600';
id | type | datetime | data
--------+--------------+--------------------------+-------------------------------
B26354 | Blade Runner | 2015-02-16 12:00:03-0600 | Deckard- Filed and monitored.
(1 rows)
So the milliseconds are definitely there. There was a JIRA ticket created for this issue (CASSANDRA-5870), but it was resolved as "Won't Fix."
How can I see that with cqlsh to verify?
One possible way to actually verify that the milliseconds are indeed there, is to nest the timestampAsBlob() function inside of blobAsBigint(), like this:
aploetz#cqlsh:stackoverflow> SELECT id, type, blobAsBigint(timestampAsBlob(datetime)),
data FROM data;
id | type | blobAsBigint(timestampAsBlob(datetime)) | data
--------+--------------+-----------------------------------------+-------------------------------
B26354 | Blade Runner | 1424109603234 | Deckard- Filed and monitored.
(1 rows)
While not optimal, here you can clearly see the millisecond value of "234" on the very end. This becomes even more apparent if I add a row for the same timestamp, but without milliseconds:
aploetz#cqlsh:stackoverflow> INSERT INTO data (id, type, datetime, data)
VALUES ('B25881','Blade Runner','2015-02-16T18:00:03+00:00','Holden- Fine as long as nobody unplugs him.');
aploetz#cqlsh:stackoverflow> SELECT id, type, blobAsBigint(timestampAsBlob(datetime)),
... data FROM data;
id | type | blobAsBigint(timestampAsBlob(datetime)) | data
--------+--------------+-----------------------------------------+---------------------------------------------
B25881 | Blade Runner | 1424109603000 | Holden- Fine as long as nobody unplugs him.
B26354 | Blade Runner | 1424109603234 | Deckard- Filed and monitored.
(2 rows)
You can configure the output format of datetime objects in the .cassandra/cqlshrc file, using python's 'strftime' syntax.
Unfortunately, the %f directive for microseconds (there does not seem to be a directive for milliseconds) does not work for older python versions, which means you have to fall back to the blobAsBigint(timestampAsBlob(date)) solution.
I think by "microseconds" (e.g 03.234567) you mean "milliseconds" (e.g. (03.234).
The issue here was a cqlsh bug that failed to support fractional seconds when dealing with timestamps.
So, while your millisecond value was preserved in the actual persistence layer (cassandra), the shell (cqlsh) failed to display them.
This was true even if you were to change time_format in .cqlshrc to display fractional seconds with an %f directive (e.g. %Y-%m-%d %H:%M:%S.%f%z). In this configuration cqlsh would render 3.000000 for our 3.234 value, since the issue was in how cqlsh loaded the datetime objects without loading the partial seconds.
That all being said, this issue was fixed in CASSANDRA-10428, and released in Cassandra 3.4.
It is impossible to show microseconds (1 millionth of a second) using the Cassandra datatype 'timestamp' because the greatest precision available for that datatype is milliseconds (1 thousandth of a second).
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/timestamp_type_r.html
Values for the timestamp type are encoded as 64-bit signed integers
representing a number of milliseconds since the standard base time
known as the epoch
Some related code:
cqlsh> CREATE KEYSPACE udf
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
cqlsh> USE udf;
cqlsh:udf> CREATE OR REPLACE FUNCTION udf.timeuuid_as_us ( t timeuuid )
RETURNS NULL ON NULL INPUT
RETURNS bigint LANGUAGE JAVA AS '
long msb = t.getMostSignificantBits();
return
( ((msb >> 32) & 0x00000000FFFFFFFFL)
| ((msb & 0x00000000FFFF0000L) << 16)
| ((msb & 0x0000000000000FFFL) << 48)
) / 10
- 12219292800000000L;
';
cqlsh:udf> SELECT
toUnixTimestamp(now()) AS now_ms
, udf.timeuuid_as_us(now()) AS now_us
FROM system.local;
now_ms | now_us
---------------+------------------
1525995892841 | 1525995892841000