Convert Timestamp to date using Cassandra query - cassandra

I need to convert timestamp '1998/02/12 00:00:00 to data 1998-02-12' using Cassandra query. Can anyone help me on this.
Is it possible or not?

You can use toDate function in cql to get date out of datetime. \
For example, if your table entry looks like:
id | datetime | value
-------------+---------------------------------+-------
22170825421 | 2018-02-15 14:06:01.000000+0000 | 50
You can run the following query:
select id, datetime, toDate(datetime) as day, value from datatable;
and it will give you:
id | datetime | day | value
-------------+---------------------------------+------------+-------
22170825421 | 2018-02-15 14:06:01.000000+0000 | 2018-02-15 | 50

You can't do it directly in the Cassandra, as it accepts data as YYYY-mm-dd, so you need to use some other method (depending on language that you're using) to convert your string into this format.

Related

How to get AUD price of bitcoin at a particular date

I am trying to find out how to get the price of bitcoin, ethereum, litecoin and bitcoin cash in AUD at a particular date.
I have a table as follows
+------------+-------+
| Date | Price |
+------------+-------+
| 16/03/2016 | |
| 19/04/2016 | |
| 03/12/2017 | |
+------------+-------+
I have tried entering using =IMPORTXML("http://coinmarketcap.com/currencies/bitcoin/","//span[#id='quote_price']") in the price column but it doesn't seem to work.
I would use:
https://min-api.cryptocompare.com/documentation?key=Historical&cat=dataPriceHistorical
In order to use this service you need an api key and register at their website:https://www.cryptocompare.com/cryptopian/api-keys
You also need to convert your date to unix timestamp:
https://www.unixtimestamp.com/index.php
e.g. 16/03/2016 => 1458086400
so ts=1458086400
The complete api call would resemble:
https://minapi.cryptocompare.com/data/pricehistoricalfsym=BTC&tsyms=AUD&ts=%201458086400&api_key="your-key"
executing this call gives = AUD: 582.52
The value in 'Kolom1' is de resulting value in AUD. This value can be retrieved by executing the url in 'URL' field : =WEBSERVICE([#URL])
Remark: to convert your dates 16/03/16 see post:
Excel date to Unix timestamp

Kusto Query to the earliest timestamp grouped by user_Id

I'm just starting with kusto, and my journey was abruptly stopped by the problem of getting the list of user_Ids with the timestamp of the very first customEvent sent by a user in the given time frame.
How should I modify my query to get the results (let's assume that the limiting timespan is 30days)
customEvents
| where timestamp >= ago(30d)
| summarize min(timestamp)
If you want to get just the min of the timestamp just add the "by" clause:
customEvents
| where timestamp >= ago(30d)
| summarize min(timestamp) by user_Id
If you want to get the full row, use arg_min() function, for example:
customEvents
| where timestamp >= ago(30d)
| summarize arg_min(timestamp, *) by user_Id

Convert UTC timestamp to local time based on time zone in PySpark

I have a PySpark DataFrame, df, with some columns as shown below. The hour column is in UTC time and I want to create a new column that has the local time based on the time_zone column. How can I do that in PySpark?
df
+-------------------------+------------+
| hour | time_zone |
+-------------------------+------------+
|2019-10-16T20:00:00+0000 | US/Eastern |
|2019-10-15T23:00:00+0000 | US/Central |
+-------------------------+------------+
#What I want:
+-------------------------+------------+---------------------+
| hour | time_zone | local_time |
+-------------------------+------------+---------------------+
|2019-10-16T20:00:00+0000 | US/Eastern | 2019-10-16T15:00:00 |
|2019-10-15T23:00:00+0000 | US/Central | 2019-10-15T17:00:00 |
+-------------------------+------------+---------------------+
You can use the in-built from_utc_timestamp function. Note that the hour column needs to be passed in as a string without timezone to the function.
Code below works for spark versions starting 2.4.
from pyspark.sql.functions import *
df.select(from_utc_timestamp(split(df.hour,'\+')[0],df.time_zone).alias('local_time')).show()
For spark versions before 2.4, you have to pass in a constant string representing the time zone, as the second argument, to the function.
Documentation
pyspark.sql.functions.from_utc_timestamp(timestamp, tz)
This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and renders that timestamp as a timestamp in the given time zone.
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to the given timezone.
This function may return confusing result if the input is a string with timezone, e.g. ‘2018-03-13T06:18:23+00:00’. The reason is that, Spark firstly cast the string to timestamp according to the timezone in the string, and finally display the result by converting the timestamp to string according to the session local timezone.
Parameters
timestamp – the column that contains timestamps
tz – a string that has the ID of timezone, e.g. “GMT”, “America/Los_Angeles”, etc
Changed in version 2.4: tz can take a Column containing timezone ID strings.
You should also be able to use a spark UDF.
from pytz import timezone
from datetime import datetime
from pyspark.sql.functions import udf
def mytime(x,y):
dt = datetime.strptime(x, "%Y-%m-%dT%H:%M:%S%z")
return dt.astimezome(timezone(y))
mytimeUDF = udf(mytime, StringType())
df = df.withColumn('local_time', mytimeUDF("hour","timezone"))

CQLSH: Converting unix timestamp to datetime

I am performing a cql query on a column that stores the values as unix timestmap, but want the results to output as datetime. Is there a way to do this?
i.e. something like the following:
select convertToDateTime(column) from table;
I'm trying to remember if there's an easier, more direct route. But if you have a table with a UNIX timestamp and want to show it in a datetime format, you can combine the dateOf and min/maxTimeuuid functions together, like this:
aploetz#cqlsh:stackoverflow2> SELECT datetime,unixtime,dateof(mintimeuuid(unixtime)) FROM unixtime;
datetimetext | unixtime | dateof(mintimeuuid(unixtime))
----------------+---------------+-------------------------------
2015-07-08 | 1436380283051 | 2015-07-08 13:31:23-0500
(1 rows)
aploetz#cqlsh:stackoverflow2> SELECT datetime,unixtime,dateof(maxtimeuuid(unixtime)) FROM unixtime;
datetimetext | unixtime | dateof(maxtimeuuid(unixtime))
----------------+---------------+-------------------------------
2015-07-08 | 1436380283051 | 2015-07-08 13:31:23-0500
(1 rows)
Note that timeuuid stores greater precision than either a UNIX timestamp or a datetime, so you'll need to first convert it to a TimeUUID using either the min or maxtimeuuid function. Then you'll be able to use dateof to convert it to a datetime timestamp.

Cassandra `COPY FROM`unable to coerce GMT date string to a formatted date (long)

I have been trying to use COPY FROM to insert into a Cassandra table that has a timestamp type column. However, I encountered the following error:
code=2200 [Invalid query] message="unable to coerce '2015-03-06 18:11:33GMT' to a formatted date (long)"
Aborting import at record #3. Previously-inserted values still present.
0 rows imported in 0.211 seconds.
The content of the CSV file was actually created with a COPY TO command. My TZ environment variable has been set to GMT.
I did some searching and found a post here that mentioned using Z instead of GMT as the timezone in the data string, i.e. '2015-03-06 18:11:33Z'. If I replace all the GMT in my CSV with Z, COPY FROM worked. Link for the post here:
unable to coerce '2012/11/11' to a formatted date (long)
When I run a SELECT on this table, the datetime column shows up in the format of: 2015-03-06 17:53:23GMT.
Further info, there was a bug about 'Z' timezone but it was fixed. Link: https://issues.apache.org/jira/browse/CASSANDRA-6973
So my question is, is there a way that I can run COPY TO so that it writes Z instead of GMT for time zone?
Alternatively, is there a way I can make COPY FROM work with GMT?
Thanks.
Note: The solution is in the comment from #Aaron for this post. Yes, it's a hack but it works.
I think what is happening here, is that you are getting bit by your time_format property in your ~/.cassandra/cqlshrc file. COPY uses this setting when exporting your timestamp data during a COPY TO. CQLSH uses the Python strftime formats. It is interesting to note that the lowercase %z and uppercase %Z seem to represent your problem.
When I SELECT timestamp data with %Z (upper), it looks like this:
aploetz#cqlsh:stackoverflow> SELECT * FROm posts1;
userid | posttime | postcontent | postid
--------+------------------------+--------------+--------------------------------------
1 | 2015-01-25 13:25:00CST | blahblah5 | 13218139-991c-4ddc-a11a-86992f6fed66
1 | 2015-01-25 13:22:00CST | blahblah2 | eacdebcc-35c5-45f7-9374-d5fd987e699f
0 | 2015-03-12 14:10:00CDT | sdgfjdsgojr | 82766df6-4cca-4ad1-ae59-ba4488103da4
0 | 2015-03-12 13:56:00CDT | kdsjfsdjflds | bd5c2be8-be66-41da-b9ff-98e9a4836000
0 | 2015-03-12 09:10:00CDT | sdgfjdsgojr | 6865216f-fc4d-431c-8067-c27cf20b6be7
When I try to INSERT a record using that date format, it fails:
aploetz#cqlsh:stackoverflow> INSERT INTO posts1 (userid,posttime,postcontent,postid) VALUES (0,'2015-03-12 14:27CST','sdgfjdsgojr',uuid());
code=2200 [Invalid query] message="unable to coerce '2015-03-12 14:27CST' to a formatted date (long)"
But when I alter time_format to use the (lowercase) %z the same query produces this:
aploetz#cqlsh:stackoverflow> SELECT * FROm posts1;
userid | posttime | postcontent | postid
--------+--------------------------+--------------+--------------------------------------
1 | 2015-01-25 13:25:00-0600 | blahblah5 | 13218139-991c-4ddc-a11a-86992f6fed66
1 | 2015-01-25 13:22:00-0600 | blahblah2 | eacdebcc-35c5-45f7-9374-d5fd987e699f
0 | 2015-03-12 14:10:00-0500 | sdgfjdsgojr | 82766df6-4cca-4ad1-ae59-ba4488103da4
0 | 2015-03-12 13:56:00-0500 | kdsjfsdjflds | bd5c2be8-be66-41da-b9ff-98e9a4836000
0 | 2015-03-12 09:10:00-0500 | sdgfjdsgojr | 6865216f-fc4d-431c-8067-c27cf20b6be7
I can also INSERT data in this format:
INSERT INTO posts1 (userid,posttime,postcontent,postid)
VALUES (0,'2015-03-12 14:27-0500','sdgfjdsgojr',uuid());
It also appears in this way when I run a COPY TO, and a COPY FROM of the same data/file also works.
In summary, check your ~/.cassandra/cqlshrc and make sure that you are either using the default setting, or this setting in the [ui] section:
[ui]
time_format = %Y-%m-%d %H:%M:%S%z
It won't get you the 'Z' like you asked for, but it will allow you to COPY TO/FROM your data without having to muck with the CSV file.
Edit
For those of you poor souls out there using CQLSH (or Cassandra, God help you) on Windows, the default location of the cqlshrc file is c:\Users\%USERNAME%\.cassandra\cqlshrc.
Edit - 20150903
Inspired by this question, I submitted a patch (CASSANDRA-8970) to allow users to specify a custom time format with COPY, and it was marked as "Ready To Commit" yesterday. Basically, this patch will allow this problem to be solved by doing the following:
COPY posts1 TO '/home/aploetz/posts1.csv' WITH DELIMITER='|' AND HEADER=true
AND TIME_FORMAT='%Y-%m-%d %H:%M:%SZ;
Edit - 20161010
The COPY command was improved in Cassandra 2.2.5, and the TIMEFORMAT option has been renamed to DATETIMEFORMAT.
From New options and better performance in cqlsh copy:
DATETIMEFORMAT, which used to be called TIMEFORMAT, a string containing the Python strftime format for date and time values, such as ‘%Y-%m-%d %H:%M:%S%z’. It defaults to the time_format value in cqlshrc.

Resources