How to typecast timestamp_ntz to CST format using Spark/PySpark while writing to Snowflake - apache-spark

As we know we have 3 timestamps in Snowflake,
TIMESTAMP_NTZ
TIMESTAMP_LTZ
TIMESTAMP_TZ
So while writing timestamp to snowflake table, it bydefault takes as TIMESTAMP_NTZ.
How snowflake can take the timestamp in CST timezone while writing to snowflake table?

First it's important to know what timezone has been set as the default for your account/session:
SHOW PARAMETERS LIKE 'TIMEZONE';
Then change the default for your session to CST
ALTER SESSION SET TIMEZONE = 'America/Chicago';
And thereafter any selects of current_timestamp will be providing the data in the right timezone
SELECT CURRENT_TIMESTAMP;
This is a great article for reference:
Snowflake timestamp datatype ref
Assuming you have control over the precise column type in your table, I found that TIMESTAMP_TZ is how you want to define your table. Here's working example of everything I did:
alter session set timezone = 'America/Los_Angeles';
create or replace table ts_test(rn number, ts timestamp_tz);
insert into ts_test values(1, current_timestamp());
insert into ts_test values(2, '2019-12-10 07:50:00 -06:00');
insert into ts_test values(3, CONVERT_TIMEZONE('America/Chicago', CURRENT_TIMESTAMP()));
select * from ts_test;
if the timestamp is being generated in code, then make sure you include the UTC offset when inserting (rn 2). If you're using the current_timestamp() which is LA, then make sure you convert to CST (rn 3).
if the table is being generated and you don't control the timezone default, then issue this first:
alter session set timestamp_type_mapping = timestamp_tz;

Related

Hive TimeStamp column with TimeZone

I have a created a table in hive with one column as timestamp datatype. While I am inserting into the hive getting different than the existing.
My column expected value : 2021-11-03 16:57:10.842 UTC (This I am getting as string). How I can store the above output in hive table( column with Datatype as timestamp)
You need to use cast to convert this to timestamp after removing the word UTC. Since hive doesnt care about timezone intentionally, and display data in UTC, you should be in good shape.
select cast( substr('2021-11-03 16:57:10.84 UTC',1,23) as timestamp) as ts
Pls note you need to have the data in above yyyy-MM-dd hh:mm:ss.SS format.
Also pls note you can not use from_unixtime(unix_timestamp(string_col , 'dd-MM-yyyy HH:mm:ss.SSS')) because we will loose millisecond part.

Cassandra Timestamp behavior with Select query

I have a column "postingdate" with datatype timestamp in Cassandra. I am using spring data Cassandra to save current date/time in this column when posting happens (Instant.now()). This is inserting date/time in UTC.
I have to select records which got posted on "2018-11-06". In table I have one record posted on this date and postingdate column is showing that as "2018-11-07 04:25:24+0000" in UTC.
I am running following query -
select * from mytable where id='5' and postingdate >=
'2018-11-06 00:00:00' and postingdate <= '2018-11-06 23:59:59';
Running this query on Dev Center console (or CQLSH), is giving me same results irrespective of timezone. I tried that in PST as well as IST and got the same result. Is Cassandra doing PST -> UTC OR IST -> UTC conversion before executing the query? If yes then how?
Per documentation:
When timezone is excluded, it's set to the client or coordinator timezone.
You can configure default timezone for CQLSH either by setting the TZ environment variable, or by specifying the timezone parameter in the cqlshrc configuration file.

Cassandra inserts timestamp in UTC time

I have json logs with timestamp(UTC TIME) in it. I map keys and values to Cassandra Table keys and Insert the record. However, Cassandra converts the already UTC timestamps to UTC again by subtracting 5 hours from the timestamp. The timezone here is (GMT + 5).
cqlsh> INSERT INTO myTable (id,time) VAlUES (abc123, 2018-01-12T12:32:31);
Now the time is already UTC time and its still inserts a timestamp of 5 hours ago.
How can I resolve this?
If you're using cqlsh to insert data, then you can specify default timezone in the cqlshrc file using the timezone parameter (see default cqlshrc as example).
If you insert dates programmatically, then you need to convert your time into corresponding type matching to the Cassandra's timestamp type (java.util.Date for Java, for example). In your case change could be simple - just append Z to timestamp string as pointed by Ralf

Cassandra time not saved in UTC

I need to split my timestamp to date and time separately and insert then to db columns with 'date' and 'time' cqltypes.
I was trying to insert a time value as string to Cassandra table. The time was converted to UTC (05:27:00). But while I checked table using Datastax devcenter, column was populated with value '09:37:54.935541808'. I tried to retrieve the value in spring using repository, then it was returning value as '3473746674935541808'.
How to get the correct value from table for time?
It looks like the limitation of Spring-data. In Cassandra time value is encoded as a 64-bit signed integer representing the number of nanoseconds since midnight. But I don't see the time type listed as supported in spring-data-cassandra documentation, so you may need to write your custom converter for it, as described in documentation.

Cassandra Timestamp : Incorrect time value

I am new in Cassandra. I have a Cassandra( V: 3.11 ) table (data). It is having a column timeStampCol of timestamp type and I am inserting a value in it.
insert into data (timeStampCol) values('2017-05-02 17:33:03');
While accessing the data from table
select * from data;
I got result like -
# Row 1
----------+------------------------------------
timeStampCol | 2017-05-02 08:33:03.000000+0000
Inserted value and retrieved values are different for time.
Reason might be timezone, how can I get it correct ?
Your selected timestamp value is correct, it's just showing in different timezone.
If you insert data into timestamp column without providing timezone like this one :
insert into data (timeStampCol) values('2017-05-02 17:33:03');
Cassandra will choose coordinator timezone
If no time zone is specified, the time zone of the Cassandra coordinator node handing the write request is used. For accuracy, DataStax recommends specifying the time zone rather than relying on the time zone configured on the Cassandra nodes.
You Need To Convert the String date into java.util.Date and set the timezone of coordinator node, In my case it was GMT+6
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Date date = dateFormat.parse("2012-01-21");
dateFormat.setTimeZone(TimeZone.getTimeZone("GMT+6")); //Change this time zone
Source : https://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html
Cassandra will assume incoming data in the timezone it is set up. For example if you have Cassandra set up in IST, and even though incoming data is UTC, Cassandra will convert it back to UTC, considering data to be in IST.
You might have to set Cassandra coordinator timezone in code or calculate the time difference between the incoming data timezone and Cassandra timezone and add/subtract that from incoming data before it is written to Cassandra. This way you will have the exact timestamps written to Cassandra.

Resources