Hive TimeStamp column with TimeZone - apache-spark

I have a created a table in hive with one column as timestamp datatype. While I am inserting into the hive getting different than the existing.
My column expected value : 2021-11-03 16:57:10.842 UTC (This I am getting as string). How I can store the above output in hive table( column with Datatype as timestamp)

You need to use cast to convert this to timestamp after removing the word UTC. Since hive doesnt care about timezone intentionally, and display data in UTC, you should be in good shape.
select cast( substr('2021-11-03 16:57:10.84 UTC',1,23) as timestamp) as ts
Pls note you need to have the data in above yyyy-MM-dd hh:mm:ss.SS format.
Also pls note you can not use from_unixtime(unix_timestamp(string_col , 'dd-MM-yyyy HH:mm:ss.SSS')) because we will loose millisecond part.

Related

Specify datetime2 format in Azure SQL data warehouse (synapse)

What is the correct way to specify the format of a datetime2 field when creating a table in Azure SQL data warehouse? I don't seem to be able to find an example in the documentation.
The data looks like this:
"2020-09-14T20:50:48.000Z"
CREATE TABLE [Foo].[Bar](
...
MyDateTime datetime2(['YYYY-MM-DDThh:mm:ss[.fractional seconds]')
)
As Panagiotis notes, the underlying representation is an int/long for the actual date value. This is how RDBMS engines can quickly compute the delta between two dates (days between Monday and Friday is a simple subtraction problem). To answer your question, you simply would format your create table as:
CREATE TABLE [Foo].[Bar](
...
MyDateTime datetime2
)
If you're interested in formatting the result in a query, you can look to the CONVERT or FORMAT functions. For example, if you wanted the format dd-mm-yyyy (Italian date), you could use either of the following:
SELECT
CONVERT(VARCHAR, CURRENT_TIMESTAMP, 105)
, FORMAT(CURRENT_TIMESTAMP, 'dd-MM-yyyy')
Note: CONVERT is generally faster than FORMAT and is the recommended approach if you have a date format that is supported. This is because the FORMAT function relies on the CLR which will include a context/process jump.

How to Convert a column having one timestamp to another timestamp in Azure Data Factory

I have column ABC where timestamp is of format dd/MM/yyyy HH:mm:SS (11/04/2020 1:17:40).I want to create another column ABC_NEW with same data as old column but with different timestamp 'yyyy-MM-dd HH:mm:SS'.I tried doing in azure data factory derived column using
toTimestamp(column_name,'yyyy-MM-dd HH:mm:SS') but it did not work it is coming as NULL. Can anyone help?
It's a 2-step process. You first need to tell ADF what each field in your timestamp column represents, then you can use string conversions to manipulate that timestamp into the output string as you like:
toString(toTimestamp('11/04/2020 1:17:40','MM/dd/yyyy HH:mm:ss'),'yyyy-MM-dd HH:mm:SS')
Data Factory doesn't support date format 'dd/mm/yyyy', we can not convert it to 'YYYY-MM-DD' directly.
I use DerivedColumn to generate a new column ABC_NEW from origin column DateTime and enter the expression bellow:
toTimestamp(concat(split(substring(DateTime,1, 10), '/')[3], '-',split(substring(DateTime,1, 10), '/')[2],'-',split(substring(DateTime,1, 10), '/')[1],substring(DateTime,11, length(DateTime))))
The result shows:
This is a trick which was a blocker for me, but try this-
Go to sink
Mapping
Click on output format
Select the data format or time format you prefer to store the data into the sink.

Specifying timestamp or date format in Athen Table

I have a timestamp in ISO-8601 format and want to specify it either as a timestamp or datetime format when creating a table in Athena. Any clues on how to do this ?
Thanks!
When you create table in Athena you can set a column as date or timestamp only in the Unix format as follows:
DATE, in the UNIX format, such as YYYY-MM-DD.
TIMESTAMP. Instant in time and date in the UNiX format, such as
yyyy-mm-dd hh:mm:ss[.f...]. For example, TIMESTAMP '2008-09-15
03:04:05.324'. This format uses the session time zone.
If the format is different, define it as a String and when you query the data use the date function:
from_iso8601_date(string) → date
You can convert the data to make it easier and cheaper for specific use cases by using CTAS (create table as select) query that will generate a new copy of the data in a simpler and more efficient (compressed and columnar) parquet format.

How to typecast timestamp_ntz to CST format using Spark/PySpark while writing to Snowflake

As we know we have 3 timestamps in Snowflake,
TIMESTAMP_NTZ
TIMESTAMP_LTZ
TIMESTAMP_TZ
So while writing timestamp to snowflake table, it bydefault takes as TIMESTAMP_NTZ.
How snowflake can take the timestamp in CST timezone while writing to snowflake table?
First it's important to know what timezone has been set as the default for your account/session:
SHOW PARAMETERS LIKE 'TIMEZONE';
Then change the default for your session to CST
ALTER SESSION SET TIMEZONE = 'America/Chicago';
And thereafter any selects of current_timestamp will be providing the data in the right timezone
SELECT CURRENT_TIMESTAMP;
This is a great article for reference:
Snowflake timestamp datatype ref
Assuming you have control over the precise column type in your table, I found that TIMESTAMP_TZ is how you want to define your table. Here's working example of everything I did:
alter session set timezone = 'America/Los_Angeles';
create or replace table ts_test(rn number, ts timestamp_tz);
insert into ts_test values(1, current_timestamp());
insert into ts_test values(2, '2019-12-10 07:50:00 -06:00');
insert into ts_test values(3, CONVERT_TIMEZONE('America/Chicago', CURRENT_TIMESTAMP()));
select * from ts_test;
if the timestamp is being generated in code, then make sure you include the UTC offset when inserting (rn 2). If you're using the current_timestamp() which is LA, then make sure you convert to CST (rn 3).
if the table is being generated and you don't control the timezone default, then issue this first:
alter session set timestamp_type_mapping = timestamp_tz;

Cassandra inserts timestamp in UTC time

I have json logs with timestamp(UTC TIME) in it. I map keys and values to Cassandra Table keys and Insert the record. However, Cassandra converts the already UTC timestamps to UTC again by subtracting 5 hours from the timestamp. The timezone here is (GMT + 5).
cqlsh> INSERT INTO myTable (id,time) VAlUES (abc123, 2018-01-12T12:32:31);
Now the time is already UTC time and its still inserts a timestamp of 5 hours ago.
How can I resolve this?
If you're using cqlsh to insert data, then you can specify default timezone in the cqlshrc file using the timezone parameter (see default cqlshrc as example).
If you insert dates programmatically, then you need to convert your time into corresponding type matching to the Cassandra's timestamp type (java.util.Date for Java, for example). In your case change could be simple - just append Z to timestamp string as pointed by Ralf

Resources