Aws athena / presto string to timestamp - presto

How can I parse the following string 2020-07-21 11:19:00.874+00:00, where +00:00 is the timezone, to a datetime?. I have tried with function parse_datetime(), but it does not seem to work

You can use this format:
SELECT parse_datetime('2020-07-21 11:19:00.874+00:00', 'yyyy-MM-dd HH:mm:ss.SSSZ');

Related

How to convert timestamp from datatype string to datatype timestamp in Hive?

in my HIVE table_one CREATEDDATE is datatype = STRING and I want to convert CREATEDATE to datatype timestamp in timezone 'Europe/Berlin'.
Table_one Field CREATEDDATE (STRING) UTC+0: "2022-07-13T09:30:19.000+0000"
This is my spark sql function where I load CREATEDDATE from table_one:
, to_utc_timestamp(from_unixtime(unix_timestamp(CREATEDDATE , "yyyy MM dd HH:mm:ss Z")),'CET') AS CREATEDDATE
But after insert into table_two with Field CREATEDDATE = TIMESTAMP, the field is still null.
What's wrong with my function? Wrong format?
replace T with a space and youre good to go.
Your code should be
select to_utc_timestamp(from_unixtime(unix_timestamp(replace(createdate,'T',' '))),'CET') time_in_CET
Example below -
select to_utc_timestamp(from_unixtime(unix_timestamp( replace('2022-07-13T09:30:19.000+0000 Z','T',' '))),'CET')

to_timestamp/unix_timestamp is unable to parse string datetime to timestamp in spark for daylight saving datetime

I am using spark 2.4 and using the below code to cast the string datetime column(rec_dt) in a dataframe(df1) to timestamp(rec_date) and create another dataframe(df2).
All the datetime values are getting parsed correctly except for the values where there are daylight saving datetime values.
The time zone of my session is 'Europe/London' and I do not want to store the data as UTC time zone and finally I have to write data as 'Europe/London' time zone only.
spark_session.conf.get("spark.sql.session.timeZone")
# Europe/London
Code :
df2 = df1.withColumn("rec_date", to_timestamp("rec_dt","yyyy-MM-dd-HH.mm.ss"))
output :
Please help.

Date type displaying with timezone on node-postgres module

I have stored input data in date format in postgres database, but when I am showing the date in browser it's showing date with timezone and converting it from utc. For example I have stored the date in 2020-07-16 format. But when i am showing the date it becomes 2020-07-15T18:00:00.000Z. I have tried using select mydate::DATE from table to get only date but its still showing date with timezone. I am using node-postgres module in my node app. I suspect it's some configuration on node-postgres module? From their doc:
node-postgres converts DATE and TIMESTAMP columns into the local time
of the node process set at process.env.TZ
Is their any way i can configure it to only parse date? If i query like this SELECT TO_CHAR(mydate :: DATE, 'yyyy-mm-dd') from table i get 2020-07-16 but thats lot of work just to get date
You can make your own date and time type parser:
const pg = require('pg');
pg.types.setTypeParser(1114, function(stringValue) {
return stringValue; //1114 for time without timezone type
});
pg.types.setTypeParser(1082, function(stringValue) {
return stringValue; //1082 for date type
});
The type id can be found in the file: node_modules/pg-types/lib/textParsers.js
It is spelled out here:
https://node-postgres.com/features/types
date / timestamp / timestamptz
console.log(result.rows)
// {
// date_col: 2017-05-29T05:00:00.000Z,
// timestamp_col: 2017-05-29T23:18:13.263Z,
// timestamptz_col: 2017-05-29T23:18:13.263Z
// }
bmc=# select * from dates;
date_col | timestamp_col | timestamptz_col
------------+-------------------------+----------------------------
2017-05-29 | 2017-05-29 18:18:13.263 | 2017-05-29 18:18:13.263-05
(1 row)

Google Bigquery API turning dates into Integers

I am running a load job and the dateformat from the pythons DataFrame is turning into an integer. The date format is in UK Time. 01/01/1970 00:00:00
I have tried to transform it using:
pd.to_datetime(appended_data['DateOfBirth'], dayfirst = True,
format = '%d/%m/%Y %H:%M:%S', errors = 'coerce')
The format it comes to: 1938-09-01, however BigQuery will then put it in this integer 1012262400000
Any ideas on the best way to to this. Is it just to put in as a String in BigQuery but I have tried to do that with parse and format date, and it's not working?

change Unix(Epoch) time to local time in pyspark

I have a dataframe in Spark which contains Unix(Epoch) time and also timezone name. I hope to convert the epochtime to local time according to different tz name.
Here is how my data looks like:
data = [
(1420088400, 'America/New_York'),
(1420088400, 'America/Los_Angeles'),
(1510401180, 'America/New_York'),
(1510401180, 'America/Los_Angeles')]
df = spark.createDataFrame(data, ["epoch_time", "tz_name"])
df.createOrReplaceTempView("df")
df1 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,"
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time"
from df""")
df1.show(truncate= False)
Here is the result:
+----------+-------------------+-------------------+---------------------+
|epoch_time|tz_name |gmt_time |local_time |
+----------+-------------------+-------------------+---------------------+
|1420088400|America/New_York |2015-01-01 05:00:00|2015-01-01 00:00:00.0|
|1420088400|America/Los_Angeles|2015-01-01 05:00:00|2014-12-31 21:00:00.0|
|1510401180|America/New_York |2017-11-11 11:53:00|2017-11-11 06:53:00.0|
|1510401180|America/Los_Angeles|2017-11-11 11:53:00|2017-11-11 03:53:00.0|
+----------+-------------------+-------------------+---------------------+
I'm not quite sure if this transfer is right, but it seems the daylight saving has been taking care of.
Should I first change epochtime to time string using from_unixtime, then change it to utc timestamp using to_utc_timestamp, finally change this UTC timestamp to local time with tz_name? Tried this but got error
df2 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time,
from_utc_timestamp(to_utc_timestamp(from_unixtime(epoch_time),from_unixtime(unix_timestamp(), 'z')), tz_name) as newtime from df""")
How could I check my EMR server timezone?
Tried use , is this the server timezone?
spark.sql("select from_unixtime(unix_timestamp(), 'z')").show()
which gave me:
+--------------------------------------------------------------------------+
|from_unixtime(unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss), z)|
+--------------------------------------------------------------------------+
| UTC|
+--------------------------------------------------------------------------+
Thank you for your clarification.
When you call from_unixtime it will format the date based on your Java runtime's timezone, since it's just using the default timezone for SimpleDateFormat here. In your case it's UTC. So when you convert the values to local time you would only need to call from_utc_timestamp with the tz_name value passed in. However if you were to change your system timezone then you would need to call to_utc_timestamp first.
Spark 2.2 introduces a timezone setting so you can set the timezone for your SparkSession like so
spark.conf.set("spark.sql.session.timeZone", "GMT")
In which case the time functions will use GMT vs your system timezone, see source here

Resources