Spark getting the current date from another country - apache-spark

I need to get the date and time of another country:
dateFormat = "%Y%m%d_%H%M"
ts=spark.sql(""" select current_timestamp() as ctime """).collect()[0]["ctime"]
ts.strftime(dateFormat)

You don't need pyspark for such a task, especially when you call .collect():
import pytz
from datetime import datetime
tz = pytz.timezone('Asia/Shanghai')
ts = datetime.now(tz)
ts.strftime('%Y%m%d_%H%M')

The session time zone is set with the configuration ‘spark.sql.session.timeZone’ and will default to the JVM system local time zone,you can change Time Zone , add your time zone will give you proper date
spark.conf.set("spark.sql.session.timeZone", "UTC")

Related

to_timestamp/unix_timestamp is unable to parse string datetime to timestamp in spark for daylight saving datetime

I am using spark 2.4 and using the below code to cast the string datetime column(rec_dt) in a dataframe(df1) to timestamp(rec_date) and create another dataframe(df2).
All the datetime values are getting parsed correctly except for the values where there are daylight saving datetime values.
The time zone of my session is 'Europe/London' and I do not want to store the data as UTC time zone and finally I have to write data as 'Europe/London' time zone only.
spark_session.conf.get("spark.sql.session.timeZone")
# Europe/London
Code :
df2 = df1.withColumn("rec_date", to_timestamp("rec_dt","yyyy-MM-dd-HH.mm.ss"))
output :
Please help.

Python convert a str date into a datetime with timezone object

In my django project i have to convert a str variable passed as a date ("2021-11-10") to a datetime with timezone object for execute an ORM filter on a DateTime field.
In my db values are stored as for example:
2021-11-11 01:18:04.200149+00
i try:
# test date
df = "2021-11-11"
df = df + " 00:00:00+00"
start_d = datetime.strptime(df, '%Y-%m-%d %H:%M:%S%Z')
but i get an error due to an error about str format and datetime representation (are different)
How can i convert a single date string into a datetimeobject with timezone stated from midnight of the date value?
So many thanks in advance
It's not the way to datetime.strptime.
Read a little bit more here
I believe it will help you.
you should implement month as str and without "-".
good luck

PyMongo/Python/Airflow - Convert CST Date/DateTime to UTC and store in MongoDB in ISO format?

Which is the correct or ideal or preferred method to convert a CST Date and/or Datetime field to UTC along with DST aware settings and store in MongoDB in ISO format in Python/PyMongo ? The source date/datetime field can come from any timezone (right now we know its CST), I need to convert all of them to UTC and store into target MongoDB.
As per MongoDB docs, MongoDB stores times in UTC by default, and will convert any local time representations into this form. Applications that must operate or report on some unmodified local time value may store the time zone alongside the UTC timestamp, and compute the original local time in their application logic.
Examples:
Method#1: with Timestamp (local timezone defined)
from datetime import datetime
import pytz
local_timezone = pytz.timezone("US/Central")
utc_datetime = local_timezone.localize(datetime.strptime ("1/2/2017 12:43 pm",'%m/%d/%Y %H:%M %p'),is_dst=True).astimezone(pytz.utc)
print(utc_datetime)
print(type(utc_datetime))
2017-01-02 18:43:00+00:00
<class 'datetime.datetime'>
without timestamp i.e. just date: - it adds an offset value of 6 hours in timestamp and during DST 5 hours. Removing or without astimezone(pytz.utc) , it returns date/time like 2017-01-02 00:00:00-06:00 i.e. showing -6 hours offset difference. Should we really be using astimezeon(pytz.utc) ??
from datetime import datetime
import pytz
local_timezone = pytz.timezone("US/Central")
utc_datetime = local_timezone.localize(datetime.strptime ("1/2/2017",'%m/%d/%Y'),is_dst=True).astimezone(pytz.utc)
print(utc_datetime)
print(type(utc_datetime))
2017-01-02 06:00:00+00:00
<class 'datetime.datetime'>
Method#2: with Timestamp (local timezone NOT defined)
from datetime import datetime, timezone
utc_datetime=datetime.utcfromtimestamp(datetime.strptime ("1/2/2017 12:43 pm",'%m/%d/%Y %H:%M %p').replace(tzinfo = timezone.utc).timestamp())
print(utc_datetime)
print(type(utc_datetime))
2017-01-02 12:43:00
<class 'datetime.datetime'>
without Timestamp i.e. just date part - no offset
from datetime import datetime, timezone
utc_datetime=datetime.utcfromtimestamp(datetime.strptime ("1/2/2017",'%m/%d/%Y').replace(tzinfo = timezone.utc).timestamp())
print(utc_datetime)
print(type(utc_datetime))
2017-01-02 00:00:00
<class 'datetime.datetime'>
After loading into MongoDB - it adds a "Z" at the end of the date/timestamp. Should I also add "tz_aware=True" when initiating connection with MongoClient ?
ISOFormat - changing above utc timestamp to isoformat() returns and gets loaded as string in MongoDB instead of a Date. So, how do we ensure it is still stored in ISO Date format in MongoDB ?
utc_datetime_iso=datetime.utcfromtimestamp(datetime.strptime ("1/2/2017",'%m/%d/%Y').replace(tzinfo = timezone.utc).timestamp()).**isoformat()**
print(utc_datetime_iso)
print(type(utc_datetime_iso))
2017-01-02T00:00:00
<class 'str'>
I never worked with python, so I can give only some general notes.
Never store date/time values as string, use proper Date object. Storing date/time values as strings is usually a design failure.
All Date values in MongoDB are stored in UTC - always and only. Some client applications implicitly converts UTC to local times and display local values, however internally in MongoDB it is always UTC.
If you run db.collection.insertOne({ts: ISODate("2020-09-07T14:00:00+02:00")}) then MongoDB stores ISODate("2020-09-07T12:00:00Z"), the original time zone information is lost. If you need to preserve the original time zone, then you have to store it in a separate field.
ISODate is just an alias for new Date. However, there is a difference. If you don't specify any time zone (e.g. "2020-09-07T14:00:00") then new Date() assumes local time but ISODate() assumes UTC time. I don't know which method is internally used by python.
So, new Date("2020-09-07T14:00:00") results in 2020-09-07 12:00:00Z whereas ISODate("2020-09-07T14:00:00") results in 2020-09-07 14:00:00Z

How to convert zulu datetime format to user defined time format

Hi I have this DateTime format in our log "2019-09-19T15:12:59.943Z"
I want to convert this to custom DateTime format 2019-09-19 15:12:59
from datetime import datetime
timestamp = "2019-09-19T15:12:59.943Z"
dt_object = datetime.fromtimestamp(timestamp)
print("dt_object =", dt_object)
print("type(dt_object) =", type(dt_object))
which function shall I use for this
thanks
okay
This issue is related to custom DateTime formatting not related to timestamp.
because timestamp in python is an integer value, not a string value.
So you have a custom DateTime format which contains Zulu time format.
and you need to convert this Zulu DateTime format to custom DateTime format.
so, try this python script and its working fine on Python version 3.6
import datetime
d = datetime.datetime.strptime("2019-09-19T15:12:59.943Z","%Y-%m-%dT%H:%M:%S.%fZ")
new_format = "%Y-%m-%d"
d.strftime(new_format)
print(d)
or you can use this online fiddle to check the result
https://pyfiddle.io/fiddle/c7b8e849-c31a-41ba-8bc9-5436d6faa4e9/?i=true

change Unix(Epoch) time to local time in pyspark

I have a dataframe in Spark which contains Unix(Epoch) time and also timezone name. I hope to convert the epochtime to local time according to different tz name.
Here is how my data looks like:
data = [
(1420088400, 'America/New_York'),
(1420088400, 'America/Los_Angeles'),
(1510401180, 'America/New_York'),
(1510401180, 'America/Los_Angeles')]
df = spark.createDataFrame(data, ["epoch_time", "tz_name"])
df.createOrReplaceTempView("df")
df1 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,"
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time"
from df""")
df1.show(truncate= False)
Here is the result:
+----------+-------------------+-------------------+---------------------+
|epoch_time|tz_name |gmt_time |local_time |
+----------+-------------------+-------------------+---------------------+
|1420088400|America/New_York |2015-01-01 05:00:00|2015-01-01 00:00:00.0|
|1420088400|America/Los_Angeles|2015-01-01 05:00:00|2014-12-31 21:00:00.0|
|1510401180|America/New_York |2017-11-11 11:53:00|2017-11-11 06:53:00.0|
|1510401180|America/Los_Angeles|2017-11-11 11:53:00|2017-11-11 03:53:00.0|
+----------+-------------------+-------------------+---------------------+
I'm not quite sure if this transfer is right, but it seems the daylight saving has been taking care of.
Should I first change epochtime to time string using from_unixtime, then change it to utc timestamp using to_utc_timestamp, finally change this UTC timestamp to local time with tz_name? Tried this but got error
df2 = spark.sql("""select *, from_unixtime(epoch_time) as gmt_time,
from_utc_timestamp(from_unixtime(epoch_time), tz_name) as local_time,
from_utc_timestamp(to_utc_timestamp(from_unixtime(epoch_time),from_unixtime(unix_timestamp(), 'z')), tz_name) as newtime from df""")
How could I check my EMR server timezone?
Tried use , is this the server timezone?
spark.sql("select from_unixtime(unix_timestamp(), 'z')").show()
which gave me:
+--------------------------------------------------------------------------+
|from_unixtime(unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss), z)|
+--------------------------------------------------------------------------+
| UTC|
+--------------------------------------------------------------------------+
Thank you for your clarification.
When you call from_unixtime it will format the date based on your Java runtime's timezone, since it's just using the default timezone for SimpleDateFormat here. In your case it's UTC. So when you convert the values to local time you would only need to call from_utc_timestamp with the tz_name value passed in. However if you were to change your system timezone then you would need to call to_utc_timestamp first.
Spark 2.2 introduces a timezone setting so you can set the timezone for your SparkSession like so
spark.conf.set("spark.sql.session.timeZone", "GMT")
In which case the time functions will use GMT vs your system timezone, see source here

Resources