Getting Millisecond in Hive timestamp with offset Timezone - string

I want to convert a timestamp to millisecond with different formats in hive.
Currently I'm able to convert a string to the correct timestamp using the following code but wanted to store the timestamp data type from something of the format of YYYYMMDD-HH:MM:SS[.sss][Z | [ + | - hh[:mm]]] where:
YYYY = 0000 to 9999
MM = 01-12
DD = 01-31
HH = 00-23 hours
MM = 00-59 minutes
SS = 00-59 seconds
sss = milliseconds
hh = 01-12 offset hours
mm = 00-59 offset minutes
Example: 20060901-02:39-05 is five hours behind UTC, thus Eastern Time on 1st of September 2006 and the timestamp with be in the yyyy-MM-dd HH:mm:ss.SSS format
What I have for UTC timestamp of YYYYMMDD-HH:MM:SS.sss is as follows:
cast(concat(concat_ws('-',substr(tag[52],1,4), substr(tag[52],5,2), substr(tag[52],7,2)),
space(1),
concat_ws(':',substr(tag[52],10,2), substr(tag[52],13,2), substr(tag[52],16,2)),
'.', substr(tag[52],19,3)) AS TIMESTAMP)
This takes a tag and does string manipulation of values of the tag to put into Timestamp datatype resulting in yyyy-MM-dd HH:MM:SS.sss...
I would like something similar to this that puts into Timestamp with offset in Hive.
Is this even possible?

Related

Non-standard Julian day time stamp

I have a timestamp in a non-standard format, its a concatenation of a number of elements. I'd like to convert at least the last part of the string into hours/minutes/seconds/decimal seconds so I can calculate the time gap between them (typically of the order of 2-5 seconds).
I have looked at this link but it assumes a 'proper' Julian time. How to convert Julian date to standard date?
My time stamp looks like this
1380643373
It is set up as ddd hh mm ss.s
This timestamp represent 138th day, 06:43:37.3
Is there a datetime method of working with this or do I need to strip out the various parts (hh,mm,ss.s) and concatenate them in some way? As I am only interested in the seconds, if I can just extract them I could deal with that by adding 60 if the second timestamp is smaller than the first - i.e event passes over the minute change boundary.
If you're only interested in seconds, you can do:
timestamp = 1380643373
seconds = (timestamp % 1000) / 10 # Gives 37.3
timestamp % 1000 gives you the last three digits of timestamp. Then you divide that by 10 to get seconds.
If it's a string, you can take the last three characters by slicing it.
timestamp = "1380643373"
seconds = int(timestamp[-3:]) / 10 # Gives 37.3
It's pretty easy to convert the timestamp to a datetime using the divmod() function repeatedly:
import datetime
base_date = datetime.datetime(2000, 1, 1, 0, 0, 0) # Midnight on Jan 1 2000
timestamp = 1380643373
timestamp, seconds = divmod(timestamp, 1000) # Gives 1380643, 373
seconds = seconds / 10 # Gives 37.3
timestamp, minutes = divmod(timestamp, 100) # Gives 13806, 43
days, hours = divmod(timestamp, 100) # Gives 138, 6
tdelta = datetime.timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds) # Gives datetime.timedelta(days=138, seconds=24217, microseconds=300000)
new_date = base_date + tdelta

convert the h:m:s in minutes format

I have the following data. The idea is to multiply all the data.
however the minute column is in h:m:s format. So whenever i try to multiply i get an error.
and morever i need to convert the h:m:s in minutes format before i actually want to multiply.
tried with the following to convert this to minute
time1 = df['time']
time2 = time1.hour * 60 + time1.minute + time1.second
Create timedeltas by to_timedelta, convert to seconds by Series.dt.total_seconds and divide by 60:
df['Minutes'] = pd.to_timedelta(df['(MIN)']).dt.total_seconds().div(60)
If input valeus are python times also convert to strings:
df['Minutes'] = pd.to_timedelta(df['(MIN)'].astype(str)).dt.total_seconds().div(60)

How to Convert POSIX time as regular date and time in Spark 2?

I am new to Spark & Pyspark, and just started with spark 2.0. I am trying to convert time stamp from server (In POSIX/Unix format) into regular date ( such as yyyy-mm-dd & time), but unable to do so. I have used the following two commands:
df_new = df.withColumn('fromTimestamp', f.from_unixtime(df['timestamp'], 'yyyy-mm-dd HH:mm:ss'))
and
df.select("timestamp",from_unixtime(f.col("timestamp"))).show()
where f is alias for pyspark.sql.functions API. They both produce the following result:
| #RequiredResult | ActualResult |
+--------------------+--------------------+
|2020-06-01 00:00:03 |52385-52-27 00:52:14|
|2020-06-01 00:00:02 |52385-35-27 00:35:19|
+--------------------+--------------------+
Furthermore, I want to aggregate time intervals (to 30 min or 60 min)durations. Any leads on how to do it?
The Unix timestamp is defined as the number of seconds since 1 January 1970. However some unix-like systems use the number of milliseconds since this date, producing 1000 times higher values.
For example the date 2020-06-01 00:00:03 would be represented by the timestamp 1590962403. If the timestamp 1590962403000 was used, this would result in a date of the year 52385:
spark.sql("""select from_unixtime(1590962403) as seconds,
from_unixtime(1590962403000) as ms""")\
.show(truncate=False)
prints
+-------------------+---------------------+
|seconds |ms |
+-------------------+---------------------+
|2020-06-01 00:00:03|+52385-08-04 18:50:00|
+-------------------+---------------------+
So you should divide the timestamp column by 1000 before applying from_unixtime.

How to extract or validate date format from a text using python?

I'm trying to execute this code:
import datefinder
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report on May 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the consensus revenue estimate by 4.93%.'
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print(match)
The output is:
2009-04-30 00:00:00
2005-09-01 00:00:00
2018-05-08 00:00:00
2019-02-04 00:00:00
The last date has come due to the percentage value 4.93% ... How to overcome this situation?
I cannot fix the datefinder module issue. You stated that you needed a solution, so I put this together for you. It's a work in progress, which means that you can adjusted it as needed. Also, some of the regex could have been consolidated, but I wanted to break them out for you. Hopefully, this answer helps you until you find another solution that works better for your needs.
import re
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises having beaten the trade Consensus EPS estimate in each of the last ' \
'four quarters In its last earnings report on March 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the ' \
'consensus revenue estimate by 4.93%. The next trading day will occur at 2019-02-15T12:00:00-06:30'
def find_dates(input):
'''
This function is used to extract date strings from provide text.
Symbol references:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
:param input: text
:return: date string
'''
date_formats = [
# Matches date format MM/DD/YYYY
'(\d{2}\/\d{2}\/\d{4})',
# Matches date format MM-DD-YYYY
'(\d{2}-\d{2}-\d{4})',
# Matches date format YYYY/MM/DD
'(\d{4}\/\d{1,2}\/\d{1,2})',
# Matches ISO 8601 format (YYYY-MM-DD)
'(\d{4}-\d{1,2}-\d{1,2})',
# Matches ISO 8601 format YYYYMMDD
'(\d{4}\d{2}\d{2})',
# Matches full_month_name dd, YYYY or full_month_name dd[suffixes], YYYY
'(January|February|March|April|May|June|July|August|September|October|November|December)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches abbreviated_month_name dd, YYYY or abbreviated_month_name dd[suffixes], YYYY
'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches ISO 8601 format with time and time zone
# yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm
'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\+|-)\d{2}:\d{2}',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmssZ
'\d{8}T\d{6}Z',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmss+|-hhmm
'\d{8}T\d{6}(\+|-)\d{4}'
]
for item in date_formats:
date_format = re.compile(r'\b{}\b'.format(item), re.IGNORECASE|re.MULTILINE)
find_date = re.search(date_format, input)
if find_date:
print (find_date.group(0))
find_dates(string_with_dates)
# outputs
04/30/2009
March 8, 2018
Sept 1st, 2005
2019-02-15T12:00:00-06:30

Subtracting minutes from a timestamp in msexcel

I am extracting a timestamp from a cell in the format like Tue Nov 06 07:33:00 UTC 2018. Now, using vbscript code or vba code I want to subtract some minutes from the above mentioned timestamp. How can I achieve that?
split(split("Tue Nov 06 07:33:00 UTC 2018",":")(1),":")(0)
Will give you the minutes on their own. You can assign the splits to arrays, then rejoin with the correct minutes using `join
Something like so
Public Function addToTimeStamp(strTimeStamp As String, lngMinutes As Long) As String
Dim a() As String
a = Split(strTimeStamp, " ")
a(3) = DateAdd("n", lngMinutes, a(3))
addToTimeStamp = Join(a, " ")
End Function
Without going into VBA, you could separate the time stamp from the date, then work on the time stamp and adjust the date accordingly.
Assuming your dates are in the column A and that the timestamp always take the same structure, starting from row number 2
So first you create a column where there is only the time stamp, that is column B:
=MID(A2,FIND("UTC",A2)-9,8)
07:33:00
This first finds the position of "UTC" within the string then extract 8 characters starting from 9 characters to the left (accounting for the space between the time and "UTC").
Already there you can work on the minutes/hours/seconds.
Hours:
=NUMBERVALUE(LEFT(B2,2))
07
Minutes:
=NUMBERVALUE(Mid(B2,4,2))
33
Seconds:
=NUMBERVALUE(Right(B2,2))
00
You can also extract the date part of the time stamp using the same logic. Column C:
=MID(A2,FIND(B2,A2)-11,10)
Tue Nov 06
Finally you can also combine all of that into an Excel date and do your operations directly on the resulting number (This will ensure that you get a new valid date which account for incrementing/decrementing hours/days/months/years, it will also automatically account for leap years.)
Final date, including the time stamp, Column D:
=DATEVALUE(MID(A2,FIND(B2,A2)-3,2)&"-"&MID(A2,FIND(B2,A2)-7,3)&"-"&RIGHT(A2,4))+NUMBERVALUE(LEFT(B2,2))/24+NUMBERVALUE(MID(B2,4,2))/(24*60)+NUMBERVALUE(RIGHT(B2,2))/(24*60*60)
43410.3145833333
On this final number you can simply increase/decrease the number of minutes by adding it directly to it. The unit of this number is "days" so one minute is equal to 1/(24*60) and one hour is 1/24. Example of removing 33 minutes:
=D2 - 33/(24*60)
43410.2916666667
Changing the formatting to [dd/mm/yyyy hh:mm] will then result in:
06/11/2018 07:00:00

Resources