Convert Excel Serial Date in Log Parser - excel

I'm converting Excel serial dates (OA dates) in Log Parser.
I have this query:
SELECT TOP 10
Field1 as ExcelSerialDate, /* days since 1900-01-01 */
to_timestamp('1900-01-01 00:00:00', 'yyyy-MM-dd hh:mm:ss') as BaseDate,
mul(sub(to_real(Field1),2.0),86400.0) as SecondsFromBase,
add(to_localtime(to_timestamp('1900-01-01 00:00:00', 'yyyy-MM-dd hh:mm:ss')),
to_timestamp(mul(sub(to_real(Field1),2.0),86400.0))) as Date
FROM 'MyData.txt'
where MyData.txt just contains an Excel serial date per line:
ExcelSerialDate
42397.6668676968
42397.6663989236
42397.664126875
42397.6668321065
42397.6668733565
42397.6668907523
42397.6668711921
42397.6657181597
42397.666233044
42397.6654758681 ...
It gives output
ExcelSerialDate BaseDate SecondsFromBase Date
42397.6668676968 01/01/00 12:00:00 AM 3662985617.369 28/01/16 04:00:17 PM
42397.6663989236 01/01/00 12:00:00 AM 3662985576.867 28/01/16 03:59:36 PM
42397.664126875 01/01/00 12:00:00 AM 3662985380.562 28/01/16 03:56:20 PM
42397.6668321065 01/01/00 12:00:00 AM 3662985614.294 28/01/16 04:00:14 PM
42397.6668733565 01/01/00 12:00:00 AM 3662985617.858 28/01/16 04:00:17 PM
42397.6668907523 01/01/00 12:00:00 AM 3662985619.361 28/01/16 04:00:19 PM
42397.6668711921 01/01/00 12:00:00 AM 3662985617.671 28/01/16 04:00:17 PM
42397.6657181597 01/01/00 12:00:00 AM 3662985518.049 28/01/16 03:58:38 PM
42397.666233044 01/01/00 12:00:00 AM 3662985562.535 28/01/16 03:59:22 PM
42397.6654758681 01/01/00 12:00:00 AM 3662985497.115 28/01/16 03:58:17 PM
which is correct.
Is this the best way? I have two subtract 2 days from the serial date before creating the timestamp and adding that to the base date of 1900-01-01, and that seems counterintuitive.

Related

Extract YYYY-MM-DD HH:MM: SS and convert to different time zone

I am exploring different date formats and trying to convert date formats to others. Currently, I m stuck in a scenario where I have input dates and times as below:
I was able to convert it to a date timestamp using concatenation
concat_ws(' ',new_df.transaction_date,new_df.Transaction_Time)
While I m trying to use
withColumn("date_time2", F.to_date(col('date_time'), "MMM d yyyy hh:mmaa")) with ('spark.sql.legacy.timeParserPolicy','LEGACY')
It is displayed as 'undefined'
I am looking for pointers/code snippets to extract YYYY-MM-DD HH:MM:SS in CET (input is in PST) as below
input_date_time
output (in CET)
Mar 1, 2022 01:00:00 PM PST
2022-03-01 22:00:00
Parse PST string to timestamp with timezone in UTC. Then convert to "CET" time:
import pyspark.sql.functions as F
df = spark.createDataFrame(data=[["Mar 1, 2022 01:00:00 PM PST"]], schema=["input_date_time_pst"])
df = df.withColumn("input_date_time_pst", F.to_timestamp("input_date_time_pst", format="MMM d, yyyy hh:mm:ss a z"))
df = df.withColumn("output_cet", F.from_utc_timestamp("input_date_time_pst", "CET"))
[Out]:
+-------------------+-------------------+
|input_date_time_pst|output_cet |
+-------------------+-------------------+
|2022-03-01 21:00:00|2022-03-01 22:00:00|
+-------------------+-------------------+
Note - The 2022-03-01 21:00:00 above is Mar 1, 2022 01:00:00 PM PST displayed in UTC.

Pandas - converting out of order string date time

I have a DataFrame column that has string values for date/time (Input data). I need to convert it into a semi-timestamp format (Desired output data). There are rows that are blank and need to remain blank. I use quotes for illustrative purposes. I am using strptime but getting an error (see below).
Input data (String):
Mar 8 12:00 PM 2020
' '
Mar 8 1:00 PM 2020
Mar 8 6:00 PM 2020
Mar 9 8:00 AM 2020
Desired output data:
3/8/2020 12:00:00
' '
3/8/2020 13:00:00
3/8/2020 18:00:00
3/9/2020 08:00:00
Code:
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x, '%b %d %H:%M %p %Y'))
Error:
ValueError: time data '' does not match format '%b %d %H:%M %p %Y'
How can I rewrite this code to get the desired output?
For me working to_datetime with format similar like yoour with %I for select hours in 12H format, also is added errors='coerce' for missing values (NaT) if some value not matching:
df['date'] = pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
print (df)
date
0 2020-03-08 12:00:00
1 NaT
2 2020-03-08 13:00:00
3 2020-03-08 18:00:00
4 2020-03-09 08:00:00
Last for custom format use Series.dt.strftime with Series.replace:
df['date'] = (pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00
Or replace multiple spoaces to one space:
df['date'] = (pd.to_datetime(df['date'].replace('\s+', ' ', regex=True), format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00

Pandas strftime with 24 hour format

QUESTION
How can I convert 24 hour time to 12 hour time, when the time provided is two characters long? For example: How to format 45 as 12:45 AM.
ATTEMPT
I can get most of the time conversions to format properly with the following:
df=df.assign(newtime=pd.to_datetime(df['Time Occurred'], format='%H%M').dt.strftime("%I:%M %p"))
df.head()
Date Reported Date Occurred Time Occurred newtime
9/13/2010 9/12/2010 45 4:05 AM
8/9/2010 8/9/2010 1515 3:15 PM
1/8/2010 1/7/2010 2005 8:05 PM
1/9/2010 1/6/2010 2100 9:00 PM
1/15/2010 1/15/2010 245 2:45 AM
In the above the values in newtime are properly formatted, except where in the input time is "45" - that time had the result 4:05 AM. Does anyone know how to create the proper output?
to_datetime
times = pd.to_datetime([
f'{h:02d}:{m:02d}:00' for h, m in zip(*df['Time Occurred'].astype(int).__divmod__(100))
])
df.assign(newtime=times.strftime('%I:%M %p'))
Time Occurred newtime
0 45 12:45 AM
1 1515 03:15 PM
2 2005 08:05 PM
3 2100 09:00 PM
4 245 02:45 AM

Converting date with timezone in UNIX timestamp Shell/Bash

I need to convert a date from string in the format "yyyy/mm/dd hh:mm:ss TZ" to UNIX time (TZ = Timezone).
What I have done so far is to convert a date in the format "yyyy/mm/dd hh:mm:ss" without a timezone to timestamp by using
dateYMD="2019/2/28 12:23:11.46"
newt=$(date -d "${dateYMD}" +"%s")
echo ${newt}
and I have the following result.
1551349391
My struggle is to find how both timezone and date/time can be converted to timestamp (unix time) . For example I need 4 variables with the same date/time as dateYMD but in 4 different timezones so that their timestamps would be different.
Here is the latest I have tried
dateYMD="2017/09/09 08:58:09"
timez=$(TZ=Australia/Sydney date -d #$(date +%s -d "${dateYMD}"))
unixTimez=$( date --date "${timez}" +"%s" )
echo ${unixTimez}
that showed me the following error
date: invalid date ‘чт фев 28 21:23:11 AEDT 2019’
You don't need to call date twice. Just call it once with TZ set to the timezone you want for that variable.
timesydney=$(TZ=Australia/Sydney date -d "$dateYMD" +%s)
timenyc=$(TZ=US/Eastern date -d "$dateYMD" +%s)
Either you do it by setting the TZ= environment variable (see answer of Barmar), or you include the time zone into the time string. This has higher priority than TZ=.
Examples:
TZ=UTC date -d '2019-01-01 12:00 CET' +'%s, %F %T %Z %z'
TZ=CET date -d '2019-01-01 12:00 CET' +'%s, %F %T %Z %z'
TZ=UTC date -d '2019-01-01 12:00 PDT' +'%s, %F %T %Z %z'
TZ=CET date -d '2019-01-01 12:00 PDT' +'%s, %F %T %Z %z'
TZ=UTC date -d '2019-01-01 12:00 +500' +'%s, %F %T %Z %z'
will print
1546340400, 2019-01-01 11:00:00 UTC +0000
1546340400, 2019-01-01 12:00:00 CET +0100
1546369200, 2019-01-01 19:00:00 UTC +0000
1546369200, 2019-01-01 20:00:00 CET +0100
1546326000, 2019-01-01 07:00:00 UTC +0000

Finding out total worked hours and minutes formula

FROM TO FROM TO FROM TO
7:30 AM 2:00 PM 2:40 PM 2:40 PM 6:30 PM 12:00 AM
7:30 AM 2:00 PM 2:40 PM 2:40 PM 6:30 PM 12:00 AM
7:30 AM 2:00 PM 2:40 PM 2:40 PM 6:30 PM 12:00 AM
HOUR(H2-C2)& "h"&MINUTE(H2-C2)&"m")
HOUR(F2-E2)&"h")
Can you check what is the error in this formula:
(HOUR(H2-C2)& "h"&MINUTE(H2-C2)&"m")-(HOUR(F2-E2)&"h"& MINUTE(F2-E2)&"M")
I want the total hours worked by each person in this format: 12h48m
Don't you need the time between C2 and D2 added to the time between G2 and H2, that's what it looks like from your screenshot? You can also add in the difference between E2 and F2 if that's likely to be non-zero
You can use MOD function to handle times passing/on midnight, e.g.
=MOD(D2-C2,1)+MOD(H2-G2,1)+MOD(F2-E2,1)
Custom format result cell as [h]:mm and you'll get 12:00, I'm not sure why it would be 12:48. If you actually want it to display as 12h00m then use a custom format for the result cell of
[h]"h"mm"m"

Resources