I wanted to calculate the difference between two date time value in python 3. Both values are in following format :
Date 1 :
Wed Jun 24 14:13:48 UTC 2020
Date 2 :
Thu Jun 25 12:13:48 UTC 2020
I wanted to calculate difference between these two dates and verify if the difference is equal to some particular minutes (say 240 min).
I am not able to figure out the code or approach for this scenario in python 3.
Any help is appreciable :)
from dateutil import parser
from datetime import datetime
date1 = parser.parse('Wed Jun 24 14:13:48 UTC 2020')
date2 = parser.parse('Thu Jun 25 12:13:48 UTC 2020')
diff = date2 - date1
print("Difference[HH:MM:SS]: ", diff)
minutes = diff.seconds / 60
print('Difference in minutes: ', minutes)
Related
I am exploring different date formats and trying to convert date formats to others. Currently, I m stuck in a scenario where I have input dates and times as below:
I was able to convert it to a date timestamp using concatenation
concat_ws(' ',new_df.transaction_date,new_df.Transaction_Time)
While I m trying to use
withColumn("date_time2", F.to_date(col('date_time'), "MMM d yyyy hh:mmaa")) with ('spark.sql.legacy.timeParserPolicy','LEGACY')
It is displayed as 'undefined'
I am looking for pointers/code snippets to extract YYYY-MM-DD HH:MM:SS in CET (input is in PST) as below
input_date_time
output (in CET)
Mar 1, 2022 01:00:00 PM PST
2022-03-01 22:00:00
Parse PST string to timestamp with timezone in UTC. Then convert to "CET" time:
import pyspark.sql.functions as F
df = spark.createDataFrame(data=[["Mar 1, 2022 01:00:00 PM PST"]], schema=["input_date_time_pst"])
df = df.withColumn("input_date_time_pst", F.to_timestamp("input_date_time_pst", format="MMM d, yyyy hh:mm:ss a z"))
df = df.withColumn("output_cet", F.from_utc_timestamp("input_date_time_pst", "CET"))
[Out]:
+-------------------+-------------------+
|input_date_time_pst|output_cet |
+-------------------+-------------------+
|2022-03-01 21:00:00|2022-03-01 22:00:00|
+-------------------+-------------------+
Note - The 2022-03-01 21:00:00 above is Mar 1, 2022 01:00:00 PM PST displayed in UTC.
Excel is converting the date 03/11/2021 to value 44535, which seems to be days since 1900. I´m trying to figure out a way to calculate this using my own golang libs. Does anyone have this kind of problem?
Thank you a lot for your help
A not so fency workaround for this problem would be addDays
d := time.Date(1900, 1, 1, 0, 0, 0, 0, time.UTC)
fmt.Println(d) // 1900-01-01 00:00:00 +0000 UTC
d2 := d.AddDate(0, 0, 44503)
fmt.Println(d2) // 2021-11-05 00:00:00 +0000 UTC
Would print: 05/11/2021 witch is 2 days more than what we desire.
Here we can see the same using JavaScript:
date = new Date(1900, 0, 1)
// Mon Jan 01 1900 00:00:00 GMT-0338 (Amazon Standard Time)
date.setDate(date.getDate() + 44503)
// Fri Nov 05 2021 00:00:00 GMT-0400 (Amazon Standard Time)
After some research about this 2 days I found this in a comment by #chux-reinstate-monica:
If you choose to use MS Excel to check your work note 2 things: 1) Jan 1, 1900 is day 1 (not the number of days since Jan 1, 1900) and 2) according to Excel Feb 29, 1900 exists(a bug in their code they refuse to fix.)
So we can substract 2 days from that to have: 03/11/2021.
So I have the Edition Column which contains data in unevenly pattern, as some have ',' followed by the date and some have ',-' pattern.
df.head()
17 Paperback,– 1 Nov 2016
18 Mass Market Paperback,– 1 Jan 1991
19 Paperback,– 2016
20 Hardcover,– 24 Nov 2018
21 Paperback,– Import, 4 Oct 2018
How can I extract the date to a separate column. I tried using str.split() but can't find specific pattern to extract.Is there any method I could do it?
obj = df['Edition']
obj.str.split('((?:\d+\s+\w+\s+)?\d{4}$)', expand=True)
or
obj.str.split('[,–]+').str[0]
obj.str.split('[,–]+').str[-1] # date
Try using dateutil
from dateutil.parser import parse
df['Dt']=[parse(i, fuzzy_with_tokens=True)[0] for i in df['column']]
System: WIN10
IDE: MS Visual Studio COde
Language: Python version 3.7.3
Library: pandas version 1.0.1
Data source: supplied in the example below
Dataset: supplied in the example below
Ask:
I need to split the date and time string out of a column from a data frame that has rows of uneven delimiters i.e. some with three and some with four commas.
I am trying to figure out how to strip the date and time values: 'Nov 11 2013 12:00AM', and 'Apr 11 2013 12:00AM' respectively off the back of these two records in one column into a new column given the second row in the example below has fewer commas.
Code:
df['sample field'].head(2)
4457-I need, this, date, Nov 11 2013 12:00AM ,
2359-I need this, date, Apr 11 2013 12:00AM ,
While the below method expands the data into different columns and staggers which column houses the date, this does not work. I need the date and time (or even just date) information in one column so that I can use the date values in further analysis (for example time-series).
Code:
df['sample field'].str.split(",", expand=True)
Data
df=pd.DataFrame({'Text':['4457-I need, this, date, Nov 11 2013 12:00AM ,','2359-I need this, date, Apr 11 2013 12:00AM ,']})
df
Use df.extract with a regex epression
df['Date']= df.Text.str.extract('([A-Za-z]+\s+\d+\s+\d+\s+\d+:[0-9A-Z]+(?=\s+\,+))')
df
#df.Date=pd.to_datetime(df.Date).dt.strftime('%b %d %Y %H:%M%p')
#df['date'] = pd.to_datetime(df['date'] ,format='%b %d %Y %H:%M%p')
df['Date']=pd.to_datetime(df['Date'])#This or even df['Date']=pd.to_datetime(df['Date'], format=('%b %d %Y %I:%M%p')) could work. Just remmeber because your time is 12AM use 12 clock hour system %I not %H and also hour 00.00 likely to be trncated, If have say11.00AM, the time will appear
IIUC you need str.extract with a regular expression.
Regex Demo Here
print(df)
0
0 4457-I need, this, date, Nov 11 2013 12:00AM
1 2359-I need this, date, Apr 11 2013 12:00AM
df['date'] = df[0].str.extract('(\w{3}\s\d.*\d{4}\s\d{2}:\d{2}\w{2})')
df['date'] = pd.to_datetime(df['date'] ,format='%b %d %Y %H:%M%p')
print(df)
0 date
0 4457-I need, this, date, Nov 11 2013 12:00AM 2013-11-11 12:00:00
1 2359-I need this, date, Apr 11 2013 12:00AM 2013-04-11 12:00:00
I'll use #wwnde's data :
df=pd.DataFrame({'Text':['4457-I need, this, date, Nov 11 2013 12:00AM ,','2359-I need this, date, Apr 11 2013 12:00AM ,']})
df['Date'] = df.Text.str.strip(',').str.split(',').str[-1].str.strip()
df['Date_formatted'] = pd.to_datetime(df.Date, format = '%b %d %Y %H:%M%p')
Text Date Date_formatted
0 4457-I need, this, date, Nov 11 2013 12:00AM , Nov 11 2013 12:00AM 2013-11-11 12:00:00
1 2359-I need this, date, Apr 11 2013 12:00AM , Apr 11 2013 12:00AM 2013-04-11 12:00:00
df3 = pd.concat([dataframe0, dataframe1])
df3.shape
Out[13]: (29807, 11)
df3['Created Date'].dtype
Out[22]: dtype('O')
df3['Created Date'] = df3['Created Date'].astype('datetime64[ns]')
ValueError: ('Unknown string format:', 'Tue Jun 25 2019 00:13:23 GMT-0700 (Pacific Daylight Time)')
'Created Date' column contains date in (Fri Jun 28 2019 00:01:12 GMT-0700 (Pacific Daylight Time)) format
One problem is that extra text: "(Pacific Daylight Time)" so I wrote a regex to remove it. The regex looks for everything up to any whitespace before an open paren/bracket.
From there you can use a function with dt.datetime.strptime() to convert your date string to a pandas datetime64. Please note that there is a call to .strip() in the date converter function. This is because I didn't see a way to remove the trailing whitespace within the regex.
import datetime as dt
import pandas as pd
def date_converter(date_str):
return dt.datetime.strptime(date_str.strip(),'%a %b %d %Y %H:%M:%S GMT%z')
df = pd.DataFrame(['Tue Jun 25 2019 00:13:23 GMT-0700 (Pacific Daylight Time)'],columns=['a'])
pattern = r'^(.*?)((?:(?!\s\().)*)$'
df['b'] = df['a'].str.replace(pattern,r'\1')
df['b'] = df['b'].apply(date_converter)
>>> print(df)
a b
0 Tue Jun 25 2019 00:13:23 GMT-0700 (Pacific Day... 2019-06-25 00:13:23-07:00
>>> df['b'].dtype
datetime64[ns, UTC-07:00]