Pandas - converting out of order string date time - python-3.x

I have a DataFrame column that has string values for date/time (Input data). I need to convert it into a semi-timestamp format (Desired output data). There are rows that are blank and need to remain blank. I use quotes for illustrative purposes. I am using strptime but getting an error (see below).
Input data (String):
Mar 8 12:00 PM 2020
' '
Mar 8 1:00 PM 2020
Mar 8 6:00 PM 2020
Mar 9 8:00 AM 2020
Desired output data:
3/8/2020 12:00:00
' '
3/8/2020 13:00:00
3/8/2020 18:00:00
3/9/2020 08:00:00
Code:
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x, '%b %d %H:%M %p %Y'))
Error:
ValueError: time data '' does not match format '%b %d %H:%M %p %Y'
How can I rewrite this code to get the desired output?

For me working to_datetime with format similar like yoour with %I for select hours in 12H format, also is added errors='coerce' for missing values (NaT) if some value not matching:
df['date'] = pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
print (df)
date
0 2020-03-08 12:00:00
1 NaT
2 2020-03-08 13:00:00
3 2020-03-08 18:00:00
4 2020-03-09 08:00:00
Last for custom format use Series.dt.strftime with Series.replace:
df['date'] = (pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00
Or replace multiple spoaces to one space:
df['date'] = (pd.to_datetime(df['date'].replace('\s+', ' ', regex=True), format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00

Related

Extract YYYY-MM-DD HH:MM: SS and convert to different time zone

I am exploring different date formats and trying to convert date formats to others. Currently, I m stuck in a scenario where I have input dates and times as below:
I was able to convert it to a date timestamp using concatenation
concat_ws(' ',new_df.transaction_date,new_df.Transaction_Time)
While I m trying to use
withColumn("date_time2", F.to_date(col('date_time'), "MMM d yyyy hh:mmaa")) with ('spark.sql.legacy.timeParserPolicy','LEGACY')
It is displayed as 'undefined'
I am looking for pointers/code snippets to extract YYYY-MM-DD HH:MM:SS in CET (input is in PST) as below
input_date_time
output (in CET)
Mar 1, 2022 01:00:00 PM PST
2022-03-01 22:00:00
Parse PST string to timestamp with timezone in UTC. Then convert to "CET" time:
import pyspark.sql.functions as F
df = spark.createDataFrame(data=[["Mar 1, 2022 01:00:00 PM PST"]], schema=["input_date_time_pst"])
df = df.withColumn("input_date_time_pst", F.to_timestamp("input_date_time_pst", format="MMM d, yyyy hh:mm:ss a z"))
df = df.withColumn("output_cet", F.from_utc_timestamp("input_date_time_pst", "CET"))
[Out]:
+-------------------+-------------------+
|input_date_time_pst|output_cet |
+-------------------+-------------------+
|2022-03-01 21:00:00|2022-03-01 22:00:00|
+-------------------+-------------------+
Note - The 2022-03-01 21:00:00 above is Mar 1, 2022 01:00:00 PM PST displayed in UTC.

How to convert different date formats in pandas?

I have 2 columns with different date formats. In every row string dates are formatted differently.
I want to convert the columns to Date type. However, I am wondering if there is any built in method that will do the parsing for me:
What I tried
from datetime import datetime
newFrame = newDF.assign(Effective_Date=newDF['Effective_Date'].apply(lambda element: datetime.strptime(element,'%b %d %Y %H %M %S')), Paid_Off_Time=newDF['Paid_Off_Time'].apply(lambda element: datetime.strptime(element,'%b %d %Y %H %M %S')))
error when I run code above
line 359, in _strptime
(data_string, format))
ValueError: time data '09/08/2016' does not match format '%b %d %Y %H %M %S'
Example Date formats in .csv:
10/07/2016
10/07/2016 09:00
Data
newDF=pd.DataFrame({'Effective_Date':['10/07/2016','10/07/2016 09:00','09 August 2016'],'Paid_Off_Time':['10 July 2016','10/08/2016','10/09/2016 01:00:30']})
Effective_Date Paid_Off_Time
0 10/07/2016 10 July 2016
1 10/07/2016 09:00 10/08/2016
2 09 August 2016 10/09/2016 01:00:30
Solution
newDF.assign(Effective_Date=pd.to_datetime(newDF['Effective_Date']).dt.date,Paid_Off_Time=pd.to_datetime(newDF['Paid_Off_Time']).dt.date)
Effective_Date Paid_Off_Time
0 2016-10-07 2016-07-10
1 2016-10-07 2016-10-08
2 2016-08-09 2016-10-09
Checkout the Pandas documentation on Data Functionality for more details:
https://pandas.pydata.org/docs/user_guide/timeseries.html#converting-to-timestamps

Changing date time format into another format

I have an output "Wed Mar 1 00:00:00 2000". I want to convert this into the format '08/11/2019 05:45PM'. How to achieve this format?
You could use something like below
import datetime
datetime.datetime.strptime('Wed Mar 1 00:00:00 2000', '%a %b %d %H:%M:%S %Y').strftime('%d/%m/%Y %I:%M%p')

Convert incomplete 12h datetime-like strings into appropriate datetime type

I've got a pandas Series containing datetime-like strings with 12h format, but without the am/pm abbreviations. It covers an entire month of data :
40 01/01/2017 11:51:00
41 01/01/2017 11:51:05
42 01/01/2017 11:55:05
43 01/01/2017 11:55:10
44 01/01/2017 11:59:30
45 01/01/2017 11:59:35
46 02/01/2017 12:00:05
47 02/01/2017 12:00:10
48 02/01/2017 12:13:20
49 02/01/2017 12:13:25
50 02/01/2017 12:24:50
51 02/01/2017 12:24:55
52 02/01/2017 12:33:30
Name: TS, dtype: object
(318621,) # shape
My goal is to convert it to datetime format, so as to obtain the appropriate unix timestamps values, and make comparisions/arithmetics with other datetime data with, this time, 24h format. So I already tried this :
pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S') # %I for 12h format
Which outputs me :
64 2017-01-02 00:46:50
65 2017-01-02 00:46:55
66 2017-01-02 01:01:00
67 2017-01-02 01:01:05
68 2017-01-02 01:05:00
But the am/pm informations are not taken into account. I know that, as a rule, the am/pm first have to be specified in the strings, then one can use dt.dt.strptime() or pd.to_datetime() to parse them with the %p indicator.
So I wanted to know if there's an other way to deal with this issue through datetime or pandas datetime modules ? Or, do I have to manualy add the abbreviations 'am/pm' before the parsing ?
You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):
31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM
First, we can parse the whole thing without AM/PM info, as you already showed:
ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')
We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
Now we have times from 00:00:00 to 11:59:55, twice per day.
Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:
twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
Next, build an offset series, which should be easy to inspect for correctness:
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
And finally:
ts += offset

Datetime Error: Time data does not match [duplicate]

This question already has an answer here:
Not able to convert "00:30 AM" to 24 hours by strptime python
(1 answer)
Closed 5 years ago.
I am trying to create date time functions using the following code:
d1 = datetime.strptime('1/1/1960 0:00 AM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2000 0:00 AM', '%m/%d/%Y %I:%M %p')
I get the following error:
ValueError: time data '1/1/1960 0:00 AM' does not match format
'%m/%d/%Y %I:%M %p'
I would appreciate help with this as I have tried tweaking the parameters to no avail.
0:00 AM doesn't match %I:%M %p, because %I doesn't include the hour 0, just 1 to 12 like on an analogue clock face:
%I Hour (12-hour clock) as a zero-padded decimal number. 01, 02, ..., 12
Midnight in a 12-hour clock is 12:00 AM:
>>> datetime.strptime('1/1/1960 12:00 AM', '%m/%d/%Y %I:%M %p')
datetime.datetime(1960, 1, 1, 0, 0)

Resources