I am trying to convert a pandas dataframe containing date in YYYYMM format to YYYYQ format as below
import pandas as pd
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date), freq='Q')
However this generates output as 2012Q2, whereas correct output should be 2006Q4
What is the right way to get correct Quarter?
Explicitly specific the input format:
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date, format='%Y%m'), freq='Q')
Output:
PeriodIndex(['2006Q4'], dtype='period[Q-DEC]', name='date')
I have a sample dataframe as given below.
import pandas as pd
import numpy as np
data = {'InsertedDate':['2022-01-21 20:13:19.000000', '2022-01-21 20:20:24.000000', '2022-02-
02 16:01:49.000000', '2022-02-09 15:01:31.000000'],
'UTCOffset': ['-05:00','+02:00','-04:00','+06:00']}
df = pd.DataFrame(data)
df['InsertedDate'] = pd.to_datetime(df['InsertedDate'])
df
The 'InsertedDate' is a datetime column wheres the 'UTCOffset' is a string column.
I want to add the Offset time to the 'Inserteddate' column and display the final result in a new column as a 'datetime' column.
It should look something like this image shown below.
Any help is greatly appreciated. Thank you!
You can use pd.to_timedelta for the offset and add with time.
# to_timedelta needs to have [+-]HH:MM:SS format, so adding :00 to fill :SS part.
df['UTCOffset'] = pd.to_timedelta(df.UTCOffset + ':00')
df['CorrectTime'] = df.InsertedDate + df.UTCOffset
I have one field in a pandas DataFrame that was imported as string format.
It should be a datetime variable. How do I convert it to a datetime column and then filter based on date.
Example:
df = pd.DataFrame({'date': ['05SEP2014:00:00:00.000']})
Use the to_datetime function, specifying a format to match your data.
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
If you have more than one column to be converted you can do the following:
df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)
You can use the DataFrame method .apply() to operate on the values in Mycol:
>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'],columns=['Mycol'])
>>> df
Mycol
0 05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x:
dt.datetime.strptime(x,'%d%b%Y:%H:%M:%S.%f'))
>>> df
Mycol
0 2014-09-05
Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.
import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)
chrisb's answer works:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
however it results in a Python warning of
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I would guess this is due to some chaining indexing.
Time Saver:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])
To silence SettingWithCopyWarning
If you got this warning, then that means your dataframe was probably created by filtering another dataframe. Make a copy of your dataframe before any assignment and you're good to go.
df = df.copy()
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')
errors='coerce' is useful
If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
Setting the correct format= is much faster than letting pandas find out1
Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.
1 Code used to produce the timeit test plot.
import perfplot
from random import choices
from datetime import datetime
mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
kernels=[lambda x: pd.to_datetime(x),
lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'),
lambda x: pd.to_datetime(x, infer_datetime_format=True),
lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
labels=["pd.to_datetime(df['date'])",
"pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')",
"pd.to_datetime(df['date'], infer_datetime_format=True)",
"df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
n_range=[2**k for k in range(20)],
setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}"
for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
equality_check=pd.Series.equals,
xlabel='len(df)'
)
Just like we convert object data type to float or int. Use astype()
raw_data['Mycol']=raw_data['Mycol'].astype('datetime64[ns]')
Unable to convert DataFrame column to date time format.
from datetime import datetime
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
type(Holidays['Date'])
Output: pandas.core.series.Series
Also tried
Holidays['Date'] = Holidays['Date'].astype('datetime64[ns]')
type(Holidays['Date'])
But same output
Output: pandas.core.series.Series
I think you are getting a bit mixed up. The dtypes of Holidays['Date'] is datetime64[ns]
Here's how I am checking.
from datetime import datetime
import pandas as pd
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
print ('Before converting : ' , Holidays['Date'].dtypes)
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
print ('After converting : ' ,Holidays['Date'].dtypes)
The output is:
Before converting : object
After converting : datetime64[ns]
Thought I will also share some addition information for you around types and dtypes. See more info in this link for types-and-dtypes
I'm importing a csv files which contain a datetime column, after importing the csv, my data frame will contain the Dat column which type is pandas.Series, I need to have another column that will contain the weekday:
import pandas as pd
from datetime import datetime
data =
pd.read_csv("C:/Users/HP/Desktop/Fichiers/Proj/CONSOMMATION_1h.csv")
print(data.head())
all the data are okay, but when I do the following:
data['WDay'] = pd.to_datetime(data['Date'])
print(type(data['WDay']))
# the output is
<class 'pandas.core.series.Series'>
the data is not converted to datetime, so I can't get the weekday.
Problem is you need dt.weekday with .dt:
data['WDay'] = data['WDay'].dt.weekday
Without dt is used for DataetimeIndex (not in your case) - DatetimeIndex.weekday:
data['WDay'] = data.index.weekday
use the command data.dtypes to check the type of the columns.