How to correctly convert Year-month to Year-quarter - python-3.x

I am trying to convert a pandas dataframe containing date in YYYYMM format to YYYYQ format as below
import pandas as pd
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date), freq='Q')
However this generates output as 2012Q2, whereas correct output should be 2006Q4
What is the right way to get correct Quarter?

Explicitly specific the input format:
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date, format='%Y%m'), freq='Q')
Output:
PeriodIndex(['2006Q4'], dtype='period[Q-DEC]', name='date')

Related

How to convert a string to pandas datetime

I have a column in my pandas data frame which is string and want to convert it to pandas date so that I will be able to sort
import pandas as pd
dat = pd.DataFrame({'col' : ['202101', '202212']})
dat['col'].astype('datetime64[ns]')
However this generates error. Could you please help to find the correct way to perform this
I think this code should work.
dat['date'] = pd.to_datetime(dat['col'], format= "%Y%m")
dat['date'] = dat['date'].dt.to_period('M')
dat.sort_values(by = 'date')
If you want to replace the sorted dataframe add in brackets inplace = True.
Your code didn't work because your wrong format to date. If you would have date in format for example 20210131 yyyy-mm-dd. This code would be enought.
dat['date'] = pd.to_datetime(dat['col'], format= "%Y%m%d")

Convert Dataframe Column from Series to Datetime

Unable to convert DataFrame column to date time format.
from datetime import datetime
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
type(Holidays['Date'])
Output: pandas.core.series.Series
Also tried
Holidays['Date'] = Holidays['Date'].astype('datetime64[ns]')
type(Holidays['Date'])
But same output
Output: pandas.core.series.Series
I think you are getting a bit mixed up. The dtypes of Holidays['Date'] is datetime64[ns]
Here's how I am checking.
from datetime import datetime
import pandas as pd
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
print ('Before converting : ' , Holidays['Date'].dtypes)
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
print ('After converting : ' ,Holidays['Date'].dtypes)
The output is:
Before converting : object
After converting : datetime64[ns]
Thought I will also share some addition information for you around types and dtypes. See more info in this link for types-and-dtypes

pandas read_csv function fails

I am trying to convert the following csv into a dataframe, using simply :
import pandas as pd
ticket = pd.read_csv("file.csv")
However, due to the missing quotation marks on the first column of the csv :
A,"B","C","D"
0,"1","2","3"
It fails to assign properly each rows to its rightful header.
import pandas as pd
df = pd.read_csv('your_csv.csv')
IN [1]: df
Out[1]:
A "B" "C" "D"
0 0 "1" "2" "3"
Seems to work for me with the given data placed into a .csv, can you be more specific about the error?

Datetime Conversion with ValueError

I have a pandas dataframe with columns containing start and stop times in this format: 2016-01-01 00:00:00
I would like to convert these times to datetime objects so that I can subtract one from the other to compute total duration. I'm using the following:
import datetime
df = df['start_time'] =
df['start_time'].apply(lambda x:datetime.datetime.strptime(x,'%Y/%m/%d/%T %I:%M:%S %p'))
However, I have the following ValueError:
ValueError: 'T' is a bad directive in format '%Y/%m/%d/%T %I:%M:%S %p'
This would convert the column into datetime64 dtype. Then you could process whatever you need using that column.
df['start_time'] = pd.to_datetime(df['start_time'], format="%Y-%m-%d %H:%M:%S")
Also if you want to avoid explicitly specifying datetime format you can use the following:
df['start_time'] = pd.to_datetime(df['start_time'], infer_datetime_format=True)
Simpliest is use to_datetime:
df['start_time'] = pd.to_datetime(df['start_time'])

Unable to Parse pandas Series to datetime

I'm importing a csv files which contain a datetime column, after importing the csv, my data frame will contain the Dat column which type is pandas.Series, I need to have another column that will contain the weekday:
import pandas as pd
from datetime import datetime
data =
pd.read_csv("C:/Users/HP/Desktop/Fichiers/Proj/CONSOMMATION_1h.csv")
print(data.head())
all the data are okay, but when I do the following:
data['WDay'] = pd.to_datetime(data['Date'])
print(type(data['WDay']))
# the output is
<class 'pandas.core.series.Series'>
the data is not converted to datetime, so I can't get the weekday.
Problem is you need dt.weekday with .dt:
data['WDay'] = data['WDay'].dt.weekday
Without dt is used for DataetimeIndex (not in your case) - DatetimeIndex.weekday:
data['WDay'] = data.index.weekday
use the command data.dtypes to check the type of the columns.

Resources