how to I use the .dt.hour accessor to get hours from a datetime object? - python-3.x

I have a dataframe I'm trying to separate into hour and day, so I can use the "hour of day" as (1,2,3,...,22,23,24) as ID variables for a project.
I'm having trouble with casting .dt.hour to my date column, and it spits out:
AttributeError: Can only use .dt accessor with datetimelike values
Currently, my dateformat is:
YYYY-MM-DD HH:MM:SS+00:00, and I'm assuming the error is in the 00:00
Here is a sample of the dataframe:
date btc_open btc_close
0 2021-01-01 00:00:00+00:00 28905.984003808422 29013.059128535537
1 2021-01-01 01:00:00+00:00 29016.129189426065 29432.828723553906
2 2021-01-01 02:00:00+00:00 29436.647295100185 29212.8610969002
For reproducible code (with error message), look below.
data = pd.DataFrame({'date': ['2021-01-01 00:00:00+00:00','2021-01-01 01:00:00+00:00','2021-01-01 02:00:00+00:00'],
'btc_open': [28905.98, 29016.12, 29436.64],
'btc_close': [29013.05, 29432.82, 29212.86]})
data['date'] = pd.to_datetime(data['date'], format = '%Y-%m-%d %H:%M:%S')
df_subset_1 = data[['date','btc_open','btc_close']]
# Converting datehour to date and hour columns
df_subset_1['date'] = df_subset_1['date'].dt.date
df_subset_1['hour'] = df_subset_1['date'].dt.hour
Does anyone know how to make this work?

keep a column of pandas datetime dtype (see also Time series / date functionality), EX:
import pandas as pd
data = pd.DataFrame({'datetime': ['2021-01-01 00:00:00+00:00','2021-01-01 01:00:00+00:00','2021-01-01 02:00:00+00:00'],
'btc_open': [28905.98, 29016.12, 29436.64],
'btc_close': [29013.05, 29432.82, 29212.86]})
data['datetime'] = pd.to_datetime(data['datetime'])
df_subset_1 = data[['datetime','btc_open','btc_close']]
# extract date and hour from datetime column
df_subset_1['date'] = df_subset_1['datetime'].dt.date
df_subset_1['hour'] = df_subset_1['datetime'].dt.hour
df_subset_1
datetime btc_open btc_close date hour
0 2021-01-01 00:00:00+00:00 28905.98 29013.05 2021-01-01 0
1 2021-01-01 01:00:00+00:00 29016.12 29432.82 2021-01-01 1
2 2021-01-01 02:00:00+00:00 29436.64 29212.86 2021-01-01 2

Related

How to set datetime format for pandas dataframe column labels?

IPNI_RNC PATHID 2020-11-11 00:00:00 2020-11-12 00:00:00 2020-11-13 00:00:00 2020-11-14 00:00:00 2020-11-15 00:00:00 2020-11-16 00:00:00 2020-11-17 00:00:00 Last Day Violation Count
Above are the columns label after reading the excel file. There are 10 columns in df variable after reading the excel and 7 of the columns label are date.
My input data set is an excel file which changes everyday and I want to update it automatically. In excel, some columns label are date like 11-Nov-2020, 12-Nov-2020 but after reading the excel it becomes like 2020-11-11 00:00:00, 2020-11-12 00:00:00. I want to keep column labels as 11-Nov-2020, 12-Nov-2020 while reading excel by pd.read_excel if possible or I need to convert it later.
I am very new in python. Looking forward for your support
Thanks who have already came forward to cooperate me
You can of course use the standard python methods to parse the date values, but I would not recommend it, because this way you end up with python datetime objects and not with the pandas representation of dates. That means, it consumes more space, is probably not as efficient and you can't use the pandas methods to access e.g. the year. I'll show you, what I mean below.
In case you want to avoid the naming issue of your column names, you might want to try to prevent pandas to automatically assign the names and read the first line as data to fix it yourselfe automatically (see the section below about how you can do it).
The type conversion part:
# create a test setup with a small dataframe
import pandas as pd
from datetime import date, datetime, timedelta
df= pd.DataFrame(dict(id=range(10), date_string=[str(datetime.now()+ timedelta(days=d)) for d in range(10)]))
# test the python way:
df['date_val_python']= df['date_string'].map(lambda dt: str(dt))
# use the pandas way: (btw. if you want to explicitely
# specify the format, you can use the format= keyword)
df['date_val_pandas']= pd.to_datetime(df['date_string'])
df.dtypes
The output is:
id int64
date_string object
date_val_python object
date_val_pandas datetime64[ns]
dtype: object
As you can see date_val has type object, this is because it contains python objects of class datetime while date_val_pandas uses the internal datetime representation of pandas. You can now try:
df['date_val_pandas'].dt.year
# this will return a series with the year part of the date
df['date_val_python'].dt.year
# this will result in the following error:
AttributeError: Can only use .dt accessor with datetimelike values
See the pandas doc for to_datetime for more details.
The column naming part:
# read your dataframe as usual
df= pd.read_excel('c:/scratch/tmp/dates.xlsx')
rename_dict= dict()
for old_name in df.columns:
if hasattr(old_name, 'strftime'):
new_name= old_name.strftime('DD-MMM-YYYY')
rename_dict[old_name]= new_name
if len(rename_dict) > 0:
df.rename(columns=rename_dict, inplace=True)
This works, in case your column titles are stored as usual dates, which I suppose is true, because you get a time part after importing them.
strftime of the datetime module is the function you need:
If datetime is a datetime object, you can do
datetime.strftime("%d-%b-%Y")
Example:
>>> from datetime import datetime
>>> timestamp = 1528797322
>>> date_time = datetime.fromtimestamp(timestamp)
>>> print(date_time)
2018-06-12 11:55:22
>>> print(date_time.strftime("%d-%b-%Y"))
12-Jun-2018
In order to apply a function to certain dataframe columns, use:
datetime_cols_list = ['datetime_col1', 'datetime_col2', ...]
for col in dataframe.columns:
if col in datetime_cols_list:
dataframe[col] = dataframe[col].apply(lambda x: x.strftime("%d-%b-%Y"))
I am sure this can be done in multiple ways in pandas, this is just what came out the top of my head.
Example:
import pandas as pd
import numpy as np
np.random.seed(0)
# generate some random datetime values
rng = pd.date_range('2015-02-24', periods=5, freq='T')
other_dt_col = rng = pd.date_range('2016-02-24', periods=5, freq='T')
df = pd.DataFrame({ 'Date': rng, 'Date2': other_dt_col,'Val': np.random.randn(len(rng)) })
print (df)
# Output:
# Date Date2 Val
# 0 2016-02-24 00:00:00 2016-02-24 00:00:00 1.764052
# 1 2016-02-24 00:01:00 2016-02-24 00:01:00 0.400157
# 2 2016-02-24 00:02:00 2016-02-24 00:02:00 0.978738
# 3 2016-02-24 00:03:00 2016-02-24 00:03:00 2.240893
# 4 2016-02-24 00:04:00 2016-02-24 00:04:00 1.867558
datetime_cols_list = ['Date', 'Date2']
for col in df.columns:
if col in datetime_cols_list:
df[col] = df[col].apply(lambda x: x.strftime("%d-%b-%Y"))
print (df)
# Output:
# Date Date2 Val
# 0 24-Feb-2016 24-Feb-2016 1.764052
# 1 24-Feb-2016 24-Feb-2016 0.400157
# 2 24-Feb-2016 24-Feb-2016 0.978738
# 3 24-Feb-2016 24-Feb-2016 2.240893
# 4 24-Feb-2016 24-Feb-2016 1.867558

How to read in unusual date\time format

I have a small df with a date\time column using a format I have never seen.
Pandas reads it in as an object even if I use parse_dates, and to_datetime() chokes on it.
The dates in the column are formatted as such:
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
I think the best approach is using a date parsing pattern. Something like this:
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)
But I simply do not know how to approach this format.
The datatime format for UTC is very specific for converting the offset.
strftime() and strptime() Format Codes
The format must be + or - and then 00:00
Use str.zfill to backfill the 0s between the sign and the integer
+08:00 or -08:00 or +10:00 or -10:00
import pandas as pd
# sample data
df = pd.DataFrame({'datetime': ['2019/12/29 GMT+8 18:00', '2019/12/15 GMT+8 05:00', '2019/12/15 GMT+10 05:00', '2019/12/15 GMT-10 05:00']})
# display(df)
datetime
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
2019/12/15 GMT+10 05:00
2019/12/15 GMT-10 05:00
# fix the format
df.datetime = df.datetime.str.split(' ').apply(lambda x: x[0] + x[2] + x[1][3:].zfill(3) + ':00')
# convert to a utc datetime
df.datetime = pd.to_datetime(df.datetime, format='%Y/%m/%d%H:%M%z', utc=True)
# display(df)
datetime
2019-12-29 10:00:00+00:00
2019-12-14 21:00:00+00:00
2019-12-14 19:00:00+00:00
2019-12-15 15:00:00+00:00
print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 4 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 160.0 bytes
You could pass the custom format with GMT+8 in the middle and then subtract eight hours with timedelta(hours=8):
import pandas as pd
from datetime import datetime, timedelta
df['Date'] = pd.to_datetime(df['Date'], format='%Y/%m/%d GMT+8 %H:%M') - timedelta(hours=8)
df
Date
0 2019-12-29 10:00:00
1 2019-12-14 21:00:00

read date time column into pandas dataframe. retain seconds information in the dataframe

My csv file.
Timestamp
---------------------
1/4/2019 2:00:09 PM
1/4/2019 2:00:18 PM
I have a column date time information in a csv file . I want to read this as a timestamp column into a pandas dataframe. I want to retain the seconds information.
Effort 1:
I tried
def dateparse (timestamp):
return pd.datetime.strptime(timestamp, '%m/%d/%Y %H:%M:%S ')
df = pd.read_csv('file_name.csv', parse_dates['Timestamp'],date_parser=dateparse)
Above rounds off the seconds to something like
1/4/2019 2:00:00
Effort 2:
I thought of reading the entire file using and later convert it into dataframe.
with open('file name.csv') as f:
for line in f:
print(line)
But again here seconds information is rounded off.
edit 1:
The seconds info is truncated when I open this csv file in editors like sublime.
For me working omit date_parser=dateparse:
import pandas as pd
temp=u"""Timestamp1
1/4/2019 2:00:09 PM
1/4/2019 2:00:18 PM"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=['Timestamp1'])
print (df)
Timestamp1
0 2019-01-04 14:00:09
1 2019-01-04 14:00:18
print (df.dtypes)
Timestamp1 datetime64[ns]
dtype: object
EDIT1:
Correct format of datetimes should be changed:
import pandas as pd
def dateparse (timestamp):
return pd.datetime.strptime(timestamp, '%m/%d/%Y %I:%M:%S %p')
temp=u"""Timestamp1
1/4/2019 2:00:09 AM
1/4/2019 2:00:09 PM
1/4/2019 2:00:18 PM"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=['Timestamp1'],date_parser=dateparse)
print (df)
Timestamp1
0 2019-01-04 02:00:09
1 2019-01-04 14:00:09
2 2019-01-04 14:00:18
print (df.dtypes)
Timestamp1 datetime64[ns]
dtype: object
EDIT2:
df = pd.read_csv('send1.csv', parse_dates=['Timestamp'])
print (df)
Timestamp
0 2019-01-04 14:00:00
1 2019-01-04 14:00:00
2 2019-01-04 14:00:00
3 2019-01-04 14:00:00
4 2019-01-04 14:00:00
5 2019-01-04 14:00:00

Change all values of a column in pandas data frame

I have a panda data frame that contains values of a column like '01:00'. I want to deduct 1 from it means '01:00' will be '00:00'. Can anyone helps
You can use timedeltas:
df = pd.DataFrame({'col':['01:00', '02:00', '24:00']})
df['new'] = pd.to_timedelta(df['col'] + ':00') - pd.Timedelta(1, unit='h')
df['new'] = df['new'].astype(str).str[-18:-13]
print (df)
col sub
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
Another faster solution by map if format of all strings is 01:00 to 24:00:
L = ['{:02d}:00'.format(x) for x in range(25)]
d = dict(zip(L[1:], L[:-1]))
df['new'] = df['col'].map(d)
print (df)
col new
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
It seems that what you want to do is subtract an hour from the time which is stored in your dataframe as a string. You can do the following:
from datetime import datetime, timedelta
subtract_one_hour = lambda x: (datetime.strptime(x, '%H:%M') - timedelta(hours=1)).strftime("%H:%M")
df['minus_one_hour'] = df['original_time'].apply(subtract_one_hour)

Give default datetime object value to pandas.to_datetime()

I have some dates in string with different formats that I convert to datetime objects using to_datetime(). However, the list of strings also has some garbage values that I want to convert to default date.
import pandas as pd
import datetime as dt
print(df)
dates
0 2018-02-12
1 2018-03-19
2 12-24-2018
3 garbage
I use errors='coerece' to avert to throw exception. It produces NaT, that I want to convert to a default date 2018-12-31, in my case.
df['dates'] = pd.to_datetime(df['dates'], errors='coerce')
Below result.
dates
0 2018-02-12
1 2018-03-19
2 2018-12-24
3 NaT
Approach:
I am checking if the given value is a valid datetime or not. If not, put the default datetime object. But for some reason, it produces all default values.
df['dates'].apply(lambda x: dt.datetime(2018,12,31) if x is not dt.datetime else x)
Current Output
dates
0 2018-12-31
1 2018-12-31
2 2018-12-31
3 2018-12-31
Expected Output:
dates
0 2018-02-12
1 2018-03-19
2 2018-12-24
3 2018-12-31
Is there a way to give a default date to to_datetime() function so that, it won't produce NaT? If not, how do I put default dates afterwards?
You just need adding fillna at the end after pd.to_datetime call
pd.to_datetime(df['dates'], errors='coerce').fillna(pd.to_datetime('2018-12-31'))
Out[217]:
0 2018-02-12
1 2018-03-19
2 2018-12-24
3 2018-12-31
Name: dates, dtype: datetime64[ns]

Resources