Set days since first occurrence based on multiple columns - python-3.x

I have a pandas dataset with this structure:
Date datetime64[ns]
Events int64
Location object
Day float64
I've used the following code to get the date of the first occurrence for location "A":
start_date = df[df['Location'] == 'A'][df.Events != 0].iat[0,0]
I now want to update all of the records after the start_date with the number of days since the start_date, where Day = df.Date - start_date.
I tried this code:
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).days
However, that code returns an error:
AttributeError: 'Series' object has no attribute 'days'
The problem seems to be that the code recognizes df.Date as an object instead of a datetime. Anyone have any ideas on what is causing this problem?

Try, you need to add the .dt accessor.
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).dt.days

Related

AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'

I'm trying to do forecast in my python 3.x. So I wrote following code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
But I'm getting error message
AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'
Can you please help me to resolve the issue
You need to make sure that your Panda Series object ts_log have a DateTime index with inferred frequency.
For example:
ts_log.index
>>> DatetimeIndex(['2014-01-01', ... '2017-12-31'],
dtype='datetime64[ns]', name='Date', length=1461, freq='D')
Noticed how there's a an attribute freq='D', it means that Pandas infer that the Pandas Series is indexed Daily (D=Daily).
Now to achieve this, I assume your Series have a column call 'Date'. And here's the code to do it:
# Convert your daily column from just string to DateTime (skip if already done)
ts_log['Date'] = pd.to_datetime(ts_log['Date'])
# Set the column 'Date' as index (skip if already done)
ts_log = ts_log.set_index('Date')
# Specify datetime frequency
ts_log = ts_log.asfreq('D')
For frequency other than Daily, refer here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
For statsmodel==0.10.1 and where ts_log is not a dataframe or a dataframe without datetime index, use the following
decomposition = seasonal_decompose(ts_log, freq=1)

Getting the current age using datetime library

I have a column of a user's DOB data in datetime64[ns] format, and would like to calculate their current age. Every time I parse this date and try to subtract the same with the present date, it throws me an error of str and datetime data format invalidity.
from datetime import datetime
main_file['AD_DOB'] = pd.to_datetime(main_file['AD_DOB']).dt.date ##10/2/1943
now = datetime.now().strftime("%H:%M:%S")
main_file['Age'] = ((now - main_file['AD_DOB'])/365).dt.days
Error: TypeError: unsupported operand type(s) for -: 'str' and 'datetime.date'
Try this
from datetime import datetime
main_file['AD_DOB'] = pd.to_datetime(main_file['AD_DOB']).dt.date ##10/2/1943
now = datetime.datetime.now()
main_file['Age'] = ((now - main_file['AD_DOB'])/365).dt.days
It's because that datetime.now().strftime("%H:%M:%S") returns a string object
Below should work:
now = datetime.now()
main_file['Age'] = (now - main_file['AD_DOB']).days/365.0
For subtract in pandas is necessary convert values to datetimes, if want remove times use Series.dt.floor for AD_DOB column and also for now is used Timestamp.floor:
main_file = pd.DataFrame({'AD_DOB':['10/2/1943','10/8/1946','10/12/1983']})
main_file['AD_DOB'] = pd.to_datetime(main_file['AD_DOB']).dt.floor('d')
now = pd.to_datetime('now').floor('d')
main_file['Age'] = ((now - main_file['AD_DOB'])/365).dt.days
print (main_file)
AD_DOB Age
0 1943-10-02 76
1 1946-10-08 73
2 1983-10-12 36

Pandas - Can't cast string to datetime

I have a datframe that stores some information from text files, this information gives me details about my execution jobs.
I store all this information in a dataframe called "df_tmp". On that dataframe I have a column "end_Date" where I want to store the end date from the file that is the last line of my file but if in the dataframe I don't have any value I want to store the current_time.
Imagine that the information from my file is on the following variable:
string_from_my_file = 'Execution time at 2019/10/14 08:06:44'
What I need is:
In case of my manual file don't have any date on the last line I want to store the current_time.
For that I am trying with this code:
now = dt.datetime.now()
current_time = now.strftime('%H:%M:%S')
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
However, it gives me the following error:
builtins.AttributeError: 'str' object has no attribute 'strftime'
What I am doing wrong?
Thanks
Try this:
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
In this part, lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S', need to change x to datetime to apply strftime().
Probable reason for your error:
Even if end_date column is of type datetime, but you are filling that column with values having str type. This is changing data type of end_date column.

Convert pandas column from object type [] in python 3

I have read this Pandas: convert type of column and this How to convert datatype:object to float64 in python?
I have current output of df:
Day object
Time object
Open float64
Close float64
High float64
Low float64
Day Time Open Close High Low
0 ['2019-03-25'] ['02:00:00'] 882.2 882.6 884.0 882.1
1 ['2019-03-25'] ['02:01:00'] 882.9 882.9 883.4 882.9
2 ['2019-03-25'] ['02:02:00'] 882.8 882.8 883.0 882.7
So I can not use this:
day_=df.loc[df['Day'] == '2019-06-25']
My final purpose is to extract df by filtering the value of column "Day" by specific condition.
I think the reason of df.loc above failed to excecute is that dtype of Day is object so I can not execute df.loc
so I try to convert the above df to something like this:
Day Time Open Close High Low
0 2019-03-25 ['02:00:00'] 882.2 882.6 884.0 882.1
1 2019-03-25 ['02:01:00'] 882.9 882.9 883.4 882.9
2 2019-03-25 ['02:02:00'] 882.8 882.8 883.0 882.7
I have tried:
df=pd.read_csv('output.csv')
df = df.convert_objects(convert_numeric=True)
#df['Day'] = df['CTR'].str.replace('[','').astype(np.float64)
df['Day'] = pd.to_numeric(df['Day'].str.replace(r'[,.%]',''))
But it does not work with error like this:
ValueError: Unable to parse string "['2019-03-25']" at position 0
I am novice at pandas and this may be duplicated!
Pls, help me to find solution. Thanks alot.
Try this I hope it would work
first remove list brackets by from day then do filter using .loc
df = pd.DataFrame(data={'Day':[['2016-05-12']],
'day2':[['2016-01-01']]})
df['Day'] = df['Day'].apply(''.join)
df['Day'] = pd.to_datetime(df['Day']).dt.date.astype(str)
days_df=df.loc[df['Day'] == '2016-05-12']
Second Solution
If the list is stored as string
from ast import literal_eval
df2 = pd.DataFrame(data={'Day':["['2016-05-12']"],
'day2':["['2016-01-01']"]})
df2['Day'] = df2['Day'].apply(literal_eval)
df2['Day'] = df2['Day'].apply(''.join)
df2['Day'] = pd.to_datetime(df2['Day']).dt.date.astype(str)
days_df=df2.loc[df2['Day'] == '2016-05-12']

Filter on month and date irrespective of year in python

I have a column of data one of them being a date and am expected to drop the rows that have leap dates. It is a range of years so I was hoping to drop any that matched the 02-29 filter.
The one way I used is to add additional columns, extract the month and date separately and then filter on the data as shown below. It serves the purpose but obviously not good from an efficiency perspective
df['Yr'], df['Mth-Dte'] = zip(*df['Date'].apply(lambda x: (x[:4], x[5:])))
df = df[df['Mth-Dte'] != '02-29']
Is there a better way to implement this by directly applying the filter on the column in the dataframe?
Adding the data
ID Date
22398 IDM00096087 1/1/2005
22586 IDM00096087 1/1/2005
21790 IDM00096087 1/2/2005
21791 IDM00096087 1/2/2005
14727 IDM00096087 1/3/2005
Thanks in advance
Convert to datetime and use boolean mask.
import pandas as pd
data = {'Date': {14727: '1/3/2005',
21790: '1/2/2005',
21791: '1/2/2005',
22398: '1/1/2005',
22586: '29/2/2008'},
'ID': {14727: 'IDM00096087',
21790: 'IDM00096087',
21791: 'IDM00096087',
22398: 'IDM00096087',
22586: 'IDM00096087'}}
df = pd.DataFrame(data)
Option1, convert + dt:
df.Date = pd.to_datetime(df.Date)
# Filter away february 29
df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))] # ~ for not equal to
Option2, convert + strftime:
df.Date = pd.to_datetime(df.Date)
# Filter away february 29
df[df.Date.dt.strftime('%m%d') != '0229']
Option3, without conversion:
mask = pd.to_datetime(df.Date).dt.strftime('%m%d') != '0229'
df[mask]

Resources