I'm trying to do forecast in my python 3.x. So I wrote following code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
But I'm getting error message
AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'
Can you please help me to resolve the issue
You need to make sure that your Panda Series object ts_log have a DateTime index with inferred frequency.
For example:
ts_log.index
>>> DatetimeIndex(['2014-01-01', ... '2017-12-31'],
dtype='datetime64[ns]', name='Date', length=1461, freq='D')
Noticed how there's a an attribute freq='D', it means that Pandas infer that the Pandas Series is indexed Daily (D=Daily).
Now to achieve this, I assume your Series have a column call 'Date'. And here's the code to do it:
# Convert your daily column from just string to DateTime (skip if already done)
ts_log['Date'] = pd.to_datetime(ts_log['Date'])
# Set the column 'Date' as index (skip if already done)
ts_log = ts_log.set_index('Date')
# Specify datetime frequency
ts_log = ts_log.asfreq('D')
For frequency other than Daily, refer here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
For statsmodel==0.10.1 and where ts_log is not a dataframe or a dataframe without datetime index, use the following
decomposition = seasonal_decompose(ts_log, freq=1)
Related
I have pandas data frame that contains Month and Year values in a yyyy-mm format. I am using pd.to_sql to set the data type value to sent it to .db file.
I keep getting error:
sqlalchemy.exc.StatementError: (builtins.TypeError) SQLite Date type only accepts Python date objects as input.
Is there a way to set 'Date' Data type for 'MonthYear' (yyyy-mm) column? Or it should be set in a VARCHAR? I tried changing it to different types pandas's datetime data type, none of them seem to work.
I don't have any issues with 'full_date', it assigns it properly. Data type for 'full_date' is datetime64[ns] in pandas.
MonthYear full_date
2015-03 2012-03-11
2015-04 2013-08-19
2010-12 2012-06-29
2012-01 2018-01-01
df.to_sql('MY_TABLE', con=some_connection,
dtype={'MonthYear':sqlalchemy.types.Date(),
'full_date':sqlalchemy.types.Date()})
My opinion is that you shouldn't store unnecessarily the extra column in your database when you can derive it from the 'full_date' column.
One issue you'll run into is that SQLite doesn't have a DATE type. So, you need to parse the dates upon extraction with your query. Full example:
import datetime as dt
import numpy as np
import pandas as pd
import sqlite3
# I'm using datetime64[ns] because that's what you say you have
df = pd.DataFrame({'full_date': [np.datetime64('2012-03-11')]})
con = sqlite3.connect(":memory:")
df.to_sql("MY_TABLE", con, index=False)
new_df = pd.read_sql_query("SELECT * FROM MY_TABLE;", con,
parse_dates={'full_date':'%Y-%m-%d'})
Result:
In [111]: new_df['YearMonth'] = new_df['full_date'].dt.strftime('%Y-%m')
In [112]: new_df
Out[112]:
full_date YearMonth
0 2012-03-11 2012-03
I have a variable of type "pandas.core.groupby.generic.SeriesGroupBy" which I got from grouping various fields of a pandas dataframe. But, I would like to convert that variable into a pandas series which is working but with a lot of errors.
Here is the code which I have tried:
w = data.groupby(['dt', 'b'])['w']
w = pd.Series(w)
When I try to run this code, it's taking a lot of time to execute and also generating a lot of errors.
I am getting a pandas Series as follows:
But, I am expecting something similar to this:
Is there any other way to group the below column of a DataFrame and store it inside a pandas Series:
Pandas groupby objects are iterable. Using list comprehension you can extract the partitioned sub-series. Try:
list_of_series = [s for _, s in data.groupby(['dt', 'b'])['w']]
list_of_series is a list and should contain your desired pandas series.
I have a datframe that stores some information from text files, this information gives me details about my execution jobs.
I store all this information in a dataframe called "df_tmp". On that dataframe I have a column "end_Date" where I want to store the end date from the file that is the last line of my file but if in the dataframe I don't have any value I want to store the current_time.
Imagine that the information from my file is on the following variable:
string_from_my_file = 'Execution time at 2019/10/14 08:06:44'
What I need is:
In case of my manual file don't have any date on the last line I want to store the current_time.
For that I am trying with this code:
now = dt.datetime.now()
current_time = now.strftime('%H:%M:%S')
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
However, it gives me the following error:
builtins.AttributeError: 'str' object has no attribute 'strftime'
What I am doing wrong?
Thanks
Try this:
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
In this part, lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S', need to change x to datetime to apply strftime().
Probable reason for your error:
Even if end_date column is of type datetime, but you are filling that column with values having str type. This is changing data type of end_date column.
I have a database I'm reading from excel as a pandas dataframe, and the dates come in Timestamp dtype, but I need them to be in np.datetime64, so that I can make calculations.
I am aware that the function pd.to_datetime() and the astype(np.datetime64[ns]) method do work. However, I am unable to update my dataframe to yield this datatype, for whatever reason, using the code mentioned above.
I have also tried creating an acessory dataframe from the original one, with just the dates that I wish to update the typing, converting it to np.datetime64 and plugging it back onto the original dataframe:
dfi = df['dates']
dfi = pd.to_datetime(dfi)
df['dates'] = dfi
But still it doesn't work. I have also tried updating values one by one:
arr_i = df.index
for i in range(len(arr_i)):
df.at[arri[l],'dates'].to_datetime64()
Edit
The root problem seems to be that the dtype of the column gets updated to np.datetime64, but somehow, when getting single values from within, they still have the dtype = Timestamp
Does anyone have a suggestion of a workaround that is fairly fast?
Pandas tries to standardize all forms of datetimes by storing them as NumPy datetime64[ns] values when you assign them to a DataFrame. But when you try to access individual datetime64 values, they are returned as Timestamps.
There is a way to prevent this automatic conversion from happening however: Wrap the list of values in a Series of dtype object:
import numpy as np
import pandas as pd
# create some dates, merely for example
dates = pd.date_range('2000-1-1', periods=10)
# convert the dates to a *list* of datetime64s
arr = list(dates.to_numpy())
# wrap the values you wish to protect in a Series of dtype object.
ser = pd.Series(arr, dtype='object')
# assignment with `df['datetime64s'] = ser` would also work
df = pd.DataFrame({'timestamps': dates,
'datetime64s': ser})
df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10 entries, 0 to 9
# Data columns (total 2 columns):
# timestamps 10 non-null datetime64[ns]
# datetime64s 10 non-null object
# dtypes: datetime64[ns](1), object(1)
# memory usage: 240.0+ bytes
print(type(df['timestamps'][0]))
# <class 'pandas._libs.tslibs.timestamps.Timestamp'>
print(type(df['datetime64s'][0]))
# <class 'numpy.datetime64'>
But beware! Although with a little work you can circumvent Pandas' automatic conversion mechanism,
it may not be wise to do this. First, converting a NumPy array to a list is usually a sign you are doing something wrong, since it is bad for performance. Using object arrays is a bad sign since operations on object arrays are generally much much slower than equivalent operations on arrays of native NumPy dtypes.
You may be looking at an XY problem -- it may be more fruitful to find a way to (1)
work with Pandas Timestamps instead of trying to force Pandas to return NumPy
datetime64s or (2) work with datetime64 array-likes (e.g. Series of NumPy arrays) instead of handling values individually (which causes the coersion to Timestamps).
New here but I've been searching for hours now and can't seem to find the solution for this. What I'm trying to do is display an aggregate of a dataframe in a Bokeh chart. I tried using a groupby object but I get an error when passing the groupby object to the ColumnDataSource (as mentioned in the post below).
how use bokeh vbar chart parameter with groupby object?
Here's some sample code I'm using:
import pandas
from bokeh.models import ColumnDataSource
df = pandas.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
group = df.groupby("A")
source = ColumnDataSource(group)
Getting this error:
ValueError: expected a dict or pandas.DataFrame, got <pandas.core.groupby.DataFrameGroupBy object at 0x103f7bfd0>
Any ideas as to plot the groupby object in a chart with Bokeh?
Thanks in advance!
I haven't used Bokeh, however from what I see you are passing a pandas.core.groupby.DataFrameGroupBy and ColumnDataSource is expecting a pd.DataFrame. That said the problem is that when using groupby you create a data structure that resembles key, value storage. So each group in the groups object will have a key and value, that value is the DataFrame that your are looking for. Running your code as shown below will help you understand the resulting data structure from applying groupby() to a DataFrame:
groups = df.group('A')
for group in groups:
# get group key
key = group[0]
# Get group df
group_df = group[1]
Notice that I replaced group = df.group('A') with groups = df.group('A')