How to Calculate Stocastic RSI using Talib on Multi-Index Dataframe? - pandas-groupby

I have been trying to calculate Stocastic RSI on multi-index dataframe by using "groupby" symbol, and then calling inline function. Following is the code:
df['fastk'],df['fastd'] = df.groupby('Symbol')['Close'].apply(lambda y: talib.STOCHRSI(y, timeperiod=14, fastk_period=5, fastd_period=3, fastd_matype=0))
The error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\jafre\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2519, in __setitem__
self._set_item(key, value)
File "C:\Users\jafre\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2585, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\jafre\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2760, in _sanitize_column
value = _sanitize_index(value, self.index, copy=False)
File "C:\Users\jafre\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\series.py", line 3121, in _sanitize_index
raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index
Can someone help with the changes in the code?

Related

Openpyxl recognize data: TypeError: 'method' object is not subscriptable and is not a valid coordinate or range error

I'm new to openpyxl. I need to copy several columns from a file and paste it on another file, with the same columns.
I'm starting my code, but getting an error:
file1 = load_workbook('PRODUCTION.xlsx') ws = file1.active column = ws.cell['ID'] print (column)
I get this error:
Traceback (most recent call last): File "c:\Users\Ana\Documents\PRODUCTION Project\production.py", line 29, in <module> column = ws.cell['ID'] TypeError: 'method' object is not subscriptable
And when I tried only column = ws ['ID']
I get:
Traceback (most recent call last): File "c:\Users\Ana\Documents\PRODUCTION Project\production.py", line 29, in <module> column = ws ['ID'] File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\site-packages\openpyxl\worksheet\worksheet.py", line 290, in __getitem__ min_col, min_row, max_col, max_row = range_boundaries(key) File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\site-packages\openpyxl\utils\cell.py", line 135, in range_boundaries raise ValueError(msg) ValueError: ID is not a valid coordinate or range PS C:\Users\Ana\Documents\PRODUCTION Project>
Thanks in advance.

ValueError after MinMaxScaler and Transform

I am experiencing difficulty in this area. I experienced ValueError in the following: (I have tried solutions online but to no avail)
Here's my original code, which returns Convert String to Float error
ValueError: could not convert string to float: '3,1,0,0,0,1,0,1,89874,49.99'):
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
training_data_df = pd.read_csv('./data/sales_data_training.csv')
scaler = MinMaxScaler(feature_range=(0,1))
scaled_training= scaler.fit_transform(training_data_df)
scaled_training_df = pd.DataFrame(scaled_training,columns= training_data_df.columns.values)
My CSV Data:
"critic_rating,is_action,is_exclusive_to_us,is_portable,is_role_playing,is_sequel,is_sports,suitable_for_kids,total_earnings,unit_price"
"3.5,1,0,1,0,1,0,0,132717,59.99"
"4.5,0,0,0,0,1,1,0,83407,49.99"...
'3,1,0,0,0,1,0,1,89874,49.99'
I have 9 columns of data across 1000 rows (~9999 data, with first row being the header).
Regards,
Yuki
The full error is as follows:
Traceback (most recent call last):
File "C:/Users/YukiKawaii/PycharmProjects/PandasTest/module2_NN/test.py", line 6, in <module>
scaled_training= scaler.fit_transform(training_data_df)
File "C:\Users\YukiKawaii\Python\Python35\lib\site-packages\sklearn\base.py", line 517, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\YukiKawaii\Python\Python35\lib\site-packages\sklearn\preprocessing\data.py", line 308, in fit
return self.partial_fit(X, y)
File "C:\Users\YukiKawaii\Python\Python35\lib\site-packages\sklearn\preprocessing\data.py", line 334, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "C:\Users\YukiKawaii\Python\Python35\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: '3,1,0,0,0,1,0,1,89874,49.99'
You should remove the "" and '' wrapped around each line in the csv file.
By default pd.read_csv() splits each line by , and thus it cannot convert strings to floats if the "" and '' were there.
So the csv file should look as follows.
critic_rating,is_action,is_exclusive_to_us,is_portable,is_role_playing,is_sequel,is_sports,suitable_for_kids,total_earnings,unit_price
3.5,1,0,1,0,1,0,0,132717,59.99
4.5,0,0,0,0,1,1,0,83407,49.99
3,1,0,0,0,1,0,1,89874,49.99
I just verified by running your code after making the above change.

I have a problem with writing to my excel file

I am getting error message when im trying to write to my excel file.
On this line: df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
Traceback (most recent call last):
File "/Users/david.soderstrom/Dropbox (Diagona)/DS/Python/Byggmarknad_index/Byggmarknad_index.py", line 125, in
df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 1766, in to_excel
engine=engine)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/io/formats/excel.py", line 652, in write
freeze_panes=freeze_panes)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 1742, in write_cells
wks = self.book.add_worksheet(sheet_name)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 179, in add_worksheet
return self._add_sheet(name, worksheet_class=worksheet_class)
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 666, in _add_sheet
name = self._check_sheetname(name, isinstance(worksheet, Chartsheet))
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 717, in _check_sheetname
if len(sheetname) > 31:
TypeError: object of type 'int' has no len()
Exception ignored in: >
Traceback (most recent call last):
File "/Users/david.soderstrom/anaconda3/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 154, in del
Exception: Exception caught in workbook destructor. Explicit close() may be required for workbook.
# Calculate and save the percent change for each asset
if 'Percent_Change' not in excel_db.columns:
print('Percent_Change does not exist in excel file.')
print('Calculating...')
# Loop through and read each sheet
x = 0
for x in range(countSheets):
# Read in data for the calculation
data = pd.read_excel('databas.xlsx', sheet_name=x, index_col='Date')
# Calculate the percent change from day to day
Percent_Change = data['Adj Close'].pct_change()*100
print(type(Percent_Change))
df_Percent_Change = pd.DataFrame(Percent_Change)
print(type(df_Percent_Change))
writer = pd.ExcelWriter('databas.xlsx', engine='xlsxwriter')
df_Percent_Change.to_excel(writer, sheet_name=x, startcol=8)
# Save the result
writer.save()
writer.close()
x += 1

dtypes returning AttributeError

I'm using pandas for the first time. I have read data from an Excel file into a data frame, but can't seem to do anything with it.
The dataframe is called ParsedData. It exists and contains data:
>>> ParsedData
>>> USD# County Name USD Name Density Area
>>> 0 D0101 Neosho Erie 0.847692 325.0
>>> ...
I can confirm that it is a dataframe:
>>> type(ParsedData)
>>> <class 'pandas.core.frame.DataFrame'>
However, attempting to determine the dtypes of my columns fails:
>>> ParsedData.dtypes
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
ParsedData.dtypes
File "C:\Python34\lib\idlelib\rpc.py", line 611, in displayhook
text = repr(value)
File "C:\Python34\lib\site-packages\pandas\core\base.py", line 72, in __repr__
return str(self)
File "C:\Python34\lib\site-packages\pandas\core\base.py", line 51, in __str__
return self.__unicode__()
File "C:\Python34\lib\site-packages\pandas\core\series.py", line 982, in __unicode__
width, height = get_terminal_size()
File "C:\Python34\lib\site-packages\pandas\io\formats\terminal.py", line 33, in get_terminal_size
return shutil.get_terminal_size()
File "C:\Python34\lib\shutil.py", line 1071, in get_terminal_size
size = os.get_terminal_size(sys.__stdout__.fileno())
AttributeError: 'NoneType' object has no attribute 'fileno'
What's going on here?

Python 3.4 Panda sort market-data by Date

I am trying to set up Python (3.4) code to sort a time-series by date.
In python shell, I key in the following
>>>data = quandl.get("YAHOO/INDEX_GSPC", start_date="2017-01-01", end_date="2017-01-20")
>>>print(data)
So, I can load in the data. But when I try to use sort by the command
>>>data = data.sort_values(by='Date')
I get the following list of errors messages. I can't seem to understand/get the syntax for date sort from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html
Experts out there......., many thanks for advice.
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\pandas\indexes\base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: 'Date'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
data = data.sort_values(by='Date')
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 3230, in sort_values
k = self.xs(by, axis=other_axis).values
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1770, in xs
return self[key]
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\indexes\base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: 'Date'
quandl.get loads a DataFrame with the date as index.
So if you sort by index, you're good to go:
data = data.sort_index()
Make sure you look at the error. You are getting a KeyError which means that the column Date does not exist in your DataFrame. It's like that the dates are stored in the index which requires the sort_index method instead. The 'Date' name that you see in your DataFrame is the name of the index and not a column.
data.sort_index()

Resources