Error message when appending data to pandas dataframe - python-3.x

Can someone give me a hand with this:
I created a loop to append successive intervals of historical price data from Coinbase.
My loop iterates successfully a few times then crashes.
Error message (under data_temp code line):
"ValueError: If using all scalar values, you must pass an index"
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end))
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
Would love to understand how to fix this and why this is happening in the first place.
Thank you!
Here's the full trace:
Traceback (most recent call last):
File "\coinbase_bot.py", line 46, in
data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end))
File "D:\Program Files\Python37\lib\site-packages\pandas\core\frame.py", line 411, in init
mgr = init_dict(data, index, columns, dtype=dtype)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
index = extract_index(arrays)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 358, in extract_index
raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index
Here's json returned via simple url call:
[[1454716800,370.05,384.54,384.44,375.44,6276.66473729],[1454630400,382.99,389.36,387.99,384.5,7443.92933224],[1454544000,368.74,390.63,368.87,387.99,8887.7572324],[1454457600,365.63,373.01,372.93,368.87,7147.95657328],[1454371200,371.17,374.41,371.33,372.93,6856.21815799],[1454284800,366.26,379,367.89,371.33,7931.22922922],[1454198400,365,382.5,378.46,367.95,5506.77681302]]
Very similar to this user's issue but cannot put my finger on it:
When attempting to merge multiple dataframes, how to resolve "ValueError: If using all scalar values, you must pass an index"

-- Hi DashOfProgramming,
Your problem is that the data_temp is initialised with only a single row and pandas requires you to provide it with an index for that.
The following snippet should resolve this. I replaced your API call with a simple dictionary that resembles what I would expect the API to return and used i as index for the dataframe (this has the advantage that you can keep track as well):
import pandas as pd
from datetime import datetime, timedelta
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
temp_dict = {'start': '2019-09-30', 'end': '2019-10-01', 'price': '-111.0928',
'currency': 'USD'}
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(temp_dict, index=[i])
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
print(data_price)
EDIT
Just saw that your API output is a nested list. pd.DataFrame() thinks the list is only one row, because it's nested. I suggest you store your columns in a separate variable and then do this:
cols = ['ts', 'low', 'high', 'open', 'close', 'sth_else']
v = [[...], [...], [...]] # your list of lists
data_temp = pd.DataFrame.from_records(v, columns=cols)

Related

Error downloading historical stcok data using pandas_datareader Anconda3, Spyder 5.3.3

I have watch list of 30 stocks. The list is in a text file called "WatchList". I initialize the list as:
stock = []
and read the symbols line by line. I specify a location to store the data in csv format for each symbol.
I have the latest version of pandas_datareader and it is 0.10.0. I have used a while loop and pandas_datareader before. However, now I am experiencing problems. I receive the following error message:
runfile('E:/Stock_Price_Forecasting/NewStockPriceFetcher.py', wdir='E:/Stock_Price_Forecasting')
Enter the name of file to access WatchList
WatchList.txt
0 AAPL <class 'str'>
Traceback (most recent call last):
File "C:\Users\Om\anaconda3\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "e:\stock_price_forecasting\newstockpricefetcher.py", line 60, in
df = web.DataReader(stock[i], data_source='yahoo', start=start_date, end=end_date)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas\util_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\data.py", line 370, in DataReader
return YahooDailyReader(
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py", line 153, in _read_one_data
data = j["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
TypeError: string indices must be integers
The portion of my code that shows the while loop is shown below:
i = 0
while i < len(stock):
print(i, stock[i], type(stock[i]))
# Format the filename for each security to use in full path
stock_data_file = stock[i] + '.csv'
# Complete the path definition for stock data storage including filename
full_file_path = (file_path/stock_data_file)
# Specify the order for the columns
columnTitles = ('Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close')
# Pull the data for the stock from the Web
df = web.DataReader(stock[i], data_source='yahoo', start=start_date,
end=end_date) ** error in this line!!
# Reorder the columns for plotting Candlesticks
df=df.reindex(columns=columnTitles)
if i == 0:
df.to_csv(full_file_path)
print(i, stock[i], 'has data stored to csv file')
else:
df.to_csv(full_file_path, header=True)
print(i, stock[i], 'has data stored to csv file')
i += 1
I have looked at the parameter requirements for the Datareader and Yahoo. I belive the first paramataer is the ticker and a string value. I have been unable to find out where I am making a mistake. Any suggestions in solving this issue would be greatly appreciated. Thank you.

Reading a CSV file with graphviz

The current project I have to do involves reading data from a CSV file and using graphviz to show a visual representation of the data. this is what the code looks like:
import graphviz
import pandas
import os
import math
def save_graph_as_jpg(graph, filename):
graph.save('temp.dot')
src = graphviz.Source.from_file('temp.dot')
src.render(filename, format="jpg")
os.remove(filename)
os.remove('temp.dot')
class Node:
def __init__(self, data, left = None, right = None):
self.left = left
self.right = right
self.data = data
df = pandas.read_csv('decisiontree.csv', index_col = "ID") # df is "data frame"
print(df.to_string())
print(df.info)
nodes = []
nodeMap = {None:None}
for index, row in df[::-1].iterrows():
row = df.index(int[index])
if isinstance(df.loc[row][3], float) and math.isnan(df.loc[row][3]):
df.loc[row][3] = None
if isinstance(df.loc[row][2], float) and math.isnan(df.loc[row][2]):
df.loc[row][2] = None
nodeMap[df.loc[row][0]] = Node(df.loc[row][1],nodeMap[df.loc[row][3]], nodeMap[df.loc[row][2]]), nodes.insert(0,df.loc[row][0])
graph = graphviz.Digraph('structs', filename='structs.gv', node_attr={'shape': 'plaintext', 'ordering':'out'})
for nodeID in nodes:
node = nodeMap[nodeID]
if node.left:
graph.edge(node.data, node.left.data)
if node.right:
graph.edge(node.data, node.right.data)
save_graph_as_jpg(graph, "Decisiontree")
When I run it using IDLE, it returns most of the code just fine, but it gets hung up on line 27:
row = df.index(int[index])
I get a traceback message saying the following:
Traceback (most recent call last):
File "C:\Users...... line 27, in <module>
row = df.index[index]
File "C:\Users......Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexes\base.py", line 5382, in __getitem__
result = getitem(key)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I changed it to:
row = df.index(int[index])
and now I get this as a traceback and index error:
Traceback (most recent call last):
File "C:\Users.......CTML AI\Week 3\Lab3.py", line 27, in <module>
row = df.index(int[index])
TypeError: 'type' object is not subscriptable
You are receiving that error because you tried to use square brackets with the int type, which would try to subscript int as if it were an array. That doesn't work because int is a type that can't be subscripted.[1][2] You probably want to use parentheses instead to cast the type of the index variable to integer. Try changing line 27 to
df.index(int(index))
However, it would work if you had an array named int, but naming your variables the same as builtin types or functions is probably not a good idea.
As of Python 3.9, some types can be subscripted in type hints, e.g. list[int].

how to filter a particular column with python pandas?

I have an excel file where I have 2 columns: 'Name' and 'size'. The 'Name' column has multiple file types, namely ".apk, .dat, .vdex, .ttc" etc. But I only want to populate the files with the file extension ending with .apk. I do not want any other file type in the new excel file.
I have written the below code:
import pandas as pd
import json
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_xl = new_df.to_excel('abc.xlsx')
return new_xl
def filter_apk(): `MODIFIED CODE`
old_xl = json_to_excel()
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')
Above program does following:
json_to_excel(), takes a Json file, converts it to a .xlsx format and save.
filter_apk() is suppose to create multiple excel file based on the file extension present in "Name" column.
1st function is doing what I intend to.
2nd function is not doing anything. Neither its throwing any error. I have followed this weblink
Below are the few samples of the "name" column
/system/product/<Path_to>/abc.apk
/system/fonts/wwwr.ttc
/system/framework/framework.jar
/system/<Path_to>/icu.dat
/system/<Path_to>/Normal.apk
/system/<Path_to>/Tv.apk
How to get that working? Or is there a better way to achieve the objective?
Please suggest.
ERROR
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>
Note:
I have all the files at the same location.
modified code:
import pandas as pd
import json
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_df.to_excel('abc.xlsx')
def filter_apk():
json_to_excel()
old_xl = pd.read_excel('abc.xlsx')
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')
t = filter_apk()
print(t)
New error:
Traceback (most recent call last):
File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 89, in <module>
t = filter_apk()
File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 84, in filter_apk
data = pd.read_excel(old_xl)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 344, in __init__
filepath_or_buffer, _, _, _ = get_filepath_or_buffer(filepath_or_buffer)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 243, in get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>
There is a difference between your use-case and use-case shown in the weblink. You want to apply a single filter (apk files), whereas the example you saw had multiple filters which were to be applied one after another (multiple species).
This will do the trick.
def filter_apk():
old_xl = json_to_excel()
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel("<path_to_new_excel>\\new_excel_name.xlsx")
Regarding your new updated question. I guess your first function is not working as you think it is working.
new_xl = new_df.to_excel('abc.xlsx')
This will write an excel file, as you are expecting it to do. Which works.
However, assigning it to new_xl, does not do anything since there is no return on pd.to_excel. So when you return new_xl as output of your json_to_excel function, you actually return None. Therefore in your second function, old_xl = json_to_excel() will make old_xl have the value None.
So, your functions should be something like this:
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_df.to_excel('abc.xlsx')
def filter_apk():
json_to_excel()
data= pd.read_excel('abc.xlsx')
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')

Pandas & requests code for financial analysis giving KeyError:0

I've been looking for a few more tools to automate stock analysis, which is how i found this link to the code below. The author says he posted the whole code but I've not seen it so I am reconstructing it and can't quite get it running. Link below.
Requests, web scraping and pandas are areas where I'm not as proficent so I figure the code Jedi's on SO could help untangle or update this code.
https://medium.com/swlh/automating-your-stock-portfolio-research-with-python-for-beginners-912dc02bf1c2
Long term I'm trying to learn python by updating or building more features into tools that others have created so this is also a learning experience. So I would love you to fix it but I would more prefer you give hints and lead me towards possible solutions.
# FILENAME financial_analysis.py
# SOURCE https://medium.com/swlh/automating-your-stock-portfolio-research-with-python-for-beginners-912dc02bf1c2
import requests
import pandas as pd
def getdata(stock):
"""Company Quote Group of Items"""
company_quote = requests.get(f"https://financialmodelingprep.com/api/v3/quote/{stock}")
company_quote = company_quote.json()
share_price = float("{0:.2f}".format(company_quote[0]['price']))
# Balance Sheet Group of Items
BS = requests.get(f"https://financialmodelingprep.com/api/v3/financials/balance-sheet-statement/{stock}?period=quarter")
BS = BS.json()
# print_data = getdata(aapl)
#Total Debt
debt = float("{0:.2f}".format(float(BS['financials'][0]['Total debt'])/10**9))#Total Cash
cash = float("{0:.2f}".format(float(BS['financials'][0]['Cash and short-term investments'])/10**9))
# Income Statement Group of Items
IS = requests.get(f"https://financialmodelingprep.com/api/v3/financials/income-statement/{stock}?period=quarter")
IS = IS.json()
# Most Recent Quarterly Revenue
qRev = float("{0:.2f}".format(float(IS['financials'][0]['Revenue'])/10**9))
# Company Profile Group of Items
company_info = requests.get(f"https://financialmodelingprep.com/api/v3/company/profile/{stock}")
company_info = company_info.json()# Chief Executive Officer
ceo = company_info['profile']['ceo']
return(share_price, cash, debt, qRev, ceo)
tickers = {'AAPL', 'MSFT', 'GOOG', 'T', 'CSCO', 'INTC', 'ORCL', 'AMZN', 'FB', 'TSLA', 'NVDA'}
data = map(getdata, tickers)
df = pd.DataFrame(data,
columns=['Total Cash', 'Total Debt', 'Q3 2019 Revenue', 'CEO'],
index=tickers), print(df)
generates this error
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-d9759a746769>", line 1, in <module>
runfile('/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py', wdir='/Users/owner/sbox/Jamesmk6_3/toolbox')
File "/Users/owner/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/193.7288.30/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Users/owner/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/193.7288.30/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py", line 44, in <module>
index=tickers), print(df)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 469, in __init__
data = list(data)
File "/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py", line 12, in getdata
share_price = float("{0:.2f}".format(company_quote[0]['price']))
KeyError: 0
I've dug deeper and found the dev pages but there seems to be a complication between what the author did and their docs show.
The API sometimes returns dict and sometimes list. Simpler approach is to always extract using json_normalize()
Obviously insert your API key to make this work. I've run out of allowed calls in 24hr period to further test, it did work well on run. Some of the tickers were returning multiple rows for some of the API call. i.e. final dataset was > 11 rows
import requests
import pandas as pd
tickers = {'AAPL', 'MSFT', 'GOOG', 'T', 'CSCO', 'INTC', 'ORCL', 'AMZN', 'FB', 'TSLA', 'NVDA'}
df = pd.DataFrame()
url = "https://financialmodelingprep.com/api/v3"
apikey="xxx"
payload = {"apikey":apikey}
for stock in tickers:
print(stock)
# use params rather than manually build request parameters
quote = requests.get(f"{url}/quote/{stock}",params=payload)
bs = requests.get(f"{url}/balance-sheet-statement/{stock}", params={"period":"quarter", "limit":1, **payload})
IS = requests.get(f"{url}/income-statement/{stock}", params={"period":"quarter", "limit":1, **payload})
company_info = requests.get(f"{url}/company/profile/{stock}", params=payload)
if "Error Message" in quote.json():
print(f"Error: {quote.text}")
break
else:
# join all the results together using json_normalise() rather than hand coded extration from JSON
df = pd.concat([df, (pd.json_normalize(quote.json())
.merge(pd.json_normalize(bs.json()), on="symbol", suffixes=("","_BS"))
.merge(pd.json_normalize(IS.json()), on="symbol", suffixes=("","_IS"))
.merge(pd.json_normalize(company_info.json()), on="symbol", suffixes=("","_info"))
)])
# df.columns.tolist()
if len(df)>0:
# the columns the question is interested in
df.loc[:,["symbol","price","totalDebt","cashAndShortTermInvestments","revenue","profile.ceo"]]

"NameError: name 'datetime' is not defined" with datetime imported

I know there are a lot of datetime not defined posts but they all seem to forget the obvious import of datetime. I can't figure out why I'm getting this error. When I do each step in iPython it works well, but the method dosen't
import requests
import datetime
def daily_price_historical(symbol, comparison_symbol, limit=1, aggregate=1, exchange='', allData='true'):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}&allData={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate, allData)
if exchange:
url += '&e={}'.format(exchange)
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
datetime.datetime.fromtimestamp()
return df
This code produces this error:
Traceback (most recent call last):
File "C:\Users\20115619\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-29-4f015e05113f>", line 1, in <module>
rv.get_prices(30, 'ETH')
File "C:\Users\20115619\Desktop\projects\testDash\Revas.py", line 161, in get_prices
for symbol in symbols:
File "C:\Users\20115619\Desktop\projects\testDash\Revas.py", line 50, in daily_price_historical
df = pd.DataFrame(data)
File "C:\Users\20115619\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 4372, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'time'
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
I think that line is the problem.
Your Dataframe df at the end of the line doesn't have the attribute .time
For what it's worth I'm on Python 3.6.0 and this runs perfectly for me:
import requests
import datetime
import pandas as pd
def daily_price_historical(symbol, comparison_symbol, limit=1, aggregate=1, exchange='', allData='true'):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}&allData={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate, allData)
if exchange:
url += '&e={}'.format(exchange)
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
#I don't have the following function, but it's not needed to run this
#datetime.datetime.fromtimestamp()
return df
df = daily_price_historical('BTC', 'ETH')
print(df)
Note, I commented out the line that calls an external function that I do not have. Perhaps you have a global variable causing a problem?
Update as per the comments:
I'd use join instead to make the URL:
url = "".join(["https://min-api.cryptocompare.com/data/histoday?fsym=", str(symbol.upper()), "&tsym=", str(comparison_symbol.upper()), "&limit=", str(limit), "&aggregate=", str(aggregate), "&allData=", str(allData)])

Resources