I've been looking for a few more tools to automate stock analysis, which is how i found this link to the code below. The author says he posted the whole code but I've not seen it so I am reconstructing it and can't quite get it running. Link below.
Requests, web scraping and pandas are areas where I'm not as proficent so I figure the code Jedi's on SO could help untangle or update this code.
https://medium.com/swlh/automating-your-stock-portfolio-research-with-python-for-beginners-912dc02bf1c2
Long term I'm trying to learn python by updating or building more features into tools that others have created so this is also a learning experience. So I would love you to fix it but I would more prefer you give hints and lead me towards possible solutions.
# FILENAME financial_analysis.py
# SOURCE https://medium.com/swlh/automating-your-stock-portfolio-research-with-python-for-beginners-912dc02bf1c2
import requests
import pandas as pd
def getdata(stock):
"""Company Quote Group of Items"""
company_quote = requests.get(f"https://financialmodelingprep.com/api/v3/quote/{stock}")
company_quote = company_quote.json()
share_price = float("{0:.2f}".format(company_quote[0]['price']))
# Balance Sheet Group of Items
BS = requests.get(f"https://financialmodelingprep.com/api/v3/financials/balance-sheet-statement/{stock}?period=quarter")
BS = BS.json()
# print_data = getdata(aapl)
#Total Debt
debt = float("{0:.2f}".format(float(BS['financials'][0]['Total debt'])/10**9))#Total Cash
cash = float("{0:.2f}".format(float(BS['financials'][0]['Cash and short-term investments'])/10**9))
# Income Statement Group of Items
IS = requests.get(f"https://financialmodelingprep.com/api/v3/financials/income-statement/{stock}?period=quarter")
IS = IS.json()
# Most Recent Quarterly Revenue
qRev = float("{0:.2f}".format(float(IS['financials'][0]['Revenue'])/10**9))
# Company Profile Group of Items
company_info = requests.get(f"https://financialmodelingprep.com/api/v3/company/profile/{stock}")
company_info = company_info.json()# Chief Executive Officer
ceo = company_info['profile']['ceo']
return(share_price, cash, debt, qRev, ceo)
tickers = {'AAPL', 'MSFT', 'GOOG', 'T', 'CSCO', 'INTC', 'ORCL', 'AMZN', 'FB', 'TSLA', 'NVDA'}
data = map(getdata, tickers)
df = pd.DataFrame(data,
columns=['Total Cash', 'Total Debt', 'Q3 2019 Revenue', 'CEO'],
index=tickers), print(df)
generates this error
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-d9759a746769>", line 1, in <module>
runfile('/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py', wdir='/Users/owner/sbox/Jamesmk6_3/toolbox')
File "/Users/owner/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/193.7288.30/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Users/owner/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-1/193.7288.30/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py", line 44, in <module>
index=tickers), print(df)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 469, in __init__
data = list(data)
File "/Users/owner/sbox/Jamesmk6_3/toolbox/financial_analysis.py", line 12, in getdata
share_price = float("{0:.2f}".format(company_quote[0]['price']))
KeyError: 0
I've dug deeper and found the dev pages but there seems to be a complication between what the author did and their docs show.
The API sometimes returns dict and sometimes list. Simpler approach is to always extract using json_normalize()
Obviously insert your API key to make this work. I've run out of allowed calls in 24hr period to further test, it did work well on run. Some of the tickers were returning multiple rows for some of the API call. i.e. final dataset was > 11 rows
import requests
import pandas as pd
tickers = {'AAPL', 'MSFT', 'GOOG', 'T', 'CSCO', 'INTC', 'ORCL', 'AMZN', 'FB', 'TSLA', 'NVDA'}
df = pd.DataFrame()
url = "https://financialmodelingprep.com/api/v3"
apikey="xxx"
payload = {"apikey":apikey}
for stock in tickers:
print(stock)
# use params rather than manually build request parameters
quote = requests.get(f"{url}/quote/{stock}",params=payload)
bs = requests.get(f"{url}/balance-sheet-statement/{stock}", params={"period":"quarter", "limit":1, **payload})
IS = requests.get(f"{url}/income-statement/{stock}", params={"period":"quarter", "limit":1, **payload})
company_info = requests.get(f"{url}/company/profile/{stock}", params=payload)
if "Error Message" in quote.json():
print(f"Error: {quote.text}")
break
else:
# join all the results together using json_normalise() rather than hand coded extration from JSON
df = pd.concat([df, (pd.json_normalize(quote.json())
.merge(pd.json_normalize(bs.json()), on="symbol", suffixes=("","_BS"))
.merge(pd.json_normalize(IS.json()), on="symbol", suffixes=("","_IS"))
.merge(pd.json_normalize(company_info.json()), on="symbol", suffixes=("","_info"))
)])
# df.columns.tolist()
if len(df)>0:
# the columns the question is interested in
df.loc[:,["symbol","price","totalDebt","cashAndShortTermInvestments","revenue","profile.ceo"]]
Related
I have watch list of 30 stocks. The list is in a text file called "WatchList". I initialize the list as:
stock = []
and read the symbols line by line. I specify a location to store the data in csv format for each symbol.
I have the latest version of pandas_datareader and it is 0.10.0. I have used a while loop and pandas_datareader before. However, now I am experiencing problems. I receive the following error message:
runfile('E:/Stock_Price_Forecasting/NewStockPriceFetcher.py', wdir='E:/Stock_Price_Forecasting')
Enter the name of file to access WatchList
WatchList.txt
0 AAPL <class 'str'>
Traceback (most recent call last):
File "C:\Users\Om\anaconda3\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "e:\stock_price_forecasting\newstockpricefetcher.py", line 60, in
df = web.DataReader(stock[i], data_source='yahoo', start=start_date, end=end_date)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas\util_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\data.py", line 370, in DataReader
return YahooDailyReader(
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py", line 153, in _read_one_data
data = j["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
TypeError: string indices must be integers
The portion of my code that shows the while loop is shown below:
i = 0
while i < len(stock):
print(i, stock[i], type(stock[i]))
# Format the filename for each security to use in full path
stock_data_file = stock[i] + '.csv'
# Complete the path definition for stock data storage including filename
full_file_path = (file_path/stock_data_file)
# Specify the order for the columns
columnTitles = ('Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close')
# Pull the data for the stock from the Web
df = web.DataReader(stock[i], data_source='yahoo', start=start_date,
end=end_date) ** error in this line!!
# Reorder the columns for plotting Candlesticks
df=df.reindex(columns=columnTitles)
if i == 0:
df.to_csv(full_file_path)
print(i, stock[i], 'has data stored to csv file')
else:
df.to_csv(full_file_path, header=True)
print(i, stock[i], 'has data stored to csv file')
i += 1
I have looked at the parameter requirements for the Datareader and Yahoo. I belive the first paramataer is the ticker and a string value. I have been unable to find out where I am making a mistake. Any suggestions in solving this issue would be greatly appreciated. Thank you.
Can someone give me a hand with this:
I created a loop to append successive intervals of historical price data from Coinbase.
My loop iterates successfully a few times then crashes.
Error message (under data_temp code line):
"ValueError: If using all scalar values, you must pass an index"
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end))
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
Would love to understand how to fix this and why this is happening in the first place.
Thank you!
Here's the full trace:
Traceback (most recent call last):
File "\coinbase_bot.py", line 46, in
data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end))
File "D:\Program Files\Python37\lib\site-packages\pandas\core\frame.py", line 411, in init
mgr = init_dict(data, index, columns, dtype=dtype)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
index = extract_index(arrays)
File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 358, in extract_index
raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index
Here's json returned via simple url call:
[[1454716800,370.05,384.54,384.44,375.44,6276.66473729],[1454630400,382.99,389.36,387.99,384.5,7443.92933224],[1454544000,368.74,390.63,368.87,387.99,8887.7572324],[1454457600,365.63,373.01,372.93,368.87,7147.95657328],[1454371200,371.17,374.41,371.33,372.93,6856.21815799],[1454284800,366.26,379,367.89,371.33,7931.22922922],[1454198400,365,382.5,378.46,367.95,5506.77681302]]
Very similar to this user's issue but cannot put my finger on it:
When attempting to merge multiple dataframes, how to resolve "ValueError: If using all scalar values, you must pass an index"
-- Hi DashOfProgramming,
Your problem is that the data_temp is initialised with only a single row and pandas requires you to provide it with an index for that.
The following snippet should resolve this. I replaced your API call with a simple dictionary that resembles what I would expect the API to return and used i as index for the dataframe (this has the advantage that you can keep track as well):
import pandas as pd
from datetime import datetime, timedelta
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
temp_dict = {'start': '2019-09-30', 'end': '2019-10-01', 'price': '-111.0928',
'currency': 'USD'}
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(temp_dict, index=[i])
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
print(data_price)
EDIT
Just saw that your API output is a nested list. pd.DataFrame() thinks the list is only one row, because it's nested. I suggest you store your columns in a separate variable and then do this:
cols = ['ts', 'low', 'high', 'open', 'close', 'sth_else']
v = [[...], [...], [...]] # your list of lists
data_temp = pd.DataFrame.from_records(v, columns=cols)
I am new to coding python and I'm absolutely loving it! Unfortunately my limited knowledge in it has made me hit a roadblock with a piece of code from a tutorial I have been following, see link below:
https://pythonprogramming.net/combining-stock-prices-into-one-dataframe-python-programming-for-finance/?completed=/sp500-company-price-data-python-programming-for-finance/
Quick summary of what i'm trying to do:
1) Copy ticker list from wikipedia of all SP500 companies using bs4 (DONE)
2) Get data from Yahoo on all Tickers using pandas_datareader and Import all SP500 companies with OHLC data individually in csv files in a folder (called stock_dfs) (DONE-ish)
Yahoo kinda blocks me after bout 70 of them... a recommendation would be great!... i've tried importing time and using time.sleep to create a 5 second delay but no matter where i place it in the loop Yahoo cuts me off..
3) Combine all Ticker data into one master file ready to be analyzed... I just can't combine them. I even tried creating the csv manually but still nothing.
Note: in the code on the website he is calling for morningstar data instead of yahoo.. on the video he puts yahoo.. i think this was done in error. Either way when he runs it, it works on 3.5. So i assume it's a version issue.
Thanks in advance!
Below you will find the error messages i get when running this as well as the block of code right afterwards.
Traceback (most recent call last):
File "C:/Users/harry/PycharmProjects/Tutorials/Finance with Python/SENTDEX_T7_sp500InOneDataframe.py", line 87, in <module>
compile_data()
File "C:/Users/harry/PycharmProjects/Tutorials/Finance with Python/SENTDEX_T7_sp500InOneDataframe.py", line 70, in compile_data
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
File "C:\Users\harry\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\harry\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\harry\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "C:\Users\harry\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\harry\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'stock_dfs/BRK.B.csv' does not exist: b'stock_dfs/BRK.B.csv'
Process finished with exit code 1
import bs4 as bs
import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web
import pickle
import requests
def save_sp500_tickers():
resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})
tickers = []
for row in table.findAll('tr')[1:]:
ticker = row.findAll('td')[0].text
tickers.append(ticker)
with open("sp500tickers.pickle", "wb") as f:
pickle.dump(tickers, f)
return tickers
# save_sp500_tickers()
def get_data_from_yahoo(reload_sp500=False):
if reload_sp500:
tickers = save_sp500_tickers()
else:
with open("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
if not os.path.exists('stock_dfs'):
os.makedirs('stock_dfs')
start = dt.datetime(2010, 1, 1)
end = dt.datetime.now()
for ticker in tickers:
# just in case your connection breaks, we'd like to save our progress!
if not os.path.exists('stock_dfs/{}.csv'.format(ticker)):
df = web.DataReader(ticker, 'yahoo', start, end)
df.reset_index(inplace=True)
df.set_index("Date", inplace=True)
df = df.drop("Symbol", axis=1)
df.to_csv('stock_dfs/{}.csv'.format(ticker))
else:
print('Already have {}'.format(ticker))
def compile_data():
with open("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
main_df = pd.DataFrame()
for count, ticker in enumerate(tickers):
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
df.set_index('Date', inplace=True)
df.rename(columns={'Adj Close': ticker}, inplace=True)
df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df, how='outer')
if count % 10 == 0:
print(count)
print(main_df.head())
main_df.to_csv('sp500_joined_closes.csv')
compile_data()
I know there are a lot of datetime not defined posts but they all seem to forget the obvious import of datetime. I can't figure out why I'm getting this error. When I do each step in iPython it works well, but the method dosen't
import requests
import datetime
def daily_price_historical(symbol, comparison_symbol, limit=1, aggregate=1, exchange='', allData='true'):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}&allData={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate, allData)
if exchange:
url += '&e={}'.format(exchange)
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
datetime.datetime.fromtimestamp()
return df
This code produces this error:
Traceback (most recent call last):
File "C:\Users\20115619\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-29-4f015e05113f>", line 1, in <module>
rv.get_prices(30, 'ETH')
File "C:\Users\20115619\Desktop\projects\testDash\Revas.py", line 161, in get_prices
for symbol in symbols:
File "C:\Users\20115619\Desktop\projects\testDash\Revas.py", line 50, in daily_price_historical
df = pd.DataFrame(data)
File "C:\Users\20115619\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 4372, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'time'
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
I think that line is the problem.
Your Dataframe df at the end of the line doesn't have the attribute .time
For what it's worth I'm on Python 3.6.0 and this runs perfectly for me:
import requests
import datetime
import pandas as pd
def daily_price_historical(symbol, comparison_symbol, limit=1, aggregate=1, exchange='', allData='true'):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}&allData={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate, allData)
if exchange:
url += '&e={}'.format(exchange)
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
#I don't have the following function, but it's not needed to run this
#datetime.datetime.fromtimestamp()
return df
df = daily_price_historical('BTC', 'ETH')
print(df)
Note, I commented out the line that calls an external function that I do not have. Perhaps you have a global variable causing a problem?
Update as per the comments:
I'd use join instead to make the URL:
url = "".join(["https://min-api.cryptocompare.com/data/histoday?fsym=", str(symbol.upper()), "&tsym=", str(comparison_symbol.upper()), "&limit=", str(limit), "&aggregate=", str(aggregate), "&allData=", str(allData)])
i am getting this error when i try to run quora duplicates files on my feature python file,
the part of code i am running is below
data = pd.read_csv('train.csv', sep='\t')
data = data.drop(['id', 'qid1', 'qid2'], axis=1)
and the output is
unfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
Traceback (most recent call last):
File "<ipython-input-31-e29a1095cc40>", line 1, in <module>
runfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py", line 55, in <module>
data = data.drop(['id','qid1','qid2'], axis=1)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
labels[mask])
ValueError: labels ['id' 'qid1' 'qid2'] not contained in axis
my csv file is like this
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share market?","0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?","0"
please help me in trying to figure out the problem
you need to remove the separator argument \ because content in csv already has , as a separator:
# sample.csv file contains following data
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"
df = pd.read_csv('sample.csv')
data = df.drop(['id', 'qid1', 'qid2'], axis=1)
print data
#output will be like this:
"question1","question2","is_duplicate"
"What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"