how to replace stock-prices symbols in a dataframe - python-3.x

I would like to get the S&P 500 ['Adj Close'] column and replace the column with the corresponding stock symbol, however, I am not able to replace the dataframe columns because it gives me an error: KeyError: '5'
What I would like to achieve is to loop through all the available stocks from the list and replace the Adj Close with the stock symbol.
This is what I did:
First I have scraped the stock symbols from Wikipedia and added them to a list.
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
symbols = data[0] # get first column
symbols.head()
stock = symbols['Symbol'].to_list()
print(stock[0:5])
this gives me a list of stock symbols as below:
['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN']
then I scraped Yahoo finance to get the daily financial data as below
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
response = requests.get(stock_url.format(stock[0]), params=params)
file = StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df = pd.DataFrame(data)
stock_data = df['5']

Fix for key error
You are calling the the url using the list 'stock' and it gives a 404 response when I tried.
Call the URL with individual stock like below,
requests.get(stock_url.format(stock[0]), params=params)
Also do below, The column 5 is stored as integer instead of character. That is the reason you got 'key error'
stock_data = df[5]
I tried for stock 'MMM' - stock[0] and it prints below:
0 1 2 3 4 5 \
0 Date Open High Low Close Adj Close
1 2019-12-11 168.380005 168.839996 167.330002 168.740005 162.682480
2 2019-12-12 166.729996 170.850006 166.330002 168.559998 162.508926
3 2019-12-13 169.619995 171.119995 168.080002 168.789993 162.730667
4 2019-12-16 168.940002 170.830002 168.190002 170.750000 164.620316
.. ... ... ... ... ... ...
249 2020-12-04 172.130005 173.160004 171.539993 172.460007 172.460007
250 2020-12-07 171.720001 172.500000 169.179993 170.149994 170.149994
251 2020-12-08 169.740005 172.830002 169.699997 172.460007 172.460007
252 2020-12-09 172.669998 175.639999 171.929993 175.289993 175.289993
253 2020-12-10 174.869995 175.399994 172.690002 173.490005 173.490005
[254 rows x 7 columns]
Loop through stocks and replace Adj Close (Edited as per requirements from comments)
Code for looping through stocks and replacing Adj close with Stock symbol.
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
df = pd.DataFrame()
for i in stock:
response = requests.get(stock_url.format(i), params=params)
file = io.StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df1 = pd.DataFrame(data)
df1.loc[df1[5] == 'Adj Close',5] = i
df = df.append(df1)
Tried the code for first 3 stocks and here it is:

Related

Calculation of stock values with yfinance and python

I would like to make some calculations on stock prices in Python 3 and I have installed the module yfinance.
I try to get an individual value like this:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
row_date = tickerDf[tickerDf['Date']=='2020-12-30']
value = row_date.Open.item()
#see your data
print (value)
But when I run this, it says:
KeyError: 'Date'
Which is strange because when I do this, it works well and I have the column Date:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
#row_date = tickerDf[tickerDf['Date']=='2020-12-30']
#value = row_date.Open.item()
#see your data
print (tickerDf)
I get the following result:
G:\python> python test.py
Open High Low Close Volume Dividends Stock Splits
Date
2014-12-31 41.512481 42.143207 41.263744 41.263744 21552500 0.0 0
2015-01-02 41.450302 42.125444 41.343701 41.539135 27913900 0.0 0
2015-01-05 41.192689 41.512495 41.086088 41.157158 39673900 0.0 0
2015-01-06 41.201567 41.530255 40.455355 40.553074 36447900 0.0 0
2015-01-07 40.846223 41.272629 40.410934 41.068310 29114100 0.0 0
... ... ... ... ... ... ... ...
2020-12-22 222.690002 225.630005 221.850006 223.940002 22612200 0.0 0
2020-12-23 223.110001 223.559998 220.800003 221.020004 18699600 0.0 0
2020-12-24 221.419998 223.610001 221.199997 222.750000 10550600 0.0 0
2020-12-28 224.449997 226.029999 223.020004 224.960007 17933500 0.0 0
2020-12-29 226.309998 227.179993 223.580002 224.149994 17403200 0.0 0
[1510 rows x 7 columns]
Under the hood, yfinance uses a Pandas data frame to create a Ticker. In this dataframe, Date isn't an ordinary column, but is instead a name given to the index (see line 240 in base.py of yfinance). The index column behaves differently than other columns and actually can't be referenced by name. You can access it using TickerDf.index=='2020-12-30' or by turning it into a regular column using reset_index as explained in another question. Searching through an index is faster than searching a regular column, so if you are looking through a lot of data, it will be to your advantage to leave it as an index.

Extract data from .OUT file between 2 strings and create a new csv file using Python

My data is like below - stored in a .OUT file:
{ID=ISIN Name=yes PROGRAM=abc START_of_FIELDS CODE END-OF-FIELDS TIMESTARTED=Mon Nov 30 20:45:56
START-OF-DATA
CODE|ERR CODE|NUM|EXCH_CODE|
912828U rp|0|1|BERLIN|
1392917 rp|0|1|IND|
3CB0248 rp|0|1|BRAZIL|
END-OF-DATA***}
I need to extract the lines between START-OF-DATA and END-OF-DATA from above .OUT file using Python and load it in CSV file.
CODE|ERR CODE|NUM|EXCH_CODE|
912828U rp|0|1|BERLIN|
1392917 rp|0|1|IND|
3CB0248 rp|0|1|FRANKFURT|
You can use non greedy quantifier regex to get the entries between two strings.
with open('file.txt', 'r') as file:
data = file.read()
pattern = pattern = re.compile(r'(?:START-OF-DATA(.*?)END-OF-DATA)', re.MULTILINE|re.IGNORECASE | re.DOTALL)
g = re.findall(pattern,data)
O/P
[' \nCODE|ERR CODE|NUM|EXCH_CODE|\n912828U rp|0|1|BERLIN|\n1392917 rp|0|1|IND| \n3CB0248 rp|0|1|BRAZIL| \n']
#remove whitespaces and split by new line and remove empty entries of list
t = g[0].replace(" ","").split("\n")
new = list(filter(None, t))
O/P
['CODE|ERRCODE|NUM|EXCH_CODE|', '912828Urp|0|1|BERLIN|', '1392917rp|0|1|IND|', '3CB0248rp|0|1|BRAZIL|']
#create dataframe with pipe demoted
df = pd.DataFrame([i.split('|') for i in new])
O/P
0 1 2 3
0 CODE ERRCODE NUM EXCH_CODE
1 912828Urp 0 1 BERLIN
2 1392917rp 0 1 IND
3 3CB0248rp 0 1 BRAZIL
#create csv from df
df.to_csv('file.csv')
The regex pattern defined here will capture everything whenever a match is found for a string that begins with "START-OF-DATA" and ends with "END-OF-DATA" and leave you its output

Writing dynamically-sized arrays to file in python3

I am trying to write ASCII data to a text file in the following format:
Time Heat Flux ...
0.023 1.793 ...
.
.
.
The text header comes from a list of tags with dimension 1 x n and the numeric data with dimension m x n. I usually print this information when I know the number of rows and columns a priori in this manner:
# ... open file object, etc.
# Print header
print('%16s \t %16s' % ('Time', 'Heat Flux'), file=fileObject)
for ii in range(0, len(heatFlux):
print('%16.3f \t %16.3f' % (heatFlux[ii][0], heatFlux[ii][1]), file=fileObject)
I want to have generic code that allows me to write these files with a dynamically-sized array (in terms of number of columns). I've tried to generate a string and insert the tags and spaces, which I then write to file, but I am not sure how to "format-print" the string itself.
For example, I was trying
tagHeader = []
for tag in keyTags:
tagHeader = tagHeader + tag + '\t'
# ...
print(tagHeader, file=fileObject)
Can someone help me with this? Thanks!
If you have a list with the header and a 2D data structure you can do something like this:
def table(header, data):
print('\t'.join(['%16s'] * len(header)) % tuple(header))
for row in data:
print('\t'.join(['%16.3f'] * len(header)) % tuple(row))
Here are some tests:
header1 = ['Time', 'Heat Flux']
header2 = ['Time', 'Heat Flux', 'New col']
header3 = ['This', 'is', 'a', 'test']
data1 = [[uniform(-10, 10) for _ in range(len(header1))] for _ in range(randint(2, 10))]
data2 = [[uniform(-10, 10) for _ in range(len(header2))] for _ in range(randint(2, 10))]
data3 = [[uniform(-10, 10) for _ in range(len(header3))] for _ in range(randint(2, 10))]
table(header1, data1)
table(header2, data2)
table(header3, data3)
Output:
Time Heat Flux
7.037 -1.528
8.058 5.649
Time Heat Flux New col
-9.590 4.846 -4.024
-8.597 9.718 -8.174
9.260 -0.947 -6.675
3.401 -5.101 8.323
0.099 -6.582 3.951
This is a test
-2.126 -0.678 4.782 -7.849
-9.007 -0.019 -4.402 8.017
-7.399 -7.617 6.235 9.320
-0.486 -5.304 -4.723 1.946
2.743 -2.150 -6.779 -2.099
-7.499 -2.618 -9.918 0.674
8.912 -6.648 -7.865 -0.101
0.682 -0.414 7.677 7.167
-3.105 -6.562 6.970 -2.147

pdblp.BCon.bdh usage. inserting an array as the "list" argument

The usage for con.bdh is con.bdh('SPY US Equity', ['PX_LAST', 'VOLUME'],
'20150629', '20150630', longdata=True)
I would like to get PX_LAST and VOLUME for a list of securities that I have on an array (strings with tickers). When I try to substitute SPY US Equity with the array "arrtickers" or [list(arrtickers)] I get the following error:
...eidData[] = {
}
sequenceNumber = 0
securityError = {
source = "3920::bbdbh4"
code = 15
category = "BAD_SEC"
message = "Security key is too longInvalid Security [nid:3920] "
subcategory = "INVALID_SECURITY"
}
fieldExceptions[] = {
}
fieldData[] = {
}}}
Am I using the correct syntax?
Without posting a reproducible example this is just a guess, but as the error message in your snippet suggests this is likely because you are querying for an invalid security. Array syntax should work. For example the following works fine
In [1]: import pdblp
...: con = pdblp.BCon().start()
...: con.bdh(['SPY US Equity', 'IBM US Equity'], ['PX_LAST', 'VOLUME'],
'20150629', '20150630', longdata=True)
Out[1]
date ticker field value
0 2015-06-29 SPY US Equity PX_LAST 2.054200e+02
1 2015-06-29 SPY US Equity VOLUME 2.026213e+08
2 2015-06-30 SPY US Equity PX_LAST 2.058500e+02
3 2015-06-30 SPY US Equity VOLUME 1.829251e+08
4 2015-06-29 IBM US Equity PX_LAST 1.629700e+02
5 2015-06-29 IBM US Equity VOLUME 3.314684e+06
6 2015-06-30 IBM US Equity PX_LAST 1.626600e+02
7 2015-06-30 IBM US Equity VOLUME 3.597288e+06
Whereas this does not
In [2]: con.bdh(['SPY US Equity', 'NOT_A_SECURITY Equity'], ['PX_LAST', 'VOLUME'],
'20150629', '20150630', longdata=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-f23344f8a6b3> in <module>()
----> 1 con.bdh(['SPY US Equity', 'NOT_A_SECURITY Equity'], ['PX_LAST', 'VOLUME'], '20150629', '20150630', longdata=True)
~/Projects/pdblp/pdblp/pdblp.py in bdh(self, tickers, flds, start_date, end_date, elms, ovrds, longdata)
268
269 data = self._bdh_list(tickers, flds, start_date, end_date,
--> 270 elms, ovrds)
271
272 df = pd.DataFrame(data, columns=["date", "ticker", "field", "value"])
~/Projects/pdblp/pdblp/pdblp.py in _bdh_list(self, tickers, flds, start_date, end_date, elms, ovrds)
305 .numValues() > 0)
306 if has_security_error or has_field_exception:
--> 307 raise ValueError(msg)
308 ticker = (msg.getElement('securityData')
309 .getElement('security').getValue())
ValueError: HistoricalDataResponse = {
securityData = {
security = "NOT_A_SECURITY Equity"
eidData[] = {
}
sequenceNumber = 1
securityError = {
source = "139::bbdbh3"
code = 15
category = "BAD_SEC"
message = "Unknown/Invalid securityInvalid Security [nid:139] "
subcategory = "INVALID_SECURITY"
}
fieldExceptions[] = {
}
fieldData[] = {
}
}
}
thanks #mgilbert. i ended up creating a list and adding all the tickers to that list.

python - cannot make corr work

I'm struggling with getting a simple correlation done. I've tried all that was suggested under similar questions.
Here are the relevant parts of the code, the various attempts I've made and their results.
import numpy as np
import pandas as pd
try01 = data[['ESA Index_close_px', 'CCMP Index_close_px' ]].corr(method='pearson')
print (try01)
Out:
Empty DataFrame
Columns: []
Index: []
try04 = data['ESA Index_close_px'][5:50].corr(data['CCMP Index_close_px'][5:50])
print (try04)
Out:
**AttributeError: 'float' object has no attribute 'sqrt'**
using numpy
try05 = np.corrcoef(data['ESA Index_close_px'],data['CCMP Index_close_px'])
print (try05)
Out:
AttributeError: 'float' object has no attribute 'sqrt'
converting the columns to lists
ESA_Index_close_px_list = list()
start_value = 1
end_value = len (data['ESA Index_close_px']) +1
for items in data['ESA Index_close_px']:
ESA_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
CCMP_Index_close_px_list = list()
start_value = 1
end_value = len (data['CCMP Index_close_px']) +1
for items in data['CCMP Index_close_px']:
CCMP_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
try06 = np.corrcoef(['ESA_Index_close_px_list','CCMP_Index_close_px_list'])
print (try06)
Out:
****TypeError: cannot perform reduce with flexible type****
Also tried .astype but not made any difference.
data['ESA Index_close_px'].astype(float)
data['CCMP Index_close_px'].astype(float)
Using Python 3.5, pandas 0.18.1 and numpy 1.11.1
Would really appreciate any suggestion.
**edit1:*
Data is coming from an excel spreadsheet
data = pd.read_excel('C:\\Users\\Ako\\Desktop\\ako_files\\for_corr_‌​tool.xlsx') prior to the correlation attempts, there are only column renames and
data = data.drop(data.index[0])
to get rid of a line
regarding the types:
print (type (data['ESA Index_close_px']))
print (type (data['ESA Index_close_px'][1]))
Out:
**edit2*
parts of the data:
print (data['ESA Index_close_px'][1:10])
print (data['CCMP Index_close_px'][1:10])
Out:
2 2137
3 2138
4 2132
5 2123
6 2127
7 2126.25
8 2131.5
9 2134.5
10 2159
Name: ESA Index_close_px, dtype: object
2 5241.83
3 5246.41
4 5243.84
5 5199.82
6 5214.16
7 5213.33
8 5239.02
9 5246.79
10 5328.67
Name: CCMP Index_close_px, dtype: object
Well, I've encountered the same problem today.
try use .astype('float64') to help make the type correct.
data['ESA Index_close_px'][5:50].astype('float64').corr(data['CCMP Index_close_px'][5:50].astype('float64'))
This works well for me. Hope it can help you as well.
You can try as following:
Top15['Citable docs per capita']=(Top15['Citable docs per capita']*100000)
Top15['Citable docs per capita'].astype('int').corr(Top15['Energy Supply per Capita'].astype('int'))
It worked for me.

Resources