Yahoo Finance: unable to read URL - python-3.x

I have been using the following code for a while to extract stock price from yahoo finance. This code is now generating an error saying it cannot read the url.
import pandas_datareader.data as web
stock = web.DataReader(i_allStock+'.L', 'yahoo', start, end)
Has anyone had this problem and found a solution?

Try it like this.
from math import sqrt
from sklearn.cluster import MiniBatchKMeans
import pandas_datareader as dr
from matplotlib import pyplot as plt
import pandas as pd
import matplotlib.cm as cm
import seaborn as sn
start = '2019-1-1'
end = '2020-1-1'
tickers = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO','JPM','MCD', 'MMM', 'MRK', 'MSFT', 'NKE','PFE','PG','TRV','UNH','RTX','VZ','V','WBA','WMT','DIS','DOW']
prices_list = []
for ticker in tickers:
try:
prices = dr.DataReader(ticker,'yahoo',start)['Adj Close']
prices = pd.DataFrame(prices)
prices.columns = [ticker]
prices_list.append(prices)
except:
pass
prices_df = pd.concat(prices_list,axis=1)
prices_df.sort_index(inplace=True)
prices_df.head()

You can put the whole bunch of items in a single list. Yahoo Finance will retrive all those at once
import yfinance as yf
etf = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO']
tit = yf.download(tickers=etf, period='max')

Related

Using pytrends library with PowerBI

I am trying to extract data from Google Trends by using the pytrends library to analyze it in MS PowerBI by using the following script:
import pandas as pd
from pytrends.request import TrendReq
pytrends = TrendReq()
data = pd.DataFrame()
kw_list = ["Bitcoin", "Ethereum"]
pytrends.build_payload(kw_list, timeframe='today 3-m')
data = pytrends.interest_over_time()
print(data)
When using the simple script in PowerBI, the date-column suddenly disappears. How can I include the date-column ?
import pandas as pd
from pytrends.request import TrendReq
pytrends = TrendReq()
data = pd.DataFrame()
kw_list = ["Bitcoin", "Ethereum"]
pytrends.build_payload(kw_list, timeframe='today 3-m')
data = pytrends.interest_over_time()
data.reset_index(inplace=True)
print(data)
Date column is index, you just need to add second last line
Hope this will work
Thanks!

Pandas error "No numeric data to plot" when using stock data from datareader

I have a dataframe with closing stock prices:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
from pandas_datareader import data
import pandas_datareader.data as web
from pandas.tseries.offsets import BDay
f = web.DataReader('^DJI', 'stooq')
CLOSE = f['Close']
CLOSE.plot(alpha= 0.5,style='-')
CLOSE.resample('BA').mean().plot(style=':')
CLOSE.asfreq(freq='BA').plot(style='--')
plt.legend(['input','resample','asfreq'],loc='upper left')
With resample() I get the average of the previous year. This works.
With asfreq() I try to get the closing value at the end of the year. This doesn't work.
I get the following error in the asfreq() line: TypeError: no numeric data to plot
f.info() displays that close is a non-null float64 type.
What could be wrong?
The indices were not hierachically sorted:
f= f.sort_index(axis=0) solved it.

Can't see the items when I print the list > frozenset({'nan'}

I am trying to print the rules after using apriori. Instead of printing the actual items my code always prints nan.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
store_data = pd.read_csv('C:\\Users\\\\datasets\\popular_words_for_apriori.csv', header=None)
store_data.head()
num_records=len(store_data)
records = []
for i in range(0,99):
records.append([str(store_data.values[i,j]) for j in range(0,54)])
association_rules = apriori(records, min_support=0.0053, min_confidence=0.2, min_lift=1, min_length=3)
association_results = list(association_rules)
print(association_results)
and this is the output:
[RelationRecord(items=frozenset({**'nan'**}), support=1.0, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'nan'}), confidence=1.0, lift=1.0)]), RelationRecord(items=frozenset({'**nan**', 'algorithm'}), support=0.010101010101010102

Python- iterating over multiple files to read into a data frame

Here is the working code:
from urllib.request import urlretrieve
import requests
import xlrd
import pandas as pd
WORKING CODE
icd9_link = "https://www.cob.cms.hhs.gov/Section111/assets/section111/Section111ValidICD9-2017.xlsx"
icd9_map= pd.read_excel(icd9_link, sheet_name=0, header=0)
NOT WORKING CODE
Define function which will name ICD_9_map_ and use correct link
fx = icd"{0}"_map_= pd.read_excel(icd"{1}"_link, sheet_name=0, header=0)
#
y = [9,10]
for x in y:
fx.format(x, x)

Charting with Candlestick_OHLC

import pandas as pd
import numpy as np
from matplotlib.finance import candlestick_ohlc
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import io
import datetime
import urllib
import urllib.request
%matplotlib notebook
urlToVisit = 'http://chartapi.finance.yahoo.com/instrument/1.0/GOOG/chartdata;
type=quote;range=1y/csv'
with urllib.request.urlopen(urlToVisit) as response:
sourcePage = response.read().decode('utf-8')
df = pd.read_csv(io.StringIO(sourcePage), skiprows=18, header=None, sep=",",
names=['date','closeP','highP','lowP','openP','volume'],
index_col= 0, parse_dates= True)
if 'volume' not in df:
df['volume'] = np.zeros(len(df))
DATA = df[['openP', 'highP', 'lowP', 'closeP','volume']].values
f1 = plt.subplot2grid((6,4), (1,0), rowspan=6, colspan=4, axisbg='#07000d')
candlestick_ohlc(f1, DATA, width=.6, colorup='#53c156', colordown='#ff1717')
f1.grid('on')
f1.xaxis.set_major_locator(mticker.MaxNLocator(15))
f1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.subplots_adjust(left=.09, bottom=.14, right=.94, top=.95, wspace=.20, hspace=0)
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.show()
So here's the problem, when I try to plot the 'candlestick_ohlc' but it only plots the volume bar chart! (Why is this happening?) I'm thinking that maybe the problem has to do with my dates? I'm using iPython Notebook btw. My source is from - Yahoo Finance. If you notice, I skipped the first 18 lines so that I can get straight to the actual data itself and it looks like:
20150302,569.7757,570.5834,557.2202,558.9953,2129600
20150303,572.0694,573.8146,564.9689,568.8881,1704700
20150304,571.8001,575.5299,566.4548,570.3043,1876800
20150305,573.7548,576.3277,571.8400,573.4456,1389600
20150306,566.1307,575.1011,565.2082,573.3060,1659100
20150309,567.2925,568.7086,561.9921,565.3079,1062100
date,close,high,low,open,volume
Any ideas? Would appreciate any help!!
So with the help of #DSM,
DATA = df[['openP', 'highP', 'lowP', 'closeP','volume']]
DATA = DATA.reset_index()
DATA["date"] = DATA["date"].apply(mdates.date2num)
f1 = plt.subplot2grid((6,4), (1,0), rowspan=6, colspan=4, axisbg='#07000d')
candlestick_ohlc(f1, DATA.values, width=.6, colorup='#53c156', colordown='#ff1717')
fixed the problem! Credits to him.

Resources