Comparing two dataframes with a function (without common indices) - python-3.x

I have 2 pandas data frames, restaurants:
Lat Long Name
0 43.599503 1.440678 Le Filochard
1 43.602369 1.447368 Le Wallace
2 43.603838 1.435186 Chez Tonton, Pastis Ô Maître
and hotels:
Lat Long Name
0 43.603779 1.444004 Grand Hôtel de l'Opéra
1 43.599482 1.441207 Hôtel Garonne
2 43.549924 1.499821 Ibis Styles
And 1 function distance(origin,destination) with origin = (lat,long) coord.
I am trying to apply the function to the 2 data frames to calculate for each row of the first data frame the number of item from the second data frame for which the distance is inferior to 0.2 ...
I'm trying the apply map function but do not manage to perform it with the 2 data frames; do I need to merge them?

Would this work for you?
Just need to correct function for distance
hotels=pd.DataFrame([[43.599503, 1.440678, 'Le Filochard'],
[43.602369, 1.447368, 'Le Wallace'],
[43.603838, 1.435186, 'Chez Tonton, Pastis Ô Maître']],
columns=['Lat', 'Long', 'Name'])
restaurants=pd.DataFrame([[43.603779, 1.444004, "Grand Hôtel de l'Opéra"],
[43.599482, 1.441207, 'Hôtel Garonne'],
[43.549924, 1.499821, 'Ibis Styles']],
columns=['Lat', 'Long', 'Name'])
hotels['Nearby Resaurants'] = hotels.apply(lambda h: list(restaurants[((restaurants.Lat-h.Lat)**2+(restaurants.Long-h.Long)**2)<0.005].Name), axis=1)
print(hotels)
Lat Long Name \
0 43.599503 1.440678 Le Filochard
1 43.602369 1.447368 Le Wallace
2 43.603838 1.435186 Chez Tonton, Pastis Ô Maître
Nearby Resaurants
0 [Grand Hôtel de l'Opéra, Hôtel Garonne]
1 [Grand Hôtel de l'Opéra, Hôtel Garonne]
2 [Grand Hôtel de l'Opéra, Hôtel Garonne]
EDIT:
Modification to handle function, filtering of hotels also by using lambda function
def distance(X,Y):
return((X[0]-X[1])**2+(Y[0]-Y[1])**2)
hotels['Nearby Resaurants'] = hotels.apply(lambda h: list(restaurants.loc[restaurants.apply(lambda r: distance((r.Lat,h.Lat),(r.Long,h.Long))<0.005, axis=1)].Name), axis=1)
print(hotels)

Related

Calculation of stock values with yfinance and python

I would like to make some calculations on stock prices in Python 3 and I have installed the module yfinance.
I try to get an individual value like this:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
row_date = tickerDf[tickerDf['Date']=='2020-12-30']
value = row_date.Open.item()
#see your data
print (value)
But when I run this, it says:
KeyError: 'Date'
Which is strange because when I do this, it works well and I have the column Date:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
#row_date = tickerDf[tickerDf['Date']=='2020-12-30']
#value = row_date.Open.item()
#see your data
print (tickerDf)
I get the following result:
G:\python> python test.py
Open High Low Close Volume Dividends Stock Splits
Date
2014-12-31 41.512481 42.143207 41.263744 41.263744 21552500 0.0 0
2015-01-02 41.450302 42.125444 41.343701 41.539135 27913900 0.0 0
2015-01-05 41.192689 41.512495 41.086088 41.157158 39673900 0.0 0
2015-01-06 41.201567 41.530255 40.455355 40.553074 36447900 0.0 0
2015-01-07 40.846223 41.272629 40.410934 41.068310 29114100 0.0 0
... ... ... ... ... ... ... ...
2020-12-22 222.690002 225.630005 221.850006 223.940002 22612200 0.0 0
2020-12-23 223.110001 223.559998 220.800003 221.020004 18699600 0.0 0
2020-12-24 221.419998 223.610001 221.199997 222.750000 10550600 0.0 0
2020-12-28 224.449997 226.029999 223.020004 224.960007 17933500 0.0 0
2020-12-29 226.309998 227.179993 223.580002 224.149994 17403200 0.0 0
[1510 rows x 7 columns]
Under the hood, yfinance uses a Pandas data frame to create a Ticker. In this dataframe, Date isn't an ordinary column, but is instead a name given to the index (see line 240 in base.py of yfinance). The index column behaves differently than other columns and actually can't be referenced by name. You can access it using TickerDf.index=='2020-12-30' or by turning it into a regular column using reset_index as explained in another question. Searching through an index is faster than searching a regular column, so if you are looking through a lot of data, it will be to your advantage to leave it as an index.

how to replace stock-prices symbols in a dataframe

I would like to get the S&P 500 ['Adj Close'] column and replace the column with the corresponding stock symbol, however, I am not able to replace the dataframe columns because it gives me an error: KeyError: '5'
What I would like to achieve is to loop through all the available stocks from the list and replace the Adj Close with the stock symbol.
This is what I did:
First I have scraped the stock symbols from Wikipedia and added them to a list.
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
symbols = data[0] # get first column
symbols.head()
stock = symbols['Symbol'].to_list()
print(stock[0:5])
this gives me a list of stock symbols as below:
['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN']
then I scraped Yahoo finance to get the daily financial data as below
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
response = requests.get(stock_url.format(stock[0]), params=params)
file = StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df = pd.DataFrame(data)
stock_data = df['5']
Fix for key error
You are calling the the url using the list 'stock' and it gives a 404 response when I tried.
Call the URL with individual stock like below,
requests.get(stock_url.format(stock[0]), params=params)
Also do below, The column 5 is stored as integer instead of character. That is the reason you got 'key error'
stock_data = df[5]
I tried for stock 'MMM' - stock[0] and it prints below:
0 1 2 3 4 5 \
0 Date Open High Low Close Adj Close
1 2019-12-11 168.380005 168.839996 167.330002 168.740005 162.682480
2 2019-12-12 166.729996 170.850006 166.330002 168.559998 162.508926
3 2019-12-13 169.619995 171.119995 168.080002 168.789993 162.730667
4 2019-12-16 168.940002 170.830002 168.190002 170.750000 164.620316
.. ... ... ... ... ... ...
249 2020-12-04 172.130005 173.160004 171.539993 172.460007 172.460007
250 2020-12-07 171.720001 172.500000 169.179993 170.149994 170.149994
251 2020-12-08 169.740005 172.830002 169.699997 172.460007 172.460007
252 2020-12-09 172.669998 175.639999 171.929993 175.289993 175.289993
253 2020-12-10 174.869995 175.399994 172.690002 173.490005 173.490005
[254 rows x 7 columns]
Loop through stocks and replace Adj Close (Edited as per requirements from comments)
Code for looping through stocks and replacing Adj close with Stock symbol.
stock_url = 'https://query1.finance.yahoo.com/v7/finance/download/{}?'
params = {
'range' : '1y',
'interval' : '1d',
'events' : 'history'
}
df = pd.DataFrame()
for i in stock:
response = requests.get(stock_url.format(i), params=params)
file = io.StringIO(response.text)
reader = csv.reader(file)
data = list(reader)
df1 = pd.DataFrame(data)
df1.loc[df1[5] == 'Adj Close',5] = i
df = df.append(df1)
Tried the code for first 3 stocks and here it is:

Pandas string encoding when retrieving cell value

I have the following Series:
s = pd.Series(['ANO DE LOS BÃEZ MH EE 3 201'])
When I print the series I get:
0 ANO DE LOS BÃEZ MH EE 3 201
But when I get the cell element I get an hexadecimal value in the string:
>>> s.iloc[0]
'ANO DE LOS BÃ\x81EZ MH EE 3 201'
Why does this happens and how can I retrieve the cell value and get the string: 'ANO DE LOS BÃEZ MH EE 3 201'?
Even though I am not really sure where the issue arised I Could solve it by using the unidecode package.
output_string = unidecode(s.iloc[0])

How do read a SEC txt-file into a pandas dataframe?

I am trying to use SEC (U.S. Security and Exchange Commision data). The SEC provides useful data in a txtformat. I am using
Financial Statement Data Sets for the second quarter of 2017. You can find the data I use here.
I try to read the txtfiles into a pandas dataframe. I tried it the following ways:
sub = pd.read_fwf('sub.txt')
sub_1 = pd.read_csv('sub.txt')
I get no error with using Pandas' read_fwf function - but the output is utter rubbish. Here is the head of the dataframe:
adsh cik name sic countryba stprba cityba zipba bas1 bas2 baph countryma stprma cityma zipma mas1 mas2 countryinc stprinc ein former changed afs wksi fye form period fy fp filed accepted prevrpt detail instance nciks aciks Unnamed: 1
0 0000002178-17-000038\t2178\tADAMS RESOURCES & ... NaN
1 0000002488-17-000107\t2488\tADVANCED MICRO DEV... NaN
I do get an error when using read_csv: Error tokenizing data. C error: Expected 2 fields in line 7, saw 3
Any ideas on how tor read the data into a pandas dataframe?
It looks like the files are tab separated - that's why you're seeing \t in the results. pandas read_csv defaults to comma separated values, so you have to change the separator. This is controlled by the sep parameter. In addition, you will need to provide the proper encoding (errors are thrown when trying to read the num, pre, and tag files). Generally ISO-8859-1 is a good choice.
#import pandas
import pandas as pd
#read in the .txt file and choose a separator and encoding standard
df = pd.read_csv('sub.txt', sep='\t', encoding='ISO-8859-1')
#output the results
print(df)
adsh cik name \
0 0000002178-17-000038 2178 ADAMS RESOURCES & ENERGY, INC.
1 0000002488-17-000107 2488 ADVANCED MICRO DEVICES INC
2 0000002969-17-000019 2969 AIR PRODUCTS & CHEMICALS INC /DE/
3 0000002969-17-000024 2969 AIR PRODUCTS & CHEMICALS INC /DE/
4 0000003499-17-000010 3499 ALEXANDERS INC
5 0000003545-17-000043 3545 ALICO INC
6 0000003570-17-000073 3570 CHENIERE ENERGY INC

Python not getting the right value in an Excel cell

I want to color the interior of a cell according to it's content, however when I'm accessing its value I'm always getting '1.0', the value is calculated.
Colorisation code :
def _colorizeTop10RejetsSheet(self):
"""Colore les position de la page "Top 10 Rejets" """
start_position = (5, 12)
last_line = 47
for x in range(start_position[0], last_line+1):
current_cell = self.workbook.Sheets("Top 10 Rejets").Cells(x, start_position[1])
current_cell.Interior.Color = self._computePositionColor(current_cell.Value)
def _computePositionColor(self, position):
"""Colore les position de 1 a 5 en rouge de et 6 a 10 en orange"""
if position < 6:
return self.RED
elif position < 11:
return self.ORANGE
else:
return self.WHITE
Excel cell code :
=SI(ESTNA(RECHERCHEV(CONCATENER(TEXTE($F23;0);TEXTE($G23;"00");$H23;$I23);Données!$J:$P;7;FAUX));MAX(Données!$P:$P);RECHERCHEV(CONCATENER(TEXTE($F23;0);TEXTE($G23;"00");$H23;$I23);Données!$J:$P;7;FAUX))
How could I get the calculated value?
I'm using python 2.7 and I'm communicating with Excel through win32com
Thanks
Adding this to the beginning of the _colorizeTop10Rejets method did the trick
self.xl.Calculate()
self.xl is the object returned by win32.Dispatch('Excel.Application')

Resources