Keyerror when adding a column to a Dataframe (Pandas) - python-3.x

Pandas DataFrame is not really accepting adding a second column, and I cannot really troubleshoot the issue. I am trying to display Moving Averages. The code works fine just for the first one (MA_9), and gives me error as soon I try to add additional MA (MA_20).
Is it not possible in this case to add more than one column?
The code:
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
symbol = 'GOOG.US'
start = '20140314'
end = '20180414'
google = pdr.DataReader(symbol, 'stooq', start, end)
print(google.head())
google_close = pd.DataFrame(google.Close)
print(google_close.last_valid_index)
google_close['MA_9'] = google_close.rolling(9).mean()
google_close['MA_20'] = google_close.rolling(20).mean()
# google_close['MA_60'] = google_close.rolling(60).mean()
# print(google_close)
plt.figure(figsize=(15, 10))
plt.grid(True)
# display MA's
plt.plot(google_close['Close'], label='Google_Cls')
plt.plot(google_close['MA_9'], label='MA 9 day')
plt.plot(google_close['MA_20'], label='MA 20 day')
# plt.plot(google_close['MA_60'], label='MA 60 day')
plt.legend(loc=2)
plt.show()

Please update your code as below and then it should work:
google_close['MA_9'] = google_close.Close.rolling(9).mean()
google_close['MA_20'] = google_close.Close.rolling(20).mean()
Initially there was only one column data of Close so your old code google_close['MA_9'] = google_close.rolling(9).mean() worked but after this line of code now it has two column and so it does not know which data you are trying to mean. So updating with the column details of data you wanted to mean, it works google_close['MA_20'] = google_close.Close.rolling(20).mean()

Related

Visual Studio Code autocomplete comma to '%%!'

While using the Jupyter extension in VS Code, for some reason, every time I type a comma, VS Code suggests %%! which means I have to hit esc every time in order to make comma separated lists over multiple lines. Can anyone tell me why this is happening or how to stop it? It doesn't happen in a blank notebook, but after running two cells it's back again.
import pandas as pd
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import datetime as dt
%matplotlib inline
sns.set()
def open_hbal(file):
df_balance = pd.read_excel(file) #open xlsx
t_0 = dt.datetime(2018, 1, 1) #set start point for time series
#add Date and time in fraction steps starting from t_0
df_balance["Date and Time"] = t_0 + pd.to_timedelta(df_balance["Time"],
unit='h')
#convert to datetime object
df_balance["Date and Time"] = pd.to_datetime(df_balance["Date and Time"])
#replace Index with Date and Time. inplace overwrites df
df_balance.set_index("Date and Time", inplace=True)
#remove Time column as it is no longer needed, axis 0 = row, 1 = column
df_balance.drop("Time", axis=1, inplace=True)
#df_balance = df_balance / 1000 # convert to kWh
#replace units in all columns
#df_balance.columns = df_balance.columns.str.replace(", W", ", kWh")
df_balance.rename(columns = {"Net losses, W" :"_Net losses, W"},
inplace = True)
return(df_balance)
In case anyone else lands here, other suggested solutions did not work for me on v2021.10.1101450599. As per this issue, rolling back to v2021.8.1236758218 removes the problem until it gets fixed.

Get Seaborn legend location

I want to add comments under my legend. Here is a sample code doing what I want:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
hardlocy = 0.92
xmargin=0.02
xmin,xmax = ax.get_xlim()
xtxt=xmax-(xmax-xmin)*xmargin
leg = ax.legend()
plt.text(xtxt,hardlocy,"Comment",
horizontalalignment='right'
);
Result is:
As you can see, I rely on manual position setting, at least for y-axis. I would like to do it automatically.
As per this thread and this one, I have tried to access legend characteristics through p = leg.get_window_extent(), but I have obtain the following error message:
AttributeError: 'NoneType' object has no attribute 'points_to_pixels'
(which is very similar to this closed issue)
I run MacOS Catalina version 10.15.4 and I have performed a successful conda update --all a few minutes ago, without any result.
How can I automatically place my comments?
Thanks to #JohanC, from this question:
One needs to draw a figure for its legend to be worked out. Therefore, a working code here could be:
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
ymargin=0.05
leg = ax.legend()
fig.canvas.draw()
bbox = leg.get_window_extent()
inv = ax.transData.inverted()
(xloc,yloc)=inv.transform((bbox.x1,bbox.y0))
ymin,ymax = ax.get_ylim()
yloc_margin=yloc-(ymax-ymin)*ymargin
ax.text(xloc,yloc_margin,"Comment",horizontalalignment='right')

Pandas - comparing average of hour periods against each other for a given date range

I'm trying to get used to using datetime data in Pandas and plotting different comparisons for a given dataset. I'm using the London Air Quality dataset for Ozone to practice and am trying to replicate the chart below (that I've created using a pivot table in Excel) with Pandas and matplotlib.
The chart plots an average of each hours Ozone reading for each location across the entire dataset to see if there is one location which is constantly higher than others or if different locations have the highest Ozone levels at different periods throughout the day.
Essentially, I'm looking to plot the hourly average of Ozone for each location.
I've attempted to reshape the data into a multiindex format and then plot, similar to what I'd do in excel before plotting but am unsure if this is the correct way to approach the problem. Code for reshaping is below. I am still getting used to reshaping so not sure if this is the correct use/I am approaching the problem in the correct way and open to other methods to accomplish this task. Any assistance to accomplish this task would be much appreciated!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
data['Date'] = pd.to_datetime(data['ReadingDateTime']).dt.date
data['Time'] = pd.to_datetime(data['ReadingDateTime']).dt.time
data.set_index(['Date', 'Time'], inplace = True)
hourly_dataframe = data.pivot_table(columns = 'Site', values = 'Value', index = ['Date', 'Time'])
hourly_dataframe.fillna(method = 'ffill', inplace = True)
hourly_dataframe[hourly_dataframe < 0] = 0
I have gone to the site and downloaded a 24 hour reading for the following;
data.Site.unique()
array(['BX1', 'TH4', 'BT4', 'HI0', 'BL0', 'RD0'], dtype=object)
I adopted your code to this point:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('/Users/xx/Downloads/LaqnData.csv')
data['ReadingDateTime'] = pd.to_datetime(data['ReadingDateTime'])
I then use datetime index to call each hour in the groupby function.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index()`#Convert to dataframe.`
To plot, I chain unstack to the groupby function and plot directly.
data.groupby([data.index.hour, data['Site']])['Value'].mean().reset_index#unstack().plot()
plt.xlabel('Hour of the day')
plt.ylabel('Ozone')
plt.title('Avarage Hourly comparison')
plt.legend()`# If you want the legend to appear in default location`
If fussed about legend location, this post explains it very well. In your case;
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),
fancybox=True, shadow=True, ncol=6)

PathCollection' object has no attribute legend_elements''

I know this exact question has been asked here, however the current solution does nothing for me. I can't seem to generate a legend that has a different color for each label. I have tried the current documentation on Matplotlib to no avail. I keep getting the error that my PathCollection object has no attribute legend_elements
EDIT: Also, I want my legend to be just the Years, unique years for the plot not how it is right now with is that each data point is mapped to my legend.
Here's what I have
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.pyplot import legend
import os
%config InlineBackend.figure_format = 'retina'
path = None
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
path = os.path.join(dirname, filename)
# Indexes to be removed
early_demo_dividend = 13
high_income = 24
lower_middle_income = 40
north_america = 46
members = 50
post_demo = 56
_removals = [early_demo_dividend, high_income, lower_middle_income, north_america, members, post_demo]
#Read in data
df = pd.read_csv(path)
#Get the rows we want
df = df.loc[df['1960'] > 1]
df = df.drop(columns=["Code", "Type", "Indicator Name"])
#Remove the odd rows
for i in _removals:
df = df.drop(df.index[i])
#Format the dataframe
df = df.melt('Name', var_name='Year', value_name='Budget')
#Plot setup
plt.figure().set_size_inches(16,6)
plt.xticks(rotation=90)
plt.grid(True)
#Plot labels
plt.title('Military Spending of Countries')
plt.xlabel('Countries')
plt.ylabel('Budget in Billions')
#Plot data
new_year = df['Year'].astype(int)
scatter = plt.scatter(df['Name'], df['Budget'], c=(new_year / 10000) , label=new_year)
#Legend setup produce a legend with the unique colors from the scatter
legend1 = plt.legend(*scatter.legend_elements(),
loc="lower left", title="Years")
plt.add_artist(legend1)
plt.show()
Heres my plot
I also encountered this problem.
Try to upgrade your matplotlib with pip3 install --upgrade matplotlib
Uninstalling matplotlib-3.0.3:
Successfully uninstalled matplotlib-3.0.3
Successfully installed matplotlib-3.1.2
It works for me.
Despite the fact that my answer may not be relevant to the current question, I decided to leave it to describe my case - it might be useful to someone else:
When using matplotlib functions such as scatter or plot, incorrectly specify the name of some additional arguments, you can get the same error.
Example:
x = list(range(10))
y = list(range(10))
plt.scatter(x, y, labels='RESULT')
I get the error:
AttributeError: 'PathCollection' object has no property 'labels'
As it said in error message (but it is not obvious to an inattentive developer :) ):
the problem that I use labels instead of label

Reindexing A Pandas Data Frame gives only Nans

I wanted to do the tutorial Python for Finance, Part I: Yahoo & Google Finance API, pandas, and matplotlib. As the Google API is not working properly, I used IEX data. When I try to reindex the pandas data Frame, to have consecutive dates, all values that were available before are replaced with NaN. The new index I think has something to do with the two different types of the indexes. I am quite new to programming and have no clue how to solve this problem after reading the pandas documentation.
The code looks like this.
Modules used
matplotlib==2.2.3
pandas_datareader==0.6.0
googlefinance.client==1.3.0
pandas==0.23.4
googlefinance==0.7
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from datetime import datetime
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
#tickers = ["AAPL","MSFT","SPY"]
# We would like all available data from 01/01/2017 until 9/1/2018.
start_date = datetime(2017,1,1)
end_date = datetime(2018,9,1)
# User pandas_reader.data.DataReader to load the desired data.
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
#ms_data = pdr.DataReader('SPY','iex',start_date,end_date,pause = 5)
print(aapl_data.tail(9))
#print(ms_data.tail(9))
print(aapl_data.index.dtype)
close = aapl_data
# Getting all weekdays between start_date and end_date
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
print(all_weekdays.dtype)
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)
# Reindexing will insert missing values (NaN) for the dates that were not present
print(close)
You need convert both the index to datetime dtype.
Use pd.to_datetime : Docs
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
aapl_data.index = pd.to_datetime(aapl_data.index) # converts index to datetime dtype
close = aapl_data
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
close = close.reindex(all_weekdays)
print(aapl_data.index.dtype)
# Output:
datetime64[ns]
print(all_weekdays.dtype)
# Output:
datetime64[ns]

Resources