PathCollection' object has no attribute legend_elements'' - python-3.x

I know this exact question has been asked here, however the current solution does nothing for me. I can't seem to generate a legend that has a different color for each label. I have tried the current documentation on Matplotlib to no avail. I keep getting the error that my PathCollection object has no attribute legend_elements
EDIT: Also, I want my legend to be just the Years, unique years for the plot not how it is right now with is that each data point is mapped to my legend.
Here's what I have
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.pyplot import legend
import os
%config InlineBackend.figure_format = 'retina'
path = None
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
path = os.path.join(dirname, filename)
# Indexes to be removed
early_demo_dividend = 13
high_income = 24
lower_middle_income = 40
north_america = 46
members = 50
post_demo = 56
_removals = [early_demo_dividend, high_income, lower_middle_income, north_america, members, post_demo]
#Read in data
df = pd.read_csv(path)
#Get the rows we want
df = df.loc[df['1960'] > 1]
df = df.drop(columns=["Code", "Type", "Indicator Name"])
#Remove the odd rows
for i in _removals:
df = df.drop(df.index[i])
#Format the dataframe
df = df.melt('Name', var_name='Year', value_name='Budget')
#Plot setup
plt.figure().set_size_inches(16,6)
plt.xticks(rotation=90)
plt.grid(True)
#Plot labels
plt.title('Military Spending of Countries')
plt.xlabel('Countries')
plt.ylabel('Budget in Billions')
#Plot data
new_year = df['Year'].astype(int)
scatter = plt.scatter(df['Name'], df['Budget'], c=(new_year / 10000) , label=new_year)
#Legend setup produce a legend with the unique colors from the scatter
legend1 = plt.legend(*scatter.legend_elements(),
loc="lower left", title="Years")
plt.add_artist(legend1)
plt.show()
Heres my plot

I also encountered this problem.
Try to upgrade your matplotlib with pip3 install --upgrade matplotlib
Uninstalling matplotlib-3.0.3:
Successfully uninstalled matplotlib-3.0.3
Successfully installed matplotlib-3.1.2
It works for me.

Despite the fact that my answer may not be relevant to the current question, I decided to leave it to describe my case - it might be useful to someone else:
When using matplotlib functions such as scatter or plot, incorrectly specify the name of some additional arguments, you can get the same error.
Example:
x = list(range(10))
y = list(range(10))
plt.scatter(x, y, labels='RESULT')
I get the error:
AttributeError: 'PathCollection' object has no property 'labels'
As it said in error message (but it is not obvious to an inattentive developer :) ):
the problem that I use labels instead of label

Related

Why can't seaborn.pairplot finish drawing this plot?

I have a dataframe central
Then I want to plot the pairwise relationships between the columns with sns.pairplot(central). Could you please explain why the process just runs forever? I tried on both my laptop and Colab, but the problem persists.
import urllib3
%matplotlib inline
%config InlineBackend.figure_format = 'svg' # Change the image format to svg for better quality
import networkx as nx
import pandas as pd
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
## Import dataset
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
open('airports.net', 'wb').write(f.data)
G = nx.read_pajek('airports.net', encoding = 'UTF-8')
G = nx.DiGraph(G)
## Compute measures of centrality
degree_central = nx.degree_centrality(G)
closeness_central = nx.closeness_centrality(G)
eigen_central = nx.eigenvector_centrality_numpy(G, max_iter = 200)
katz_central = nx.katz_centrality_numpy(G)
between_central = nx.betweenness_centrality(G)
pagerank = nx.pagerank_numpy(G)
[hub, authority] = nx.hits(G)
## Create a dataframe using with above calculated centralities
central = pd.DataFrame([degree_central, closeness_central, eigen_central, katz_central, between_central, hub, authority]).T
central.columns = ['degree', 'closeness', 'eigen', 'katz', 'between', 'hub', 'authority']
central
## Plot the pairwise relationships between centralities
sns.pairplot(central)
For reasons unknown to me, the histplot for column eigen_central has a problem determining a reasonable number of bins. The pairplot works with kde plots in the diagonal sns.pairplot(central, diag_kind="kde"), and the histplot for column eigen_central alone also does not work as expected. You can overcome this problem by defining the bin number:
sns.pairplot(central, diag_kws = {"bins": 10})
Output:
I will upvote any answer that can provide a reason why seaborn has problems defining the bins. This problem is seaborn-specific as plt.hist(central.eigen) works as expected but not sns.histplot(central.eigen).

X and Y label being cut in matplotlib plots

I have this code:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
start = datetime.date(2016,1,1)
end = datetime.date.today()
stock = 'fb'
fig = plt.figure(dpi=1400)
data = web.DataReader(stock, 'yahoo', start, end)
fig, ax = plt.subplots(dpi=720)
data['vol_pct'] = data['Volume'].pct_change()
data.plot(y='vol_pct', ax = plt.gca(), title = 'this is the title \n second line')
ax.set(xlabel="Date")
ax.legend(loc='upper center', bbox_to_anchor=(0.32, -0.22), shadow=True, ncol=2)
plt.savefig('Test')
This is an example of another code but the problem is the same:
At bottom of the plot you can see that the legend is being cut out. In another plot of a different code which i am working on, even the ylabel is also cut when i save the plot using plt.savefig('Test').How can i can fix this?
It's a long-standing issue with .savefig() that it doesn't check legend and axis locations before setting bounds. As a rule, I solve this with the bbox_inches argument:
plt.savefig('Test', bbox_inches='tight')
This is similar to calling plt.tight_layout(), but takes all of the relevant artists into account, whereas tight_layout will often pull some objects into frame while cutting off new ones.
I have to tell pyplot to keep it tight more than half the time, so I'm not sure why this isn't the default behavior.
plt.subplots_adjust(bottom=0.4 ......)
I think this modification will satisfy you.
Or maybe you can relocate the legend to loc="upper left"
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
please also checked this issue which raised 8 years ago..
Moving matplotlib legend outside of the axis makes it cutoff by the figure box

Get Seaborn legend location

I want to add comments under my legend. Here is a sample code doing what I want:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
hardlocy = 0.92
xmargin=0.02
xmin,xmax = ax.get_xlim()
xtxt=xmax-(xmax-xmin)*xmargin
leg = ax.legend()
plt.text(xtxt,hardlocy,"Comment",
horizontalalignment='right'
);
Result is:
As you can see, I rely on manual position setting, at least for y-axis. I would like to do it automatically.
As per this thread and this one, I have tried to access legend characteristics through p = leg.get_window_extent(), but I have obtain the following error message:
AttributeError: 'NoneType' object has no attribute 'points_to_pixels'
(which is very similar to this closed issue)
I run MacOS Catalina version 10.15.4 and I have performed a successful conda update --all a few minutes ago, without any result.
How can I automatically place my comments?
Thanks to #JohanC, from this question:
One needs to draw a figure for its legend to be worked out. Therefore, a working code here could be:
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
ymargin=0.05
leg = ax.legend()
fig.canvas.draw()
bbox = leg.get_window_extent()
inv = ax.transData.inverted()
(xloc,yloc)=inv.transform((bbox.x1,bbox.y0))
ymin,ymax = ax.get_ylim()
yloc_margin=yloc-(ymax-ymin)*ymargin
ax.text(xloc,yloc_margin,"Comment",horizontalalignment='right')

Timeserie datetick problems when using pandas.DataFrame.plot method

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Errorbar plot for pandas DataFrame shows weird items in the legend

I use the latest version of the Anaconda package for Python 3.
python: v3.4.3 (Anaconda 2.4.1)
pandas: v0.17.1
matplotlib: v1.5.0
I encountered a trouble, when I tried to plot data with errorbars mounted in pandas.DataFrame using matplotlib. Although data and errorbars were plotted correctly, an additional weird item whose name is a column name of y-axis data was added to the legend.
Here, I show a simple code demonstrating this weird behavior. Would you tell me how to remove this additional weird item in the legend?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create test data: here, y = 2x + e
x = np.linspace(0,1,20)
y = 2*x + np.random.normal(size=20)
yerr = np.zeros(20)
yerr[:] = 1
# put data into DataFrame
data = pd.DataFrame()
data["x"] = x
data["y"] = y
data["yerr"] = yerr
# plot test data
plt.errorbar(data["x"],data["y"],data["yerr"],
ls="None",marker="o",label="test")
plt.legend(frameon=False,
numpoints=1,
loc="upper left")
plt.xlim(-0.05,1.05)
plt.show()
This code provides following figure in my python environment. You can see that there is an additional item "y" in the legend, which I'd like to remove.
Output of the above sample code
I found a solution just after posting this question; similar question was asked for pandas.Series. This can be solved by specifying barsabove=True in pyplot.errorbar() as follows.
# plot test data
plt.errorbar(data["x"],data["y"],data["yerr"],
barsabove=True,
ls="None",marker="o",label="test")
This modification provides the following image.
Output of the modified code

Resources