Why can't seaborn.pairplot finish drawing this plot? - python-3.x

I have a dataframe central
Then I want to plot the pairwise relationships between the columns with sns.pairplot(central). Could you please explain why the process just runs forever? I tried on both my laptop and Colab, but the problem persists.
import urllib3
%matplotlib inline
%config InlineBackend.figure_format = 'svg' # Change the image format to svg for better quality
import networkx as nx
import pandas as pd
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
## Import dataset
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
open('airports.net', 'wb').write(f.data)
G = nx.read_pajek('airports.net', encoding = 'UTF-8')
G = nx.DiGraph(G)
## Compute measures of centrality
degree_central = nx.degree_centrality(G)
closeness_central = nx.closeness_centrality(G)
eigen_central = nx.eigenvector_centrality_numpy(G, max_iter = 200)
katz_central = nx.katz_centrality_numpy(G)
between_central = nx.betweenness_centrality(G)
pagerank = nx.pagerank_numpy(G)
[hub, authority] = nx.hits(G)
## Create a dataframe using with above calculated centralities
central = pd.DataFrame([degree_central, closeness_central, eigen_central, katz_central, between_central, hub, authority]).T
central.columns = ['degree', 'closeness', 'eigen', 'katz', 'between', 'hub', 'authority']
central
## Plot the pairwise relationships between centralities
sns.pairplot(central)

For reasons unknown to me, the histplot for column eigen_central has a problem determining a reasonable number of bins. The pairplot works with kde plots in the diagonal sns.pairplot(central, diag_kind="kde"), and the histplot for column eigen_central alone also does not work as expected. You can overcome this problem by defining the bin number:
sns.pairplot(central, diag_kws = {"bins": 10})
Output:
I will upvote any answer that can provide a reason why seaborn has problems defining the bins. This problem is seaborn-specific as plt.hist(central.eigen) works as expected but not sns.histplot(central.eigen).

Related

Get Seaborn legend location

I want to add comments under my legend. Here is a sample code doing what I want:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
hardlocy = 0.92
xmargin=0.02
xmin,xmax = ax.get_xlim()
xtxt=xmax-(xmax-xmin)*xmargin
leg = ax.legend()
plt.text(xtxt,hardlocy,"Comment",
horizontalalignment='right'
);
Result is:
As you can see, I rely on manual position setting, at least for y-axis. I would like to do it automatically.
As per this thread and this one, I have tried to access legend characteristics through p = leg.get_window_extent(), but I have obtain the following error message:
AttributeError: 'NoneType' object has no attribute 'points_to_pixels'
(which is very similar to this closed issue)
I run MacOS Catalina version 10.15.4 and I have performed a successful conda update --all a few minutes ago, without any result.
How can I automatically place my comments?
Thanks to #JohanC, from this question:
One needs to draw a figure for its legend to be worked out. Therefore, a working code here could be:
np.random.seed(0)
df1 = pd.DataFrame(np.random.normal(size=100))
df2 = pd.DataFrame(np.random.uniform(size=100))
fig,ax=plt.subplots()
sns.distplot(df1,ax=ax,label='foo')
sns.distplot(df2,ax=ax,label='bar')
ymargin=0.05
leg = ax.legend()
fig.canvas.draw()
bbox = leg.get_window_extent()
inv = ax.transData.inverted()
(xloc,yloc)=inv.transform((bbox.x1,bbox.y0))
ymin,ymax = ax.get_ylim()
yloc_margin=yloc-(ymax-ymin)*ymargin
ax.text(xloc,yloc_margin,"Comment",horizontalalignment='right')

PathCollection' object has no attribute legend_elements''

I know this exact question has been asked here, however the current solution does nothing for me. I can't seem to generate a legend that has a different color for each label. I have tried the current documentation on Matplotlib to no avail. I keep getting the error that my PathCollection object has no attribute legend_elements
EDIT: Also, I want my legend to be just the Years, unique years for the plot not how it is right now with is that each data point is mapped to my legend.
Here's what I have
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.pyplot import legend
import os
%config InlineBackend.figure_format = 'retina'
path = None
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
path = os.path.join(dirname, filename)
# Indexes to be removed
early_demo_dividend = 13
high_income = 24
lower_middle_income = 40
north_america = 46
members = 50
post_demo = 56
_removals = [early_demo_dividend, high_income, lower_middle_income, north_america, members, post_demo]
#Read in data
df = pd.read_csv(path)
#Get the rows we want
df = df.loc[df['1960'] > 1]
df = df.drop(columns=["Code", "Type", "Indicator Name"])
#Remove the odd rows
for i in _removals:
df = df.drop(df.index[i])
#Format the dataframe
df = df.melt('Name', var_name='Year', value_name='Budget')
#Plot setup
plt.figure().set_size_inches(16,6)
plt.xticks(rotation=90)
plt.grid(True)
#Plot labels
plt.title('Military Spending of Countries')
plt.xlabel('Countries')
plt.ylabel('Budget in Billions')
#Plot data
new_year = df['Year'].astype(int)
scatter = plt.scatter(df['Name'], df['Budget'], c=(new_year / 10000) , label=new_year)
#Legend setup produce a legend with the unique colors from the scatter
legend1 = plt.legend(*scatter.legend_elements(),
loc="lower left", title="Years")
plt.add_artist(legend1)
plt.show()
Heres my plot
I also encountered this problem.
Try to upgrade your matplotlib with pip3 install --upgrade matplotlib
Uninstalling matplotlib-3.0.3:
Successfully uninstalled matplotlib-3.0.3
Successfully installed matplotlib-3.1.2
It works for me.
Despite the fact that my answer may not be relevant to the current question, I decided to leave it to describe my case - it might be useful to someone else:
When using matplotlib functions such as scatter or plot, incorrectly specify the name of some additional arguments, you can get the same error.
Example:
x = list(range(10))
y = list(range(10))
plt.scatter(x, y, labels='RESULT')
I get the error:
AttributeError: 'PathCollection' object has no property 'labels'
As it said in error message (but it is not obvious to an inattentive developer :) ):
the problem that I use labels instead of label

Import PDF Image From MatPlotLib to ReportLab

I am trying to insert a saved PDF image into a ReportLab flowable.
I have seen several answers to similar questions and many involve using Py2PDF like this:
import PyPDF2
import PIL
input1 = PyPDF2.PdfFileReader(open(path+"image.pdf", "rb"))
page0 = input1.getPage(0)
xObject = page0['/Resources']['/XObject'].getObject()
for obj in xObject:
#Do something here
The trouble I'm having is with a sample image I've saved from MatPlotLib as a PDF. When I try to access that saved image with the code above, it returns nothing under page0['/Resources']['/XObject'].
In fact, here's what I see when I look at page0 and /XObject:
'/XObject': {}
Here's the code I used to generate the PDF:
import matplotlib.pyplot as plt
import numpy as np
# Fixing random state for reproducibility
np.random.seed(19680801)
plt.rcdefaults()
fig, ax = plt.subplots()
# Example data
people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
y_pos = np.arange(len(people))
performance = 3 + 10 * np.random.rand(len(people))
error = np.random.rand(len(people))
ax.barh(y_pos, performance, xerr=error, align='center',
color='green', ecolor='black')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Performance')
ax.set_title('How fast do you want to go today?')
plt.savefig(path+'image.pdf',bbox_inches='tight')
Thanks in advance!

Timeserie datetick problems when using pandas.DataFrame.plot method

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Matplotlib.animation.FuncAnimation using pcolormesh

Python 3.5, windows 10 Pro.
I'm trying to continuously plot an 8x8 array of pixels (for the sake of the question I'll just use random data, but in the real thing I'm reading from a serial port).
I can do it using a while loop, but I need to switch over to matplotlib.animation.FuncAnimation and I can't get it to work. I've tried looking at the help files and tried to follow examples from matplotlib.org here, but I've not been able to follow it.
Can someone help me figure out how to continuously plot an 8x8 array of pixels using FuncAnimation and pcolormesh? Here is what I've got so far:
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib import animation
plt.close('all')
y = sp.rand(64).reshape([8,8])
def do_something():
y = sp.rand(64).reshape([8,8])
fig_plot.set_data(y)
return fig_plot,
fig1 = plt.figure(1,facecolor = 'w')
plt.clf()
fig_plot = plt.pcolormesh(y)
fig_ani = animation.FuncAnimation(fig1,do_something)
plt.show()
If you want to see the while loop code, just so you know exactly what I'm trying to reproduce, see below.
import scipy as sp
import matplotlib.pyplot as plt
plt.figure(1)
plt.clf()
while True:
y = sp.rand(64).reshape([8,8])
plt.pcolormesh(y)
plt.show()
plt.pause(.000001)
I was able to find a solution using imshow instead of pcolormesh. In case anyone else is struggling with the same issues I had, I've posted the working code below.
import scipy as sp
import matplotlib.pyplot as plt
import matplotlib.animation as animation
Hz = sp.rand(64).reshape([8,8]) # initalize with random data
fig = plt.figure(1,facecolor='w')
ax = plt.axes()
im = ax.imshow(Hz)
im.set_data(sp.zeros(Hz.shape))
def update_data(n):
Hz = sp.rand(64).reshape([8,8]) # More random data
im.set_data(Hz)
return
ani = animation.FuncAnimation(fig, update_data, interval = 10, blit = False, repeat = False)
fig.show()

Resources