Using Pandas df.boxplot() in subplots - python-3.x

I am trying to create subplots of a column in pandas dataframe grouped by each of the other columns. Here I create and iterate through subplots and attempt to add a boxplot to each one.
fig, axes = plt.subplots(nrows=2, ncols=2) # create 2x2 array of subplots
axes[0,0] = df.boxplot(column='price') # add boxplot to 1st subplot
axes[0,1] = df.boxplot(column='price', by='bedrooms') # add boxplot to 2nd subplot
# etc.
plt.show()
this results in
As you can see, the boxplots are not being added to the subplots. I am not sure where I am going wrong. All the documentation I have found says that [0,0] is the upper left corner, and the boxplots work..
I need to use df.boxplots() specifically.

You should pass axes as argument to plot function:
fig, axes = plt.subplots(nrows=2, ncols=2) # create 2x2 array of subplots
df.boxplot(column='price', ax=axes[0,0]) # add boxplot to 1st subplot
df.boxplot(column='price', by='bedrooms', ax=axes[0,1]) # add boxplot to 2nd subplot
# etc.
plt.show()

Related

How to align heights and widths subplot axes with gridspec and matplotlib?

I am trying to use matplotlib with gridspec to create a subplot such that the axes are arranged to look similar to the figure below; the figure was taken from this unrelated question.
My attempt at recreating this axes arrangement is below. Specifically, my problem is that the axes are not properly aligned. For example, the axis object for the blue histogram is taller than the axis object for the image with various shades of green; the orange histogram seems to properly align in terms of width, but I attribute this to luck. How can I properly align these axes? Unlike the original figure, I would like to add/pad extra empty space between axes such that there borders do not intersect; the slice notation in the code below does this by adding a blank row/column. (In the interest of not making this post longer than it has to be, I did not make the figures "pretty" by playing with axis ticks and the like.)
Unlike the original picture, the axes are not perfectly aligned. Is there a way to do this without using constrained layout? By this, I mean some derivative of fig, ax = plt.subplots(constrained_layout=True)?
The MWE code to recreate my figure is below; note that there was no difference between ax.imshow(...) and ax.matshow(...).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
## initialize figure and axes
fig = plt.figure()
gs = fig.add_gridspec(6, 6, hspace=0.2, wspace=0.2)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
## generate data
x = np.random.normal(loc=50, scale=10, size=100)
y = np.random.normal(loc=500, scale=50, size=100)
## get singular histograms
x_counts, x_edges = np.histogram(x, bins=np.arange(0, 101, 5))
y_counts, y_edges = np.histogram(y, bins=np.arange(0, 1001, 25))
x_mids = (x_edges[1:] + x_edges[:-1]) / 2
y_mids = (y_edges[1:] + y_edges[:-1]) / 2
## get meshed histogram
sample = np.array([x, y]).T
xy_counts, xy_edges = np.histogramdd(sample, bins=(x_edges, y_edges))
## subplot histogram of x
ax_bottom.bar(x_mids, x_counts,
width=np.diff(x_edges),
color='darkorange')
ax_bottom.set_xlim([x_edges[0], x_edges[-1]])
ax_bottom.set_ylim([0, np.max(x_counts)])
## subplot histogram of y
ax_left.bar(y_mids, y_counts,
width=np.diff(y_edges),
color='steelblue')
ax_left.set_xlim([y_edges[0], y_edges[-1]])
ax_left.set_ylim([0, np.max(y_counts)])
## subplot histogram of xy-mesh
ax_big.imshow(xy_counts,
cmap='Greens',
norm=Normalize(vmin=np.min(xy_counts), vmax=np.max(xy_counts)),
interpolation='nearest',
origin='upper')
plt.show()
plt.close(fig)
EDIT:
One can initialize the axes by explicitly setting width_ratios and height_ratios per row/column; this is shown below. This doesn't affect the output, but maybe I'm using it incorrectly?
## initialize figure and axes
fig = plt.figure()
gs = gridspec.GridSpec(ncols=6, nrows=6, figure=fig, width_ratios=[1]*6, height_ratios=[1]*6)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
The problem is with imshow, which resizes the axes automatically to maintain a square pixel aspect.
You can prevent this by calling:
ax_big.imshow(..., aspect='auto')

Using "hue" for a Seaborn visual: how to get legend in one graph?

I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()

matplotlib.pyplot: create a subplot of stored plots

python 3.6 om mac
matplotlib 2.1.0
using matplotlib.pyplot (as plt)
Let's say i have a few plt.figures() that i appended into a list called figures as objects. When in command line i do: figures[0]it produces the plot for the index 0 of the list figures.
However, how can i arrange to have all the plots in figures to be in a subplot.
# Pseudo code:
plt.figure()
for i, fig in enumerate(figures): # figures contains the plots
plt.subplot(2, 2, i+1)
fig # location i+1 of the subplot is filled with the fig plot element
So as a result, i would a 2 by 2 grid that contains each plot found in figures.
hoping this makes sense.
A figure is a figure. You cannot have a figure inside a figure. The usual approach is to create a figure, create one or several subplots, plot something in the subplots.
In case it may happen that you want to plot something in different axes or figures, it might make sense to wrap the plotting in a function which takes the axes as argument.
You could then use this function to plot to an axes of a new figure or to plot to an axes of a figure with many subplots.
import numpy as np
import matplotlib.pyplot as plt
def myplot(ax, data_x, data_y, color="C0"):
ax.plot(data_x, data_y, color=color)
ax.legend()
x = np.linspace(0,10)
y = np.cumsum(np.random.randn(len(x),4), axis=0)
#create 4 figures
for i in range(4):
fig, ax = plt.subplots()
myplot(ax, x, y[:,i], color="C{}".format(i))
# create another figure with each plot as subplot
fig, ax = plt.subplots(2,2)
for i in range(4):
myplot(ax.flatten()[i], x, y[:,i], color="C{}".format(i))
plt.show()

How to prevent from drawing overlapping axis ticks when adding a line to scatter plot?

fig, ax = plt.subplots()
ax = fig.add_subplot(111)
ax.scatter(X[1],y)
y_projection = X.dot(theta_after)
ax.plot(X[1], y_projection)
plt.show()
Above is my code. What I'm trying to do is basically fitting a line to the data. I use gradient descent method to find the suitable theta.
The problem I came across is that the code above created two x-axis and y-axis and that they were overlapping on each other
This is the result generated from the above code. I'm not allowed to embed a pic now, please click on this to open the pic.
X - is a 97*2 matrix in which the first column is all 1.
You are creating an extra Axes with your second line. Just remove the following line:
ax = fig.add_subplot(111)
You already have an Axes when you run fig, ax = plt.subplots()

matplotlib dynamic number of subplot

I am trying to get a subplot using matplotlib, with number of subplots calculated in runtime (as pnum varies in the example below)
pnum = len(args.m)
f, (ax1, ax2) = plt.subplots(pnum, sharex=True, sharey=True)
ax1.plot(x,ptp, "#757578",label="Total")
ax2.fill_between(x,dxyp,facecolor="C0", label="$d_{xy}$")
This example, obviously, only work, when pnum=2. So, I need to do some thing else.
I have checked the accepted answer of this question, but this is plotting same thing in all the plots.
To create a dynamic number of subplots you may decide not to specify the axes individually, but as an axes array
pnum = len(args.m)
fig, ax_arr = plt.subplots(pnum, sharex=True, sharey=True)
ax_arr is then a numpy array.
You can then do something for each axes in a loop.
for ax in ax_arr.flatten():
ax.plot(...)

Resources