I have a data frame with 36 columns. I want to plot histograms for each feature in one go (6x6) using seaborn. Basically reproducing df.hist() but with seaborn. My code below shows the plot for only the first feature and all other come empty.
Test dataframe:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 36)), columns=range(0,36))
My code:
import seaborn as sns
# plot
f, axes = plt.subplots(6, 6, figsize=(20, 20), sharex=True)
for feature in df.columns:
sns.distplot(df[feature] , color="skyblue", ax=axes[0, 0])

I guess it would make sense to loop over the axes and features simultaneously.
f, axes = plt.subplots(6, 6, figsize=(20, 20), sharex=True)
for ax, feature in zip(axes.flat, df.columns):
sns.distplot(df[feature] , color="skyblue", ax=ax)
Numpy arrays are flattened by row-wise, i.e. you would get the first 6 features in the first row, the features 6 to 11 in the second row etc.
If this is not what you want, you can define the index for the axes array manually,
f, axes = plt.subplots(6, 6, figsize=(20, 20), sharex=True)
for i, feature in enumerate(df.columns):
sns.distplot(df[feature] , color="skyblue", ax=axes[i%6, i//6])
e.g. the above will fill the subplots column by column.


Using a nested for loop in subplots [duplicate]

I am a little confused about how this code works:
fig, axes = plt.subplots(nrows=2, ncols=2)
How does the fig, axes work in this case? What does it do?
Also why wouldn't this work to do the same thing:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
There are several ways to do it. The subplots method creates the figure along with the subplots that are then stored in the ax array. For example:
import matplotlib.pyplot as plt
x = range(10)
y = range(10)
fig, ax = plt.subplots(nrows=2, ncols=2)
for row in ax:
for col in row:
col.plot(x, y)
However, something like this will also work, it's not so "clean" though since you are creating a figure with subplots and then add on top of them:
fig = plt.figure()
plt.subplot(2, 2, 1)
plt.plot(x, y)
plt.subplot(2, 2, 2)
plt.plot(x, y)
plt.subplot(2, 2, 3)
plt.plot(x, y)
plt.subplot(2, 2, 4)
plt.plot(x, y)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 2)
ax[0, 0].plot(range(10), 'r') #row=0, col=0
ax[1, 0].plot(range(10), 'b') #row=1, col=0
ax[0, 1].plot(range(10), 'g') #row=0, col=1
ax[1, 1].plot(range(10), 'k') #row=1, col=1
You can also unpack the axes in the subplots call
And set whether you want to share the x and y axes between the subplots
Like this:
import matplotlib.pyplot as plt
# fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
ax1, ax2, ax3, ax4 = axes.flatten()
ax1.plot(range(10), 'r')
ax2.plot(range(10), 'b')
ax3.plot(range(10), 'g')
ax4.plot(range(10), 'k')
You might be interested in the fact that as of matplotlib version 2.1 the second code from the question works fine as well.
From the change log:
Figure class now has subplots method
The Figure class now has a subplots() method which behaves the same as pyplot.subplots() but on an existing figure.
import matplotlib.pyplot as plt
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
Read the documentation: matplotlib.pyplot.subplots
pyplot.subplots() returns a tuple fig, ax which is unpacked in two variables using the notation
fig, axes = plt.subplots(nrows=2, ncols=2)
The code:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
does not work because subplots() is a function in pyplot not a member of the object Figure.
Iterating through all subplots sequentially:
fig, axes = plt.subplots(nrows, ncols)
for ax in axes.flatten():
Accessing a specific index:
for row in range(nrows):
for col in range(ncols):
axes[row,col].plot(x[row], y[col])
Subplots with pandas
This answer is for subplots with pandas, which uses matplotlib as the default plotting backend.
Here are four options to create subplots starting with a pandas.DataFrame
Implementation 1. and 2. are for the data in a wide format, creating subplots for each column.
Implementation 3. and 4. are for data in a long format, creating subplots for each unique value in a column.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
Imports and Data
import seaborn as sns # data only
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# wide dataframe
df = sns.load_dataset('planets').iloc[:, 2:5]
orbital_period mass distance
0 269.300 7.10 77.40
1 874.774 2.21 56.95
2 763.000 2.60 19.84
3 326.030 19.40 110.62
4 516.220 10.50 119.47
# long dataframe
dfm = sns.load_dataset('planets').iloc[:, 2:5].melt()
variable value
0 orbital_period 269.300
1 orbital_period 874.774
2 orbital_period 763.000
3 orbital_period 326.030
4 orbital_period 516.220
1. subplots=True and layout, for each column
Use the parameters subplots=True and layout=(rows, cols) in pandas.DataFrame.plot
This example uses kind='density', but there are different options for kind, and this applies to them all. Without specifying kind, a line plot is the default.
ax is array of AxesSubplot returned by pandas.DataFrame.plot
See How to get a Figure object, if needed.
How to save pandas subplots
axes = df.plot(kind='density', subplots=True, layout=(2, 2), sharex=False, figsize=(10, 6))
# extract the figure object; only used for tight_layout in this example
fig = axes[0][0].get_figure()
# set the individual titles
for ax, title in zip(axes.ravel(), df.columns):
2. plt.subplots, for each column
Create an array of Axes with matplotlib.pyplot.subplots and then pass axes[i, j] or axes[n] to the ax parameter.
This option uses pandas.DataFrame.plot, but can use other axes level plot calls as a substitute (e.g. sns.kdeplot, plt.plot, etc.)
It's easiest to collapse the subplot array of Axes into one dimension with .ravel or .flatten. See .ravel vs .flatten.
Any variables applying to each axes, that need to be iterate through, are combined with .zip (e.g. cols, axes, colors, palette, etc.). Each object must be the same length.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
cols = df.columns # create a list of dataframe columns to use
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for col, color, ax in zip(cols, colors, axes):
df[col].plot(kind='density', ax=ax, color=color, label=col, title=col)
fig.delaxes(axes[3]) # delete the empty subplot
Result for 1. and 2.
3. plt.subplots, for each group in .groupby
This is similar to 2., except it zips color and axes to a .groupby object.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
dfg = dfm.groupby('variable') # get data for each unique value in the first column
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for (group, data), color, ax in zip(dfg, colors, axes):
data.plot(kind='density', ax=ax, color=color, title=group, legend=False)
fig.delaxes(axes[3]) # delete the empty subplot
4. seaborn figure-level plot
Use a seaborn figure-level plot, and use the col or row parameter. seaborn is a high-level API for matplotlib. See seaborn: API reference
p = sns.displot(data=dfm, kind='kde', col='variable', col_wrap=2, x='value', hue='variable',
facet_kws={'sharey': False, 'sharex': False}, height=3.5, aspect=1.75)
sns.move_legend(p, "upper left", bbox_to_anchor=(.55, .45))
Convert the axes array to 1D
Generating subplots with plt.subplots(nrows, ncols), where both nrows and ncols is greater than 1, returns a nested array of <AxesSubplot:> objects.
It’s not necessary to flatten axes in cases where either nrows=1 or ncols=1, because axes will already be 1 dimensional, which is a result of the default parameter squeeze=True
The easiest way to access the objects, is to convert the array to 1 dimension with .ravel(), .flatten(), or .flat.
.ravel vs. .flatten
flatten always returns a copy.
ravel returns a view of the original array whenever possible.
Once the array of axes is converted to 1-d, there are a number of ways to plot.
This answer is relevant to seaborn axes-level plots, which have the ax= parameter (e.g. sns.barplot(…, ax=ax[0]).
seaborn is a high-level API for matplotlib. See Figure-level vs. axes-level functions and seaborn is not plotting within defined subplots
import matplotlib.pyplot as plt
import numpy as np # sample data only
# example of data
rads = np.arange(0, 2*np.pi, 0.01)
y_data = np.array([np.sin(t*rads) for t in range(1, 5)])
x_data = [rads, rads, rads, rads]
# Generate figure and its subplots
fig, axes = plt.subplots(nrows=2, ncols=2)
# axes before
array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)
# convert the array to 1 dimension
axes = axes.ravel()
# axes after
array([<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
Iterate through the flattened array
If there are more subplots than data, this will result in IndexError: list index out of range
Try option 3. instead, or select a subset of the axes (e.g. axes[:-2])
for i, ax in enumerate(axes):
ax.plot(x_data[i], y_data[i])
Access each axes by index
axes[0].plot(x_data[0], y_data[0])
axes[1].plot(x_data[1], y_data[1])
axes[2].plot(x_data[2], y_data[2])
axes[3].plot(x_data[3], y_data[3])
Index the data and axes
for i in range(len(x_data)):
axes[i].plot(x_data[i], y_data[i])
zip the axes and data together and then iterate through the list of tuples.
for ax, x, y in zip(axes, x_data, y_data):
ax.plot(x, y)
An option is to assign each axes to a variable, fig, (ax1, ax2, ax3) = plt.subplots(1, 3). However, as written, this only works in cases with either nrows=1 or ncols=1. This is based on the shape of the array returned by plt.subplots, and quickly becomes cumbersome.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) for a 2 x 2 array.
This option is most useful for two subplots (e.g.: fig, (ax1, ax2) = plt.subplots(1, 2) or fig, (ax1, ax2) = plt.subplots(2, 1)). For more subplots, it's more efficient to flatten and iterate through the array of axes.
You could use the following:
import numpy as np
import matplotlib.pyplot as plt
fig, _ = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(fig.axes):
ax.plot(np.sin(np.linspace(0,2*np.pi,100) + np.pi/2*i))
Or alternatively, using the second variable that plt.subplot returns:
fig, ax_mat = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(ax_mat.flatten()):
ax_mat is a matrix of the axes. It's shape is nrows x ncols.
here is a simple solution
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=False)
for sp in fig.axes:
Go with the following if you really want to use a loop:
def plot(data):
fig = plt.figure(figsize=(100, 100))
for idx, k in enumerate(data.keys(), 1):
x, y = data[k].keys(), data[k].values
plt.subplot(63, 10, idx), y)
Another concise solution is:
// set up structure of plots
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,10))
// for plot 1
ax1.set_title('Title A')
ax1.plot(x, y)
// for plot 2
ax2.set_title('Title B')
ax2.plot(x, y)
// for plot 3
ax3.set_title('Title C')

Multiple figures with subplots from a dataframe

I have a dataframe for which I am plotting the sorted values from the columns as a line and then plotting and labeling various percentiles along that line.
I would like to have 12 subplots per figure and as many figures as I need depending on the number of columns (which will vary on my real datasets).
Here is a version of my script with a simple dataframe. This example produces 2 figures, each with 12 subplots. There are only 17 entries in the dataframe, so one figure has 7 emply subplots. This is the result that I need, however, the script is not elegant and not effiecient.
I am learning python and tend to revert back to old scripting habits when I can't figure out the efficient python way. Could someone show me how to produce the same figures with more elegant python? If I don't have 7 empty subplots, that fine too. I've tried various combinations. I think I'm getting tripped up on not understanding the inner workings of pandas versus numpy. I have a pandas Dataframe, but am manipulating the columns with numpy functions. In my trials I had many errors when attempting to apply the np.sort to the dataframe columns, which is why I reverted back to taking a single column at a time out of the dataframe to manipulate and then plot.
df = pd.DataFrame(np.random.randint(0,100,size=(15, 17)), columns=list('ABCDEFGHIJKLMNOPQ'))
p = np.array([0.0, 25.0, 50.0, 75.0, 90.0, 95.0, 99.0, 100.0])
fig, axs = plt.subplots(4,3, figsize=(8.5, 11), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for f in df.columns:
perc = np.nanpercentile(d, p)
if i >12:
fig, axs = plt.subplots(4,3, figsize=(8.5, 11), facecolor='w', edgecolor='k')
plt.subplot(4, 3, i)
plt.plot((len(df[f])-1) * p/100., perc, 'ro')
plt.xticks((len(df[f])-1)* p/100., map(str, p))
A slightly different approach
create all figures and axes upfront and flatten the array of axes
then use more unto date Matplotlib API to plot against axis
have used pandas instead of numpy as I found it simpler to define x-co-ordinates from index of series rather than an array
the x-axis is still somewhat ugly even after rotation as you want ticks that are close together for 90+ percentiles
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
df = pd.DataFrame(np.random.randint(0,100,size=(15, 17)), columns=list('ABCDEFGHIJKLMNOPQ'))
p = np.array([0.0, 25.0, 50.0, 75.0, 90.0, 95.0, 99.0, 100.0])
# create all the figures and axis that are going to be used...
axs = np.array([])
for _ in range(math.ceil(len(df.columns)/12)):
fig, ax_t = plt.subplots(4,3, figsize=(8.5, 11), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace=0.4, wspace=0.4)
axs = np.concatenate((axs, np.array(ax_t).flatten()))
for i, ax in enumerate(np.array(axs)):
# NB will have created empty axis where there is no column
if i==len(df.columns): break
d = df.loc[:,df.columns[i]].sort_values().reset_index(drop=True)
ax.plot(d.index/(len(df)-1), d)
q = d.quantile(p/100)
ax.plot(q, "ro")
ax.tick_params(axis='x', labelrotation = 90)
# annotate 0th, 50th and 100th percentiles
for a in q.loc[[0,.5,1]].index:
ax.annotate(round(q[a],1), xy=(a, q[a]), xycoords='data', xytext=(3, 3), textcoords='offset points',)

How do you add a legend to a 3D scatter plot in matplotlib when using a DataFrame?

I am trying to create a legend for my 3D plot. I do not completely understand the use of a handle when it comes to making a legend.
I have followed two previously posted questions Matplotlib: Annotating a 3D scatter plot and Matplotlib scatter plot legend
What I do not understand is how to recreate their workflow when using a Dataframe.
#Creating the Graph:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# %matplotlib notebook
threedee = plt.figure(figsize = (10,10)).gca(projection='3d')
c = y["y_num"] )
threedee.set_xlabel('PC1', fontsize = 12)
threedee.set_ylabel('PC2', fontsize = 12)
threedee.set_zlabel('PC3', fontsize = 12)
#My data:
[df_pca_test] # Contains three columns of PCA results
y[y_num] # Contains my labels in numerical format (1,2,3,4,0)
#First thing I tried:
plt.legend() # Returned: No handles with labels found to put in legend
#Second thing I tried:
plt.legend((1, 2, 3, 4, 0),
('A', 'B', 'C', 'Test', 'G'),
loc='lower left',
# Returned: No handles with labels found to put in legend
Ideally, I would like my graph to have a legend that has 'A', 'B', 'C', 'Test', and 'G' instead of y_num which has the numbers 1, 2, 3, 4, 0.

How to subplot two alternate x scales and two alternate y scales for more than one subplot?

I am trying to make a 2x2 subplot, with each of the inner subplots consisting of two x axes and two y axes; the first xy correspond to a linear scale and the second xy correspond to a logarithmic scale. Before assuming this question has been asked before, the matplotlib docs and examples show how to do multiple scales for either x or y but not both. This post on stackoverflow is the closest thing to my question, and I have attempted to use this idea to implement what I want. My attempt is below.
Firstly, we initialize data, ticks, and ticklabels. The idea is that the alternate scaling will have the same tick positions with altered ticklabels to reflect the alternate scaling.
import numpy as np
import matplotlib.pyplot as plt
# xy data (global)
X = np.linspace(5, 13, 9, dtype=int)
Y = np.linspace(7, 12, 9)
# xy ticks for linear scale (global)
dtick = dict(X=X, Y=np.linspace(7, 12, 6, dtype=int))
# xy ticklabels for linear and logarithmic scales (global)
init_xt = 2**dtick['X']
dticklabel = dict(X1=dtick['X'], Y1=dtick['Y']) # linear scale
dticklabel['X2'] = ['{}'.format(init_xt[idx]) if idx % 2 == 0 else '' for idx in range(len(init_xt))] # log_2 scale
dticklabel['Y2'] = 2**dticklabel['Y1'] # log_2 scale
Borrowing from the linked SO post, I will plot the same thing in each of the 4 subplots. Since similar methods are used for both scalings in each subplot, the method is thrown into a for-loop. But we need the row number, column number, and plot number for each.
# 2x2 subplot
# fig.add_subplot(row, col, pnum); corresponding iterables = (irows, icols, iplts)
irows = (1, 1, 2, 2)
icols = (1, 2, 1, 2)
iplts = (1, 2, 1, 2)
ncolors = ('red', 'blue', 'green', 'black')
Putting all of this together, the function to output the plot is below:
def initialize_figure(irows, icols, iplts, ncolors, figsize=None):
""" """
fig = plt.figure(figsize=figsize)
for row, col, pnum, color in zip(irows, icols, iplts, ncolors):
ax1 = fig.add_subplot(row, col, pnum) # linear scale
ax2 = fig.add_subplot(row, col, pnum, frame_on=False) # logarithmic scale ticklabels
ax1.plot(X, Y, '-', color=color)
# ticks in same positions
for ax in (ax1, ax2):
# remove xaxis xtick_labels and labels from top row
if row == 1:
ax2.set_xlabel('X2', color='gray')
# initialize xaxis xtick_labels and labels for bottom row
ax1.set_xlabel('X1', color='black')
# linear scale on left
if col == 1:
ax1.set_ylabel('Y1', color='black')
# logarithmic scale on right
ax2.set_ylabel('Y2', color='black')
ax1.tick_params(axis='x', colors='black')
ax1.tick_params(axis='y', colors='black')
ax2.tick_params(axis='x', colors='gray')
ax2.tick_params(axis='y', colors='gray')
for ax in (ax1, ax2):
ax.set_xlim([4, 14])
ax.set_ylim([6, 13])
Calling initialize_figure(irows, icols, iplts, ncolors) produces the figure below.
I am applying the same xlim and ylim so I do not understand why the subplots are all different sizes. Also, the axis labels and axis ticklabels are not in the specified positions (since fig.add_subplot(...) indexing starts from 1 instead of 0.
What is my mistake and how can I achieve the desired result?
(In case it isn't clear, I am trying to put the xticklabels and xlabels for the linear scale on the bottom row, the xticklabels and xlabels for the logarithmic scale on the top row, the 'yticklabelsandylabelsfor the linear scale on the left side of the left column, and the 'yticklabels and ylabels for the logarithmic scale on the right side of the right column. The color='black' kwarg corresponds to the linear scale and the color='gray' kwarg corresponds to the logarithmic scale.)
The irows and icols lists inn the code do not serve any purpose. To create 4 subplots in a 2x2 grid you would loop over the range(1,5),
for pnum in range(1,5):
ax1 = fig.add_subplot(2, 2, pnum)
This might not be the only problem in the code, but as long as the subplots aren't created correctly it's not worth looking further down.

Second y-axis and overlapping labeling?

I am using python for a simple time-series analysis of calory intake. I am plotting the time series and the rolling mean/std over time. It looks like this:
Here is how I do it:
## packages & libraries
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from pandas import Series, DataFrame, Panel
## import data and set time series structure
data = pd.read_csv('time_series_calories.csv', parse_dates={'dates': ['year','month','day']}, index_col=0)
## check ts for stationarity
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = pd.rolling_mean(timeseries, window=14)
rolstd = pd.rolling_std(timeseries, window=14)
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.title('Rolling Mean & Standard Deviation')
The plot doesn't look good - since the rolling std distorts the scale of variation and the x-axis labelling is screwed up. I have two question: (1) How can I plot the rolling std on a secony y-axis? (2) How can I fix the x-axis overlapping labeling?
With your help I managed to get the following:
But do I get the legend sorted out?
1) Making a second (twin) axis can be done with ax2 = ax1.twinx(), see here for an example. Is this what you needed?
2) I believe there are several old answers to this question, i.e. here, here and here. According to the links provided, the easiest way is probably to use either plt.xticks(rotation=70) or plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) or fig.autofmt_xdate().
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
plt.xticks(rotation=70) # Either this
ax.set_xticks([1, 2, 3, 4, 5])
# fig.autofmt_xdate() # or this
# plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) # or this works
Answer to Edit
When sharing lines between different axes into one legend is to create some fake-plots into the axis you want to have the legend as:
ax1.plot(something, 'r--') # one plot into ax1
ax2.plot(something else, 'gx') # another into ax2
# create two empty plots into ax1
ax1.plot([][], 'r--', label='Line 1 from ax1') # empty fake-plot with same lines/markers as first line you want to put in legend
ax1.plot([][], 'gx', label='Line 2 from ax2') # empty fake-plot as line 2
In my silly example it is probably better to label the original plot in ax1, but I hope you get the idea. The important thing is to create the "legend-plots" with the same line and marker settings as the original plots. Note that the fake-plots will not be plotted since there is no data to plot.
