Matplotlib how to plot 1 colorbar for four 2d histogram - python-3.x

Before I start I want to say that I've tried follow this and this post on the same problem however they are doing it with imshow heatmaps unlike 2d histogram like I'm doing.
Here is my code(the actual data has been replaced by randomly generated data but the gist is the same):
import matplotlib.pyplot as plt
import numpy as np
def subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles):
fig, a = plt.subplots(2, 2)
a = a.ravel()
for idx, ax in enumerate(a):
image = ax.hist2d(x_data[idx], y_data[idx], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[idx], fontsize=12)
cb = fig.colorbar(image[idx])
cb.set_label("Intensity", rotation=270)
# pad = how big overall pic is
# w_pad = how separate they're left to right
# h_pad = how separate they're top to bottom
plt.tight_layout(pad=-1, w_pad=-10, h_pad=0.5)
x1, y1 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x2, y2 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x3, y3 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x4, y4 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x_data = [x1, x2, x3, x4]
y_data = [y1, y2, y3, y4]
x_labels = ["x1", "x2", "x3", "x4"]
y_labels = ["y1", "y2", "y3", "y4"]
titles = ["1", "2", "3", "4"]
subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles)
And this is what it's generating:
So now my problem is that I could not for the life of me make the colorbar apply for all 4 of the histograms. Also for some reason the bottom right histogram seems to behave weirdly compared with the others. In the links that I've posted their methods don't seem to use a = a.ravel() and I'm only using it here because it's the only way that allows me to plot my 4 histograms as subplots. Help?
Thomas Kuhn your new method actually solved all of my problem until I put my labels down and tried to use plt.tight_layout() to sort out the overlaps. It seems that if I put down the specific parameters in plt.tight_layout(pad=i, w_pad=0, h_pad=0) then the colorbar starts to misbehave. I'll now explain my problem.
I have made some changes to your new method so that it suits what I want, like this
def test_hist_2d(x_data, y_data, x_labels, y_labels, titles):
nrows, ncols = 2, 2
fig, axes = plt.subplots(nrows, ncols, sharex=True, sharey=True)
##produce the actual data and compute the histograms
for (i, j), ax in np.ndenumerate(axes):
H, xedges, yedges = np.histogram2d(x_data[i][j], y_data[i][j], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[i][j], fontsize=12)
##the min and max values of all histograms
vmin = np.min(mappables)
vmax = np.max(mappables)
##second loop for visualisation
for ax, H in zip(axes.ravel(), mappables):
im = ax.imshow(H,vmin=vmin, vmax=vmax, extent=[-2,2,-2,2])
##colorbar using solution from linked question
# plt.tight_layout
# plt.tight_layout(pad=i, w_pad=0, h_pad=0)
Now if I try to generate my data, in this case:
phi, cos_theta = get_angles(runs)
detector_x1, detector_y1, smeared_x1, smeared_y1 = detection_vectorised(1.5, cos_theta, phi)
detector_x2, detector_y2, smeared_x2, smeared_y2 = detection_vectorised(1, cos_theta, phi)
detector_x3, detector_y3, smeared_x3, smeared_y3 = detection_vectorised(0.5, cos_theta, phi)
detector_x4, detector_y4, smeared_x4, smeared_y4 = detection_vectorised(0, cos_theta, phi)
Here detector_x, detector_y, smeared_x, smeared_y are all lists of data point
So now I put them into 2x2 lists so that they can be unpacked suitably by my plotting function, as such:
data_x = [[detector_x1, detector_x2], [detector_x3, detector_x4]]
data_y = [[detector_y1, detector_y2], [detector_y3, detector_y4]]
x_labels = [["x positions(m)", "x positions(m)"], ["x positions(m)", "x positions(m)"]]
y_labels = [["y positions(m)", "y positions(m)"], ["y positions(m)", "y positions(m)"]]
titles = [["0.5m from detector", "1.0m from detector"], ["1.5m from detector", "2.0m from detector"]]
I now run my code with
test_hist_2d(data_x, data_y, x_labels, y_labels, titles)
with just turned on, it gives this:
which is great because data and visual wise, it is exactly what I want i.e. the colormap corresponds to all 4 histograms. However, since the labels are overlapping with the titles, I thought I would just run the same thing but this time with plt.tight_layout(pad=a, w_pad=b, h_pad=c) hoping that I would be able to adjust the overlapping labels problem. However this time it doesn't matter how I change the numbers a, b and c, I always get my colorbar lying on the second column of graphs, like this:
Now changing a only makes the overall subplots bigger or smaller, and the best I could do was to adjust it with plt.tight_layout(pad=-10, w_pad=-15, h_pad=0), which looks like this
So it seems that whatever your new method is doing, it made the whole plot lost its adjustability. Your solution, as wonderful as it is at solving one problem, in return, created another. So what would be the best thing to do here?
Edit 2:
Using fig, axes = plt.subplots(nrows, ncols, sharex=True, sharey=True, constrained_layout=True) along with gives
As you can see there's still a vertical gap between the columns of subplots for which not even using plt.subplots_adjust() can get rid of.

As has been noted in the comments, the biggest problem here is actually to make the colorbar for many histograms meaningful, as ax.hist2d will always scale the histogram data it receives from numpy. It may therefore be best to first calculated the 2d histogram data using numpy and then use again imshow to visualise it. This way, also the solutions of the linked question can be applied. To make the problem with the normalisation more visible, I put some effort into producing some qualitatively different 2d histograms using scipy.stats.multivariate_normal, which shows how the height of the histogram can change quite dramatically even though the number of samples is the same in each figure.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec as gs
from scipy.stats import multivariate_normal
##opening figure and axes
fig, axes = plt.subplots(nrows,ncols)
##generate some random data for the distributions
means = np.random.rand(nrows,ncols,2)
sigmas = np.random.rand(nrows,ncols,2)
thetas = np.random.rand(nrows,ncols)*np.pi*2
##produce the actual data and compute the histograms
for mean,sigma,theta in zip( means.reshape(-1,2), sigmas.reshape(-1,2), thetas.reshape(-1)):
##the data (only cosmetics):
c, s = np.cos(theta), np.sin(theta)
rot = np.array(((c,-s), (s, c)))
cov = rot#np.diag(sigma)#rot.T
rv = multivariate_normal(mean,cov)
data = rv.rvs(size = 10000)
##the 2d histogram from numpy
H,xedges,yedges = np.histogram2d(data[:,0], data[:,1], bins=50, range=[[-2, 2],[-2, 2]])
##the min and max values of all histograms
vmin = np.min(mappables)
vmax = np.max(mappables)
##second loop for visualisation
for ax,H in zip(axes.ravel(),mappables):
im = ax.imshow(H,vmin=vmin, vmax=vmax, extent=[-2,2,-2,2])
##colorbar using solution from linked question
This code produces a figure like this:
Old Answer:
One way to solve your problem is to generate the space for your colorbar explicitly. You can use a GridSpec instance to define how wide your colorbar should be. Below your subplots_hist_2d() function with a few modifications. Note that your use of tight_layout() shifted the colorbar into a funny place, hence the replacement. If you want the plots closer to each other, I'd rather recommend to play with the aspect ratio of the figure.
def subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles):
## fig, a = plt.subplots(2, 2)
fig = plt.figure()
g = gs.GridSpec(nrows=2, ncols=3, width_ratios=[1,1,0.05])
a = [fig.add_subplot(g[n,m]) for n in range(2) for m in range(2)]
cax = fig.add_subplot(g[:,2])
## a = a.ravel()
for idx, ax in enumerate(a):
image = ax.hist2d(x_data[idx], y_data[idx], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[idx], fontsize=12)
## cb = fig.colorbar(image[-1],ax=a)
cb = fig.colorbar(image[-1], cax=cax)
cb.set_label("Intensity", rotation=270)
# pad = how big overall pic is
# w_pad = how separate they're left to right
# h_pad = how separate they're top to bottom
## plt.tight_layout(pad=-1, w_pad=-10, h_pad=0.5)
Using this modified function, I get the following output:


Modify position of colorbar so that extend triangle is above plot

So, I have to make a bunch of contourf plots for different days that need to share colorbar ranges. That was easily made but sometimes it happens that the maximum value for a given date is above the colorbar range and that changes the look of the plot in a way I dont need. The way I want it to treat it when that happens is to add the extend triangle above the "original colorbar". It's clear in the attached picture.
I need the code to run things automatically, right now I only feed the data and the color bar range and it outputs the images, so the fitting of the colorbar in the code needs to be automatic, I can't add padding in numbers because the figure sizes changes depending on the area that is being asked to be plotted.
The reason why I need this behavior is because eventually I would want to make a .gif and I can't have the colorbar to move in that short video. I need for the triangle to be added, when needed, to the top (and below) without messing with the "main" colorbar.
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, BoundaryNorm
from matplotlib import cm
## Finds the appropriate option for variable "extend" in fig colorbar
def find_extend(vmin, vmax, datamin, datamax):
#extend{'neither', 'both', 'min', 'max'}
if datamin >= vmin:
if datamax <= vmax:
if datamax <= vmax:
return extend
### Creating data
z_1=30*abs(np.random.rand(5, 5))
z_2=37*abs(np.random.rand(5, 5))
data={1:z_1, 2:z_2}
## Plot
for day in [1, 2]:
fig = plt.figure(figsize=(4,4))
## Normally figsize=get_figsize(bounds) and bounds is retrieved from gdf.total_bounds
## The function creates the figure size based on the x/y ratio of the bounds
ax = fig.add_subplot(1, 1, 1)
norm=BoundaryNorm(np.linspace(vmin, vmax, nlevels+1), ncolors=colormap.N)
cs=ax.contourf(x, y, z, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax)
extend=find_extend(vmin, vmax, np.nanmin(z), np.nanmax(z))
fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), ax=ax, extend=extend)
You can do something like this: putting a triangle on top of the colorbar manually:
fig, ax = plt.subplots()
pc = ax.pcolormesh(np.random.randn(20, 20))
cb = fig.colorbar(pc)
trixy = np.array([[0, 1], [1, 1], [0.5, 1.05]])
p = mpatches.Polygon(trixy,,
clip_on=False, edgecolor='k', linewidth=0.7,
facecolor='m', zorder=4, snap=True)

How to draw vertical average lines for overlapping histograms in a loop

I'm trying to draw with matplotlib two average vertical line for every overlapping histograms using a loop. I have managed to draw the first one, but I don't know how to draw the second one. I'm using two variables from a dataset to draw the histograms. One variable (feat) is categorical (0 - 1), and the other one (objective) is numerical. The code is the following:
for chas in df[feat].unique():
plt.hist(df.loc[df[feat] == chas, objective], bins = 15, alpha = 0.5, density = True, label = chas)
plt.axvline(df[objective].mean(), linestyle = 'dashed', linewidth = 2)
plt.legend(loc = 'upper right')
I also have to add to the legend the mean and standard deviation values for each histogram.
How can I do it? Thank you in advance.
I recommend you using axes to plot your figure. Pls see code below and the artist tutorial here.
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
mu1, sigma1 = 100, 8
mu2, sigma2 = 150, 15
x1 = mu1 + sigma1 * np.random.randn(10000)
x2 = mu2 + sigma2 * np.random.randn(10000)
fig, ax = plt.subplots(1, 1, figsize=(7.2, 7.2))
# the histogram of the data
lbs = ['a', 'b']
colors = ['r', 'g']
for i, x in enumerate([x1, x2]):
n, bins, patches = ax.hist(x, 50, density=True, facecolor=colors[i], alpha=0.75, label=lbs[i])

How to iterate a list of list for a scatter plot and create a legend of unique elements

I have a list_of_x_and_y_list that contains x and y values which looks like:
[[(44800, 14888), (132000, 12500), (40554, 12900)], [(None, 193788), (101653, 78880), (3866, 160000)]]
I have another data_name_list ["data_a","data_b"] so that
"data_a" = [(44800, 14888), (132000, 12500), (40554, 12900)]
"data_b" = [(None, 193788), (101653, 78880), (3866, 160000)]
The len of list_of_x_and_y_list / or len of data_name_list is > 20.
How can I create a scatter plot for each item (being the same colour) in the data_name_list?
What I have tried:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax = plt.axes(facecolor='#FFFFFF')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
for x_and_y in x_and_y_list,:
x, y = x_and_y
ax.scatter(x, y, label=data_name, color=color) # "label=data_name" creates
# a huge list as a legend!
# :(
plt.title('Matplot scatter plot')
file_name = "3kstc.png"
fig.savefig(file_name, dpi=fig.dpi)
print("Generated: {}".format(file_name))
The Problem:
The legend appears to be a very long list, which I don't know how to rectify:
Relevant Research:
Matplotlib scatterplot
Scatter Plot
Scatter plot in Python using matplotlib
The reason you get a long repeated list as a legend is because you are providing each point as a separate series, as matplotlib does not automatically group your data based on the labels.
A quick fix is to iterate over the list and zip together the x-values and the y-values of each series as two tuples, so that the x tuple contains all the x-values and the y tuple the y-values.
Then you can feed these tuples to the plt.plot method together with the labels.
I felt that the names list_of_x_and_y_list were uneccessary long and complicated, so in my code I've used shorter names.
import matplotlib.pyplot as plt
data_series = [[(44800, 14888), (132000, 12500), (40554, 12900)],
[(None, 193788), (101653, 78880), (3866, 160000)]]
data_names = ["data_a","data_b"]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax = plt.axes(facecolor='#FFFFFF')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
for data, data_name, color in zip(data_series, data_names, colors):
x,y = zip(*data)
ax.scatter(x, y, label=data_name, color=color)
plt.title('Matplot scatter plot')
To only get one entry per data_name, you should add data_name only once as a label. The rest of the calls should go with label=None.
The simplest you can achieve this using the current code, is to set data_name to None at the end of the loop:
from matplotlib import pyplot as plt
from random import randint
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
# create some random data, suppose the sublists have different lengths
list_of_x_and_y_list = [[(randint(1000, 4000), randint(2000, 5000)) for col in range(randint(2, 10))]
for row in range(10)]
data_name_list = list('abcdefghij')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
for x_and_y in x_and_y_list :
x, y = x_and_y
ax.scatter(x, y, label=data_name, color=color)
data_name = None
Some things can be simplified, making the code 'more pythonic', for example:
for x_and_y in x_and_y_list :
x, y = x_and_y
can be written as:
for x, y in x_and_y_list:
Another issue, is that with a lot of data calling scatter for every point could be rather slow. All the x and y belonging to the same list can be plotted together. For example using list comprehension:
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
xs = [x for x, y in x_and_y_list]
ys = [y for x, y in x_and_y_list]
ax.scatter(xs, ys, label=data_name, color=color)
scatter could even get a list of colors per point, but plotting all the points in one go, wouldn't allow for labels per data_name.
Very often, numpy is used to store numerical data. This has some advantages, such as vectorization for quick calculations. With numpy the code would look like:
import numpy as np
for x_and_y_list, data_name, color in zip(list_of_x_and_y_list, data_name_list, colors):
xys = np.array(x_and_y_list)
ax.scatter(xys[:,0], xys[:,1], label=data_name, color=color)

matplotlib, add legend for each line? [duplicate]

TL;DR -> How can one create a legend for a line graph in Matplotlib's PyPlot without creating any extra variables?
Please consider the graphing script below:
if __name__ == '__main__':
PyPlot.plot(total_lengths, sort_times_bubble, 'b-',
total_lengths, sort_times_ins, 'r-',
total_lengths, sort_times_merge_r, 'g+',
total_lengths, sort_times_merge_i, 'p-', )
PyPlot.title("Combined Statistics")
PyPlot.xlabel("Length of list (number)")
PyPlot.ylabel("Time taken (seconds)")
As you can see, this is a very basic use of matplotlib's PyPlot. This ideally generates a graph like the one below:
Nothing special, I know. However, it is unclear what data is being plotted where (I'm trying to plot the data of some sorting algorithms, length against time taken, and I'd like to make sure people know which line is which). Thus, I need a legend, however, taking a look at the following example below(from the official site):
ax = subplot(1,1,1)
p1, = ax.plot([1,2,3], label="line 1")
p2, = ax.plot([3,2,1], label="line 2")
p3, = ax.plot([2,3,1], label="line 3")
handles, labels = ax.get_legend_handles_labels()
# reverse the order
ax.legend(handles[::-1], labels[::-1])
# or sort them by labels
import operator
hl = sorted(zip(handles, labels),
handles2, labels2 = zip(*hl)
ax.legend(handles2, labels2)
You will see that I need to create an extra variable ax. How can I add a legend to my graph without having to create this extra variable and retaining the simplicity of my current script?
Add a label= to each of your plot() calls, and then call legend(loc='upper left').
Consider this sample (tested with Python 3.8.0):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 20, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, "-b", label="sine")
plt.plot(x, y2, "-r", label="cosine")
plt.legend(loc="upper left")
plt.ylim(-1.5, 2.0)
Slightly modified from this tutorial:
You can access the Axes instance (ax) with plt.gca(). In this case, you can use
You can do this either by using the label= keyword in each of your plt.plot() calls or by assigning your labels as a tuple or list within legend, as in this working example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-0.75,1,100)
y0 = np.exp(2 + 3*x - 7*x**3)
y1 = 7-4*np.sin(4*x)
However, if you need to access the Axes instance more that once, I do recommend saving it to the variable ax with
ax = plt.gca()
and then calling ax instead of plt.gca().
Here's an example to help you out ...
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.set_title('ADR vs Rating (CS:GO)')
plt.plot(data[:,0], m*data[:,0] + b,color='red',label='Our Fitting
You can add a custom legend documentation
first = [1, 2, 4, 5, 4]
second = [3, 4, 2, 2, 3]
plt.plot(first, 'g--', second, 'r--')
plt.legend(['First List', 'Second List'], loc='upper left')
A simple plot for sine and cosine curves with a legend.
Used matplotlib.pyplot
import math
import matplotlib.pyplot as plt
for i in range(-314,314):
ysin=[math.sin(i) for i in x]
ycos=[math.cos(i) for i in x]
plt.plot(x,ysin,label='sin(x)') #specify label for the corresponding curve
Add labels to each argument in your plot call corresponding to the series it is graphing, i.e. label = "series 1"
Then simply add Pyplot.legend() to the bottom of your script and the legend will display these labels.

Plotting multiple matplotlib axes class object

My problem is the following: I'm trying to plot in a readable way 6 different design matrix.
The function creating the display for this design matrix is part of the nipy module and is describe as this:
class nipy.modalities.fmri.design_matrix.DesignMatrix
Function show(): Visualization of a design matrix
rescale: bool, optional,
rescale columns magnitude for visualization or not.
ax: axis handle, optional
Handle to axis onto which we will draw design matrix.
cmap: colormap, optional
Matplotlib colormap to use, passed to imshow.
ax: axis handle
Basicly, I'm trying to do a subplot with 3 rows and 2 column with 6 different matrix.
n_scans = 84
tr = 7
hrf_models = ['canonical', 'canonical with derivative', 'fir', 'spm', 'spm_time', 'spm_time_dispersion']
drift_model = 'cosine'
frametimes = np.arange(0, n_scans * tr,tr)
hfcut = 128
fig1 = plt.figure()
ax1 = fig1.add_subplot(3, 2, 1)
hrf_model = hrf_models[0]
design_matrix = make_dmtx(frametimes, paradigm, hrf_model=hrf_model, drift_model=drift_model, hfcut=hfcut)
ax1 =
ax1.set_position([.05, .25, .9, .65])
ax1.set_title('Design matrix with {} as hrf_model'.format(hrf_model))
ax2 = fig1.add_subplot(3, 2, 2)
hrf_model = hrf_models[1]
design_matrix = make_dmtx(frametimes, paradigm, hrf_model=hrf_model, drift_model=drift_model, hfcut=hfcut)
ax2 =
ax2.set_position([.05, .25, .9, .65])
ax2.set_title('Design matrix with {} as hrf_model'.format(hrf_model))
ax6 = fig1.add_subplot(3, 2, 6)
hrf_model = hrf_models[5]
design_matrix = make_dmtx(frametimes, paradigm, hrf_model=hrf_model, drift_model=drift_model, hfcut=hfcut)
ax6 =
ax6.set_position([.05, .25, .9, .65])
ax6.set_title('Design matrix with {} as hrf_model'.format(hrf_model))
Currently the output is a figure of 3 rows and 2 columns with blank graph on it, and then each design matrix displayed individually bellow.
Moreover, a loop over the list hrf_models would be quite better than repeating 6 times the same block. I did it at some point, but the output was exactly the same sadly.
Current ouput (need to scroll to see all the design matrix):
Thanks for the help!
Essentially the excerpt from the docstring you put in the question already tells you the solution. You need to use the ax argument to
ax1 = fig1.add_subplot(3, 2, 1)
design_matrix = make_dmtx(...) = ax1)
To use a loop, you may produce all axes first and then loop over them.
fig, axes = plt.subplots(nrows=3,ncols=2)
for i, ax in enumerate(axes.flatten()):
hrf_model = hrf_models[0]
design_matrix = make_dmtx(frametimes, paradigm, hrf_model=hrf_models[i],
drift_model=drift_model, hfcut=hfcut) = ax)
Note that I haven't tested anything here because I don't have nipy available.
