How to mimic the Draw('same') from ROOT with matplotlib - python-3.x

I have a use case from ROOT that I have not been able to reproduce with matplotlib. Here is a minimal example
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
dist1x = np.random.normal(5, 0.05, 10_000)
dist1y = np.random.normal(5, 0.05, 10_000)
dist2x = np.random.normal(15, 0.05, 10_000)
dist2y = np.random.normal(15, 0.05, 10_000)
ax.hist2d(dist1x, dist1y, bins=100, cmap='viridis')
ax.hist2d(dist2x, dist2y, bins=100, cmap='viridis')
plt.show()
and the output is
With ROOT one can do:
TCanvas *c1 = new TCanvas("c1","c1");
TH1D *h1 = new TH1D("h1","h1",500,-5,5);
h1->FillRandom("gaus");
TH1D *h2 = new TH1D("h2","h2",500,-5,5);
h2->FillRandom("gaus");
h1->Draw();
h2->Draw("SAME");
and the two histograms will share the canvas, axes, etc. Why plotting the two histograms in the same figure only shows the last one? How can I reproduce the ROOT behavior?

I think the intended behavior is to draw the sum of both histograms. You can do this by concatenating the arrays before plotting:
ax.hist2d(np.concatenate([dist1x, dist2x]),
np.concatenate([dist1y, dist2y]),
bins=100, cmap='viridis')
(I've modified the number a bit, to make sure the two blobs overlap.)
The default behavior in ROOT for SAME with TH2F is probably not desirable.
The second histogram is drawn over the other, overwriting the fill color of the bins. The information from the first histogram is discarded in every cell if there is at least one event from the second histogram.
To reproduce this behavior, I'd suggest to use numpy.histogram2d. Set the bins of the first histogram to zero if there are entries in the second one, and then plot the sum of both.
bins = np.linspace(0, 20, 100), np.linspace(0, 20, 100)
hist1, _, _ = np.histogram2d(dist1x, dist1y, bins=bins)
hist2, _, _ = np.histogram2d(dist2x, dist2y, bins=bins)
hist1[hist2 > 0] = 0
sum_hist = hist1 + hist2
plt.pcolormesh(*bins, sum_hist)
If the two histograms don't have any populated bin in common, the two behaviors are identical.

Related

Modify position of colorbar so that extend triangle is above plot

So, I have to make a bunch of contourf plots for different days that need to share colorbar ranges. That was easily made but sometimes it happens that the maximum value for a given date is above the colorbar range and that changes the look of the plot in a way I dont need. The way I want it to treat it when that happens is to add the extend triangle above the "original colorbar". It's clear in the attached picture.
I need the code to run things automatically, right now I only feed the data and the color bar range and it outputs the images, so the fitting of the colorbar in the code needs to be automatic, I can't add padding in numbers because the figure sizes changes depending on the area that is being asked to be plotted.
The reason why I need this behavior is because eventually I would want to make a .gif and I can't have the colorbar to move in that short video. I need for the triangle to be added, when needed, to the top (and below) without messing with the "main" colorbar.
Thanks!
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, BoundaryNorm
from matplotlib import cm
###############
## Finds the appropriate option for variable "extend" in fig colorbar
def find_extend(vmin, vmax, datamin, datamax):
#extend{'neither', 'both', 'min', 'max'}
if datamin >= vmin:
if datamax <= vmax:
extend="neither"
else:
extend="max"
else:
if datamax <= vmax:
extend="min"
else:
extend="both"
return extend
###########
vmin=0
vmax=30
nlevels=8
colormap=cm.get_cmap("rainbow")
### Creating data
z_1=30*abs(np.random.rand(5, 5))
z_2=37*abs(np.random.rand(5, 5))
data={1:z_1, 2:z_2}
x=range(5)
y=range(5)
## Plot
for day in [1, 2]:
fig = plt.figure(figsize=(4,4))
## Normally figsize=get_figsize(bounds) and bounds is retrieved from gdf.total_bounds
## The function creates the figure size based on the x/y ratio of the bounds
ax = fig.add_subplot(1, 1, 1)
norm=BoundaryNorm(np.linspace(vmin, vmax, nlevels+1), ncolors=colormap.N)
z=data[day]
cs=ax.contourf(x, y, z, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax)
extend=find_extend(vmin, vmax, np.nanmin(z), np.nanmax(z))
fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), ax=ax, extend=extend)
plt.close(fig)
You can do something like this: putting a triangle on top of the colorbar manually:
fig, ax = plt.subplots()
pc = ax.pcolormesh(np.random.randn(20, 20))
cb = fig.colorbar(pc)
trixy = np.array([[0, 1], [1, 1], [0.5, 1.05]])
p = mpatches.Polygon(trixy, transform=cb.ax.transAxes,
clip_on=False, edgecolor='k', linewidth=0.7,
facecolor='m', zorder=4, snap=True)
cb.ax.add_patch(p)
plt.show()

How to align twin-axis of datetimes over invisible original axis of floats/ints in imshow?

I would like to show datetimes as ticklabels on the x-axis of a plot via ax.imshow(). I first tried putting the limits (as datetime objects) into extent, but it appears that extent only accepts arguments of type <float/int>. So instead, I would like to create the original plot via ax.imshow(...), then make the x-axis invisible, then add in the correct xticks and xlim.
I found a similar problem solved using a different approach in this example, but I think my use-case is slightly different; I don't need to convert any time-stamps, but I do know the xlim of the data (in terms of datetime objects). Also, I do not think the suggested use of matplotlib.dates.date2num fits my use-case since some of the data is spaced less than one day apart, but date2num uses days as a base-unit.
I am stuck trying to make this work using my alternate approach; a simple mini-example is below.
import numpy as np
import datetime
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
def f(x, y):
return np.sqrt(np.square(x) + np.square(y))
## SAMPLE DATA
x = np.arange(10) ## elapsed minutes
y = np.square(x) ## arbitrary y-values
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
## DATETIMES FOR ALTERNATE AXIS
lower_dt = datetime.datetime(1999, 1, 1, 0, 0, 0)
# upper_dt = datetime.datetime(2001, 10, 31, 0, 0, 0)
upper_dt = datetime.datetime(1999, 1, 1, x.size-1, 0, 0)
## DO PLOT
fig, ax = plt.subplots()
ax.xaxis.set_visible(False)
# ax.xaxis.tick_top()
ax.imshow(
Z,
origin='lower',
cmap='Oranges',
norm=Normalize(vmin=np.nanmin(Z), vmax=np.nanmax(X)),
extent=(x[0], x[-1], y[0], y[-1]))
## CONVERT XTICKLABELS OF X-AXIS TO DATETIME
mirror_ax = ax.twiny()
# mirror_ax = ax.figure.add_subplot(ax.get_subplotspec(), frameon=False)
mirror_ax.set_xlim([lower_dt, upper_dt])
plt.show()
plt.close(fig)
The obtained plot can be seen here:
I notice that the xticks are shown at the top instead of the bottom of the plot - this is unwanted behavior; using ax.tick_top (commented out above) does not change this. Even worse, the x-axis limits are not retained. I realize I could manually change the xticklabels via ax.get_xticks() and ax.set_xticklabels(...), but I would prefer to leave that for date-formatters and date-locators via matplotlib.
How can I use the approach outlined above to create a "mirror/alternate" x-axis of datetime units such that this x-axis is the same size/orientation of the "original/invisible" x-axis of float/integer units?

How to align heights and widths subplot axes with gridspec and matplotlib?

I am trying to use matplotlib with gridspec to create a subplot such that the axes are arranged to look similar to the figure below; the figure was taken from this unrelated question.
My attempt at recreating this axes arrangement is below. Specifically, my problem is that the axes are not properly aligned. For example, the axis object for the blue histogram is taller than the axis object for the image with various shades of green; the orange histogram seems to properly align in terms of width, but I attribute this to luck. How can I properly align these axes? Unlike the original figure, I would like to add/pad extra empty space between axes such that there borders do not intersect; the slice notation in the code below does this by adding a blank row/column. (In the interest of not making this post longer than it has to be, I did not make the figures "pretty" by playing with axis ticks and the like.)
Unlike the original picture, the axes are not perfectly aligned. Is there a way to do this without using constrained layout? By this, I mean some derivative of fig, ax = plt.subplots(constrained_layout=True)?
The MWE code to recreate my figure is below; note that there was no difference between ax.imshow(...) and ax.matshow(...).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
## initialize figure and axes
fig = plt.figure()
gs = fig.add_gridspec(6, 6, hspace=0.2, wspace=0.2)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
## generate data
x = np.random.normal(loc=50, scale=10, size=100)
y = np.random.normal(loc=500, scale=50, size=100)
## get singular histograms
x_counts, x_edges = np.histogram(x, bins=np.arange(0, 101, 5))
y_counts, y_edges = np.histogram(y, bins=np.arange(0, 1001, 25))
x_mids = (x_edges[1:] + x_edges[:-1]) / 2
y_mids = (y_edges[1:] + y_edges[:-1]) / 2
## get meshed histogram
sample = np.array([x, y]).T
xy_counts, xy_edges = np.histogramdd(sample, bins=(x_edges, y_edges))
## subplot histogram of x
ax_bottom.bar(x_mids, x_counts,
width=np.diff(x_edges),
color='darkorange')
ax_bottom.set_xlim([x_edges[0], x_edges[-1]])
ax_bottom.set_ylim([0, np.max(x_counts)])
## subplot histogram of y
ax_left.bar(y_mids, y_counts,
width=np.diff(y_edges),
color='steelblue')
ax_left.set_xlim([y_edges[0], y_edges[-1]])
ax_left.set_ylim([0, np.max(y_counts)])
## subplot histogram of xy-mesh
ax_big.imshow(xy_counts,
cmap='Greens',
norm=Normalize(vmin=np.min(xy_counts), vmax=np.max(xy_counts)),
interpolation='nearest',
origin='upper')
plt.show()
plt.close(fig)
EDIT:
One can initialize the axes by explicitly setting width_ratios and height_ratios per row/column; this is shown below. This doesn't affect the output, but maybe I'm using it incorrectly?
## initialize figure and axes
fig = plt.figure()
gs = gridspec.GridSpec(ncols=6, nrows=6, figure=fig, width_ratios=[1]*6, height_ratios=[1]*6)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
The problem is with imshow, which resizes the axes automatically to maintain a square pixel aspect.
You can prevent this by calling:
ax_big.imshow(..., aspect='auto')

Matplotlib how to plot 1 colorbar for four 2d histogram

Before I start I want to say that I've tried follow this and this post on the same problem however they are doing it with imshow heatmaps unlike 2d histogram like I'm doing.
Here is my code(the actual data has been replaced by randomly generated data but the gist is the same):
import matplotlib.pyplot as plt
import numpy as np
def subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles):
fig, a = plt.subplots(2, 2)
a = a.ravel()
for idx, ax in enumerate(a):
image = ax.hist2d(x_data[idx], y_data[idx], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[idx], fontsize=12)
ax.set_xlabel(x_labels[idx])
ax.set_ylabel(y_labels[idx])
ax.set_aspect("equal")
cb = fig.colorbar(image[idx])
cb.set_label("Intensity", rotation=270)
# pad = how big overall pic is
# w_pad = how separate they're left to right
# h_pad = how separate they're top to bottom
plt.tight_layout(pad=-1, w_pad=-10, h_pad=0.5)
x1, y1 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x2, y2 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x3, y3 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x4, y4 = np.random.uniform(-2, 2, 10000), np.random.uniform(-2, 2, 10000)
x_data = [x1, x2, x3, x4]
y_data = [y1, y2, y3, y4]
x_labels = ["x1", "x2", "x3", "x4"]
y_labels = ["y1", "y2", "y3", "y4"]
titles = ["1", "2", "3", "4"]
subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles)
And this is what it's generating:
So now my problem is that I could not for the life of me make the colorbar apply for all 4 of the histograms. Also for some reason the bottom right histogram seems to behave weirdly compared with the others. In the links that I've posted their methods don't seem to use a = a.ravel() and I'm only using it here because it's the only way that allows me to plot my 4 histograms as subplots. Help?
EDIT:
Thomas Kuhn your new method actually solved all of my problem until I put my labels down and tried to use plt.tight_layout() to sort out the overlaps. It seems that if I put down the specific parameters in plt.tight_layout(pad=i, w_pad=0, h_pad=0) then the colorbar starts to misbehave. I'll now explain my problem.
I have made some changes to your new method so that it suits what I want, like this
def test_hist_2d(x_data, y_data, x_labels, y_labels, titles):
nrows, ncols = 2, 2
fig, axes = plt.subplots(nrows, ncols, sharex=True, sharey=True)
##produce the actual data and compute the histograms
mappables=[]
for (i, j), ax in np.ndenumerate(axes):
H, xedges, yedges = np.histogram2d(x_data[i][j], y_data[i][j], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[i][j], fontsize=12)
ax.set_xlabel(x_labels[i][j])
ax.set_ylabel(y_labels[i][j])
ax.set_aspect("equal")
mappables.append(H)
##the min and max values of all histograms
vmin = np.min(mappables)
vmax = np.max(mappables)
##second loop for visualisation
for ax, H in zip(axes.ravel(), mappables):
im = ax.imshow(H,vmin=vmin, vmax=vmax, extent=[-2,2,-2,2])
##colorbar using solution from linked question
fig.colorbar(im,ax=axes.ravel())
plt.show()
# plt.tight_layout
# plt.tight_layout(pad=i, w_pad=0, h_pad=0)
Now if I try to generate my data, in this case:
phi, cos_theta = get_angles(runs)
detector_x1, detector_y1, smeared_x1, smeared_y1 = detection_vectorised(1.5, cos_theta, phi)
detector_x2, detector_y2, smeared_x2, smeared_y2 = detection_vectorised(1, cos_theta, phi)
detector_x3, detector_y3, smeared_x3, smeared_y3 = detection_vectorised(0.5, cos_theta, phi)
detector_x4, detector_y4, smeared_x4, smeared_y4 = detection_vectorised(0, cos_theta, phi)
Here detector_x, detector_y, smeared_x, smeared_y are all lists of data point
So now I put them into 2x2 lists so that they can be unpacked suitably by my plotting function, as such:
data_x = [[detector_x1, detector_x2], [detector_x3, detector_x4]]
data_y = [[detector_y1, detector_y2], [detector_y3, detector_y4]]
x_labels = [["x positions(m)", "x positions(m)"], ["x positions(m)", "x positions(m)"]]
y_labels = [["y positions(m)", "y positions(m)"], ["y positions(m)", "y positions(m)"]]
titles = [["0.5m from detector", "1.0m from detector"], ["1.5m from detector", "2.0m from detector"]]
I now run my code with
test_hist_2d(data_x, data_y, x_labels, y_labels, titles)
with just plt.show() turned on, it gives this:
which is great because data and visual wise, it is exactly what I want i.e. the colormap corresponds to all 4 histograms. However, since the labels are overlapping with the titles, I thought I would just run the same thing but this time with plt.tight_layout(pad=a, w_pad=b, h_pad=c) hoping that I would be able to adjust the overlapping labels problem. However this time it doesn't matter how I change the numbers a, b and c, I always get my colorbar lying on the second column of graphs, like this:
Now changing a only makes the overall subplots bigger or smaller, and the best I could do was to adjust it with plt.tight_layout(pad=-10, w_pad=-15, h_pad=0), which looks like this
So it seems that whatever your new method is doing, it made the whole plot lost its adjustability. Your solution, as wonderful as it is at solving one problem, in return, created another. So what would be the best thing to do here?
Edit 2:
Using fig, axes = plt.subplots(nrows, ncols, sharex=True, sharey=True, constrained_layout=True) along with plt.show() gives
As you can see there's still a vertical gap between the columns of subplots for which not even using plt.subplots_adjust() can get rid of.
Edit:
As has been noted in the comments, the biggest problem here is actually to make the colorbar for many histograms meaningful, as ax.hist2d will always scale the histogram data it receives from numpy. It may therefore be best to first calculated the 2d histogram data using numpy and then use again imshow to visualise it. This way, also the solutions of the linked question can be applied. To make the problem with the normalisation more visible, I put some effort into producing some qualitatively different 2d histograms using scipy.stats.multivariate_normal, which shows how the height of the histogram can change quite dramatically even though the number of samples is the same in each figure.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec as gs
from scipy.stats import multivariate_normal
##opening figure and axes
nrows=3
ncols=3
fig, axes = plt.subplots(nrows,ncols)
##generate some random data for the distributions
means = np.random.rand(nrows,ncols,2)
sigmas = np.random.rand(nrows,ncols,2)
thetas = np.random.rand(nrows,ncols)*np.pi*2
##produce the actual data and compute the histograms
mappables=[]
for mean,sigma,theta in zip( means.reshape(-1,2), sigmas.reshape(-1,2), thetas.reshape(-1)):
##the data (only cosmetics):
c, s = np.cos(theta), np.sin(theta)
rot = np.array(((c,-s), (s, c)))
cov = rot#np.diag(sigma)#rot.T
rv = multivariate_normal(mean,cov)
data = rv.rvs(size = 10000)
##the 2d histogram from numpy
H,xedges,yedges = np.histogram2d(data[:,0], data[:,1], bins=50, range=[[-2, 2],[-2, 2]])
mappables.append(H)
##the min and max values of all histograms
vmin = np.min(mappables)
vmax = np.max(mappables)
##second loop for visualisation
for ax,H in zip(axes.ravel(),mappables):
im = ax.imshow(H,vmin=vmin, vmax=vmax, extent=[-2,2,-2,2])
##colorbar using solution from linked question
fig.colorbar(im,ax=axes.ravel())
plt.show()
This code produces a figure like this:
Old Answer:
One way to solve your problem is to generate the space for your colorbar explicitly. You can use a GridSpec instance to define how wide your colorbar should be. Below your subplots_hist_2d() function with a few modifications. Note that your use of tight_layout() shifted the colorbar into a funny place, hence the replacement. If you want the plots closer to each other, I'd rather recommend to play with the aspect ratio of the figure.
def subplots_hist_2d(x_data, y_data, x_labels, y_labels, titles):
## fig, a = plt.subplots(2, 2)
fig = plt.figure()
g = gs.GridSpec(nrows=2, ncols=3, width_ratios=[1,1,0.05])
a = [fig.add_subplot(g[n,m]) for n in range(2) for m in range(2)]
cax = fig.add_subplot(g[:,2])
## a = a.ravel()
for idx, ax in enumerate(a):
image = ax.hist2d(x_data[idx], y_data[idx], bins=50, range=[[-2, 2],[-2, 2]])
ax.set_title(titles[idx], fontsize=12)
ax.set_xlabel(x_labels[idx])
ax.set_ylabel(y_labels[idx])
ax.set_aspect("equal")
## cb = fig.colorbar(image[-1],ax=a)
cb = fig.colorbar(image[-1], cax=cax)
cb.set_label("Intensity", rotation=270)
# pad = how big overall pic is
# w_pad = how separate they're left to right
# h_pad = how separate they're top to bottom
## plt.tight_layout(pad=-1, w_pad=-10, h_pad=0.5)
fig.tight_layout()
Using this modified function, I get the following output:

Seaborn: lining up 2 distplots on same axes

I'm trying to create 2 distplots that overlap on same axes, but they seem to be offset. How can I adjust this so that their overlapping is exact? Please see the image link below for the issue I have.
plt.figure(figsize=(10,8))
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins= 10, kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins= 10, color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax1.set_xlim([600, 850])
ax2.set_xlim([600, 850])
Problematic result
The plots aren't lining up because Seaborn (well, Matplotlib behind the scenes) is working out the best way to give you ten bins for each set of data you pass to it. But the two sets might not have the same range.
You can provide a sequence as the bins argument, which defines the edges of the bins. Assuming you have numpy available you can use its linspace function to easily create this sequence from the smallest and largest values in your data.
plt.figure(figsize(10,8))
bins = np.linspace(min(loans['fico']), max(loans['fico']), num=11)
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins=bins,
kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins=bins,
color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
And then you shouldn't need to set the x limits.
An example with some randomly generated values:

Resources