Seaborn: lining up 2 distplots on same axes - python-3.x

I'm trying to create 2 distplots that overlap on same axes, but they seem to be offset. How can I adjust this so that their overlapping is exact? Please see the image link below for the issue I have.
plt.figure(figsize=(10,8))
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins= 10, kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins= 10, color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax1.set_xlim([600, 850])
ax2.set_xlim([600, 850])
Problematic result

The plots aren't lining up because Seaborn (well, Matplotlib behind the scenes) is working out the best way to give you ten bins for each set of data you pass to it. But the two sets might not have the same range.
You can provide a sequence as the bins argument, which defines the edges of the bins. Assuming you have numpy available you can use its linspace function to easily create this sequence from the smallest and largest values in your data.
plt.figure(figsize(10,8))
bins = np.linspace(min(loans['fico']), max(loans['fico']), num=11)
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins=bins,
kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins=bins,
color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
And then you shouldn't need to set the x limits.
An example with some randomly generated values:

Related

Modify position of colorbar so that extend triangle is above plot

So, I have to make a bunch of contourf plots for different days that need to share colorbar ranges. That was easily made but sometimes it happens that the maximum value for a given date is above the colorbar range and that changes the look of the plot in a way I dont need. The way I want it to treat it when that happens is to add the extend triangle above the "original colorbar". It's clear in the attached picture.
I need the code to run things automatically, right now I only feed the data and the color bar range and it outputs the images, so the fitting of the colorbar in the code needs to be automatic, I can't add padding in numbers because the figure sizes changes depending on the area that is being asked to be plotted.
The reason why I need this behavior is because eventually I would want to make a .gif and I can't have the colorbar to move in that short video. I need for the triangle to be added, when needed, to the top (and below) without messing with the "main" colorbar.
Thanks!
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, BoundaryNorm
from matplotlib import cm
###############
## Finds the appropriate option for variable "extend" in fig colorbar
def find_extend(vmin, vmax, datamin, datamax):
#extend{'neither', 'both', 'min', 'max'}
if datamin >= vmin:
if datamax <= vmax:
extend="neither"
else:
extend="max"
else:
if datamax <= vmax:
extend="min"
else:
extend="both"
return extend
###########
vmin=0
vmax=30
nlevels=8
colormap=cm.get_cmap("rainbow")
### Creating data
z_1=30*abs(np.random.rand(5, 5))
z_2=37*abs(np.random.rand(5, 5))
data={1:z_1, 2:z_2}
x=range(5)
y=range(5)
## Plot
for day in [1, 2]:
fig = plt.figure(figsize=(4,4))
## Normally figsize=get_figsize(bounds) and bounds is retrieved from gdf.total_bounds
## The function creates the figure size based on the x/y ratio of the bounds
ax = fig.add_subplot(1, 1, 1)
norm=BoundaryNorm(np.linspace(vmin, vmax, nlevels+1), ncolors=colormap.N)
z=data[day]
cs=ax.contourf(x, y, z, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax)
extend=find_extend(vmin, vmax, np.nanmin(z), np.nanmax(z))
fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), ax=ax, extend=extend)
plt.close(fig)
You can do something like this: putting a triangle on top of the colorbar manually:
fig, ax = plt.subplots()
pc = ax.pcolormesh(np.random.randn(20, 20))
cb = fig.colorbar(pc)
trixy = np.array([[0, 1], [1, 1], [0.5, 1.05]])
p = mpatches.Polygon(trixy, transform=cb.ax.transAxes,
clip_on=False, edgecolor='k', linewidth=0.7,
facecolor='m', zorder=4, snap=True)
cb.ax.add_patch(p)
plt.show()

Legend overwritten by plot - matplotlib

I have a plot that looks as follows:
I want to put labels for both the lineplot and the markers in red. However the legend is not appearning because its the plot is taking out its space.
Update
it turns out I cannot put several strings in plt.legend()
I made the figure bigger by using the following:
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
However now I have only one label in the legend, with the marker appearing on the lineplot while I rather want two: one for the marker alone and another for the line alone:
Updated code:
plt.plot(range(len(y)), y, '-bD', c='blue', markerfacecolor='red', markeredgecolor='k', markevery=rare_cases, label='%s' % target_var_name)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
# changed this over here
plt.legend()
plt.savefig(output_folder + fig_name)
plt.close()
What you want to do (have two labels for a single object) is not completely impossible but it's MUCH easier to plot separately the line and the rare values, e.g.
# boilerplate
import numpy as np
import matplotlib.pyplot as plt
# synthesize some data
N = 501
t = np.linspace(0, 10, N)
s = np.sin(np.pi*t)
rare = np.zeros(N, dtype=bool); rare[:20]=True; np.random.shuffle(rare)
plt.plot(t, s, label='Curve')
plt.scatter(t[rare], s[rare], label='rare')
plt.legend()
plt.show()
Update
[...] it turns out I cannot put several strings in plt.legend()
Well, you can, as long as ① the several strings are in an iterable (a tuple or a list) and ② the number of strings (i.e., labels) equals the number of artists (i.e., thingies) in the plot.
plt.legend(('a', 'b', 'c'))

How to mimic the Draw('same') from ROOT with matplotlib

I have a use case from ROOT that I have not been able to reproduce with matplotlib. Here is a minimal example
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
dist1x = np.random.normal(5, 0.05, 10_000)
dist1y = np.random.normal(5, 0.05, 10_000)
dist2x = np.random.normal(15, 0.05, 10_000)
dist2y = np.random.normal(15, 0.05, 10_000)
ax.hist2d(dist1x, dist1y, bins=100, cmap='viridis')
ax.hist2d(dist2x, dist2y, bins=100, cmap='viridis')
plt.show()
and the output is
With ROOT one can do:
TCanvas *c1 = new TCanvas("c1","c1");
TH1D *h1 = new TH1D("h1","h1",500,-5,5);
h1->FillRandom("gaus");
TH1D *h2 = new TH1D("h2","h2",500,-5,5);
h2->FillRandom("gaus");
h1->Draw();
h2->Draw("SAME");
and the two histograms will share the canvas, axes, etc. Why plotting the two histograms in the same figure only shows the last one? How can I reproduce the ROOT behavior?
I think the intended behavior is to draw the sum of both histograms. You can do this by concatenating the arrays before plotting:
ax.hist2d(np.concatenate([dist1x, dist2x]),
np.concatenate([dist1y, dist2y]),
bins=100, cmap='viridis')
(I've modified the number a bit, to make sure the two blobs overlap.)
The default behavior in ROOT for SAME with TH2F is probably not desirable.
The second histogram is drawn over the other, overwriting the fill color of the bins. The information from the first histogram is discarded in every cell if there is at least one event from the second histogram.
To reproduce this behavior, I'd suggest to use numpy.histogram2d. Set the bins of the first histogram to zero if there are entries in the second one, and then plot the sum of both.
bins = np.linspace(0, 20, 100), np.linspace(0, 20, 100)
hist1, _, _ = np.histogram2d(dist1x, dist1y, bins=bins)
hist2, _, _ = np.histogram2d(dist2x, dist2y, bins=bins)
hist1[hist2 > 0] = 0
sum_hist = hist1 + hist2
plt.pcolormesh(*bins, sum_hist)
If the two histograms don't have any populated bin in common, the two behaviors are identical.

How to plot fill_betweenx to fill the area between y1 and y2 with different scales using matplotlib.pyplot?

I am trying to fill the area between two vertical curves(RHOB and NPHI) using matplotlib.pyplot. Both RHOB and NPHI are having different scale of x-axis.
But when i try to plot i noticed that the fill_between is filling the area between RHOB and NPHI in the same scale.
#well_data is the data frame i am reading to get my data
#creating my subplot
fig, ax=plt.subplots(1,2,figsize=(8,6),sharey=True)
ax[0].get_xaxis().set_visible(False)
ax[0].invert_yaxis()
#subplot 1:
#ax01 to house the NPHI curve (NPHI curve are having values between 0-45)
ax01=ax[0].twiny()
ax01.set_xlim(-15,45)
ax01.invert_xaxis()
ax01.set_xlabel('NPHI',color='blue')
ax01.spines['top'].set_position(('outward',0))
ax01.tick_params(axis='x',colors='blue')
ax01.plot(well_data.NPHI,well_data.index,color='blue')
#ax02 to house the RHOB curve (RHOB curve having values between 1.95,2.95)
ax02=ax[0].twiny()
ax02.set_xlim(1.95,2.95)
ax02.set_xlabel('RHOB',color='red')
ax02.spines['top'].set_position(('outward',40))
ax02.tick_params(axis='x',colors='red')
ax02.plot(well_data.RHOB,well_data.index,color='red')
# ax03=ax[0].twiny()
# ax03.set_xlim(0,50)
# ax03.spines['top'].set_position(('outward',80))
# ax03.fill_betweenx(well_data.index,well_data.RHOB,well_data.NPHI,alpha=0.5)
plt.show()
ax03=ax[0].twiny()
ax03.set_xlim(0,50)
ax03.spines['top'].set_position(('outward',80))
ax03.fill_betweenx(well_data.index,well_data.RHOB,well_data.NPHI,alpha=0.5)
above is the code that i tried, but the end result is not what i expected.
it is filling area between RHOB and NPHI assuming RHOB and NPHI is in the same scale.
How can i fill the area between the blue and the red curve?
Since the data are on two different axes, but each artist needs to be on one axes alone, this is hard. What would need to be done here is to calculate all data in a single unit system. You might opt to transform both datasets to display-space first (meaning pixels), then plot those transformed data via fill_betweenx without transforming again (transform=None).
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0, 22, 101)
x1 = np.sin(y)/2
x2 = np.cos(y/2)+20
fig, ax1 = plt.subplots()
ax2 = ax1.twiny()
ax1.tick_params(axis="x", colors="C0", labelcolor="C0")
ax2.tick_params(axis="x", colors="C1", labelcolor="C1")
ax1.set_xlim(-1,3)
ax2.set_xlim(15,22)
ax1.plot(x1,y, color="C0")
ax2.plot(x2,y, color="C1")
x1p, yp = ax1.transData.transform(np.c_[x1,y]).T
x2p, _ = ax2.transData.transform(np.c_[x2,y]).T
ax1.autoscale(False)
ax1.fill_betweenx(yp, x1p, x2p, color="C9", alpha=0.4, transform=None)
plt.show()
We might equally opt to transform the data from the second axes to the first. This has the advantage that it's not defined in pixel space and hence circumvents a problem that occurs when the figure size is changed after the figure is created.
x2p, _ = (ax2.transData + ax1.transData.inverted()).transform(np.c_[x2,y]).T
ax1.autoscale(False)
ax1.fill_betweenx(y, x1, x2p, color="grey", alpha=0.4)

How to subplot two alternate x scales and two alternate y scales for more than one subplot?

I am trying to make a 2x2 subplot, with each of the inner subplots consisting of two x axes and two y axes; the first xy correspond to a linear scale and the second xy correspond to a logarithmic scale. Before assuming this question has been asked before, the matplotlib docs and examples show how to do multiple scales for either x or y but not both. This post on stackoverflow is the closest thing to my question, and I have attempted to use this idea to implement what I want. My attempt is below.
Firstly, we initialize data, ticks, and ticklabels. The idea is that the alternate scaling will have the same tick positions with altered ticklabels to reflect the alternate scaling.
import numpy as np
import matplotlib.pyplot as plt
# xy data (global)
X = np.linspace(5, 13, 9, dtype=int)
Y = np.linspace(7, 12, 9)
# xy ticks for linear scale (global)
dtick = dict(X=X, Y=np.linspace(7, 12, 6, dtype=int))
# xy ticklabels for linear and logarithmic scales (global)
init_xt = 2**dtick['X']
dticklabel = dict(X1=dtick['X'], Y1=dtick['Y']) # linear scale
dticklabel['X2'] = ['{}'.format(init_xt[idx]) if idx % 2 == 0 else '' for idx in range(len(init_xt))] # log_2 scale
dticklabel['Y2'] = 2**dticklabel['Y1'] # log_2 scale
Borrowing from the linked SO post, I will plot the same thing in each of the 4 subplots. Since similar methods are used for both scalings in each subplot, the method is thrown into a for-loop. But we need the row number, column number, and plot number for each.
# 2x2 subplot
# fig.add_subplot(row, col, pnum); corresponding iterables = (irows, icols, iplts)
irows = (1, 1, 2, 2)
icols = (1, 2, 1, 2)
iplts = (1, 2, 1, 2)
ncolors = ('red', 'blue', 'green', 'black')
Putting all of this together, the function to output the plot is below:
def initialize_figure(irows, icols, iplts, ncolors, figsize=None):
""" """
fig = plt.figure(figsize=figsize)
for row, col, pnum, color in zip(irows, icols, iplts, ncolors):
ax1 = fig.add_subplot(row, col, pnum) # linear scale
ax2 = fig.add_subplot(row, col, pnum, frame_on=False) # logarithmic scale ticklabels
ax1.plot(X, Y, '-', color=color)
# ticks in same positions
for ax in (ax1, ax2):
ax.set_xticks(dtick['X'])
ax.set_yticks(dtick['Y'])
# remove xaxis xtick_labels and labels from top row
if row == 1:
ax1.set_xticklabels([])
ax2.set_xticklabels(dticklabel['X2'])
ax1.set_xlabel('')
ax2.set_xlabel('X2', color='gray')
# initialize xaxis xtick_labels and labels for bottom row
else:
ax1.set_xticklabels(dticklabel['X1'])
ax2.set_xticklabels([])
ax1.set_xlabel('X1', color='black')
ax2.set_xlabel('')
# linear scale on left
if col == 1:
ax1.set_yticklabels(dticklabel['Y1'])
ax1.set_ylabel('Y1', color='black')
ax2.set_yticklabels([])
ax2.set_ylabel('')
# logarithmic scale on right
else:
ax1.set_yticklabels([])
ax1.set_ylabel('')
ax2.set_yticklabels(dticklabel['Y2'])
ax2.set_ylabel('Y2', color='black')
ax1.tick_params(axis='x', colors='black')
ax1.tick_params(axis='y', colors='black')
ax2.tick_params(axis='x', colors='gray')
ax2.tick_params(axis='y', colors='gray')
ax1.xaxis.tick_bottom()
ax1.yaxis.tick_left()
ax1.xaxis.set_label_position('top')
ax1.yaxis.set_label_position('right')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
for ax in (ax1, ax2):
ax.set_xlim([4, 14])
ax.set_ylim([6, 13])
fig.tight_layout()
plt.show()
plt.close(fig)
Calling initialize_figure(irows, icols, iplts, ncolors) produces the figure below.
I am applying the same xlim and ylim so I do not understand why the subplots are all different sizes. Also, the axis labels and axis ticklabels are not in the specified positions (since fig.add_subplot(...) indexing starts from 1 instead of 0.
What is my mistake and how can I achieve the desired result?
(In case it isn't clear, I am trying to put the xticklabels and xlabels for the linear scale on the bottom row, the xticklabels and xlabels for the logarithmic scale on the top row, the 'yticklabelsandylabelsfor the linear scale on the left side of the left column, and the 'yticklabels and ylabels for the logarithmic scale on the right side of the right column. The color='black' kwarg corresponds to the linear scale and the color='gray' kwarg corresponds to the logarithmic scale.)
The irows and icols lists inn the code do not serve any purpose. To create 4 subplots in a 2x2 grid you would loop over the range(1,5),
for pnum in range(1,5):
ax1 = fig.add_subplot(2, 2, pnum)
This might not be the only problem in the code, but as long as the subplots aren't created correctly it's not worth looking further down.

Resources