Matplotlib distributing the markers on a semilogx plot - python-3.x

I want to plot a curve on a semilogx scale in Matplotlib. I have two vectors fpr and tpr of size 17874. I want to show markers on the same curve. But since there are too many points, I used markevery=0.1, as shown e.g. in this example from the matplotlib page. However, using markevery in this case did not have any markers in the semilog plot (right panel):
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=0.1)
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=0.1)
plt.show()
Then, I tried to use slice object. As you can see in the plot below, the markers in the left plot are evenly distributed, but in the right semilogx plot, the markers are only shown in half of the curve. So I am wondering if there is any way to have a variable slice interval that can fix this issue.
My curent code is the following:
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
plt.show()

My suspicion would be that the value in the data which seems close to zero actually is zero. Since the logarithm of zero is undefined, that point cannot be plotted on a logarithmic scale. Missing the first point in the graph though would not allow to calculate the spacings for markevery. (Just as in "Calculate the mean of NaN and 1" , which is impossible.)
A solution is of course to leave out that point, which cannot be plotted anyways when calling the plot function for a log scale.
plt.semilogx(fpr[1:], tpr[1:], marker='o', markevery=0.1)

Related

Modify position of colorbar so that extend triangle is above plot

So, I have to make a bunch of contourf plots for different days that need to share colorbar ranges. That was easily made but sometimes it happens that the maximum value for a given date is above the colorbar range and that changes the look of the plot in a way I dont need. The way I want it to treat it when that happens is to add the extend triangle above the "original colorbar". It's clear in the attached picture.
I need the code to run things automatically, right now I only feed the data and the color bar range and it outputs the images, so the fitting of the colorbar in the code needs to be automatic, I can't add padding in numbers because the figure sizes changes depending on the area that is being asked to be plotted.
The reason why I need this behavior is because eventually I would want to make a .gif and I can't have the colorbar to move in that short video. I need for the triangle to be added, when needed, to the top (and below) without messing with the "main" colorbar.
Thanks!
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, BoundaryNorm
from matplotlib import cm
###############
## Finds the appropriate option for variable "extend" in fig colorbar
def find_extend(vmin, vmax, datamin, datamax):
#extend{'neither', 'both', 'min', 'max'}
if datamin >= vmin:
if datamax <= vmax:
extend="neither"
else:
extend="max"
else:
if datamax <= vmax:
extend="min"
else:
extend="both"
return extend
###########
vmin=0
vmax=30
nlevels=8
colormap=cm.get_cmap("rainbow")
### Creating data
z_1=30*abs(np.random.rand(5, 5))
z_2=37*abs(np.random.rand(5, 5))
data={1:z_1, 2:z_2}
x=range(5)
y=range(5)
## Plot
for day in [1, 2]:
fig = plt.figure(figsize=(4,4))
## Normally figsize=get_figsize(bounds) and bounds is retrieved from gdf.total_bounds
## The function creates the figure size based on the x/y ratio of the bounds
ax = fig.add_subplot(1, 1, 1)
norm=BoundaryNorm(np.linspace(vmin, vmax, nlevels+1), ncolors=colormap.N)
z=data[day]
cs=ax.contourf(x, y, z, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax)
extend=find_extend(vmin, vmax, np.nanmin(z), np.nanmax(z))
fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), ax=ax, extend=extend)
plt.close(fig)
You can do something like this: putting a triangle on top of the colorbar manually:
fig, ax = plt.subplots()
pc = ax.pcolormesh(np.random.randn(20, 20))
cb = fig.colorbar(pc)
trixy = np.array([[0, 1], [1, 1], [0.5, 1.05]])
p = mpatches.Polygon(trixy, transform=cb.ax.transAxes,
clip_on=False, edgecolor='k', linewidth=0.7,
facecolor='m', zorder=4, snap=True)
cb.ax.add_patch(p)
plt.show()

Matplotlib - Scatter Plot, How to fill in the space between each individual point?

I am plotting the following data:
fig, ax = plt.subplots()
im = ax.scatter(std_sorted[:, [1]], std_sorted[:, [2]], s=5, c=std_sorted[:, [0]])
With the following result:
My question is: can I fill the space between each point in the plot by extrapolating and then coloring that extrapolated space accordingly, so I get a uniform plot without any points?
So basically I'm looking for this result (This is simply me "squeezing" the above picture to show the desired result and not dealing with the space between the points):
The simplest thing to do in this case, is to use a short vertical line a marker, and set the markersize large enough such that there is no white space left.
Another option is to use tricontourf to create a filled image of x, y and z.
Note that neither a scatter plot nor tricontourf need the points to be sorted in any order.
If you do have your points sorted into an orderded grid, plt.imshow should give the best result.
Here is some code to show how it could look like. First some dummy data slightly similar to the example are generated. As the x,y are random, they don't fill the complete space. This might leave some blank spots in the scatter plot. The spots are nicely interpolated for the contourf, except possibly in the corners.
import numpy as np
import matplotlib.pyplot as plt
N = 50000
xmin = 0
xmax = 0.20
ymin = -0.01
ymax = 0.01
std_sorted = np.zeros((N, 3))
std_sorted[:,1] = np.random.uniform(xmin, xmax, N)
std_sorted[:,2] = np.random.choice(np.linspace(ymin, ymax, 80), N)
std_sorted[:,0] = np.cos(3*(std_sorted[:,1] - 0.04 - 100*std_sorted[:,2]**2))**10
fig, ax = plt.subplots(ncols=2)
# im = ax[0].scatter(std_sorted[:, 1], std_sorted[:, 2], s=20, c=std_sorted[:, 0], marker='|')
im = ax[0].scatter(std_sorted[:, 1], std_sorted[:, 2], s=5, c=std_sorted[:, 0], marker='.')
ax[0].set_xlim(xmin, xmax)
ax[0].set_ylim(ymin, ymax)
ax[0].set_title("scatter plot")
ax[1].tricontourf(std_sorted[:, 1], std_sorted[:, 2], std_sorted[:, 0], 256)
ax[1].set_title("tricontourf")
plt.tight_layout()
plt.show()

How to subplot two alternate x scales and two alternate y scales for more than one subplot?

I am trying to make a 2x2 subplot, with each of the inner subplots consisting of two x axes and two y axes; the first xy correspond to a linear scale and the second xy correspond to a logarithmic scale. Before assuming this question has been asked before, the matplotlib docs and examples show how to do multiple scales for either x or y but not both. This post on stackoverflow is the closest thing to my question, and I have attempted to use this idea to implement what I want. My attempt is below.
Firstly, we initialize data, ticks, and ticklabels. The idea is that the alternate scaling will have the same tick positions with altered ticklabels to reflect the alternate scaling.
import numpy as np
import matplotlib.pyplot as plt
# xy data (global)
X = np.linspace(5, 13, 9, dtype=int)
Y = np.linspace(7, 12, 9)
# xy ticks for linear scale (global)
dtick = dict(X=X, Y=np.linspace(7, 12, 6, dtype=int))
# xy ticklabels for linear and logarithmic scales (global)
init_xt = 2**dtick['X']
dticklabel = dict(X1=dtick['X'], Y1=dtick['Y']) # linear scale
dticklabel['X2'] = ['{}'.format(init_xt[idx]) if idx % 2 == 0 else '' for idx in range(len(init_xt))] # log_2 scale
dticklabel['Y2'] = 2**dticklabel['Y1'] # log_2 scale
Borrowing from the linked SO post, I will plot the same thing in each of the 4 subplots. Since similar methods are used for both scalings in each subplot, the method is thrown into a for-loop. But we need the row number, column number, and plot number for each.
# 2x2 subplot
# fig.add_subplot(row, col, pnum); corresponding iterables = (irows, icols, iplts)
irows = (1, 1, 2, 2)
icols = (1, 2, 1, 2)
iplts = (1, 2, 1, 2)
ncolors = ('red', 'blue', 'green', 'black')
Putting all of this together, the function to output the plot is below:
def initialize_figure(irows, icols, iplts, ncolors, figsize=None):
""" """
fig = plt.figure(figsize=figsize)
for row, col, pnum, color in zip(irows, icols, iplts, ncolors):
ax1 = fig.add_subplot(row, col, pnum) # linear scale
ax2 = fig.add_subplot(row, col, pnum, frame_on=False) # logarithmic scale ticklabels
ax1.plot(X, Y, '-', color=color)
# ticks in same positions
for ax in (ax1, ax2):
ax.set_xticks(dtick['X'])
ax.set_yticks(dtick['Y'])
# remove xaxis xtick_labels and labels from top row
if row == 1:
ax1.set_xticklabels([])
ax2.set_xticklabels(dticklabel['X2'])
ax1.set_xlabel('')
ax2.set_xlabel('X2', color='gray')
# initialize xaxis xtick_labels and labels for bottom row
else:
ax1.set_xticklabels(dticklabel['X1'])
ax2.set_xticklabels([])
ax1.set_xlabel('X1', color='black')
ax2.set_xlabel('')
# linear scale on left
if col == 1:
ax1.set_yticklabels(dticklabel['Y1'])
ax1.set_ylabel('Y1', color='black')
ax2.set_yticklabels([])
ax2.set_ylabel('')
# logarithmic scale on right
else:
ax1.set_yticklabels([])
ax1.set_ylabel('')
ax2.set_yticklabels(dticklabel['Y2'])
ax2.set_ylabel('Y2', color='black')
ax1.tick_params(axis='x', colors='black')
ax1.tick_params(axis='y', colors='black')
ax2.tick_params(axis='x', colors='gray')
ax2.tick_params(axis='y', colors='gray')
ax1.xaxis.tick_bottom()
ax1.yaxis.tick_left()
ax1.xaxis.set_label_position('top')
ax1.yaxis.set_label_position('right')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
for ax in (ax1, ax2):
ax.set_xlim([4, 14])
ax.set_ylim([6, 13])
fig.tight_layout()
plt.show()
plt.close(fig)
Calling initialize_figure(irows, icols, iplts, ncolors) produces the figure below.
I am applying the same xlim and ylim so I do not understand why the subplots are all different sizes. Also, the axis labels and axis ticklabels are not in the specified positions (since fig.add_subplot(...) indexing starts from 1 instead of 0.
What is my mistake and how can I achieve the desired result?
(In case it isn't clear, I am trying to put the xticklabels and xlabels for the linear scale on the bottom row, the xticklabels and xlabels for the logarithmic scale on the top row, the 'yticklabelsandylabelsfor the linear scale on the left side of the left column, and the 'yticklabels and ylabels for the logarithmic scale on the right side of the right column. The color='black' kwarg corresponds to the linear scale and the color='gray' kwarg corresponds to the logarithmic scale.)
The irows and icols lists inn the code do not serve any purpose. To create 4 subplots in a 2x2 grid you would loop over the range(1,5),
for pnum in range(1,5):
ax1 = fig.add_subplot(2, 2, pnum)
This might not be the only problem in the code, but as long as the subplots aren't created correctly it's not worth looking further down.

Setting ticks on matplotlib 3-D plots

I'm doing some cluster analysis and want to use matplotlib to visualise the results. For the most part, this is working out OK. However, I'm struggling with controlling tick placement on the axes. That is, the ticks on the y axis are overcrowded and I'd like to thin them out. I've tried supplying a range for the ticks using the numpy arrange function, but this isn't working.
I don't know if this is because I'm not familiar enough with matplotlib, or if it's an issue with 3-D plotting. In any event, I've tried all the solutions I can find on Stack and nothing seems to be working.
My code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['col_1'], data['col_2'], data['col_3'], c = data.index, cmap = cm.winter, s=60)
ax.view_init(15, 240)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z- Axis')
plt.title('Sample Plot')
plt.show()
My solution to this is to supply the ticks as follows:
ticks = np.arange(0.3, 0.7, 0.02)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['col_1'], data['col_2'], data['col_3'], c = data.index, cmap = cm.winter, s=60)
ax.view_init(15, 240)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_zticks(ticks)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z- Axis')
plt.title('Sample Bad Plot')
plt.show()
However, this only produces the hot mess below. Any help to be had?
The problem is that your x-values lie approximately within the range 0.54-0.68, your y-values lie within the range 0.34-0.42 and your z-values lie within the range0.55-0.63. Now in your second code, you define ticks = np.arange(0.3, 0.7, 0.02) which creates ticks from 0.3 to 0.68 and then you assign these values to be displayed on x, y, z axis using ax.set_xticks(ticks) and so on. You get this mess because your supplied ticks values are outside the range of actual x, y, z data points. Since you are only interested in refining the y axis ticks, you can just do
ticks = np.arange(0.34, 0.44, 0.02)
and then just set the ticks for the y axis as
ax.set_yticks(ticks).
If you don't want to specify the numbers 0.34 and 0.44 manually, you can find the maximum and minimum y value and use something like ticks = np.arange(min_value, max_value, 0.02).
Since I do not have access to your original data data['col_1'] and so on, I can't play with your code but the above tips will surely help.

Seaborn: lining up 2 distplots on same axes

I'm trying to create 2 distplots that overlap on same axes, but they seem to be offset. How can I adjust this so that their overlapping is exact? Please see the image link below for the issue I have.
plt.figure(figsize=(10,8))
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins= 10, kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins= 10, color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax1.set_xlim([600, 850])
ax2.set_xlim([600, 850])
Problematic result
The plots aren't lining up because Seaborn (well, Matplotlib behind the scenes) is working out the best way to give you ten bins for each set of data you pass to it. But the two sets might not have the same range.
You can provide a sequence as the bins argument, which defines the edges of the bins. Assuming you have numpy available you can use its linspace function to easily create this sequence from the smallest and largest values in your data.
plt.figure(figsize(10,8))
bins = np.linspace(min(loans['fico']), max(loans['fico']), num=11)
ax1 = sns.distplot(loans['fico'][loans['credit.policy']==1], bins=bins,
kde=False, hist_kws=dict(edgecolor='k', lw=1))
ax2 = sns.distplot(loans['fico'][loans['credit.policy']==0], bins=bins,
color='Red', kde=False, hist_kws=dict(edgecolor='k', lw=1))
And then you shouldn't need to set the x limits.
An example with some randomly generated values:

Resources