Unable to set xticks for selecting appropriate cut-off threshold - python-3.x

During the model evaluation phase, I observed that my model predicted values in the range of 30.309 - 59.556.
I wanted to set a threshold so that I can say the values greater than that threshold are correct else incorrect.
I set the different threshold as:
start, stop, step = min(y_pred), max(y_pred)+1, (max(y_pred) - min(y_pred))/10
for i in arange(start, stop, step):
k.append(round(i, 2))
This provides me 10 different equally spaced threshold as follows:
Thresholds: [30.31, 33.23, 36.16, 39.08, 42.01, 44.93, 47.86, 50.78, 53.71, 56.63, 59.56]
I performed the model evaluation on these thresholds and the results are as follows:
Sensitivity: [1.0, 0.999, 0.983, 0.866, 0.497, 0.197, 0.093, 0.036, 0.007, 0.0, 0.0]
Specificity: [0.0, 0.0, 0.003, 0.027, 0.172, 0.576, 0.845, 0.915, 0.971, 0.997, 1.0]
The sensitivity goes from 1 to 0, where specificity from 0 to 1.
I plot them and the results are as follows:
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(sensitivity, c='red')
plt.plot(specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
plt.grid(True)
plt.show()
But I want the x-axis should show the threshold instead of numbers from 0 to 10.
I tried to set the xticks but results are as follows:
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(sensitivity, c='red')
plt.plot(specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
**plt.xticks(k)**
plt.grid(True)
plt.show()
Where I am making the mistake?

Instead of plotting sensitivity and specificity as 1D plots, you can plot them against thresholds as 2D plots. Just add it as x axis to your plots.
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(thresholds, sensitivity, c='red')
plt.plot(thresholds, specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
plt.grid(True)
plt.show()
The results is below.

Related

using % range in plot labels

I want to change the range in terms of %. In the attached figure along x-axis, I want to label it as -1%, -0.05%, 0, 0.05% and 1% along x-axis. Is there any way to do that directly in python using range function?
ax.set_xlim(-0.012, 0.012, 0.2)
You can modify the tick labels as shown below
fig, ax = plt.subplots()
x = np.linspace(-0.01, 0.01, 10)
ax.plot(x, -x/10, '-bo')
ax.set_xlim(-0.012, 0.012, 0.2)
labels = ['{:.2f}%'.format(item*100) for item in ax.get_xticks()]
ax.set_xticklabels(labels)

It's related to ROC curve

I have no problem in plotting the ROC curve and it also gets plotted as per my requirement, but the problem I am facing is in (ylim axes) it starts from 0.1 to 1.05, and it plots only even numbers (0.0 0.2 0.4...1.05), but I want to extend the ylim axes (for eg. 0.0 0.1 0.2 0.3...1.05). I want a code which includes both even and odd number while plotting ROC curve.
I searched in matplotlib but I didn't find anything related to my problem.
lw = 2
plt.figure()
plt.plot(fpr11, tpr11, 'o-', ms=2, label='ROC_curve_APOE(AUC11 = %0.4f)'
% roc_auc11, color='deeppink', linestyle=':', linewidth=2)
plt.plot(fpr51, tpr51, 'o-', ms=2, label='ROC_curve_Combined AUC5 =
%0.4f)' % roc_auc51, color='cornflowerblue', linestyle=':', linewidth=2)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0, 1])
plt.ylim([0, 1.05])
plt.xlabel('1-Specificity(False Positive Rate)')
plt.ylabel('Sensitivity(True Positive Rate)')
# plt.title('ROC curve for MCIc vs MCIs')
plt.title('ROC curve for AD vs NC')
plt.legend(loc="lower right")
plt.show()
# plt.savefig('roc_auc.png')
plt.close()
My expected output must be the same as over here https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#roc_curve_for_binary_svm
You can see in this figure that ylim has plotted every point from (0.0 0.1 ....up to 1).
Please help me solve it.
Not sure if you can set steps in ylim/xlim but you can use xticks/yticks instead
def frange(x, y, jump):
while x < y:
yield x
x += jump
yield y
plt.yticks(list(frange(0, 1.05, 0.1)))
How you choose to replace the frange is up to you, but you can also do something like
plt.yticks([0, 0.1, 0.2, 0.3, 0.4,....,1.0, 1.05])

How to subplot two alternate x scales and two alternate y scales for more than one subplot?

I am trying to make a 2x2 subplot, with each of the inner subplots consisting of two x axes and two y axes; the first xy correspond to a linear scale and the second xy correspond to a logarithmic scale. Before assuming this question has been asked before, the matplotlib docs and examples show how to do multiple scales for either x or y but not both. This post on stackoverflow is the closest thing to my question, and I have attempted to use this idea to implement what I want. My attempt is below.
Firstly, we initialize data, ticks, and ticklabels. The idea is that the alternate scaling will have the same tick positions with altered ticklabels to reflect the alternate scaling.
import numpy as np
import matplotlib.pyplot as plt
# xy data (global)
X = np.linspace(5, 13, 9, dtype=int)
Y = np.linspace(7, 12, 9)
# xy ticks for linear scale (global)
dtick = dict(X=X, Y=np.linspace(7, 12, 6, dtype=int))
# xy ticklabels for linear and logarithmic scales (global)
init_xt = 2**dtick['X']
dticklabel = dict(X1=dtick['X'], Y1=dtick['Y']) # linear scale
dticklabel['X2'] = ['{}'.format(init_xt[idx]) if idx % 2 == 0 else '' for idx in range(len(init_xt))] # log_2 scale
dticklabel['Y2'] = 2**dticklabel['Y1'] # log_2 scale
Borrowing from the linked SO post, I will plot the same thing in each of the 4 subplots. Since similar methods are used for both scalings in each subplot, the method is thrown into a for-loop. But we need the row number, column number, and plot number for each.
# 2x2 subplot
# fig.add_subplot(row, col, pnum); corresponding iterables = (irows, icols, iplts)
irows = (1, 1, 2, 2)
icols = (1, 2, 1, 2)
iplts = (1, 2, 1, 2)
ncolors = ('red', 'blue', 'green', 'black')
Putting all of this together, the function to output the plot is below:
def initialize_figure(irows, icols, iplts, ncolors, figsize=None):
""" """
fig = plt.figure(figsize=figsize)
for row, col, pnum, color in zip(irows, icols, iplts, ncolors):
ax1 = fig.add_subplot(row, col, pnum) # linear scale
ax2 = fig.add_subplot(row, col, pnum, frame_on=False) # logarithmic scale ticklabels
ax1.plot(X, Y, '-', color=color)
# ticks in same positions
for ax in (ax1, ax2):
ax.set_xticks(dtick['X'])
ax.set_yticks(dtick['Y'])
# remove xaxis xtick_labels and labels from top row
if row == 1:
ax1.set_xticklabels([])
ax2.set_xticklabels(dticklabel['X2'])
ax1.set_xlabel('')
ax2.set_xlabel('X2', color='gray')
# initialize xaxis xtick_labels and labels for bottom row
else:
ax1.set_xticklabels(dticklabel['X1'])
ax2.set_xticklabels([])
ax1.set_xlabel('X1', color='black')
ax2.set_xlabel('')
# linear scale on left
if col == 1:
ax1.set_yticklabels(dticklabel['Y1'])
ax1.set_ylabel('Y1', color='black')
ax2.set_yticklabels([])
ax2.set_ylabel('')
# logarithmic scale on right
else:
ax1.set_yticklabels([])
ax1.set_ylabel('')
ax2.set_yticklabels(dticklabel['Y2'])
ax2.set_ylabel('Y2', color='black')
ax1.tick_params(axis='x', colors='black')
ax1.tick_params(axis='y', colors='black')
ax2.tick_params(axis='x', colors='gray')
ax2.tick_params(axis='y', colors='gray')
ax1.xaxis.tick_bottom()
ax1.yaxis.tick_left()
ax1.xaxis.set_label_position('top')
ax1.yaxis.set_label_position('right')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
for ax in (ax1, ax2):
ax.set_xlim([4, 14])
ax.set_ylim([6, 13])
fig.tight_layout()
plt.show()
plt.close(fig)
Calling initialize_figure(irows, icols, iplts, ncolors) produces the figure below.
I am applying the same xlim and ylim so I do not understand why the subplots are all different sizes. Also, the axis labels and axis ticklabels are not in the specified positions (since fig.add_subplot(...) indexing starts from 1 instead of 0.
What is my mistake and how can I achieve the desired result?
(In case it isn't clear, I am trying to put the xticklabels and xlabels for the linear scale on the bottom row, the xticklabels and xlabels for the logarithmic scale on the top row, the 'yticklabelsandylabelsfor the linear scale on the left side of the left column, and the 'yticklabels and ylabels for the logarithmic scale on the right side of the right column. The color='black' kwarg corresponds to the linear scale and the color='gray' kwarg corresponds to the logarithmic scale.)
The irows and icols lists inn the code do not serve any purpose. To create 4 subplots in a 2x2 grid you would loop over the range(1,5),
for pnum in range(1,5):
ax1 = fig.add_subplot(2, 2, pnum)
This might not be the only problem in the code, but as long as the subplots aren't created correctly it's not worth looking further down.

Matplotlib distributing the markers on a semilogx plot

I want to plot a curve on a semilogx scale in Matplotlib. I have two vectors fpr and tpr of size 17874. I want to show markers on the same curve. But since there are too many points, I used markevery=0.1, as shown e.g. in this example from the matplotlib page. However, using markevery in this case did not have any markers in the semilog plot (right panel):
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=0.1)
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=0.1)
plt.show()
Then, I tried to use slice object. As you can see in the plot below, the markers in the left plot are evenly distributed, but in the right semilogx plot, the markers are only shown in half of the curve. So I am wondering if there is any way to have a variable slice interval that can fix this issue.
My curent code is the following:
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
plt.show()
My suspicion would be that the value in the data which seems close to zero actually is zero. Since the logarithm of zero is undefined, that point cannot be plotted on a logarithmic scale. Missing the first point in the graph though would not allow to calculate the spacings for markevery. (Just as in "Calculate the mean of NaN and 1" , which is impossible.)
A solution is of course to leave out that point, which cannot be plotted anyways when calling the plot function for a log scale.
plt.semilogx(fpr[1:], tpr[1:], marker='o', markevery=0.1)

Create a colormap for a histogram based off a data array

I've looked in a variety of places for how to do this but haven't been able to find exactly what I need/find validation if this is even possible.
I have a 2d histogram of height on y axis (given by rdata1) and intensity on x axis (given by intensity). The histogram plots fine, but I'd like to scale the colorbar rather than have it normalised.
I have already defined a colourmap, and I simply want to scale it using my plotted data.
I would like to scale the data so that the colorbar (which currently runs from 0-1) scales from 0 to (number of points in bin)/(len(time)). This is so I can find out the probability of a point being in a certain intensity bin at a certain height. I was unsure whether to do this by making a second histogram (which I wouldn't plot) and exporting the max/min values from that to scale the colourbar, or using a number of for loops and lists to append values into bins for each height range and then max/min the number of values in those bins.
Code is attached below:
import numpy as np
import file_reader as fr
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import matplotlib as mpl
time = [0.01649999991059303, 0.02584999985992908]
rdata = [-600.020751953125, -570.04150390625, -540.062255859375, -510.0830078125, -480.1037902832031, -450.1245422363281, -420.1452941894531, -390.1660461425781, -360.1867980957031, -330.2075500488281]
intensity = [[-37.32464981079102, -38.3233528137207], [-37.70231628417969, -38.05134201049805], [-38.27889251708984, -38.82979583740234], [-28.01022720336914, -27.68825912475586], [-8.408446311950684, -8.440451622009277], [-8.749446868896484, -8.750232696533203], [-9.431790351867676, -9.41820240020752], [-10.09048461914062, -10.23848724365234], [-10.84317588806152, -10.84869194030762], [-11.61933135986328, -11.67543029785156]]
range_bins = np.linspace(rdata[0],rdata[-1],(len(rdata)+1))
intensity_bins = np.linspace(-70,30,100)
intensity = np.array(intensity).ravel()
rdata1 = np.repeat(rdata,len(time))
cdict = {'red': ((0.0, 1.0, 1.0),
(0.25, 0.0, 0.0),
(0.55, 0.35, 0.35),
(0.75, 0.75, 0.75),
(1.0, 1.0, 1.0)),
'green': ((0.0, 1.0, 1.0),
(0.25, 0.1, 0.1),
(0.55, 0.6, 0.6),
(0.75, 0.8, 0.8),
(1.0, 0.0, 0.0)),
'blue': ((0.0, 1.0, 1.0),
(0.25, 1.0, 1.0),
(0.55, 0.2, 0.2),
(0.75, 0.1, 0.1),
(1.0, 0.0, 0.0))
}
radar_map = LinearSegmentedColormap('radar_map', cdict)
H, range_bins, intensity_bins = np.histogram2d(rdata1,intensity,bins=(range_bins,intensity_bins))
fig = plt.figure()
X,Y = np.meshgrid(intensity_bins,range_bins)
plt.pcolormesh(X,Y,H, cmap=radar_map)
cax = fig.add_axes([0.95, 0.2, 0.02, 0.6])
cb = mpl.colorbar.ColorbarBase(cax, cmap=radar_map, spacing='proportional')
Any help with this would be massively appreciated. Sorry for the rather lengthy post.
At the moment, the colorbar is independend of the shown pcolormesh.
If you link the colorbar to the pcolormesh, it will automatically scale to the minimum and maximum level of the plot. Use plt.colorbar(pc), where pc is the return of the pcolormesh.
To normalize the histogram counts you may divide the histogram by whatever quantity you want, e.g.
plt.pcolormesh( X,Y,H/float(len(time)) )
Example:
fig = plt.figure()
X,Y = np.meshgrid(intensity_bins,range_bins)
pc = plt.pcolormesh(X,Y,H/float(len(time)), cmap=radar_map)
cax = fig.add_axes([0.90, 0.2, 0.02, 0.6])
cb = plt.colorbar(pc, cax=cax)

Resources