It's related to ROC curve - python-3.x

I have no problem in plotting the ROC curve and it also gets plotted as per my requirement, but the problem I am facing is in (ylim axes) it starts from 0.1 to 1.05, and it plots only even numbers (0.0 0.2 0.4...1.05), but I want to extend the ylim axes (for eg. 0.0 0.1 0.2 0.3...1.05). I want a code which includes both even and odd number while plotting ROC curve.
I searched in matplotlib but I didn't find anything related to my problem.
lw = 2
plt.figure()
plt.plot(fpr11, tpr11, 'o-', ms=2, label='ROC_curve_APOE(AUC11 = %0.4f)'
% roc_auc11, color='deeppink', linestyle=':', linewidth=2)
plt.plot(fpr51, tpr51, 'o-', ms=2, label='ROC_curve_Combined AUC5 =
%0.4f)' % roc_auc51, color='cornflowerblue', linestyle=':', linewidth=2)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0, 1])
plt.ylim([0, 1.05])
plt.xlabel('1-Specificity(False Positive Rate)')
plt.ylabel('Sensitivity(True Positive Rate)')
# plt.title('ROC curve for MCIc vs MCIs')
plt.title('ROC curve for AD vs NC')
plt.legend(loc="lower right")
plt.show()
# plt.savefig('roc_auc.png')
plt.close()
My expected output must be the same as over here https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#roc_curve_for_binary_svm
You can see in this figure that ylim has plotted every point from (0.0 0.1 ....up to 1).
Please help me solve it.

Not sure if you can set steps in ylim/xlim but you can use xticks/yticks instead
def frange(x, y, jump):
while x < y:
yield x
x += jump
yield y
plt.yticks(list(frange(0, 1.05, 0.1)))
How you choose to replace the frange is up to you, but you can also do something like
plt.yticks([0, 0.1, 0.2, 0.3, 0.4,....,1.0, 1.05])

Related

using % range in plot labels

I want to change the range in terms of %. In the attached figure along x-axis, I want to label it as -1%, -0.05%, 0, 0.05% and 1% along x-axis. Is there any way to do that directly in python using range function?
ax.set_xlim(-0.012, 0.012, 0.2)
You can modify the tick labels as shown below
fig, ax = plt.subplots()
x = np.linspace(-0.01, 0.01, 10)
ax.plot(x, -x/10, '-bo')
ax.set_xlim(-0.012, 0.012, 0.2)
labels = ['{:.2f}%'.format(item*100) for item in ax.get_xticks()]
ax.set_xticklabels(labels)

Matplotlib - Scatter Plot, How to fill in the space between each individual point?

I am plotting the following data:
fig, ax = plt.subplots()
im = ax.scatter(std_sorted[:, [1]], std_sorted[:, [2]], s=5, c=std_sorted[:, [0]])
With the following result:
My question is: can I fill the space between each point in the plot by extrapolating and then coloring that extrapolated space accordingly, so I get a uniform plot without any points?
So basically I'm looking for this result (This is simply me "squeezing" the above picture to show the desired result and not dealing with the space between the points):
The simplest thing to do in this case, is to use a short vertical line a marker, and set the markersize large enough such that there is no white space left.
Another option is to use tricontourf to create a filled image of x, y and z.
Note that neither a scatter plot nor tricontourf need the points to be sorted in any order.
If you do have your points sorted into an orderded grid, plt.imshow should give the best result.
Here is some code to show how it could look like. First some dummy data slightly similar to the example are generated. As the x,y are random, they don't fill the complete space. This might leave some blank spots in the scatter plot. The spots are nicely interpolated for the contourf, except possibly in the corners.
import numpy as np
import matplotlib.pyplot as plt
N = 50000
xmin = 0
xmax = 0.20
ymin = -0.01
ymax = 0.01
std_sorted = np.zeros((N, 3))
std_sorted[:,1] = np.random.uniform(xmin, xmax, N)
std_sorted[:,2] = np.random.choice(np.linspace(ymin, ymax, 80), N)
std_sorted[:,0] = np.cos(3*(std_sorted[:,1] - 0.04 - 100*std_sorted[:,2]**2))**10
fig, ax = plt.subplots(ncols=2)
# im = ax[0].scatter(std_sorted[:, 1], std_sorted[:, 2], s=20, c=std_sorted[:, 0], marker='|')
im = ax[0].scatter(std_sorted[:, 1], std_sorted[:, 2], s=5, c=std_sorted[:, 0], marker='.')
ax[0].set_xlim(xmin, xmax)
ax[0].set_ylim(ymin, ymax)
ax[0].set_title("scatter plot")
ax[1].tricontourf(std_sorted[:, 1], std_sorted[:, 2], std_sorted[:, 0], 256)
ax[1].set_title("tricontourf")
plt.tight_layout()
plt.show()

Unable to set xticks for selecting appropriate cut-off threshold

During the model evaluation phase, I observed that my model predicted values in the range of 30.309 - 59.556.
I wanted to set a threshold so that I can say the values greater than that threshold are correct else incorrect.
I set the different threshold as:
start, stop, step = min(y_pred), max(y_pred)+1, (max(y_pred) - min(y_pred))/10
for i in arange(start, stop, step):
k.append(round(i, 2))
This provides me 10 different equally spaced threshold as follows:
Thresholds: [30.31, 33.23, 36.16, 39.08, 42.01, 44.93, 47.86, 50.78, 53.71, 56.63, 59.56]
I performed the model evaluation on these thresholds and the results are as follows:
Sensitivity: [1.0, 0.999, 0.983, 0.866, 0.497, 0.197, 0.093, 0.036, 0.007, 0.0, 0.0]
Specificity: [0.0, 0.0, 0.003, 0.027, 0.172, 0.576, 0.845, 0.915, 0.971, 0.997, 1.0]
The sensitivity goes from 1 to 0, where specificity from 0 to 1.
I plot them and the results are as follows:
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(sensitivity, c='red')
plt.plot(specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
plt.grid(True)
plt.show()
But I want the x-axis should show the threshold instead of numbers from 0 to 10.
I tried to set the xticks but results are as follows:
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(sensitivity, c='red')
plt.plot(specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
**plt.xticks(k)**
plt.grid(True)
plt.show()
Where I am making the mistake?
Instead of plotting sensitivity and specificity as 1D plots, you can plot them against thresholds as 2D plots. Just add it as x axis to your plots.
fig = plt.figure(facecolor='white', figsize=(18, 4))
plt.title("Sensitivity vs Specificiy")
plt.plot(thresholds, sensitivity, c='red')
plt.plot(thresholds, specificity, c='Green')
plt.legend(['Sensitivity','Specificity'])
plt.grid(True)
plt.show()
The results is below.

Setting ticks on matplotlib 3-D plots

I'm doing some cluster analysis and want to use matplotlib to visualise the results. For the most part, this is working out OK. However, I'm struggling with controlling tick placement on the axes. That is, the ticks on the y axis are overcrowded and I'd like to thin them out. I've tried supplying a range for the ticks using the numpy arrange function, but this isn't working.
I don't know if this is because I'm not familiar enough with matplotlib, or if it's an issue with 3-D plotting. In any event, I've tried all the solutions I can find on Stack and nothing seems to be working.
My code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['col_1'], data['col_2'], data['col_3'], c = data.index, cmap = cm.winter, s=60)
ax.view_init(15, 240)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z- Axis')
plt.title('Sample Plot')
plt.show()
My solution to this is to supply the ticks as follows:
ticks = np.arange(0.3, 0.7, 0.02)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['col_1'], data['col_2'], data['col_3'], c = data.index, cmap = cm.winter, s=60)
ax.view_init(15, 240)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_zticks(ticks)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z- Axis')
plt.title('Sample Bad Plot')
plt.show()
However, this only produces the hot mess below. Any help to be had?
The problem is that your x-values lie approximately within the range 0.54-0.68, your y-values lie within the range 0.34-0.42 and your z-values lie within the range0.55-0.63. Now in your second code, you define ticks = np.arange(0.3, 0.7, 0.02) which creates ticks from 0.3 to 0.68 and then you assign these values to be displayed on x, y, z axis using ax.set_xticks(ticks) and so on. You get this mess because your supplied ticks values are outside the range of actual x, y, z data points. Since you are only interested in refining the y axis ticks, you can just do
ticks = np.arange(0.34, 0.44, 0.02)
and then just set the ticks for the y axis as
ax.set_yticks(ticks).
If you don't want to specify the numbers 0.34 and 0.44 manually, you can find the maximum and minimum y value and use something like ticks = np.arange(min_value, max_value, 0.02).
Since I do not have access to your original data data['col_1'] and so on, I can't play with your code but the above tips will surely help.

Matplotlib distributing the markers on a semilogx plot

I want to plot a curve on a semilogx scale in Matplotlib. I have two vectors fpr and tpr of size 17874. I want to show markers on the same curve. But since there are too many points, I used markevery=0.1, as shown e.g. in this example from the matplotlib page. However, using markevery in this case did not have any markers in the semilog plot (right panel):
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=0.1)
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=0.1)
plt.show()
Then, I tried to use slice object. As you can see in the plot below, the markers in the left plot are evenly distributed, but in the right semilogx plot, the markers are only shown in half of the curve. So I am wondering if there is any way to have a variable slice interval that can fix this issue.
My curent code is the following:
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
plt.plot(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
ax = fig.add_subplot(1, 2, 2)
plt.semilogx(fpr, tpr, marker='o', markevery=slice(0, 20000, 1000))
plt.show()
My suspicion would be that the value in the data which seems close to zero actually is zero. Since the logarithm of zero is undefined, that point cannot be plotted on a logarithmic scale. Missing the first point in the graph though would not allow to calculate the spacings for markevery. (Just as in "Calculate the mean of NaN and 1" , which is impossible.)
A solution is of course to leave out that point, which cannot be plotted anyways when calling the plot function for a log scale.
plt.semilogx(fpr[1:], tpr[1:], marker='o', markevery=0.1)

Resources