I am trying to use survival_probability_calibration to visualize the performance of Cox model but the calibration curve would always stay flat as shown in the following plot:
Calibration curve with cox model
What can be the reason of this issue? Does any other package like matplotlib affect this plot?
Related
I have a neural network that predicts frequencies for four different features in my dataset. I then test my network and therefore for each input I get a vector of size four where each entry corresponds to frequency of a given feature. Now I want to plot a 2D density plot of true frequencies vs predicted frequencies.
true: [0.000345,0.99183,0,0] prediction: [0.0212,0.738,0.004,0.006]
true: [0,0.9937,0,0.00013] prediction: [0.005,0.983,0.04,0.01]
I basically plot 2D density plot for each of the features separately. That is I would plot true vs predictions for feature # 1, then for feature # 2 and so on.
import seaborn as sns
sns.jointplot(x=feature_1_true, y=feature_1_pred, kind='scatter')
sns.jointplot(x=feature_2_true, y=feature_2_pred, kind='scatter')
My question is there a way to visualise such a vector in totality on a density plot instead of plotting features separately. If you had a vector of such kind how would you plot it's density plot and how would you plot true vs predictions.
I am confused by this example here: https://scikit-learn.org/stable/visualizations.html
If we plot the ROC curve for a Logistic Regression Classifier the ROC curve is parametrized by the threshold parameter. But a usual SVM spits out binary values instead of probabilities.
Consequently there should not be a threshold which can be varied to obtain an ROC curve.
But which parameter is then varied in the example above?
SVMs have a measure of confidence in their predictions using the distance from the separating hyperplane (before the kernel, if you're not doing a linear SVM). These are obviously not probabilities, but they do rank-order the data points, and so you can get an ROC curve. In sklearn, this is done via the decision_function method. (You can also set probability=True in the SVC to calibrate the decision function values into probability estimates.)
See this section of the User Guide for some of the details on the decision function.
I am reading paper by Viola and Jones. There they have used ROC curve to measure the accuracy of their classifier.
https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf
Could someone please explain how the ROC curve is plotted in case of binary classifier like face or non face? I mean how is the data points obtained.
(X,Y)= (falsepositive, correctdetection rate)
Do I have to calculate these points for every positives and negatives of my training data set. But my positive and negative data sets are of different sizes. I am bit confused.
ROC curve - Receiver operating characteristic is a measure of the accuracy of their classifier. As much as the area under the curve is larger the classifier is more accurate. In order to increase the area under the curve, the classifier needs to have a high value on the y-axis. That means to have a good TPR = true positive rate.
To calculate the ROC you first need to plot a graph of No' of instance as a function of the result of the AdaBoost classifier. After doing that in order to plot the graph you need to move the threshold of the AdaBoost classifier and calculate the TPR and FPR of each point.
I'm trying to obtain ROC Curve for GBTClassifier.
One way is to reuse BinaryClassificationMetrics, however the path given in the documentation (https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html) provides only 4 values for the ROC Curve, like:
[0.0|0.0]
[0.0|0.9285714285714286]
[1.0|1.0]
[1.0|1.0]
Another way is to use the "probability" column instead of "prediction". However, in case of GBTClassifier I don't have it and this solution works mostly for RandomForestClassifier.
How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics
So what is the general/common way to get a ROC curve with enough points for an arbitrary classifier?
I was trying to plot ROC curve and Precision-Recall curve in graph. The points are generated from the Spark Mllib BinaryClassificationMetrics. By following the following Spark https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html
[(1.0,1.0), (0.0,0.4444444444444444)] Precision
[(1.0,1.0), (0.0,1.0)] Recall
[(1.0,1.0), (0.0,0.6153846153846153)] - F1Measure
[(0.0,1.0), (1.0,1.0), (1.0,0.4444444444444444)]- Precision-Recall curve
[(0.0,0.0), (0.0,1.0), (1.0,1.0), (1.0,1.0)] - ROC curve
It looks like you have a similar problem to what I experienced. You need to either flip your parameters to the Metrics constructor or perhaps pass in the probability instead of the prediction. So, for example, if you are using the BinaryClassificationMetrics and a RandomForestClassifier, then according to this page (under outputs) there is "prediction" and "probability".
Then initialize your Metrics thus:
new BinaryClassificationMetrics(predictionsWithResponse
.select(col("probability"),col("myLabel"))
.rdd.map(r=>(r.getAs[DenseVector](0)(1),r.getDouble(1))))
With the DenseVector call used to extract the probability of the 1 class.
As for actual plotting, that's up to you (many fine tools for that), but at least you will get more than 1 point on you curve (besides the endpoints).
And in case it's not clear:
metrics.roc().collect() will give you the data for the ROC curve: Tuples of: (false positive rate, true positive rate).