Calculate probability MLLIB SVM multi-class - apache-spark

I would like to know how to calculate the probability using Spark MLLIB SVM in a multi-class classification problem.
The documentation shows there is no such function available.
LibSVM uses Platt-scaling.
My questions are:
Is there a function to calculate the probability somewhere?
If not, who can help me implementing such functionality?
I would simply take the average distances from all the support vectors for each category after training and compare the distance from a new data-point to hyperplanes from all the classifiers.
I think the SVMModel.predict() gives these distances, but I am uncertain.

Related

Interpreting coefficientMatrix, interceptVector and Confusion matrix on multinomial logistic regression

Can anyone explain how to interpret coefficientMatrix, interceptVector , Confusion matrix
of a multinomial logistic regression.
According to Spark documentation:
Multiclass classification is supported via multinomial logistic (softmax) regression. In multinomial logistic regression, the algorithm produces K sets of coefficients, or a matrix of dimension K×J where K is the number of outcome classes and J is the number of features. If the algorithm is fit with an intercept term then a length K vector of intercepts is available.
I turned an example using spark ml 2.3.0 and I got this result.
.
If I analyse what I get :
The coefficientMatrix has dimension of 5 * 11
The interceptVector has dimension of 5
If so,why the Confusion matrix has a dimension of 4 * 4 ?
Also, can anyone give an interpretation of coefficientMatrix, interceptVector ?
Why I get negative coefficients ?
If 5 is the number of classes after classification, why I get 4 rows in the confusion matrix ?
EDIT
I forgot to mention that I am still beginner in machine learning and that my search in google didn't help, so maybe I get an Up Vote :)
Regarding the 4x4 confusion matrix: I imagine that when you split your data into test and train, there were 5 classes present in your training set and only 4 classes present in your test set. This can easily happen if the distribution of your response variable is imbalanced.
You'll want to try to perform some stratified split between test and train prior to modeling. If you are working with pyspark, you may find this library helpful: https://github.com/databricks/spark-sklearn
Now regarding negative coefficients for a multi-class Logistic Regression: As you mentioned, your returned coefficientMatrix shape is 5x11.
Spark generated five models via one-vs-all approach. The 1st model corresponds to the model where the positive class is the 1st label and the negative class is composed of all other labels. Lets say the 1st coefficient for this model is -2.23. In order to interpret this coefficient we take the exponential of -2.23 which is (approx) 0.10. Interpretation here: 'With one unit increase of 1st feature we expect a reduced odds of the positive label by 90%'

sklearn: AUC score for LinearSVC and OneSVM

One option of the SVM classifier (SVC) is probability which is false by default. The documentation does not say what it does. Looking at libsvm source code, it seems to do some sort of cross-validation.
This option does not exist for LinearSVC nor OneSVM.
I need to calculate AUC scores for several SVM models, including these last two. Should I calculate the AUC score using decision_function(X) as the thresholds?
Answering my own question.
Firstly, it is a common "myth" that you need probabilities to draw the ROC curve. No, you need some kind of threshold in your model that you can change. The ROC curve is then drawn by changing this threshold. The point of the ROC curve being, of course, to see how well your model is reproducing the hypothesis by seeing how well it is ordering the observations.
In the case of SVM, there are two ways I see people drawing ROC curves for them:
using distance to the decision bondary, as I mentioned in my own question
using the bias term as your threshold in the SVM: http://researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM. In fact, if you use SVC(probabilities=True) then probabilities will be calculated for you in this manner, by using CV, which you can then use to draw the ROC curve. But as mentioned in the link I provide, it is much faster if you draw the ROC curve directly by varying the bias.
I think #2 is the same as #1 if we are using a linear kernel, as in my own case, because varying the bias is varying the distance in this particular case.
In order to calculate AUC, using sklearn, you need a predict_proba method on your classifier; this is what the probability parameter on SVC does (you are correct that it's calculated using cross-validation). From the docs:
probability : boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling fit, and will slow down that method.
You can't use the decision function directly to compute AUC, since it's not a probability. I suppose you could scale the decision function to take values in the range [0,1], and compute AUC, however I'm not sure what statistical properties this will have; you certainly won't be able to use it to compare with ROC calculated using probabilities.

SVM Classification: Confidence Interval

Is it possible to get a Z-score from sklearn's svm implementation?
So, if it classifies inputs X as [0,1,0,1,1,1,0,0,0], could you get it to output: [0.5,0.78,0.95,0.11,0.34,...], where these are the estimated confidences the learner has in its predictions?
If I implemented it myself, would I be able to extract this info, or would it turn into a huge project?
As far as I know SVM's don't have a closed-form Z-score, however if you create your SVC with the parameter probability=True, it will include a probability model constructed using cross-validation which you can access using predict_proba, to get an estimate of the confidence of the predictions.

scikit 0.15 classifiers without predict_proba

In scikit some classifiers do not implement the "predict_proba" function.
While I understand that some classifiers do not predict probabilities, I would expect that there is always a confidence factor in a the prediction of a classifier.
I would like to know how to have something equivalent of predict_proba Perceptron model (scikit 0.15).
Is there such a thing?
(I think there was predict_proba for older versions of scikit but there is not one in the version I need to use)
Some binary classifiers have uncalibrated decision_function method that yields positive or negative values and a threshold at zero. It is possible to use that and compute a calibrated probability estimate of correct classification, see this on-going pull request for instance:
https://github.com/scikit-learn/scikit-learn/pull/1176

Regarding Probability Estimates predicted by LIBSVM

I am attempting 3 class classification by using SVM classifier. How do we interpret the probabililty estimates predicted by LIBSVM. Is it based on perpendicular distance of the instance from the maximal margin hyperplane?.
Kindly through some light on the interpretation of probability estimates predicted by LIBSVM classifier. Parameters C and gamma are first tuned and then probability estimates are outputted by using -b option with both training and testing.
Multiclass SVM is always decomposed into several binary classifiers (typically a set of one vs all classifiers). Any binary SVM classifier's decision function outputs a (signed) distance to the separating hyperplane. In short, an SVM maps the input domain to a one-dimensional real number (the decision value). The predicted label is determined by the sign of the decision value. The most common technique to obtain probabilistic output from SVM models is through so-called Platt scaling (paper of LIBSVM authors).
Is it based on perpendicular distance of the instance from the maximal margin hyperplane?
Yes. Any classifier that outputs such a one-dimensional real value can be post-processed to yield probabilities, by calibrating a logistic function on the decision values of the classifier. This is the exact same approach as in standard logistic regression.
SVM performs binary classification. In order to achieve multiclass classification libsvm performs what it's called one vs all. What you get when you invoke -bis the probability related to this technique that you can found explained here .

Resources