I am attempting 3 class classification by using SVM classifier. How do we interpret the probabililty estimates predicted by LIBSVM. Is it based on perpendicular distance of the instance from the maximal margin hyperplane?.
Kindly through some light on the interpretation of probability estimates predicted by LIBSVM classifier. Parameters C and gamma are first tuned and then probability estimates are outputted by using -b option with both training and testing.
Multiclass SVM is always decomposed into several binary classifiers (typically a set of one vs all classifiers). Any binary SVM classifier's decision function outputs a (signed) distance to the separating hyperplane. In short, an SVM maps the input domain to a one-dimensional real number (the decision value). The predicted label is determined by the sign of the decision value. The most common technique to obtain probabilistic output from SVM models is through so-called Platt scaling (paper of LIBSVM authors).
Is it based on perpendicular distance of the instance from the maximal margin hyperplane?
Yes. Any classifier that outputs such a one-dimensional real value can be post-processed to yield probabilities, by calibrating a logistic function on the decision values of the classifier. This is the exact same approach as in standard logistic regression.
SVM performs binary classification. In order to achieve multiclass classification libsvm performs what it's called one vs all. What you get when you invoke -bis the probability related to this technique that you can found explained here .
Related
I am confused by this example here: https://scikit-learn.org/stable/visualizations.html
If we plot the ROC curve for a Logistic Regression Classifier the ROC curve is parametrized by the threshold parameter. But a usual SVM spits out binary values instead of probabilities.
Consequently there should not be a threshold which can be varied to obtain an ROC curve.
But which parameter is then varied in the example above?
SVMs have a measure of confidence in their predictions using the distance from the separating hyperplane (before the kernel, if you're not doing a linear SVM). These are obviously not probabilities, but they do rank-order the data points, and so you can get an ROC curve. In sklearn, this is done via the decision_function method. (You can also set probability=True in the SVC to calibrate the decision function values into probability estimates.)
See this section of the User Guide for some of the details on the decision function.
I'm using OneVsRestClassifier on a multiclass problem with svm.SVC as the base estimator.
The argmax from the predict_proba() does not match the predicted class:
Is there some normalization going on in the background? How do I get predict_proba() and predict()
to match?
According to the scikit learn's SVC documentation on multi-class classification, there can be discrepancies between the output of predict and the argmax of predict_proba (emphasis mine):
The decision_function method of SVC and NuSVC gives per-class scores for each sample (or a single score per sample in the binary case). When the constructor option probability is set to True, class membership probability estimates (from the methods predict_proba and predict_log_proba) are enabled. In the binary case, the probabilities are calibrated using Platt scaling: logistic regression on the SVM’s scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per Wu et al. (2004).
Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.
You cannot get them to match using a SVC. You can try another model if you need the probabilities. If you do not need probabilities, as stated in the documentation, you can use decision_function (see here for more details.)
I am making a machine learning program which classifies words in one of the following categories: Hardware, Software, None_of_these. I make use of the Multinomial Naive Bayes classifier from sklearn.
The function predict() gives me the prediction of every word, however, I can't see the actual probability (float ranging for 0 to 1.0) that the word matches with the predicted categorie. I didn't find this on sklearn's site either.
Is there a function which gives me the probability of every sample?
Nevermind, I found the solution.:
predict_proba(X) Returns probability estimates for the test vector X.
I would like to know how to calculate the probability using Spark MLLIB SVM in a multi-class classification problem.
The documentation shows there is no such function available.
LibSVM uses Platt-scaling.
My questions are:
Is there a function to calculate the probability somewhere?
If not, who can help me implementing such functionality?
I would simply take the average distances from all the support vectors for each category after training and compare the distance from a new data-point to hyperplanes from all the classifiers.
I think the SVMModel.predict() gives these distances, but I am uncertain.
is there any possibility to configure an svm classifier from sci-kit such that:
1.) the svm classifier is trained with examples from 0,...,n - 1
2.) If none of the single classifiers (one-vs-rest) delivers a positive result (class membership), then the output is a designated label n which means "none of them"
Thanks!
By construction, the OvR multiclass wrapper sklearn.multiclass.OneVsRestClassifier selects the maximum decision_function output or the maximum predict_proba to be decisive of predicted class. This means that there will always be a predicted class.
If you wanted e.g. to predict "None of these" when decision_function / predict_proba all stay under a certain threshold (for all OvR problems), then you would have to write this estimator yourself, but could get inspiration from the code of sklearn.multiclass.OneVsRestClassifier and just modify the decision logic.