scikit 0.15 classifiers without predict_proba - scikit-learn

In scikit some classifiers do not implement the "predict_proba" function.
While I understand that some classifiers do not predict probabilities, I would expect that there is always a confidence factor in a the prediction of a classifier.
I would like to know how to have something equivalent of predict_proba Perceptron model (scikit 0.15).
Is there such a thing?
(I think there was predict_proba for older versions of scikit but there is not one in the version I need to use)

Some binary classifiers have uncalibrated decision_function method that yields positive or negative values and a threshold at zero. It is possible to use that and compute a calibrated probability estimate of correct classification, see this on-going pull request for instance:
https://github.com/scikit-learn/scikit-learn/pull/1176

Related

OneVsRest predict_proba() and predict() don't match?

I'm using OneVsRestClassifier on a multiclass problem with svm.SVC as the base estimator.
The argmax from the predict_proba() does not match the predicted class:
Is there some normalization going on in the background? How do I get predict_proba() and predict()
to match?
According to the scikit learn's SVC documentation on multi-class classification, there can be discrepancies between the output of predict and the argmax of predict_proba (emphasis mine):
The decision_function method of SVC and NuSVC gives per-class scores for each sample (or a single score per sample in the binary case). When the constructor option probability is set to True, class membership probability estimates (from the methods predict_proba and predict_log_proba) are enabled. In the binary case, the probabilities are calibrated using Platt scaling: logistic regression on the SVM’s scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per Wu et al. (2004).
Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.
You cannot get them to match using a SVC. You can try another model if you need the probabilities. If you do not need probabilities, as stated in the documentation, you can use decision_function (see here for more details.)

Modelling probabilities in a regularized (logistic?) regression model in python

I would like to fit a regression model to probabilities. I am aware that linear regression is often used for this purpose, but I have several probabilities at or near 0.0 and 1.0 and would like to fit a regression model where the output is constrained to lie between 0.0 and 1.0. I want to be able to specify a regularization norm and strength for the model and ideally do this in python (but an R implementation would be helpful as well). All the logistic regression packages I've found seem to be only suited for classification whereas this is a regression problem (albeit one where I want to use the logit link function). I use scikits-learn for my classification and regression needs so if this regression model can be implemented in scikits-learn, that would be fantastic (it seemed to me that this is not possible), but I'd be happy about any solution in python and/or R.
The question has two issues, penalized estimation and fractional or proportions data as dependent variable. I worked on each separately but never tried the combination.
Penalization
Statsmodels has had L1 regularized Logit and other discrete models like Poisson for some time. In recent months there has been a lot of effort to support more penalization but it is not in statsmodels yet. Elastic net for linear and Generalized Linear Model (GLM) is in a pull request and will be merged soon. More penalized GLM like L2 penalization for GAM and splines or SCAD penalization will follow over the next months based on pull requests that still need work.
Two examples for the current L1 fit_regularized for Logit are here
Difference in SGD classifier results and statsmodels results for logistic with l1 and https://github.com/statsmodels/statsmodels/blob/master/statsmodels/examples/l1_demo/short_demo.py
Note, the penalization weight alpha can be a vector with zeros for coefficients like the constant if they should not be penalized.
http://www.statsmodels.org/dev/generated/statsmodels.discrete.discrete_model.Logit.fit_regularized.html
Fractional models
Binary and binomial models in statsmodels do not impose that the dependent variable is binary and work as long as the dependent variable is in the [0,1] interval.
Fractions or proportions can be estimated with Logit as Quasi-maximum likelihood estimator. The estimates are consistent if the mean function, logistic, cumulative normal or similar link function, is correctly specified but we should use robust sandwich covariance for proper inference. Robust standard errors can be obtained in statsmodels through a fit keyword cov_type='HC0'.
Best documentation is for Stata http://www.stata.com/manuals14/rfracreg.pdf and the references therein. I went through those references before Stata had fracreg, and it works correctly with at least Logit and Probit which were my test cases. (I don't find my scripts or test cases right now.)
The bad news for inference is that robust covariance matrices have not been added to fit_regularized, so the correct sandwich covariance is not directly available. The standard covariance matrix and standard errors of the parameter estimates are derived under the assumption that the model, i.e. the likelihood function, is correctly specified, which will not be the case if the data are fractions and not binary.
Besides using Quasi-Maximum Likelihood with binary models, it is also possible to use a likelihood that is defined for fractional data in (0, 1). A popular model is Beta regression, which is also waiting in a pull request for statsmodels and is expected to be merged within the next months.

Calculate probability MLLIB SVM multi-class

I would like to know how to calculate the probability using Spark MLLIB SVM in a multi-class classification problem.
The documentation shows there is no such function available.
LibSVM uses Platt-scaling.
My questions are:
Is there a function to calculate the probability somewhere?
If not, who can help me implementing such functionality?
I would simply take the average distances from all the support vectors for each category after training and compare the distance from a new data-point to hyperplanes from all the classifiers.
I think the SVMModel.predict() gives these distances, but I am uncertain.

sci-kit SVM multi-class classification with unseen label

is there any possibility to configure an svm classifier from sci-kit such that:
1.) the svm classifier is trained with examples from 0,...,n - 1
2.) If none of the single classifiers (one-vs-rest) delivers a positive result (class membership), then the output is a designated label n which means "none of them"
Thanks!
By construction, the OvR multiclass wrapper sklearn.multiclass.OneVsRestClassifier selects the maximum decision_function output or the maximum predict_proba to be decisive of predicted class. This means that there will always be a predicted class.
If you wanted e.g. to predict "None of these" when decision_function / predict_proba all stay under a certain threshold (for all OvR problems), then you would have to write this estimator yourself, but could get inspiration from the code of sklearn.multiclass.OneVsRestClassifier and just modify the decision logic.

Regarding Probability Estimates predicted by LIBSVM

I am attempting 3 class classification by using SVM classifier. How do we interpret the probabililty estimates predicted by LIBSVM. Is it based on perpendicular distance of the instance from the maximal margin hyperplane?.
Kindly through some light on the interpretation of probability estimates predicted by LIBSVM classifier. Parameters C and gamma are first tuned and then probability estimates are outputted by using -b option with both training and testing.
Multiclass SVM is always decomposed into several binary classifiers (typically a set of one vs all classifiers). Any binary SVM classifier's decision function outputs a (signed) distance to the separating hyperplane. In short, an SVM maps the input domain to a one-dimensional real number (the decision value). The predicted label is determined by the sign of the decision value. The most common technique to obtain probabilistic output from SVM models is through so-called Platt scaling (paper of LIBSVM authors).
Is it based on perpendicular distance of the instance from the maximal margin hyperplane?
Yes. Any classifier that outputs such a one-dimensional real value can be post-processed to yield probabilities, by calibrating a logistic function on the decision values of the classifier. This is the exact same approach as in standard logistic regression.
SVM performs binary classification. In order to achieve multiclass classification libsvm performs what it's called one vs all. What you get when you invoke -bis the probability related to this technique that you can found explained here .

Resources