Sklearn: Difference between using OneVsRestClassifier and build each classifier individually - scikit-learn

As far as I know, multi-label problem can be solved with one-vs-all scheme, for which Scikit-learn implements OneVsRestClassifier as a wrapper on classifier such as svm.SVC. I am wondering how would it be different if I literally train, say we have a multi-label problem with n classes, n individual binary classifiers for each label and thereby evaluate them separately.
I know it is like a "manual" way of implementing one-vs-all rather than using the wrapper, but are two ways actually different? If so, how are they different, like in execution time or performance of classifier(s)?

There would be no difference. For multi-label classification, sklearn one-versus-rest implements binary relevance which is what you have described.

Related

pos_weight in multilabel classification in pytorch

I am using pytorch for multilabel classification. I have used pos_weights in BCELoss since i have imbalanced data. FOr to use pos_weight, whether we need to take the entire dataset(train, validation, test) or only the training set for calculating the pos_Weight... Thanks...
While not a coding question and better suited for a different SE site, the quick answer is this:
You always assume you have never seen the test set before, so you cannot use it in any way to make decisions about the model design. For the validation set, a similar argument can be made in that you want to validate at regular intervals using unseen data. As such, you want to calculate class weights using the train data only.
Do keep in mind that if the class distribution is not a representation of the class distribution in unseen data (i.e. the real world, or your test set), then the model will optimize for the wrong class distribution. This should be solved by analyzing the task better, not by directly using the test set to determine class distribution.

How to convert a tf.estimator to a keras model?

In package tf.estimator, there's a lot of defined estimators. I want to use them in Keras.
I checked TF docs, there's only one converting method that could convert keras. Model to tf. estimator, but no way to convert from estimator to Model.
For example, if we want to convert the following estimator:
tf.estimator.DNNLinearCombinedRegressor
How could it be converted into Keras Model?
You cannot because estimators can run arbitrary code in their model_fn functions and Keras models must be much more structured, whether sequential or functional they must consist of layers, basically.
A Keras model is a very specific type of object that can therefore be easily wrapped and plugged into other abstractions.
Estimators are based on arbitrary Python code with arbitrary control flow and so it's quite tricky to force any structure onto them.
Estimators support 3 modes - train, eval and predict. Each of these could in theory have completely independent flows, with different weights, architectures etc. This is almost unthinkable in Keras and would essentially amount to 3 separate models.
Keras, in contrast, supports 2 modes - train and test (which is necessary for things like Dropout and Regularisation).

p-values of scikit-learn LogisticRegressionCV?

I'm using scikit-learn's LogisticRegressionCV. Looks like the coefs_ field are the logistic regression coefficients. Is there any way to get p-values, z-values, or some measure of uncertainty for each feature? (For example, as discussed here in R.)
Unfortunately, scikit-learn does not have any such methods for the logistic regression (nor for the linear regression as a matter of fact). I found this which might be of interest for you, but honestly, I would try to stick to R for such tasks if you can.
https://datascience.stackexchange.com/questions/15398/how-to-get-p-value-and-confident-interval-in-logisticregression-with-sklearn

Image Classification using Tensorflow

I am doing transfer-learning/retraining using Tensorflow Inception V3 model. I have 6 labels. A given image can be one single type only, i.e, no multiple class detection is needed. I have three queries:
Which activation function is best for my case? Presently retrain.py file provided by tensorflow uses softmax? What are other methods available? (like sigmoid etc)
Which Optimiser function I should use? (GradientDescent, Adam.. etc)
I want to identify out-of-scope images, i.e. if users inputs a random image, my algorithm should say that it does not belong to the described classes. Presently with 6 classes, it gives one class as a sure output but I do not want that. What are possible solutions for this?
Also, what are the other parameters that we may tweak in tensorflow. My baseline accuracy is 94% and I am looking for something close to 99%.
Since you're doing single label classification, softmax is the best loss function for this, as it maps your final layer logit values to a probability distribution. Sigmoid is used when it's multilabel classification.
It's always better to use a momentum based optimizer compared to vanilla gradient descent. There's a bunch of such modified optimizers like Adam or RMSProp. Experiment with them to see what works best. Adam is probably going to give you the best performance.
You can add an extra label no_class, so your task will now be a 6+1 label classification. You can feed in some random images with no_class as the label. However the distribution of your random images must match the test image distribution, else it won't generalise.

What does the CV stand for in sklearn.linear_model.LogisticRegressionCV?

scikit-learn has two logistic regression functions:
sklearn.linear_model.LogisticRegression
sklearn.linear_model.LogisticRegressionCV
I'm just curious what the CV stands for in the second one. The only acronym I know in ML that matches "CV" is cross-validation, but I'm guessing that's not it, since that would be achieved in scikit-learn with a wrapper function, not as part of the logistic regression function itself (I think).
You are right in guessing that the latter allows the user to perform cross validation. The user can pass the number of folds as an argument cv of the function to perform k-fold cross-validation (default is 10 folds with StratifiedKFold).
I would recommend reading the documentation for the functions LogisticRegression and LogisticRegressionCV
Yes, it's cross-validation. Excerpt from the docs:
For the grid of Cs values (that are set by default to be ten values in a logarithmic scale between 1e-4 and 1e4), the best hyperparameter is selected by the cross-validator StratifiedKFold, but it can be changed using the cv parameter.
The point here is the following:
yes: sklearn has general model-selection wrappers providing CV-functionality for all those classifiers/regressors
but: when the classifier/regressor is known/fixed a-priori (to some extent) or sometimes even some CV-model, one can gain advantages using these facts with specialized code bound to one classifier/regressor resulting in improved performance!
Typically:
CV already embedded in optimization-algorithm
Efficient warm-starting (instead of full re-optimization after just the change of one parameter like alpha)
It seems, at least the latter idea is used in sklearn's LogisticRegressionCV, as seen in this excerpt:
In the case of newton-cg and lbfgs solvers, we warm start along the path i.e guess the initial coefficients of the present fit to be the coefficients got after convergence in the previous fit, so it is supposed to be faster for high-dimensional dense data.
May I also refer you to this section in scikit-learn documentation which I beleive explains it well:
Some models can fit data for a range of values of some parameter
almost as efficiently as fitting the estimator for a single value of
the parameter. This feature can be leveraged to perform a more
efficient cross-validation used for model selection of this parameter.
The most common parameter amenable to this strategy is the parameter
encoding the strength of the regularizer. In this case we say that we
compute the regularization path of the estimator.
And logistic regression is one such model. That's why scikit-learn has the dedicated LogisticRegressionCV class that does this.
There are some things left out on other answers, e.g. about gridsearch functionality. See the docs:
cross-validation estimator
An estimator that has built-in cross-validation capabilities to automatically select the best hyper-parameters (see the User Guide). Some example of cross-validation estimators are ElasticNetCV and LogisticRegressionCV. Cross-validation estimators are named EstimatorCV and tend to be roughly equivalent to GridSearchCV(Estimator(), ...). The advantage of using a cross-validation estimator over the canonical estimator class along with grid search is that they can take advantage of warm-starting by reusing precomputed results in the previous steps of the cross-validation process. This generally leads to speed improvements. An exception is the RidgeCV class, which can instead perform efficient Leave-One-Out CV.
https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator
https://github.com/amueller/talks_odt/blob/master/2015/nyc-open-data-2015-andvanced-sklearn.pdf

Resources