SVM qp solver in sklearn - scikit-learn

I study SVM and I will implement svm using python sklearn.svm.SVC.
As i know SVM problem can be represented a QP(Quadratic Programming)
So here i was wondering which QP solver is used to solve the SVM QP problem in sklearn svm.
I think it may be SMO or coordinate descent algorithm.
Please let me know what the exact algorithm is used in sklearn svm

Off-the-shelf QP-solvers have been used in the past, but for many years now dedicated code is used (much faster and more robust). Those solvers are not (general) QP-solvers anymore and are just build for this one use-case.
sklearn's SVC is a wrapper for libsvm (proof).
As the link says:
Since version 2.8, it implements an SMO-type algorithm proposed in this paper:
R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918, 2005.
(link to paper)

Related

Lasso with Coordinate Descent in Scikit-Learn

I've tried to implement the lasso regression with coordinate descent. In the later process the objective function will include the first derivative of the function as well. All derivatives are computed by a automatic differentiation tool. In the first step I've tried to implement the lasso with simple cyclic coordinate descent without including the derivative.
In an small example with 4 features and ~100 samples the algorithm is converging to the right solution. But the solutions of my real dataset and the solution of the lasso regression from scikit-learn are diffrent. Furthermore, scikit-learns algorithm converges a lot faster. I've used default settings on the scikit-learn setup.
My question is: What is the diffrence between the defaulth scikit-learn algorithm of the lasso regression and the simple coordinate descent? Is there a paper which describes the implemented algorithm?
BR

automatic classification model selecion

i want to know is there any method by which the computer can decide which classification model to use ( Decision trees, logistic regression, KNN, etc. ) by just looking at the training data.
even just the math will be extremely helpful.
I am going to be writing this in python 3, so if there's any built method in scikit-learn or tensorflow for this purpose,it would be of great help.
This scikit learn tool kit solves it :
https://automl.github.io/auto-sklearn/stable/index.html

Scikit-learn’s implementation of Dirichlet Process Gaussian Mixture Model: Gibbs sampling or Variational inference?

Reading the docs of scikit-learn I had understood that the implementation behind the DPGMM class use variational inference rather than the also traditional Gibbs sampling.
Nevertheless, while going through this Edwin Chen's popular post ("Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process") he says he uses scikit-learn to run Gibbs sampling inference of a DPGMM.
So, is there a Gibbs Sampling implementation of the DP-GMM in scikit-learn, Chen got it wrong or there was a Gibbs version that was replaced by the variational one?
As far as I know there never was a Gibbs sampling implementation. (And I have been with the project for a couple of years)

text classification using svm

i read this article :A hybrid classification method of k nearest neighbor, Bayesian methods
and genetic algorithm
it's proposed to use genetic algorithm in order to improve text classification
i want to replace Genetic algorithm with SVM but i don't know if it works or not
i mean i do not know if the new idea and the result will be better than this article
i read somewhere Ga is better than SVM but i dono if it's right or not?
SVM and Genetic Algorithms are in fact completely different methods. SVM is basicaly a classification tool, while genetic algorithms are meta optimisation heuristic. Unfortunately I do not have access to the cited paper, but I can hardly imagine, how putting sVM in the place of GA could work.
i read somewhere Ga is better than SVM but i dono if it's right or not?
No, it is not true. These methods are not comparable as they are completely different tools.

auc_score in scikit-learn 0.14

I'm training a RandomForestClassifier on a binary classification problem in scikit-learn. I want to maximize my auc score for the model. I understand this is not possible in the 0.13 stable version but is possible in the 0.14 bleeding edge version.
I tried this but I seemed to get a worse result:
ic = RandomForestClassifier(n_estimators=100, compute_importances=True, criterion='entropy', score_func = auc_score);
Does this work as a parameter for the model or only in gridsearchCV?
If I use it in gridsearchCV will it make the model fit the data better for auc_score? I also want to try it to maximize recall_score.
I am surprised the above does not raise an error. You can use the AUC only for model selection as in GridSearchCV.
If you use it there (scoring='roc_auc' iirc), this means that the model with the best auc will be selected. It does not make the individual models better with respect to this score.
It is still worth trying, though.
I have found a journal article that addresses highly imbalanced classes with random forests. Although it is aimed at running RDF on Hadoop clusters, the same techniques seem to work well on smaller problems as well:
del Río, S., López, V., Benítez, J. M., & Herrera, F. (2014). On the use of MapReduce for imbalanced big data using Random Forest. Information Sciences, 285, 112-137.
http://sci2s.ugr.es/rf_big_imb/pdf/rio14_INS.pdf

Resources