Is the scikit learn implementation version of LibSVM nearly as computationally efficient as the original LibSVM?
Does anyone have any stats on the comparison?
Thanks!
There is no point in comparison. Sklearn's SVC is just a wrapper around libsvm. So up to data manipulation "around" it (pre- and post processing), it is exactly the same.
Related
I'm translating a random forest using h20 and r into a random forest using SciKit Learn's Random Forest Classifier with python. H2o's randomForest model has an argument 'stopping_rounds'. Is there a way to do this in python using the SKLearn Random Forest Classifier model? I've looked through the documentation, so I'm afraid I might have to hard code this.
No, I don't believe scikit-learn algorithms have any sort of automatic early stopping mechanism (that's what stopping_rounds relates to in H2O algorithms). You will have to figure out the optimal number of trees manually.
Per the sklearn random forest classifier docs, early stopping is determined by the min_impurity_split (deprecated) and min_impurity_decrease arguments. It doesn't seem to have the same functionality as H2O, but it might be what you're looking for.
I am working on a binary logistic regression data set in python. I want to know if there are any numerical methods to calculate how well the model fits the data.
please don't include graphical methods like plotting etc.
Thanks :)
read through 3.3.2. Classification metrics in sklearn documentation.
http://scikit-learn.org/stable/modules/model_evaluation.html
hope it helps.
I've been using a bunch of scikit-learn Transformer classes to pipeline and combine features for point-wise ranking modeling and I'd like to convert these features into LibSVM format to experiment with XGBoost and other methods. Is there any easy way to dump scikit-learn features into LibSVM format? Thanks.
I believe you're looking for sklearn.datasets.dump_svmlight_file.
I have trained a SVM (svc) using scikit-learn over half a terabyte of data. The model is working fine and I need to port it to C, but I don't want to re-train the SVM from scratch because it takes way too long for me. Is there a way to easily export the model generated by scikit-learn and import it into LibSVM? Internally scikit-learn uses LibSVM so theoretically it should be possible, but I haven't been able to find anything in the documentation. Any suggestion?
Is there a way to easily export the model generated by scikit-learn and import it into LibSVM?
No. The scikit-learn version of LIBSVM has been hacked up severely to fit it into the Python environment and the model is stored as NumPy/SciPy data structures.
Your best shot is to study the SVM decision function and reimplement it in C. The support vectors can be obtained from the SVC object as NumPy arrays, which are easily translated to C arrays.
I need a library for naïve Bayes large scale, with millions of training examples and +100k binary features. It must be an online version (updatable after training). I also need top-k output, that is multiple classifications for a single instance. Accuracy is not very important.
The purpose is an automatic text categorization application.
Any suggestions for a good library is very appreciated.
EDIT: The library should preferably be in Java.
If a learning algorithm other than naïve Bayes is also acceptable, then check out Vowpal Wabbit (C++), which has the reputation of being one of the best scalable text classification algorithms (online stochastic gradient descent + LDA). I'm not sure if it does top-K output.