Convert scikit-learn SVM model to LibSVM - scikit-learn

I have trained a SVM (svc) using scikit-learn over half a terabyte of data. The model is working fine and I need to port it to C, but I don't want to re-train the SVM from scratch because it takes way too long for me. Is there a way to easily export the model generated by scikit-learn and import it into LibSVM? Internally scikit-learn uses LibSVM so theoretically it should be possible, but I haven't been able to find anything in the documentation. Any suggestion?

Is there a way to easily export the model generated by scikit-learn and import it into LibSVM?
No. The scikit-learn version of LIBSVM has been hacked up severely to fit it into the Python environment and the model is stored as NumPy/SciPy data structures.
Your best shot is to study the SVM decision function and reimplement it in C. The support vectors can be obtained from the SVC object as NumPy arrays, which are easily translated to C arrays.

Related

Is there any way to speed up the predicting process for tensorflow lattice?

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)
As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

automatic classification model selecion

i want to know is there any method by which the computer can decide which classification model to use ( Decision trees, logistic regression, KNN, etc. ) by just looking at the training data.
even just the math will be extremely helpful.
I am going to be writing this in python 3, so if there's any built method in scikit-learn or tensorflow for this purpose,it would be of great help.
This scikit learn tool kit solves it :
https://automl.github.io/auto-sklearn/stable/index.html

How to make polynomial features using sparse matrix in Scikit-learn

I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model.
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
('linear', LinearRegression(fit_intercept=False))])
model.fit(X, y)
But it throws an error
TypeError: A sparse matrix was passed, but dense data is required
I know my data is sparse matrix format. So when I try to convert my data to dense matrix it shows memory error. Because my data is huge(50k~). Because of these large amounts of data I can't convert it to a dense matrix.
I also find Github Issues where this feature is requested. But still not implemented.
So please can someone tell how to use sparse data format in PolynomialFeatures in Scikit-learn without converting it to dense format?
This is a new feature in the upcoming 0.20 version of sklearn. See Release History - V0.20 - Enhancements If you really wanted to test it out you could install the development version by following the instructions in Sklean - Advanced Installation - Install Bleeding Edge.
Since version 0.21.0, the PolynomialFeatures class accepts CSR matrices for degrees 2 and 3. The method laid out here is used, and the computation is much, much faster than if the input is a CSC matrix or dense (assuming the data's sparse to any reasonable degree - even slightly).
While we are waiting for the latest update of Sklearn - you can find an implementation of sparse interaction here:
https://github.com/drivendataorg/box-plots-sklearn/blob/master/src/features/SparseInteractions.py

how to dump model from GradientBoostingRegressor to txt?

I am using gradient boosting regressor to build a predictive model.
After all the tuning/CV, finally I get my prediction right. I am now thinking about dump the model to a file, so that my production c++ program can load it and use it.
It seem that sklearn provides model persistence through pickle, but I am wondering if there is a way to convert pickle model into txt, like what xgboost has. My production code is c++ so having pickle as media is really not handy
Is there a 'dumpModel' function in the library?
Anyone has any experience ?
Thanks

sklearn: Regression models on sparse data?

Does python's scikit-learn have any regression models that work well with sparse data?
I was poking around and found this "sparse linear regression" module, but it seems outdated. (It's so old that scikit-learn was called 'scikits-learn' at the time, I think.)
Most scikit-learn regression models (linear such as Ridge, Lasso, ElasticNet or non-linear, e.g. with RandomForestRegressor) support both dense and sparse input data recent versions of scikit-learn (0.16.0 is the latest stable version at the time of writing).
Edit: if you are unsure, check the docstring of the fit method of the class of interest.

Resources