Is it possible to convert an ONNX ConvInteger operator to OpenVINO while keeping tensor precision? - onnx

I am using the OpenVINO model optimizer framework to convert an ONNX model containing a single ConvInteger operation to OpenVINO IR format.
mo --input_model {onnx_model}
The ONNX ConvInteger operator has input and weight tensors with INT8/UINT8 precision, and an output tensor with INT32 precision - this output precision is the only supported precision.
When the model is converted to OpenVINO, the input and weight tensors are converted to INT32 precision automatically, and convert operators are added to the model to make this change in precision.
Is it possible to force the int8/uint8 precision for the openvino model? Alternatively, is there a simple way to convert the precisions to int8/uint8 once the openvino model has been created?
Thanks

You can convert the FP32 or FP16 precision into INT8 without model retraining or fine-tuning by using OpenVINO Post-training Optimization Tool (POT). This tool supports the uniform integer quantization method.
There are two main quantization methods:
Default Quantization: a recommended method that provides fast and accurate results in most cases. It requires only an unannotated dataset for quantization.
Accuracy-aware Quantization: an advanced method that allows keeping accuracy at a predefined range at the cost of performance improvement in case when Default Quantization cannot guarantee it. The method requires annotated representative dataset and may require more time for quantization.

Related

Is there any way to speed up the predicting process for tensorflow lattice?

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)
As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Picking up the anomalies using autoencoder

Other than mean square error, are there other quantities that we can use to detect anomalies using autoencoder in keras?
Generally, the idea is to measure the reconstruction and classify anomalies as those datapoints that cause a significant deviation from the input. Thus, one can other other norms such as mae. However, the results will probably be very similar.
I would suggest different flavors of the auto encoder. First of all, if your are not already using it, the variational autoencoder is better than a standard auto encoder in all aspects.
Second, the performance of a variational autoencoder can be significantly improved by using the reconstruction probability. The idea is to output the parameters for probability distributions not only for the latent space but also for the feature space. This means that the decoder would output a mean and a variance to parameterize a normal distribution when used with continuous data. Then the reconstruction probability is basically the negative log likehood of the normal distribution N(x; decoder_mu, decoder_var). Using the 2-sigma rule, the variance can be interpreted as confidence intervall and thus even small errors can lead to an high error.
Other than that, there are other flavors like vae-gan, which combines a vae and gan uses a combined anomaly score with the reconstruction error and the discriminator prediction. Also depending on your problem type, you can also go into the route of a vae-sl that adds an additional classifier in the bottleneck. The model is then trained on mixed data which can be fully or sparsed labelled. Then the classifier can be used for anomaly detection.

how to avoid upscaling / copy of data in scikit-learn Logistic Regression's fit() for dense input data

I have a large input data set of type float16. My data is dense, not sparse.
I use sklearn's LogisticRegression class to fit a model to that data.
I am running out of memory during model fitting.
I believe fit() will upscale to either float32 or float64, depending on the solver used and/or the version of sklearn.
Is there a way I can avoid this extra data copy and/or upscaling?
Would upscaling the input data myself to float64 avoid the data copy?
My preference is to use the liblinear solver, but will consider lbfgs or saga.
I'd prefer not using gradient descent.

How to convert a tf.estimator to a keras model?

In package tf.estimator, there's a lot of defined estimators. I want to use them in Keras.
I checked TF docs, there's only one converting method that could convert keras. Model to tf. estimator, but no way to convert from estimator to Model.
For example, if we want to convert the following estimator:
tf.estimator.DNNLinearCombinedRegressor
How could it be converted into Keras Model?
You cannot because estimators can run arbitrary code in their model_fn functions and Keras models must be much more structured, whether sequential or functional they must consist of layers, basically.
A Keras model is a very specific type of object that can therefore be easily wrapped and plugged into other abstractions.
Estimators are based on arbitrary Python code with arbitrary control flow and so it's quite tricky to force any structure onto them.
Estimators support 3 modes - train, eval and predict. Each of these could in theory have completely independent flows, with different weights, architectures etc. This is almost unthinkable in Keras and would essentially amount to 3 separate models.
Keras, in contrast, supports 2 modes - train and test (which is necessary for things like Dropout and Regularisation).

How to make polynomial features using sparse matrix in Scikit-learn

I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model.
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
('linear', LinearRegression(fit_intercept=False))])
model.fit(X, y)
But it throws an error
TypeError: A sparse matrix was passed, but dense data is required
I know my data is sparse matrix format. So when I try to convert my data to dense matrix it shows memory error. Because my data is huge(50k~). Because of these large amounts of data I can't convert it to a dense matrix.
I also find Github Issues where this feature is requested. But still not implemented.
So please can someone tell how to use sparse data format in PolynomialFeatures in Scikit-learn without converting it to dense format?
This is a new feature in the upcoming 0.20 version of sklearn. See Release History - V0.20 - Enhancements If you really wanted to test it out you could install the development version by following the instructions in Sklean - Advanced Installation - Install Bleeding Edge.
Since version 0.21.0, the PolynomialFeatures class accepts CSR matrices for degrees 2 and 3. The method laid out here is used, and the computation is much, much faster than if the input is a CSC matrix or dense (assuming the data's sparse to any reasonable degree - even slightly).
While we are waiting for the latest update of Sklearn - you can find an implementation of sparse interaction here:
https://github.com/drivendataorg/box-plots-sklearn/blob/master/src/features/SparseInteractions.py

Resources