Converting Pytorch model with NonZero operations to TensorRT - pytorch

I'm trying to convert Pytorch model containing NonZero operations to TRT.
I've successfully saved it as ONNX, but neither TensorRT support the operation natively nor existing converters like https://github.com/onnx/onnx-tensorrt
Could you please advice how I can deal with it maybe by changing the operation to Where or Equal + Logical Not?

Related

ONNX model inference produces different results for the same input

I'm testing the ONNX model with one identical input for multiple inference calls, but it produces different results every time?
For details, please refer to the below Colab script.
https://colab.research.google.com/drive/1cBd0MkQ804FXjWtOME1EB1-UiTXe1elp#scrollTo=bRLuTOjO2YQU
This is expected, as ONNX does not provide deterministic computations (details).
The flag SessionOptions.use_deterministic_compute is used for ONNX training, but inference is never stable.
The conversion script provides a number of tests with configurable absolute and relative error.

Is there any way to speed up the predicting process for tensorflow lattice?

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)
As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Keras use multi-gpu without Model object (not for training)

I have a bunch of tensor operations (matmul, transpose, etc..) I would like to run on a large dataset.
Since they are still matrix operations, and since I am using Keras generators to load the data batches, It would make sense to use GPUs to compute them.
Now, I've searched a while and I can't seem to find which is the correct way to use Keras to do parallel GPU operations, using generators, outside of the standard Model object interface.
Does anyone know how to do it? Thanks!

Tensorflow hangs when initializing large matrix with variable.Whats the best solution for handling large matrix multiplication in tensorflow?

I am trying to model a neural network using tensorflow.
But the matrices are in the order of 800000x300000.When I initialize the variables using global variable initializer in tensorflow, the system freezes. How to do deal with this problem?
Could tensorflow with gpu support will be able to handle this large matrix?
You can divide data set into batches and then process you model or you can use tensor flow queue

Training Keras model with Dask Array is very slow

I want to use Dask to read a large dataset and feed with it a Keras model. The data consists of audio files and I am using a custom function to read them. I have tried to apply delayed to this function and I collect all of the files in a dask array, as:
x = da.stack([da.from_delayed(delayed(get_item_data)(fp, sr, mono, post_processing, data_shape), shape=data_shape, dtype=np.float32) for fp in df['path']])
(See the source)
To train the Keras model, I compute X and Y as above and I input them to the function fit.
However, the training is very slow. I have tried to change the chunksizeand it is still very slow.
Could you tell me if I am doing something wrong when creating the array? Or any good practices for it?
Thanks
As far as I know Keras doesn't have any built-in support for Dask.arrays. So I'm not sure what will happen when you provide a dask.array directly to Keras functions. My guess is that it will automatically convert the dask.array into a (possibly very large) numpy array.

Resources