Is it possible to create an ONNX convolution operator which has int8/uint8 precision for the input, weight and output tensors?
The ONNX Conv only supports fp precisions and ConvInteger requires an int32 output tensor. Is there simple way of converting one of these operators so that all tensors have int8/uint8 precision?
Related
I am using the OpenVINO model optimizer framework to convert an ONNX model containing a single ConvInteger operation to OpenVINO IR format.
mo --input_model {onnx_model}
The ONNX ConvInteger operator has input and weight tensors with INT8/UINT8 precision, and an output tensor with INT32 precision - this output precision is the only supported precision.
When the model is converted to OpenVINO, the input and weight tensors are converted to INT32 precision automatically, and convert operators are added to the model to make this change in precision.
Is it possible to force the int8/uint8 precision for the openvino model? Alternatively, is there a simple way to convert the precisions to int8/uint8 once the openvino model has been created?
Thanks
You can convert the FP32 or FP16 precision into INT8 without model retraining or fine-tuning by using OpenVINO Post-training Optimization Tool (POT). This tool supports the uniform integer quantization method.
There are two main quantization methods:
Default Quantization: a recommended method that provides fast and accurate results in most cases. It requires only an unannotated dataset for quantization.
Accuracy-aware Quantization: an advanced method that allows keeping accuracy at a predefined range at the cost of performance improvement in case when Default Quantization cannot guarantee it. The method requires annotated representative dataset and may require more time for quantization.
I have a tensor and want to convert in into a Quantized Binary form.
x= torch.Tensor([-9.0387e-01, 1.4811e-01, 2.8242e-01, 3.6679e-01, 3.2012e-01])
PyTorch only supports qint8 type.
You can convert the tensor to a quantized version with torch.quantize_per_tensor, you can check the wiki here.
I am trying to do a text classification using pytorch and torchtext on paperspace.
I get
RuntimeError: ‘lengths’ argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
My PyTorch version is 1.10.1+cu102
I just had this problem yesterday, in my case the rnn pad sequences wants length to be on the cpu, so just put the lengths to CPU in your function call like this:
packed_sequences = nn.utils.rnn.pack_padded_sequence(padded_tensor, valid_frames.to('cpu'), batch_first=True, enforce_sorted=True)
This might not be the exact function you're using but I think it will apply to most of the rnn utils functions.
The input shape of my Keras model is (batch_size, time_steps, spatial_dim, features), which was fed through a TimeDistributed 1D convolution with N filters. The output of the TimeDistributed convolution is (batch_size, time_steps, spatial_dim, N). I wanted to feed this output to an RNN, however, I did not want to lose the spatial_dim by flattening the last two dimensions. So I reshaped my convolution output as (batch_size, spatial_dim, time_steps, N) as an input to a TimeDistributed RNN layer. Is this valid to flip the time axis and apply TimeDistributed RNN for preserving the spatial dimension? Any other alternative to preserve the spatial_dim while applying RNN?
As an exercise I need to use only dense layers to perform text classifications. I want to leverage words embeddings, the issue is that the dataset then is 3D (samples,words of sentence,embedding dimension). Can I input a 3D dataset into a dense layer?
Thanks
As stated in the keras documentation you can use 3D (or higher rank) data as input for a Dense layer but the input gets flattened first:
Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.
This means that if your input has shape (batch_size, sequence_length, dim), then the dense layer will first flatten your data to shape (batch_size * sequence_length, dim) and then apply a dense layer as usual. The output will have shape (batch_size, sequence_length, hidden_units). This is actually the same as applying a Conv1D layer with kernel size 1, and it might be more explicit to use a Conv1D layer instead of a Dense layer.