Onnx to mlmodel conversion fails to generate .mlmodel file - pytorch

I'm trying to convert Pytorch model to MLModel with Onnx.
My code:
import torch
from onnx_coreml import convert
import coremltools
net = BiSeNet(19)
net.cuda()
net.load_state_dict(torch.load('model.pth'))
#net.eval()
dummy = torch.rand(1,3,512,512).cuda()
torch.onnx.export(net, dummy, "Model.onnx", input_names=["image"], output_names=["output"], opset_version=11)
finalModel = convert(model='Model.onnx', minimum_ios_deployment_target='12')
finalModel.save('ModelML.mlmodel')
After the code runs Model.onnx is generated, however, .mlmodel file is not generated. There're no errors in the console. This is the output:
2020-04-15 21:49:32.367179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
WARNING:root:TensorFlow version 2.2.0-rc2 detected. Last version known to be fully compatible is 1.14.0 .
WARNING:root:Keras version 2.3.1 detected. Last version known to be fully compatible of Keras is 2.2.4 .
1.4.0
/content/drive/My Drive/Collab/fp/model.py:116: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size_array = [int(s) for s in feat32.size()[2:]]
/content/drive/My Drive/Collab/fp/model.py:80: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size_array = [int(s) for s in feat.size()[2:]]
/content/drive/My Drive/Collab/fp/model.py:211: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size_array = [int(s) for s in feat.size()[2:]]
What could be the issue?

Related

Attribute Error when trying to use PyTorch

I am getting this error when trying to use PyTorch
import torch
z = torch.zeros(5,3)
print (z)
print(z.datatype)
AttributeError: partially initialized module 'torch' has no attribute 'zeros' (most likely due to a circular import)
I am on python 3.9 because PyTorch does not work with more modern versions
I tried reimporting with pip3 and it says that I already have it downloaded
Show a minimal, reproducible example
In an empty python program, show what is needed for any third party to replicate your problem.
This means:
import statement
just enough statements to trigger the error
And please state version of Pytorch as well as the version of Python that you did give.

Example of LSTMCell usage in MXNet

I'm trying to make use of LSTM Cell, according to this tutorial:
https://mxnet.apache.org/versions/1.2.1/api/python/gluon/rnn.html
model = mx.gluon.rnn.SequentialRNNCell()
model.add(mx.gluon.rnn.LSTMCell(20))
model.add(mx.gluon.rnn.LSTMCell(20))
states = model.begin_state(batch_size=32)
input = mx.nd.random.uniform(shape=(32, 10))
model.initialize()
model(input, states)
I'm getting the following error: ValueError: Deferred initialization failed because shape cannot be inferred. Operator FullyConnected registered in backend is known as FullyConnected in Python. This is a legacy operator which can only accept legacy ndarrays, while received an MXNet numpy ndarray. Please call as_nd_ndarray() upon the numpy ndarray to convert it to a legacy ndarray, and then feed the converted array to this operator.
What's wrong with it? I'm using MXnet '2.0.0' and can't find any up to date tutorial.

Python XGBoost prediction discrepancies with DMatrix

I found there are 2 problems with xbgoost predictions. I trained the model with XGBClassifier and tried to load the model using Booster for prediction, I found
Predictions are slightly different using xbg.Booster and xgb.Classifier, see below.
Predictions are different between list and numpy array when using DMatrix, see below,
Some difference is quite big, I am not sure why this is happening and which prediction should be the source of truth?
For the second question, your data types could change when you convert a list to a numpy array (depending on the numpy version you're using). For example on numpy 1.19.5, try converting list ["1",1] to a numpy array and see the result.

ONNX runtime is throwing TypeError when loading an onnx model

I have converted a savedModel format to onnx model but when loading it via onnxruntime
import onnxruntime as rt
sess = rt.InferenceSession('model.onnx')
It throws me the below error:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from /mnt/model/io_files/convert/1606801475/model.onnx failed:This is an invalid model. Type Error: Type 'tensor(float)' of input parameter (const_fold_opt__342) of operator (Slice) in node (StatefulPartitionedCall/mobilenet_1.00_224/reshape_1/strided_slice) is invalid.
The savedModel I have used is Keras pretrained MobileNet from the tensorflow website: https://www.tensorflow.org/guide/saved_model.
I saw the parameters in netron is float but I am unable to address and understand this issue.
Below is the snip from netron:
Looks like a bug in Keras->ONNX converter.
starts input of Slice must be int32 or int64: https://github.com/onnx/onnx/blob/master/docs/Operators.md#Slice
You can try to patch the model by using onnx Python interface: load the model, find the node, change input type. But if the model has this issue, the Keras->ONNX converter is probably not very well-tested and there are likely other issues.
Can you find an equivalent PyTorch model? PyTorch->ONNX converter should be much better.

Pytorch: Convert 2D-CNN model to tflite

I'd like to convert a model (eg Mobilenet V2) from pytorch to tflite in order to run it on a mobile device.
Has anyone managed to do so?
All I found, was a method that uses ONNX to convert the model into an inbetween state. However, this seems not to work properly, as Tensorflow expects a NHWC-channel order whereas onnx and pytorch work with NCHW channel order.
There is a discussion on github, however in my case the conversion worked without complaints until a "frozen tensorflow graph model", after trying to convert the model further to tflite, it complains about the channel order being wrong...
Here is my code so far:
import torch
import torch.onnx
import onnx
from onnx_tf.backend import prepare
# Create random input
input_data = torch.randn(1,3,224,224)
# Create network
model = torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True)
model.eval()
# Forward Pass
output = model(input_data)
# Export model to onnx
filename_onnx = "mobilenet_v2.onnx"
filename_tf = "mobilenet_v2.pb"
torch.onnx.export(model, input_data, filename_onnx)
# Export model to tensorflow
onnx_model = onnx.load(filename_onnx)
tf_rep = prepare(onnx_model)
tf_rep.export_graph(filename_tf)
All working without errors until here (ignoring many tf warnings). Then I look up the names of the input and output tensors using netron ("input.1" and "473").
Finally I apply my usual tf-graph to tf-lite conversion script from bash:
tflite_convert \
--output_file=mobilenet_v2.tflite \
--graph_def_file=mobilenet_v2.pb \
--input_arrays=input.1 \
--output_arrays=473
My configuration:
torch 1.6.0.dev20200508 (needs pytorch-nightly to work with mobilenet V2 from torch.hub)
tensorflow-gpu 1.14.0
onnx 1.6.0
onnx-tf 1.5.0
Here is the exact error message I'm getting from tflite:
Unexpected value for attribute 'data_format'. Expected 'NHWC'
Fatal Python error: Aborted
UPDATE:
Updating my configuration:
torch 1.6.0.dev20200508
tensorflow-gpu 2.2.0
onnx 1.7.0
onnx-tf 1.5.0
using
tflite_convert \
--output_file=mobilenet_v2.tflite \
--graph_def_file=mobilenet_v2.pb \
--input_arrays=input.1 \
--output_arrays=473 \
--enable_v1_converter # <-- needed for conversion of frozen graphs
leading to another error:
Exception: <unknown>:0: error: loc("convolution"): 'tf.Conv2D' op is neither a custom op nor a flex op
Update:
Here is an onnx model of mobilenet v2 loaded via netron:
Here is a gdrive link to my converted onnx and pb file
#Ahwar posted a nice solution to this using a Google Colab notebook.
It uses
torch 1.5.0+cu101
torchsummary 1.5.1
torchtext 0.3.1
torchvision 0.6.0+cu101
tensorflow 1.15.2
tensorflow-addons 0.8.3
tensorflow-estimator 1.15.1
onnx 1.7.0
onnx-tf 1.5.0
The conversion is working and the model can be tested on my computer. However when pushing the model to the mobile phone it only works in CPU mode and is much slower (almost 10 fold) than a corresponding model created in tensorflow directly. GPU mode is not working on my mobile phone (in contrast to the corresponding model created in tensorflow directly)
Update:
Apparantly after converting the mobilenet v2 model, the tensorflow frozen graph contains many more convolution operations than the original pytorch model ( ~38 000 vs ~180 ) as discussed in this github issue.
https://github.com/alibaba/TinyNeuralNetwork
You can try this project to convert the pytorch model to tflite. It supports all models in torchvision, and can eliminate redundant operators, basically without performance loss

Resources