I want to run inference in C++ using a yolo3 model I trained with pytorch. I am unable to make the conversions using tracing and scripting provided by pytorch. I have this error during conversion
First diverging operator:
Node diff:
- %2 : __torch__.torch.nn.modules.container.ModuleList = prim::GetAttr[name="module_list"](%self.1)
+ %2 : __torch__.torch.nn.modules.container.___torch_mangle_139.ModuleList = prim::GetAttr[name="module_list"](%self.1)
? ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
Node:
%358 : Tensor = prim::Constant[value=<Tensor>](), scope: __module.module_list.16.yolo_16
Related
Update #1 (original question and details below):
As per the suggestion of #MatthijsHollemans below I've tried to run this by removing dynamic_axes from the initial create_onnx step below. This removed both:
Description of image feature 'input_image' has missing or non-positive width 0.
and
Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Unfortunately this opens up two sub-questions:
I still want to have a functional ONNX model. Is there a more appropriate way to make H and W dynamic? Or should I be saving two versions of the ONNX model, one without dynamic_axes for the CoreML conversion, and one with for use as a valid ONNX model?
Although this solves the compilation error in xcode (specified below) it introduces the following runtime issues:
Finalizing CVPixelBuffer 0x282f4c5a0 while lock count is 1.
[espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/480 status=-7
[coreml] Error binding image input buffer input_image: -7
[coreml] Failure in bindInputsAndOutputs.
I am calling this the same way I was calling the fixed size model, which does still work fine. The image dimensions are 640 x 480.
As specified below the model should accept any image between 64x64 and higher.
For flexible shape models, do I need to provide an input differently in xcode?
Original Question (parts still relevant)
I have been slowly working on converting a style transfer model from pytorch > onnx > coreml. One of the issues that has been a struggle is flexible/dynamic input + output shape.
This method (besides i/o renaming) has worked well on iOS 12 & 13 when using a static input shape.
I am using the following code to do the onnx > coreml conversion:
def create_coreml(name):
mlmodel = convert(
model="onnx/" + name + ".onnx",
preprocessing_args={'is_bgr': True},
deprocessing_args={'is_bgr': True},
image_input_names=['input_image'],
image_output_names=['stylized_image'],
minimum_ios_deployment_target='13'
)
spec = mlmodel.get_spec()
img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange()
img_size_ranges.add_height_range((64, -1))
img_size_ranges.add_width_range((64, -1))
flexible_shape_utils.update_image_size_range(
spec,
feature_name='input_image',
size_range=img_size_ranges)
flexible_shape_utils.update_image_size_range(
spec,
feature_name='stylized_image',
size_range=img_size_ranges)
mlmodel = coremltools.models.MLModel(spec)
mlmodel.save("mlmodel/" + name + ".mlmodel")
Although the conversion 'succeeds' there are a couple of warnings (spaces added for readability):
Translation to CoreML spec completed. Now compiling the CoreML model.
/usr/local/lib/python3.7/site-packages/coremltools/models/model.py:111:
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"Error reading protobuf spec. validator error: Description of image feature 'input_image' has missing or non-positive width 0.".
RuntimeWarning)
Model Compilation done.
/usr/local/lib/python3.7/site-packages/coremltools/models/model.py:111:
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
".
RuntimeWarning)
If I ignore these warnings and try to compile the model for latest targets (13.0) I get the following error in xcode:
coremlc: Error: compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Here is what the problematic area appears to look like in netron:
My main question is how can I get these two warnings out of the way?
Happy to provide any other details.
Thanks for any advice!
Below is my pytorch > onnx conversion:
def create_onnx(name):
prior = torch.load("pth/" + name + ".pth")
model = transformer.TransformerNetwork()
model.load_state_dict(prior)
dummy_input = torch.zeros(1, 3, 64, 64) # I wasn't sure what I would set the H W to here?
torch.onnx.export(model, dummy_input, "onnx/" + name + ".onnx",
verbose=True,
opset_version=10,
input_names=["input_image"], # These are being renamed from garbled originals.
output_names=["stylized_image"], # ^
dynamic_axes={'input_image':
{2: 'height', 3: 'width'},
'stylized_image':
{2: 'height', 3: 'width'}}
)
onnx.save_model(original_model, "onnx/" + name + ".onnx")
I'm training a LSTM and I'm defining parameters and regression layer. I get the error in the title with this code:
lstm_cells = [
tf.contrib.rnn.LSTMCell(num_units=num_nodes[li],
state_is_tuple=True,
initializer= tf.contrib.layers.xavier_initializer()
)
for li in range(n_layers)]
drop_lstm_cells = [tf.contrib.rnn.DropoutWrapper(
lstm, input_keep_prob=1.0,output_keep_prob=1.0-dropout, state_keep_prob=1.0-dropout
) for lstm in lstm_cells]
drop_multi_cell = tf.contrib.rnn.MultiRNNCell(drop_lstm_cells)
multi_cell = tf.contrib.rnn.MultiRNNCell(lstm_cells)
w = tf.get_variable('w',shape=[num_nodes[-1], 1], initializer=tf.contrib.layers.xavier_initializer())
b = tf.get_variable('b',initializer=tf.random_uniform([1],-0.1,0.1))
I'm using tensorflow2 and I have already read the https://www.tensorflow.org/guide/migrate guide and I think almost everything on the net.
But I'm not able to solve it.
How can I do it?
This error occurs because the contrib module has been removed from version 2 of tensorflow. There are two solutions to this problem:
You can delete the current package and install one of the Series 1 versions.
You can use this command, which is also compatible with the version two package: Use tf.compat.v1.nn.rnn_cell.LSTMCell instead of tf.contrib.rnn.LSTMCell and use tf.initializers.GlorotUniform () instead of tf.contrib.layers.xavier_initializer () in other command which include rnn you can use tf.compat.v1.nn.rnn_cell.
tf.contrib.rnn.LSTMCell -> tf.compat.v1.nn.rnn_cell.LSTMCell or tf.keras.layers.LSTMCell
tf.contrib.rnn.DropoutWrapper -> tf.compat.v1.nn.rnn_cell.DropoutWrapper or tf.keras.layers.DropOut
tf.contrib.rnn.MultiRNNCell -> tf.compat.v1.nn.rnn_cell.MultiRNNCell or tf.keras.layers.RNN
tf.contrib has moved out of TF starting TF 2.0 alpha.
Take a look at these tf 2.0 release notes https://github.com/tensorflow/tensorflow/releases/tag/v2.0.0-alpha0
You can upgrade your TF 1.x code to TF 2.x using the tf_upgrade_v2 script
https://www.tensorflow.org/alpha/guide/upgrade
I want to create an object-detection app based on a retrained ssd_mobilenet model I've retrained like the guy on youtube.
I chose the model ssd_mobilenet_v2_coco from the Tensorflow Model Zoo. After the retraining process I've got the model with the following structure:
- saved_model
- variables (empty folder)
- saved_model.pb
- checkpoint
- frozen_inverence_graph.pb
- model.ckpt.data-00000-of-00001
- model.ckpt.index
- model.ckpt.meta
- pipeline.config
In the same folder, I have the python script with the following code:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model", input_shapes={"image_tensor":[1,300,300,3]})
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
After running this code, I got the following error:
...
2019-05-24 18:46:59.811289: I tensorflow/lite/toco/import_tensorflow.cc:1324] Converting unsupported operation: TensorArrayGatherV3
2019-05-24 18:46:59.811864: I tensorflow/lite/toco/import_tensorflow.cc:1373] Unable to determine output type for op: TensorArrayGatherV3
2019-05-24 18:46:59.908207: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 1792 operators, 3033 arrays (0 quantized)
2019-05-24 18:47:00.089034: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] After Removing unused ops pass 1: 1771 operators, 2979 arrays (0 quantized)
2019-05-24 18:47:00.314681: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 1771 operators, 2979 arrays (0 quantized)
2019-05-24 18:47:00.453570: F tensorflow/lite/toco/graph_transformations/resolve_constant_slice.cc:59] Check failed: dim_size >= 1 (0 vs. 1)
Is there any solution for the "Check failed: dim_size >= 1 (0 vs. 1)"?
Conversion of MobileNet SSD is a little different due to some Custom ops that are needed in the graph.
Take a look at this Medium post for the end-to-end process of training and exporting the model as a TFLite graph. For conversion, you would need to use the export_tflite_ssd_graph script.
I'm implementing some RL in PyTorch and had to write my own mse_loss function (which I found on Stackoverflow ;) ).
The loss function is:
def mse_loss(input_, target_):
return torch.sum(
(input_ - target_) * (input_ - target_)) / input_.data.nelement()
Now, in my training loop, the first input is something like:
tensor([-1.7610e+10]), tensor([-6.5097e+10])
With this input I'll get the error:
Unable to get repr for <class 'torch.Tensor'>
Computing a = (input_ - target_) works fine, while b = a * a respectively b = torch.pow(a, 2) will fail with the error metioned above.
Does anyone know a fix for this?
Thanks a lot!
Update:
I just tried using torch.nn.functional.mse_loss which will result in the same error..
I had the same error,when I use the below code
criterion = torch.nn.CrossEntropyLoss().cuda()
output=output.cuda()
target=target.cuda()
loss=criterion(output, target)
but I finally found my wrong:output is like tensor([[0.5746,0.4254]]) and target is like tensor([2]),the number 2 is out of indice of output
when I not use GPU,this error message is:
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch-nightly_1547458468907/work/aten/src/THNN/generic/ClassNLLCriterion.c:93
Are you using a GPU ?
I had simillar problem (but I was using gather operations), and when I moved my tensors to CPU I could get a proper error message. I fixed the error, switched back to GPU and it was alright.
Maybe pytorch has trouble outputing the correct error when it comes from inside the GPU.
I am using Matlab 2013b and the econometrics toolbox to learn some ARIMA models.
When I want to specify AR lags in the ARIMA model as follows:
%Estimate simple ARMA model
model1 = arima('ARLags',[1 24],'MALags',0,'D',0);
EstMdl1 = estimate(model1,learningSet');
Then everything is fine when I estimate the model
If now I use
%Estimate simple ARMA model
model1 = arima('ARLags',[1 24],'MALags',1,'D',0);
EstMdl1 = estimate(model1,learningSet');
then the following error is issued:
Error using optimset (line 184)
Invalid value for OPTIONS parameter MaxNodes: must be a real non-negative integer.
Error in internal.econ.arma0 (line 195)
options = optimset('lsqlin');
Error in arima/estimate (line 864)
[AR0, MA0, constant, variance] = internal.econ.arma0(I(YData), LagOpAR0, LagOpMA0);
I am a bit puzzled about this and am looking for a workaround, if not an explanation of what is happening