Transferring pretrained pytorch model to onnx - pytorch

I am trying to convert pytorch model to ONNX, in order to use it later for TensorRT. I followed the following tutorial, but my kernel dies all the time.
This is the code that I implemented.
# Some standard imports
import io
import numpy as np
from torch import nn
import torch.onnx
from deepformer.nets.quicknat import quickNAT
param = {
'num_channels': 64,
'num_filters': 64,
'kernel_h': 5,
'kernel_w': 5,
'kernel_c': 1,
'stride_conv': 1,
'pool': 2,
'stride_pool': 2,
'num_classes': 1,
'padding': 'reflection'
net = quickNAT(param)
checkpoint_path = 'checkpoint_epoch36_loss0.78.t7'
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
map_location = None
# Input to the modelvcdfx
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
# Export the model
torch_out = torch.onnx._export(net, # model being run
x, # model input (or a tuple for multiple inputs)
"quicknat.onnx", # where to save the model (can be a file or file-like object)
export_params=True) # store the trained parameter weights inside the model file

What is the output you get? It seems SuperResolution is supported with the export operators in pytorch as mentioned in the documentation
Are you sure the input to your model is:
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
That could be the variable that you used for training, since for deployment you run the network on one or multiple images the dummy input to export to onnx is usually:
dummy_input = torch.randn(1, 3, 720, 1280, device='cuda')
With 1 being the batch size, 3 being the channels of the image(RGB), and then the size of the image, in this case 720x1280. Check on that input, I guess you don't have a 64 channel image as input right?
Also, it'd be helpful if you post the terminal output to see where it fails.
Good luck!


I tried to divide resnet into two parts using pytorch children(), but it doesn't work

Here is a simple example. I tried to divide a network (Resnet50) into two parts: head and tail using children. Conceptually, this should work but it doesn't. Why is it?
import torch
import torch.nn as nn
from torchvision.models import resnet50
head = nn.Sequential(*list(resnet.children())[:-2])
tail = nn.Sequential(*list(resnet.children())[-2:])
x = torch.zeros(1, 3, 160, 160)
resnet(x).shape # torch.Size([1, 1000])
head(x).shape # torch.Size([1, 2048, 5, 5])
tail(head(x)).shape # Error: RuntimeError: size mismatch, m1: [2048 x 1], m2: [2048 x 1000] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136
For information, the tail is nothing but
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Linear(in_features=2048, out_features=1000, bias=True)
So I actually know that if I can do like this. But then, why the reshaping function (view) is not in the children?
pool =resnet._modules['avgpool']
fc = resnet._modules['fc']
fc(pool(head(x)).view(1, -1))
What you are looking to do is separate the feature extractor from the classifier.
What I should point out straight away, is that Resnet is not a sequential model (as the name implies - residual network - it as residuals)!
Therefore compiling it down to a nn.Sequential will not be accurate. There's a difference between model definition the layers that appear ordered with .children() and the actual underlying implementation of that model's forward function.
The flattening you performed using view(1, -1) is not registered as a layer in all torchvision.models.resnet* models. Instead it is performed on this line in the forward definition:
x = torch.flatten(x, 1)
They could have registered it as a layer in the __init__ as self.flatten = nn.Flatten(), to be used in the forward implementation as x = self.flatten(x).
Even so fc(pool(head(x)).view(1, -1)) is completely different to resnet(x) (cf. first point).
Adding a nn.Flatten module into tail seems to solve your problem:
import torch
import torch.nn as nn
from torchvision.models import resnet50
resnet = resnet50()
head = nn.Sequential(*list(resnet.children())[:-2])
tail = nn.Sequential(*[list(resnet.children())[-2], nn.Flatten(start_dim=1), list(resnet.children())[-1]])
x = torch.zeros(1, 3, 160, 160)
resnet(x).shape # torch.Size([1, 1000])
head(x).shape # torch.Size([1, 2048, 5, 5])
tail(head(x)).shape # torch.Size([1, 1000])

PyTorch DataLoader adding extra dimension for TorchVision MNIST

I am fairly new to PyTorch and have been experimenting with the DataLoader class.
When I attempt to load the MNIST dataset, the DataLoader appears to add an additional dimension after the batch dimension. I am not sure what is causing this to occur.
import torch
from torchvision.datasets import MNIST
from torchvision import transforms
if __name__ == '__main__':
mnist_train = MNIST(root='./data', train=True, download=True, transform=transforms.Compose([transforms.ToTensor()]))
first_x =[0]
print(first_x.shape) # expect to see [28, 28], actual [28, 28]
train_loader =, batch_size=200)
batch_x, batch_y = next(iter(train_loader)) # get first batch
print(batch_x.shape) # expect to see [200, 28, 28], actual [200, 1, 28, 28]
# Where is the extra dimension of 1 from?
Can anyone shed some light on the issue?
I guess that is the number of channels of the input image. So basically it is
batch_x.shape = Batch-size, No of channels, Height of the image, Width of the image

LibTorch, convert deeplabv3_resnet101 to c++

I am trying to use this example code from the PyTorch website to convert a python model for use in the PyTorch c++ api (LibTorch).
Converting to Torch Script via Tracing
To convert a PyTorch model to Torch Script via tracing, you must pass an instance of your model along with an example input to the torch.jit.trace function. This will produce a torch.jit.ScriptModule object with the trace of your model evaluation embedded in the module’s forward method:
import torch
import torchvision
# An instance of your model.
model = torchvision.models.resnet18()
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)"")
This example works fine, and saves out the file as expected.
When i switch to this model:
model = models.segmentation.deeplabv3_resnet101(pretrained=True)
It gives me the following error:
File "", line 14, in <module>
traced_script_module = torch.jit.trace(model, example)
File "C:\Python37\lib\site-packages\torch\jit\", line 636, in trace
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
I assume this is because the example format is wrong, but how can I get the correct one?
Based on the comments below, my new code is:
import torch
import torchvision
from torchvision import models
model = models.segmentation.deeplabv3_resnet101(pretrained=True)
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)"")
And i now get the error:
File "", line 15, in <module>
traced_script_module = torch.jit.trace(model, example)
File "C:\Python37\lib\site-packages\torch\jit\", line 636, in trace
var_lookup_fn, _force_outplace)
RuntimeError: Only tensors and (possibly nested) tuples of tensors are supported as inputs or outputs of traced functions (toIValue at C:\a\w\1\s\windows\pytorch\torch/csrc/jit/pybind_utils.h:91)
(no backtrace available)
(from pytorch forums)
trace only supports modules that have tensor or tuple of tensor as output.
According to deeplabv3 implementation, its output is OrderedDict. That is a problem.
To solve this, make a wrapper module
class wrapper(torch.nn.Module):
def __init__(self, model):
super(wrapper, self).__init__()
self.model = model
def forward(self, input):
results = []
output = self.model(input)
for k, v in output.items():
return tuple(results)
model = wrapper(deeplap_model)
Has my model saving out.
Your problem originates in the BatchNorm layer. If it requires more than one value per channel, then your model is in training mode. Could you invoke on the model and see if there's an improvement?
Otherwise you could also try to generate random data with more than one instance in a batch, i.e. example = torch.rand(5, 3, 224, 224).
Furthermore, you should take care to properly normalise your data, however, this isn't causing the error here.

Convert tensor to numpy without a session

I'm using the estimator library of tensorflow on python. I want to train a student network by using a pre-trained teacher.I'm facing the following issue.
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_data},
This code returns a generator object that is passed to a student classifier. Inside the generator, we have the inputs and labels (in batches of 100) as tensors. The problem is, I want to pass the same values to the teacher model and extract its softmax outputs. But unfortunately, the model input requires a numpy array as follows
student_classifier = tf.estimator.Estimator(
model_fn=student_model_fn, model_dir="./models/mnist_student")
def student_model_fn(features, labels, mode):
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
eval_teacher_fn = tf.estimator.inputs.numpy_input_fn(
This requires x and y to be numpy arrays so I converted it via using such as ugly hack of using a session to convert tensor to numpy. Is there a better way of doing this?
P.S. I tried tf.estimator.Estimator.get_variable_value() but it retrieves weights from the model, not the input and output
Convert Tensor to Numpy_array using tf.make_ndarray.
tf.make_ndarray(), Create a numpy ndarray with the same shape and data as the tensor.
Sample working code:
import tensorflow as tf
a = tf.constant([[1,2,3],[4,5,6]])
proto_tensor = tf.make_tensor_proto(a)
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)
# output has shape (2,3)

Base64 images with Keras and Google Cloud ML

I'm predicting image classes using Keras. It works in Google Cloud ML (GCML), but for efficiency need change it to pass base64 strings instead of json array. Related Documentation
I can easily run python code to decode a base64 string into json array, but when using GCML I don't have the opportunity to run a preprocessing step (unless maybe use a Lambda layer in Keras, but I don't think that is the correct approach).
Another answer suggested adding tf.placeholder with type of tf.string, which makes sense, but how to incorporate that into the Keras model?
Here is complete code for training the model and saving the exported model for GCML...
import os
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.preprocessing import image
from tensorflow.python.platform import gfile
def preprocess(filename):
# decode the image file starting from the filename
# end up with pixel values that are in the -1, 1 range
image_contents = tf.read_file(filename)
image = tf.image.decode_png(image_contents, channels=1)
image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 0-1
image = tf.expand_dims(image, 0) # resize_bilinear needs batches
image = tf.image.resize_bilinear(image, [IMAGE_HEIGHT, IMAGE_WIDTH], align_corners=False)
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0) # -1 to 1
image = tf.squeeze(image,[0])
return image
filelist = gfile.ListDirectory("images")
sess = tf.Session()
with sess.as_default():
x = np.array([np.array( preprocess(os.path.join("images", filename)).eval() ) for filename in filelist])
input_shape = (IMAGE_HEIGHT, IMAGE_WIDTH, 1) # 1, because preprocessing made grayscale
# in our case the labels come from part of the filename
y = np.array([int(filename[filename.index('_')+1:-4]) for filename in filelist])
# convert class labels to numbers
y = keras.utils.to_categorical(y, NUM_CLASSES)
########## TODO: something here? ##########
image = K.placeholder(shape=(), dtype=tf.string)
decoded = tf.image.decode_jpeg(image, channels=3)
# scores = build_model(decoded)
model = Sequential()
# model.add(decoded)
model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(64, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
predict_signature = tf.saved_model.signature_def_utils.build_signature_def(
########## TODO: something here? ##########
# inputs={'input': image }, # input name must have "_bytes" suffix to use base64.
outputs={'formId': tf.saved_model.utils.build_tensor_info(model.output)},
builder = tf.saved_model.builder.SavedModelBuilder("exported_model")
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: predict_signature
},, name='legacy_init_op')
This is related to my previous question.
The heart of the question is how to incorporate the placeholder that calls decode into the Keras model. In other words, after creating the placeholder that decodes the base64 string to a tensor, how to incorporate that into what Keras runs? I assume it needs to be a layer.
image = K.placeholder(shape=(), dtype=tf.string)
decoded = tf.image.decode_jpeg(image, channels=3)
model = Sequential()
# Something like this, but this fails because it is a tensor, not a Keras layer. Possibly this is where a Lambda layer comes in?
model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', input_shape=input_shape))
Update 2:
Trying to use a lambda layer to accomplish this...
import keras
from keras.models import Sequential
from keras.layers import Lambda
from keras import backend as K
import tensorflow as tf
image = K.placeholder(shape=(), dtype=tf.string)
model = Sequential()
model.add(Lambda(lambda image: tf.image.decode_jpeg(image, channels=3), input_shape=() ))
Gives the error: TypeError: Input 'contents' of 'DecodeJpeg' Op has type float32 that does not match expected type of string.
first of all I use tf.keras but this should not be a big problem.
So here is an example of how you can read a base64 decoded jpeg:
def preprocess_and_decode(img_str, new_shape=[299,299]):
img =
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize_images(img, new_shape, method=tf.image.ResizeMethod.BILINEAR, align_corners=False)
# if you need to squeeze your input range to [0,1] or [-1,1] do it here
return img
InputLayer = Input(shape = (1,),dtype="string")
OutputLayer = Lambda(lambda img : tf.map_fn(lambda im : preprocess_and_decode(im[0]), img, dtype="float32"))(InputLayer)
base64_model = tf.keras.Model(InputLayer,OutputLayer)
The code above creates a model that takes a jpeg of any size, resizes it to 299x299 and returns as 299x299x3 tensor. This model can be exported directly to saved_model and used for Cloud ML Engine serving. It is a little bit stupid, since the only thing it does is the convertion of base64 to tensor.
If you need to redirect the output of this model to the input of an existing trained and compiled model (e.g inception_v3) you have to do the following:
base64_input = base64_model.input
final_output = inception_v3(base64_model.output)
new_model = tf.keras.Model(base64_input,final_output)
This new_model can be saved. It takes base64 jpeg and returns classes identified by the inception_v3 part.
Another answer suggested adding tf.placeholder with type of tf.string, which makes sense, but how to incorporate that into the Keras model?
In Keras you can access your selected Backend (in this case Tensorflow) by doing:
from keras import backend as K
This you already seem to import on your code. That will enable you to access some native methods and resources available on the backend of your choice. It is the case that Keras backend includes a method for creating placeholders, among other utilities. Regarding placeholders, we can see what the Keras docs indicates about them:
keras.backend.placeholder(shape=None, ndim=None, dtype=None, sparse=False, name=None)
Instantiates a placeholder tensor and returns it.
It also gives some example on its use:
>>> from keras import backend as K
>>> input_ph = K.placeholder(shape=(2, 4, 5))
>>> input_ph._keras_shape
(2, 4, 5)
>>> input_ph
<tf.Tensor 'Placeholder_4:0' shape=(2, 4, 5) dtype=float32>
As you can see, this is returning a Tensorflow tensor, with shape (2,4,5) and of dtype float. If you had another backend while doing the example you would get another tensor object (a Theano one surely). You can therefore use this placeholder() to adapt the solution you got on your previous question.
In conclusion, you can use your backend imported as K (or whatever you want) to do calls on the methods and objects available on the backend of your choice, by doing on the desired method. I suggest you give a read to what the Keras Backend to explore more things that can be useful for you on future situations.
Update: As per your edit. Yes, this placeholder should be a layer in your model. Specifically, it should be the Input Layer of your model, as it holds your decoded image (as Keras needs it that way) to classify.
