I have been training a model in the Pytorch framework using multiple convolutional layers (3x3, stride 1, padding same). The model performs well and I want to use it in Matlab for inference. For that, the ONNX format for NN exchange between frameworks seems to be the (only?) solution. The model can be exported using the following command:
torch.onnx.export(net.to('cpu'), test_input,'onnxfile.onnx')
Here is my CNN architecture definition:
class Encoder_decoder(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(2,8, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(8,8, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(8,16, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(16,16, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(16,32, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(32,32, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(32,64, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(64,64, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(64,128, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(128,128, (3, 3),stride = 1, padding='same'),
nn.ReLU(),
nn.Conv2d(128,1, (1, 1))
)
def forward(self, x):
x = self.model(x)
return x
However, when I run the torch.onnx.export command I get the following error:
RuntimeError: Exporting the operator _convolution_mode to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
I have tried changing the opset, but that doesn't solve the problem. ONNX has full support for convolutional neural networks. Also, I am training the network in google colab.
Do you know other methods to transfer the model to matlab?
Currently, _convolution_mode operator isn't supported in pytorch. This is due to the use of padding='same'.
You need to change padding to an integer value or change it to its equivalent. Consult Same padding equivalent in Pytorch.
I made a workaround:
...
def calc_same_padding(kernel_size, stride, input_size):
if isinstance(kernel_size, Sequence):
kernel_size = kernel_size[0]
if isinstance(stride, Sequence):
stride = stride[0]
if isinstance(input_size, Sequence):
input_size = input_size[0]
pad = ((stride - 1) * input_size - stride + kernel_size) / 2
return int(pad)
def replace_conv2d_with_same_padding(m: nn.Module, input_size=512):
if isinstance(m, nn.Conv2d):
if m.padding == "same":
m.padding = calc_same_padding(
kernel_size=m.kernel_size,
stride=m.stride,
input_size=input_size
)
...
model = MyModel()
model.apply(lambda m: replace_conv2d_with_same_padding(m, 512))
example_input = torch.ones((1, 3, 512, 512))
torch.onnx.export(model,
example_input,
input_names=["input"],
output_names=["output"],
f=save_path,
opset_version=12)
All my input/outputs tensors have even dimentions aka 512x512/256x256/128x128 etc, so input size doesn't matter here.
Related
I have been working with CNN model recently. I always get loss nan for this model? How do I solve this?
My Model..
def CNN_Model(inputshape):
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(96, kernel_size = (7, 7), strides= 2, activation='relu',kernel_initializer='glorot_uniform',input_shape = inputshape),
tf.keras.layers.MaxPooling2D((3,3), strides=(2,2)),
tf.keras.layers.ZeroPadding2D((2, 2), data_format="channels_last"),
#tf.keras.layers.Lambda(lambda x: tf.image.per_image_standardization(x)),
tf.keras.layers.Conv2D(256,kernel_size = (5, 5), strides= 1, activation='relu'),
tf.keras.layers.MaxPooling2D((3,3), strides=(2,2)),
#tf.keras.layers.Lambda(lambda x: tf.image.per_image_standardization(x)),
tf.keras.layers.Conv2D(384,kernel_size = (3, 3), activation='relu',strides=1),
tf.keras.layers.Conv2D(256, kernel_size = (3, 3), activation='relu',strides=1),
tf.keras.layers.MaxPooling2D((3,3), strides=(2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024, kernel_regularizer=l2(0.0005), activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1024),
tf.keras.layers.Dense(40, activation='softmax')
])
return model
My loss function
def contrastive_loss(y_true, y_pred):
'''Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
'''
margin = 1
return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
I tried to to change the layer config but nothing worked. Here
I need to train an autoencoder on Adaptiope dataset. I am using a ResNet18 backbone for my encoder part.
The issue I encounter is that even after many epochs, the reconstructed image is always completely black.
On the other hand, when I use a simpler Autoencoder without the resnet18 backbone, reconstructed images turn out close to what I need them to be.
I am trying to understand why is this the case. I am a novice in the field and still cannot grasp the problem. It looks like an architectural problem but I cannot wrap my head around it.
This is my "vanilla" Encoder, with no resnet18 backbone:
`
class Encoder(nn.Module):
def __init__(self,
num_input_channels : int,
base_channel_size : int,
latent_dim : int
):
"""
Inputs:
- num_input_channels : Number of input channels of the image. For CIFAR, this parameter is 3
- base_channel_size : Number of channels we use in the first convolutional layers. Deeper layers might use a duplicate of it.
- latent_dim : Dimensionality of latent representation z
- act_fn : Activation function used throughout the encoder network
"""
super().__init__()
c_hid = base_channel_size
self.layer1 = nn.Sequential(nn.Conv2d(num_input_channels, c_hid, kernel_size=3, padding=1, stride=2), # 32x32 => 16x16
nn.ReLU(),
nn.Conv2d(c_hid, c_hid, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(c_hid, 2*c_hid, kernel_size=3, padding=1, stride=2), # 16x16 => 8x8
nn.ReLU(),
nn.Conv2d(2*c_hid, 2*c_hid, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(2*c_hid, 2*c_hid, kernel_size=3, padding=1, stride=2), # 8x8 => 4x4
nn.ReLU(),
nn.Flatten(), # Image grid to single feature vector
nn.Linear(351232, latent_dim))
self.linear2 = nn.Linear(latent_dim, 20*8)
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
enc = self.layer1(x)
lin_p = self.linear2(enc)
p = self.softmax(lin_p)
return enc, p
This is the Encoder with Resnet18 backbone:
class Encoder(nn.Module):
def __init__(self,
num_input_channels : int,
base_channel_size : int,
latent_dim : int
):
"""
Inputs:
- num_input_channels : Number of input channels of the image. For CIFAR, this parameter is 3
- base_channel_size : Number of channels we use in the first convolutional layers. Deeper layers might use a duplicate of it.
- latent_dim : Dimensionality of latent representation z
- act_fn : Activation function used throughout the encoder network
"""
super().__init__()
c_hid = base_channel_size
self.fc_hidden1, self.fc_hidden2, self.CNN_embed_dim = 224, 768, 224
# CNN architechtures
self.ch1, self.ch2, self.ch3, self.ch4 = 16, 32, 64, 128
self.k1, self.k2, self.k3, self.k4 = (5, 5), (3, 3), (3, 3), (3, 3) # 2d kernel size
self.s1, self.s2, self.s3, self.s4 = (2, 2), (2, 2), (2, 2), (2, 2) # 2d strides
self.pd1, self.pd2, self.pd3, self.pd4 = (0, 0), (0, 0), (0, 0), (0, 0) # 2d padding
# encoding components
model = models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
modules = list(model.children())[:-1] # delete the last fc layer.
self.resnet_modules=modules
self.resnet = nn.Sequential(*modules)
self.fc1 = nn.Linear(model.fc.in_features, self.fc_hidden1)
self.bn1 = nn.BatchNorm1d(self.fc_hidden1, momentum=0.01)
self.relu = nn.ReLU(inplace=True)
self.layer = nn.Sequential(
nn.Flatten(), # Image grid to single feature vector
nn.Linear(224, latent_dim)) #8x224
#self.flatten = nn.Flatten(), # Image grid to single feature vector
#self.linear1 = nn.Linear(351232, latent_dim)
self.linear2 = nn.Linear(latent_dim, 20*8)
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
x = self.resnet(x)
x = x.reshape(x.shape[0], 512)
x = self.fc1(x)
x = self.bn1(x)
x = self.relu(x)
enc = self.layer(x)
#x = self.fc2(x)
#x = self.bn(x)
# enc = self.layer1(x)
lin_p = self.linear2(enc)
p = self.softmax(lin_p)
return enc, p
The decoder is the same for both.
class Decoder_N(nn.Module):
def __init__(self,
num_input_channels : int,
base_channel_size : int,
latent_dim : int,
act_fn : object = nn.GELU):
"""
Inputs:
- num_input_channels : Number of channels of the image to reconstruct. For CIFAR, this parameter is 3
- base_channel_size : Number of channels we use in the last convolutional layers. Early layers might use a duplicate of it.
- latent_dim : Dimensionality of latent representation z
- act_fn : Activation function used throughout the decoder network
"""
super().__init__()
c_hid = 224
self.linear = nn.Sequential(
nn.Linear(latent_dim, 351232),
nn.ReLU()
)
self.net = nn.Sequential(
nn.ConvTranspose2d(2*c_hid, 2*c_hid, kernel_size=3, output_padding=1, padding=1, stride=2), # 4x4 => 8x8
nn.ReLU(),
nn.Conv2d(2*c_hid, 2*c_hid, kernel_size=3, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(2*c_hid, c_hid, kernel_size=3, output_padding=1, padding=1, stride=2), # 8x8 => 16x16
nn.ReLU(),
nn.Conv2d(c_hid, c_hid, kernel_size=3, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(c_hid, 3, kernel_size=3, output_padding=1, padding=1, stride=2), # 16x16 => 32x32
nn.Tanh() # The input images is scaled between -1 and 1, hence the output has to be bounded as well
)
def forward(self, x):
x = self.linear(x)
x = x.reshape(x.shape[0], -1, 28, 28)
x = self.net(x)
return x
`
num_input_channels : 224,
base_channel_size : 3
latent_dim : 64
I expected the "advanced" autoencoder to extract my features better, but apparently this is not the case.
I solved the issue: there were issues with the normalization of images and the BatchNorm layer. I accidentally used mean and std of ImageNet for the dataset instead of the correct ones. Additionally, during training I forgot to add regularizers for the different components of my loss, leading my model to learn literally nothing.
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=5, stride=2, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2, bias=False),
nn.BatchNorm2d(64),
)
How can I deal with this error? I think the error is with self.fc, but I can't say how to fix it.
The output from self.conv(x) is of shape torch.Size([32, 64, 2, 2]): 32*64*2*2= 8192 (this is equivalent to (self.conv_out_size). The input to fully connected layer expects a single dimension vector i.e. you need to flatten it before passing to a fully connected layer in the forward function.
i.e.
class Network():
...
def foward():
...
conv_out = self.conv(x)
print(conv_out.shape)
conv_out = conv_out.view(-1, 32*64*2*2)
print(conv_out.shape)
x = self.fc(conv_out)
return x
output
torch.Size([32, 64, 2, 2])
torch.Size([1, 8192])
EDIT:
I think you're using self._get_conv_out function wrong.
It should be
def _get_conv_out(self, shape):
output = self.conv(torch.zeros(1, *shape)) # not (32, *size)
return int(numpy.prod(output.size()))
then, in the forward pass, you can use
conv_out = self.conv(x)
# flatten the output of conv layers
conv_out = conv_out.view(conv_out.size(0), -1)
x = self.fc(conv_out)
For an input of (32, 1, 110, 110), the output should be torch.Size([32, 2]).
I had the same problem however I have solved it by using a batch of 32 and tensor size of [3, 32, 32] for my images and the following configurations on my model. I am using ResNet with 9 CNN and looking for 4 outputs.
transform = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor()])
def conv_block(in_channels, out_channels, pool=False):
layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)]
if pool: layers.append(nn.MaxPool2d(2))
return nn.Sequential(*layers)
class ResNet9(ImageClassificationBase):
def __init__(self, in_channels, num_classes):
super().__init__()
self.conv1 = conv_block(in_channels, 64)
self.conv2 = conv_block(64, 128, pool=True)
self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
self.conv3 = conv_block(128, 256, pool=True)
self.conv4 = conv_block(256, 512, pool=True)
self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
self.classifier = nn.Sequential(nn.MaxPool2d(4),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
def forward(self, xb):
out = self.conv1(xb)
out = self.conv2(out)
out = self.res1(out) + out
out = self.conv3(out)
out = self.conv4(out)
out = self.res2(out) + out
out = self.classifier(out)
return out
I want to remove the decoder portion of the Autoencoder.
and I want to put FC in the removed part.
In addition, the encoder parts will not train with pre-learned weights.
self.encoder = nn.Sequential(
nn.Conv2d(1, 16, 3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(16, 8, 3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8, 8, 3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=4, stride=1),
)
self.decoder = nn.Sequential(
nn.Conv2d(8, 8, 3, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(8, 8, kernel_size=2, stride=2),
nn.Conv2d(8, 8, 3, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(8, 8, kernel_size=2, stride=2),
nn.Conv2d(8, 16, 3),
nn.ReLU(True),
nn.ConvTranspose2d(16, 16, kernel_size=2, stride=2),
nn.Conv2d(16, 1, 3, padding=1)
)
def forward(self, x):
if self.training :
x = self.encoder(x)
x = self.decoder(x)
return x
else:
x = classifier(x)
return x
is this possible?
help me...
One easy and clean solution would be to define a stand-alone network as your decoder, then replace the decoder attribute of your model with this new network after pre-training is over. Easy example below:
class sillyExample(torch.nn.Module):
def __init__(self):
super(sillyExample, self).__init__()
self.encoder = torch.nn.Linear(5, 5)
self.decoder = torch.nn.Linear(5, 10)
def forward(self, x):
return self.decoder(self.encoder(x))
test = sillyExample()
test(torch.rand(30, 5)).shape
Out: torch.Size([30, 10])
test.decoder = torch.nn.Linear(5, 20) # replace the decoder
test(torch.rand(30, 5)).shape
Out: torch.Size([30, 20])
Just make sure to re-initialize your optimizers with the updated model (or anything else that might be referencing the model's parameters).
This is a convolutional neural network which I found in the web
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(500, 50)
self.fc2 = nn.Linear(50, 64)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 500)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)
and its summary
print(net)
Net(
(conv1): Conv2d(3, 10, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
(conv2_drop): Dropout2d(p=0.5)
(fc1): Linear(in_features=500, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=64, bias=True)
)
What is x.view does? Is it similar to the Flatten function in keras. The other query is reagarding how pytorch prints summary of a model. Eventhough the model uses two dropouts nn.Dropout2d() and F.dropout. When printing the model we can see only one (conv2_drop): Dropout2d(p=0.5), why?. The last question is why pytorch dosen't print F.max_pool2d layer?
1) x.view can do more than just flatten: It will keep the same data while reshaping the dimension. So using x.view(batch_size, -1)will be equivalent to Flatten
2) In the __repr__function of nn.Module, the elements that are printed are the modules in self._modules.items() which are its children.
F.dropoutand F.max_pool2d are functions and not children of nn.Module, thus they are not layers and will not be printed. For pooling and dropout however, there is a module in torch.nn which you already used for the first dropout.