I run the code to show the latent space images, but the colour of the image is not displayed as RGB. If you know, would you tell me what is the cause of this result? I would like to output each image like
The problem code and result are
random_latent_vectors = tf.random.normal(shape=(10, 128))
generator = make_generator(128)
images = generator(random_latent_vectors)
images *= 255
images = images.numpy()
for i in range(images.shape[0]):
plt.subplot(2, 5, i+1)
plt.imshow(images[i, :, :, 0].astype("int32"))
generator is built from
def make_generator(latent_dim):
model = keras.Sequential([
layers.Dense(8 * 8 * 128),
layers.Reshape((8, 8, 128)),
layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding="same"),
layers.Conv2DTranspose(256, kernel_size=4, strides=2, padding="same"),
layers.Conv2DTranspose(512, kernel_size=4, strides=2, padding="same"),
layers.Conv2D(3, kernel_size=5, padding="same", activation="sigmoid")
return model
I am un able to find error input 32*32 gray images:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
in_channels=1, # gray-scale images
kernel_size=5, # 5x5 convolutional kernel
stride=1, #no. of pixels pass at a time
padding=2, # to preserve size of input image
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
# fully connected layers
self.out = nn.Linear(32*7*7, 3)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
# flatten the output of conv2
x = x.view(x.size(0), -1)
output = self.out(x)
return output
Your linear layer expects input of size 32x7x7. Given that your conv1 and conv2 layers performs max pooling with stride=2, that means your network is configured for input size of 28x28 (MNIST usual input size) and not 32x32 as you expect.
Moreover, considering the values in your error message (64x2304) I assume you are working with batch_size=64, but your images are NOT 32x32, but rather 32x?? which is slightly larger than 32, resulting with a feature map of 32x8x9 after the pooling.
I have been training a model in the Pytorch framework using multiple convolutional layers (3x3, stride 1, padding same). The model performs well and I want to use it in Matlab for inference. For that, the ONNX format for NN exchange between frameworks seems to be the (only?) solution. The model can be exported using the following command:
torch.onnx.export(net.to('cpu'), test_input,'onnxfile.onnx')
Here is my CNN architecture definition:
class Encoder_decoder(nn.Module):
def __init__(self):
self.model = nn.Sequential(
nn.Conv2d(2,8, (3, 3),stride = 1, padding='same'),
nn.Conv2d(8,8, (3, 3),stride = 1, padding='same'),
nn.Conv2d(8,16, (3, 3),stride = 1, padding='same'),
nn.Conv2d(16,16, (3, 3),stride = 1, padding='same'),
nn.Conv2d(16,32, (3, 3),stride = 1, padding='same'),
nn.Conv2d(32,32, (3, 3),stride = 1, padding='same'),
nn.Conv2d(32,64, (3, 3),stride = 1, padding='same'),
nn.Conv2d(64,64, (3, 3),stride = 1, padding='same'),
nn.Conv2d(64,128, (3, 3),stride = 1, padding='same'),
nn.Conv2d(128,128, (3, 3),stride = 1, padding='same'),
nn.Conv2d(128,1, (1, 1))
def forward(self, x):
x = self.model(x)
return x
However, when I run the torch.onnx.export command I get the following error:
RuntimeError: Exporting the operator _convolution_mode to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
I have tried changing the opset, but that doesn't solve the problem. ONNX has full support for convolutional neural networks. Also, I am training the network in google colab.
Do you know other methods to transfer the model to matlab?
Currently, _convolution_mode operator isn't supported in pytorch. This is due to the use of padding='same'.
You need to change padding to an integer value or change it to its equivalent. Consult Same padding equivalent in Pytorch.
I made a workaround:
def calc_same_padding(kernel_size, stride, input_size):
if isinstance(kernel_size, Sequence):
kernel_size = kernel_size[0]
if isinstance(stride, Sequence):
stride = stride[0]
if isinstance(input_size, Sequence):
input_size = input_size[0]
pad = ((stride - 1) * input_size - stride + kernel_size) / 2
return int(pad)
def replace_conv2d_with_same_padding(m: nn.Module, input_size=512):
if isinstance(m, nn.Conv2d):
if m.padding == "same":
m.padding = calc_same_padding(
model = MyModel()
model.apply(lambda m: replace_conv2d_with_same_padding(m, 512))
example_input = torch.ones((1, 3, 512, 512))
All my input/outputs tensors have even dimentions aka 512x512/256x256/128x128 etc, so input size doesn't matter here.
What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])
I am using PyTorch 1.7 and Python 3.8 with CIFAR-10 dataset. I am trying to create a block with: conv -> conv -> pool -> fc. Fully connected layer (fc) has 256 neurons. The code for this is as follows:
# Testing-
conv1 = nn.Conv2d(
in_channels = 3, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
conv2 = nn.Conv2d(
in_channels = 64, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
pool = nn.MaxPool2d(
kernel_size = 2, stride = 2
fc1 = nn.Linear(
in_features = 64 * 16 * 16, out_features = 256
bias = True
# torch.Size([32, 3, 32, 32])
x = conv1(images)
# torch.Size([32, 64, 32, 32])
x = conv2(x)
# torch.Size([32, 64, 32, 32])
x = pool(x)
# torch.Size([32, 64, 16, 16])
# This line of code gives error-
x = fc1(x)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32768x16 and
What is going wrong?
You are nearly there! As you will have noticed nn.MaxPool returns a shape (32, 64, 16, 16) which is incompatible with a nn.Linear's input: a 2D dimensional tensor (batch, in_features). You need to broadcast to (batch, 64*16*16).
I would recommend using a nn.Flatten layer rather than broadcasting yourself. It will act as x.view(x.size(0), -1) but is clearer. By default it preserves the first dimension:
conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
pool = nn.MaxPool2d(kernel_size=2, stride=2)
flatten = nn.Flatten()
fc1 = nn.Linear(in_features=64*16*16, out_features=256)
x = conv1(images)
x = conv2(x)
x = pool(x)
x = flatten(x)
x = fc1(x)
Alternatively, you could use the functional alternative torch.flatten, where you will have to provide the start_dim as 1: x = torch.flatten(x, start_dim=1).
When you're done debugging, you could assemble your layers with nn.Sequential:
model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Linear(in_features=64*16*16, out_features=256)
x = model(images)
you need to flat the output of nn.MaxPool2d layer for giving input in nn.Linear layer.
try to use x = x.view(x.size(0), -1) before giving input to fc layer for flatten tensor.
I want to remove the decoder portion of the Autoencoder.
and I want to put FC in the removed part.
In addition, the encoder parts will not train with pre-learned weights.
self.encoder = nn.Sequential(
nn.Conv2d(1, 16, 3, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(16, 8, 3, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8, 8, 3, padding=1),
nn.MaxPool2d(kernel_size=4, stride=1),
self.decoder = nn.Sequential(
nn.Conv2d(8, 8, 3, padding=1),
nn.ConvTranspose2d(8, 8, kernel_size=2, stride=2),
nn.Conv2d(8, 8, 3, padding=1),
nn.ConvTranspose2d(8, 8, kernel_size=2, stride=2),
nn.Conv2d(8, 16, 3),
nn.ConvTranspose2d(16, 16, kernel_size=2, stride=2),
nn.Conv2d(16, 1, 3, padding=1)
def forward(self, x):
if self.training :
x = self.encoder(x)
x = self.decoder(x)
return x
x = classifier(x)
return x
is this possible?
help me...
One easy and clean solution would be to define a stand-alone network as your decoder, then replace the decoder attribute of your model with this new network after pre-training is over. Easy example below:
class sillyExample(torch.nn.Module):
def __init__(self):
super(sillyExample, self).__init__()
self.encoder = torch.nn.Linear(5, 5)
self.decoder = torch.nn.Linear(5, 10)
def forward(self, x):
return self.decoder(self.encoder(x))
test = sillyExample()
test(torch.rand(30, 5)).shape
Out: torch.Size([30, 10])
test.decoder = torch.nn.Linear(5, 20) # replace the decoder
test(torch.rand(30, 5)).shape
Out: torch.Size([30, 20])
Just make sure to re-initialize your optimizers with the updated model (or anything else that might be referencing the model's parameters).