I built the bellow convolution autoencoder and trying to tune it to get encoder output shape (x_encoder) of [NxHxW] = 1024 without increasing loss. Currently my output shape is [4, 64, 64] Any ideas?
# define the NN architecture
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
## encoder layers ##
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
## decoder layers ##
## a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
self.t_conv2 = nn.ConvTranspose2d(16, 1, 2, stride=2)
def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x) # compressed representation
x_encoder = x
## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
# output layer (with sigmoid for scaling from 0 to 1)
x = F.sigmoid(self.t_conv2(x))
return x, x_encoder
If you want to keep the number of your parameters, adding an nn.AdaptiveAvgPool2d((N, H, W)) or nn.AdaptiveMaxPool2d((N, H, W))layer, in place of after the pooling layer (self.pool) can force the output of the decoder to have shape [NxHxW].
This should work assuming the shape of the x_encoder is (torch.Size([1, 4, 64, 64]). You can add a Conv. layer with stride set to 2 or a Conv. layer followed by a pooling layer. Check the code below:
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
# The changes
self.conv3 = nn.Conv2d(4, 1, 1, 2)
# or
self.conv3 = nn.Conv2d(4, 1, 1)
self.maxpool2d = nn.MaxPool2d((2, 2))
Related
I am un able to find error input 32*32 gray images:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=1, # gray-scale images
out_channels=16,
kernel_size=5, # 5x5 convolutional kernel
stride=1, #no. of pixels pass at a time
padding=2, # to preserve size of input image
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
)
# fully connected layers
self.out = nn.Linear(32*7*7, 3)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
# flatten the output of conv2
x = x.view(x.size(0), -1)
output = self.out(x)
return output
cnn=CNN()
cnn
Your linear layer expects input of size 32x7x7. Given that your conv1 and conv2 layers performs max pooling with stride=2, that means your network is configured for input size of 28x28 (MNIST usual input size) and not 32x32 as you expect.
Moreover, considering the values in your error message (64x2304) I assume you are working with batch_size=64, but your images are NOT 32x32, but rather 32x?? which is slightly larger than 32, resulting with a feature map of 32x8x9 after the pooling.
I am using PyTorch 1.7 and Python 3.8 with CIFAR-10 dataset. I am trying to create a block with: conv -> conv -> pool -> fc. Fully connected layer (fc) has 256 neurons. The code for this is as follows:
# Testing-
conv1 = nn.Conv2d(
in_channels = 3, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
)
conv2 = nn.Conv2d(
in_channels = 64, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
)
pool = nn.MaxPool2d(
kernel_size = 2, stride = 2
)
fc1 = nn.Linear(
in_features = 64 * 16 * 16, out_features = 256
bias = True
)
images.shape
# torch.Size([32, 3, 32, 32])
x = conv1(images)
x.shape
# torch.Size([32, 64, 32, 32])
x = conv2(x)
x.shape
# torch.Size([32, 64, 32, 32])
x = pool(x)
x.shape
# torch.Size([32, 64, 16, 16])
# This line of code gives error-
x = fc1(x)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32768x16 and
16384x256)
What is going wrong?
You are nearly there! As you will have noticed nn.MaxPool returns a shape (32, 64, 16, 16) which is incompatible with a nn.Linear's input: a 2D dimensional tensor (batch, in_features). You need to broadcast to (batch, 64*16*16).
I would recommend using a nn.Flatten layer rather than broadcasting yourself. It will act as x.view(x.size(0), -1) but is clearer. By default it preserves the first dimension:
conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
pool = nn.MaxPool2d(kernel_size=2, stride=2)
flatten = nn.Flatten()
fc1 = nn.Linear(in_features=64*16*16, out_features=256)
x = conv1(images)
x = conv2(x)
x = pool(x)
x = flatten(x)
x = fc1(x)
Alternatively, you could use the functional alternative torch.flatten, where you will have to provide the start_dim as 1: x = torch.flatten(x, start_dim=1).
When you're done debugging, you could assemble your layers with nn.Sequential:
model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(in_features=64*16*16, out_features=256)
)
x = model(images)
you need to flat the output of nn.MaxPool2d layer for giving input in nn.Linear layer.
try to use x = x.view(x.size(0), -1) before giving input to fc layer for flatten tensor.
I want to create a network on the basis of the vgg16 network, but adding linear layers (Gemm) just after the conv2d layers, for normalization purpose.
After that, I want to export the network in an ONNX file.
The first part seems to work: I took the Pytorch code for generating the vgg16 and modified it as follows
import torch.nn as nn
class VGG(nn.Module):
def __init__(self, features, num_classes=8, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.Linear(4096, 4096), # New shift layer
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.Linear(4096, 4096), # New shift layer
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 8),
nn.Linear(8, 8), # New shift layer
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def make_layers(cfg, batch_norm=False):
layers = []
in_channels = 3
n = 224
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
n = int(n / 2)
elif v == 'B':
layers += [nn.AdaptiveAvgPool2d(n)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
linear = nn.Linear(n,n,True)
if batch_norm:
layers += [conv2d, linear, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, linear, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
cfg = {'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M','B'],
}
def vgg16(**kwargs):
"""VGG 16-layer model (configuration "D")
"""
model = VGG(make_layers(cfg['D']), **kwargs)
return model
But when I insert the weights and export to onnx, I see that my linear layers are not referred to as Gemm but as {Transpose + Matmult + Add}
The Transpose part is the weights matrix and the Add part is for the biases (which are all 0).
Am I wrong to think that it's possible to do this, or is there a way to get a real Gemm layer here or another way to do this normalization (which is simply multiply all outputs by a single value)?
The input data of nn.Linear here is a 4-D tensor, then torch will export it to {Transpose, MatMul, Add}. Only input is 2-D, the GEMM op will be exported.
You can have to look at the source code of Pytorch for more information.
This question already has answers here:
Pytorch - Inferring linear layer in_features
(2 answers)
Closed 1 year ago.
I'm working on an assignement with 1D signals and I have trouble finding the right input size for the linear layer (XXX). My signals have different lengths and are padded in a batch. I read that the linear layear should always have the same input size (XXX) but I'm not sure how to find it when each batch has a different length. Does anybody have an advice?
Thanks
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.features = nn.Sequential(nn.Conv1d(in_channels = 1, out_channels = 128, kernel_size = 7, stride = 3),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.MaxPool1d(2, 3),
nn.Conv1d(128, 32, 5, 1),
nn.BatchNorm1d(32),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(32, 32, 5, 1),
nn.ReLU(),
nn.Conv1d(32, 128, 3, 2),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(128, 256, 7, 1),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(256, 512, 3, 1),
nn.ReLU(),
nn.Conv1d(512, 128, 3, 1),
nn.ReLU()
)
self.classifier = nn.Sequential(nn.Linear(XXX, 512),
nn.ReLU(),
nn.Dropout(p = 0.1),
nn.Linear(512,2)
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x)
x = self.classifier(x)
return x
First, you need to decide on a fixed-length input. Let's assume each signal is of length 2048. It didn't work for any length < 1024 because of the previous convolution layers. If you would like to have a signal of length < 1024, then you may need to either remove a couple of Convolutional layers or change the kernel_size or remove maxpool operation.
Assuming, the fixed length is 2048, your nn.Linear layer will take as input 768 neurons. To calculate this, fix an arbitrary size of input neurons to your nn.Linear layer (say 1000) and then try to print the shape of the output from the Conv. layers. You could do something like this in your forward call:
def forward(self, x):
x = self.features(x)
print('Output shape of Conv. layers', x.shape)
x = x.view(-1, x.size(1) * x.size(2))
print('Shape required to pass to Linear Layer', x.shape)
x = self.classifier(x)
return x
This will obviously throw an error because of the shape mismatch. But you'll get to know the number of input neurons required in your first nn.Linear layer. With this approach, you could try a number of experiments of varying input signal lengths (1536, 2048, 4096, etc.)
I have my training dataset as below, where X_train is 3D with 3 channels
Shape of X_Train: (708, 256, 3)
Shape of Y_Train: (708, 4)
Then I convert them into a tensor and input into the dataloader:
X_train=torch.from_numpy(X_data)
y_train=torch.from_numpy(y_data)
training_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(training_dataset, batch_size=50, shuffle=False)
However when training the model, I get the following error:
RuntimeError: Given groups=1, weight of size 24 3 5, expected input[708, 256, 3] to have 3 channels, but got 256 channels instead
I suppose this is due to the position of the channel? In Tensorflow, the channel position is at the end, but in PyTorch the format is "Batch Size x Channel x Height x Width"? So how do I swap the positions in the x_train tensor to match the expected format in the dataloader?
class TwoLayerNet(torch.nn.Module):
def __init__(self):
super(TwoLayerNet,self).__init__()
self.conv1 = nn.Sequential(
nn.Conv1d(3, 3*8, kernel_size=5, stride=1),
nn.Sigmoid(),
nn.AvgPool1d(kernel_size=2, stride=0))
self.conv2 = nn.Sequential(
nn.Conv1d(3*8, 12, kernel_size=5, stride=1),
nn.Sigmoid(),
nn.AvgPool1d(kernel_size=2, stride = 0))
#self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(708, 732)
self.fc2 = nn.Linear(732, 4)
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
out = out.reshape(out.size(0), -1)
out = self.drop_out(out)
out = self.fc1(out)
out = self.fc2(out)
return out
Use permute.
X_train = torch.rand(708, 256, 3)
X_train = X_train.permute(2, 0, 1)
X_train.shape
# => torch.Size([3, 708, 256])