I am using TF2.5 & Python3.8 where a conv layer is defined as:
Conv2D(
filters = 64, kernel_size = (3, 3),
activation='relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same',
)
Using a batch of 60 CIFAR-10 dataset as input:
x.shape
# TensorShape([60, 32, 32, 3])
Output volume of this layer preserves the spatial width and height (32, 32) and has 64 filters/kernel maps applied to the 60 images as batch-
conv1(x).shape
# TensorShape([60, 32, 32, 64])
I understand this output.
Can you explain the output of:
conv1.trainable_weights[0].shape
# TensorShape([3, 3, 3, 64])
This is the formula used to compute the number of trainable parameters in a conv layer = [{(m x n x d) + 1} x k]
where,
m -> width of filter; n -> height of filter; d -> number of channels in input volume; k -> number of filters applied in current layer.
The 1 is added as bias for each filter. But in case of TF2.X, for a conv layer, the bias term is set to False. Therefore, it doesn't appear as in formula.
Related
In an Autoencoder based on CNN, will you increase or decrease the number of filters between layers ? As we compress the information, I was thinking of decreasing.
Example here of the encoder part where the number of filters is decreased at each new layer, from 16 to 8 to 4.
x = Conv2D(filters = 16, kernel_size = 3, activation='relu', padding='same', name='encoder_1a')(inputs)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_1b')(x)
x = Conv2D(filters = 8, kernel_size = 3, activation='relu', padding='same', name='encoder_2a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_2b')(x)
x = Conv2D(filters = 4, kernel_size = 3, activation='relu', padding='same', name='encoder_3a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_3b')(x)
It is not always the case that the filter sizes are reduced or increased with increasing number of layers in encoder. In most examples of encoder I have seen of convolutional autoencoder architectures the height and width is decreased through strided convolution or pooling, and depth of layer is increased (filter sizes are increased), kept similar to last one or varied with each new layer in encoder. But there is also examples where the output channels or filter sizes are decreased with more layers.
Usually autoencoder encodes input into latent representation/vector or embedding that has lower dimension than input that minimizes reconstruction error. So both of the above can be used for creating undercomplete autoencoder by varying kernel size, number of layers, adding an extra layer at the end of encoder with a certain dimension etc.
Filter increase example
In the image below as more layers are added in encoder the filter sizes increase. But as the input 28*28*1 = 784 dimension features and the flattened representation 3*3*128 = 1152 is more so another layer is added before final layer which is the embedding layer. It reduces the feature dimension with predefined number of outputs in fully connected network. Even the last dense/fully connected layer can be replaced by varying the number of layers or kernel size to have an output (1, 1, NUM_FILTERS).
Filter decrease example
An easy example of filters decreasing in encoder as the number of layers increase can be found on keras convolutional autoencoder example just as your code.
import keras
from keras import layers
input_img = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
References
https://www.deeplearningbook.org/contents/autoencoders.html
https://xifengguo.github.io/papers/ICONIP17-DCEC.pdf
https://blog.keras.io/building-autoencoders-in-keras.html
What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])
I built the bellow convolution autoencoder and trying to tune it to get encoder output shape (x_encoder) of [NxHxW] = 1024 without increasing loss. Currently my output shape is [4, 64, 64] Any ideas?
# define the NN architecture
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
## encoder layers ##
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
## decoder layers ##
## a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
self.t_conv2 = nn.ConvTranspose2d(16, 1, 2, stride=2)
def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x) # compressed representation
x_encoder = x
## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
# output layer (with sigmoid for scaling from 0 to 1)
x = F.sigmoid(self.t_conv2(x))
return x, x_encoder
If you want to keep the number of your parameters, adding an nn.AdaptiveAvgPool2d((N, H, W)) or nn.AdaptiveMaxPool2d((N, H, W))layer, in place of after the pooling layer (self.pool) can force the output of the decoder to have shape [NxHxW].
This should work assuming the shape of the x_encoder is (torch.Size([1, 4, 64, 64]). You can add a Conv. layer with stride set to 2 or a Conv. layer followed by a pooling layer. Check the code below:
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
# The changes
self.conv3 = nn.Conv2d(4, 1, 1, 2)
# or
self.conv3 = nn.Conv2d(4, 1, 1)
self.maxpool2d = nn.MaxPool2d((2, 2))
This question already has answers here:
Pytorch - Inferring linear layer in_features
(2 answers)
Closed 1 year ago.
I'm working on an assignement with 1D signals and I have trouble finding the right input size for the linear layer (XXX). My signals have different lengths and are padded in a batch. I read that the linear layear should always have the same input size (XXX) but I'm not sure how to find it when each batch has a different length. Does anybody have an advice?
Thanks
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.features = nn.Sequential(nn.Conv1d(in_channels = 1, out_channels = 128, kernel_size = 7, stride = 3),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.MaxPool1d(2, 3),
nn.Conv1d(128, 32, 5, 1),
nn.BatchNorm1d(32),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(32, 32, 5, 1),
nn.ReLU(),
nn.Conv1d(32, 128, 3, 2),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(128, 256, 7, 1),
nn.ReLU(),
nn.MaxPool1d(2, 2),
nn.Conv1d(256, 512, 3, 1),
nn.ReLU(),
nn.Conv1d(512, 128, 3, 1),
nn.ReLU()
)
self.classifier = nn.Sequential(nn.Linear(XXX, 512),
nn.ReLU(),
nn.Dropout(p = 0.1),
nn.Linear(512,2)
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x)
x = self.classifier(x)
return x
First, you need to decide on a fixed-length input. Let's assume each signal is of length 2048. It didn't work for any length < 1024 because of the previous convolution layers. If you would like to have a signal of length < 1024, then you may need to either remove a couple of Convolutional layers or change the kernel_size or remove maxpool operation.
Assuming, the fixed length is 2048, your nn.Linear layer will take as input 768 neurons. To calculate this, fix an arbitrary size of input neurons to your nn.Linear layer (say 1000) and then try to print the shape of the output from the Conv. layers. You could do something like this in your forward call:
def forward(self, x):
x = self.features(x)
print('Output shape of Conv. layers', x.shape)
x = x.view(-1, x.size(1) * x.size(2))
print('Shape required to pass to Linear Layer', x.shape)
x = self.classifier(x)
return x
This will obviously throw an error because of the shape mismatch. But you'll get to know the number of input neurons required in your first nn.Linear layer. With this approach, you could try a number of experiments of varying input signal lengths (1536, 2048, 4096, etc.)
I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the first model (without the LSTMs) I get the following error:
"ValueError: Error when checking target: expected dense_3 to have shape (None, 120, 40) but got array with shape (8, 40, 1)"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
# the window sizes (seq_length?) are 40, 1075, 465, 72 and 1246 for the kettle, dish washer,
# fridge, microwave, oven and washing machine, respectively.
def ae_net(T):
input_layer = Input(shape= (T,))
branch_a = layers.Conv1D(32, 3, activation= 'linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation= 'linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation= 'linear', padding='same', strides=1)(input_layer)
merge_layer = layers.concatenate([branch_a, branch_b, branch_c], axis=1)
dense_1 = layers.Dense(128, activation='relu')(merge_layer)
dense_2 =layers.Dense(128, activation='relu')(dense_1)
output_dense = layers.Dense(T, activation='linear')(dense_2)
model = Model(input_layer, output_dense)
return model
model = ae_net(40)
model.compile(loss= 'mean_absolute_error', optimizer='rmsprop')
model.fit(X, y, batch_size= 8)
where X and y are numpy arrays of 8 sequences of a length of 40 values. So X.shape and y.shape are (8, 40, 1). It's actually one batch of data. The thing is I cannot understand how the output would be of shape (None, 120, 40) and what these sizes would mean.
As you noted, your shapes contain batch_size, length and channels: (8,40,1)
Your three convolutions are, each one, creating a tensor like (8,40,32).
Your concatenation in the axis=1 creates a tensor like (8,120,32), where 120 = 3*40.
Now, the dense layers only work on the last dimension (the channels in this case), leaving the length (now 120) untouched.
Solution
Now, it seems you do want to keep the length at the end. So you won't need any flatten or reshape layers. But you will need to keep the length 40, though.
You're probably doing the concatenation in the wrong axis. Instead of the length axis (1), you should concatenate in the channels axis (2 or -1).
So, this should be your concatenate layer:
merge_layer = layers.Concatenate()([branch_a, branch_b, branch_c])
#or layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
This will output (8, 40, 96), and the dense layers will transform the 96 in something else.