Calculating shape of conv1d layer in Pytorch - pytorch

How we can calculate the shape of conv1d layer in PyTorch. IS there any command to calculate size and shape of these layers in PyTorch.
nn.Conv1d(depth_1, depth_2, kernel_size=kernel_size_2, stride=stride_size),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=stride_size),
nn.Dropout(0.25)```

The output size can be calculated as shown in the documentation nn.Conv1d - Shape:
The batch size remains unchanged and you already know the number of channels, since you specified them when creating the convolution (depth_2 in this example).
Only the length needs to be calculated and you can do that with a simple function analogous to the formula above:
def calculate_output_length(length_in, kernel_size, stride=1, padding=0, dilation=1):
return (length_in + 2 * padding - dilation * (kernel_size - 1) - 1) // stride + 1
The default values specified are also the default values of nn.Conv1d, therefore you only need to specify what you also specify to create the convolution. It uses an integer division //, because the numerator might be not be divisible by stride, in which case it just gets rounded down (indicated by the brackets that are only closed at towards the bottom).
The same formula also applies to nn.MaxPool1d, but keep in mind that it automatically sets stride = kernel_size if stride is not specified.

Related

Convolutional layer: does the filter convolves also trough the nlayers_in or it take all the dimensions?

In the leading DeepLearning libraries, does the filter (aka kernel or weight) in the convolutional layer convolves also across the "channel" dimension or does it take all the channels at once?
To make an example, if the input dimension is (60,60,10) (where the last dimension is often referred as "channels") and the desired output number of channels is 5, can the filter be (5,5,5,5) or should it be (5,5,10,5) instead ?
It should be (5, 5, 10, 5). Conv2d operation is just like Linear if you ignore the spatial dimensions.
From TensorFlow documentation [link]:
Given an input tensor of shape batch_shape + [in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following:
Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.
It takes all channels at once, so 5×5×10×5 should be right.
julia> using Flux
julia> c = Conv((5,5), 10 => 5); # make a layer, 10 channels to 5
julia> c.weight |> summary
"5×5×10×5 Array{Float32, 4}"
julia> c(randn(Float32, 60, 60, 10, 1)) |> summary # check it works
"56×56×5×1 Array{Float32, 4}"
julia> Conv(rand(Float32, (5,5,5,5))) # different weight size
Conv((5, 5), 5 => 5) # 630 parameters

PyTorch high-dimensional tensor through linear layer

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?
In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

what the Conv1D with kernel size equal to 1 do?

I read an example of using LSTM with CONV1.
(Took it from: CNN LSTM)
Conv1D(filters=64, kernel_size=1, activation='relu')
I understand that the dimension of the convolutional is 1 (one dim with size 1))
what is the value of the convolution ? (what is the value of the matrix 1*1 ?)
I can't figure out what is the filters=64 ? what does it mean ?
Is the relu activation function work on the output of the convolutional ? (from what I read it seems like that, but I'm not sure)
what is the motivation to use convolutional with kernel_size = 1, as we do here ?
filters
filters = 64 means number of separate filters used is 64.
Each filter will output 1 channel. i.e. here 64 filters operate on input to produce 64 different channels(or vectors). Hence filters parameter determines number of output channels.
kernel_size
kernel_size determines the size of the convolution window. Suppose kernel_size = 1 then each kernel will have dimension of in_channels x 1. Hence each kernel weight will be in_channels x 1 dimension tensor.
activation = relu
That means relu activation will be applied on the output of convolution operation.
kernel_size = 1 convolution
Used to reduce depth channels with applying non-linearity. It will do something like weighted average across the channels while keeping receptive field.
In your eg: filters = 64, kernel_size = 1, activation = relu
Suppose input feature map has size of 100 x 10(100 channels). Then the layer weight will of dimension 64 x 100 x 1. The output size will be 64 x 10.

fix/freeze individual kernel weights in a convolutional operation

I'm trying out a workaround for fixing individual kernel weights in a convolutional operation in TensorFlow using Python 3.7. I do it by creating
a trainable variable,
an identical non-trainable variable and
a "mask" tensor consisting of 1s and 0s with the same shape as the created variables in step 1 and 2 above.
A 1 in the "mask" tensor indicates that I want to fix/freeze that specific weight during training, i.e. not update it in the backward pass.
Now, this workaround works perfectly fine when applied to a fully connected layer but fails when applied to a convolutional layer and I can't figure out why or how to make it work.
Something seems to be happening in the tf.nn.conv2d() function call (see code example below) and according to the documentation this is what they do:
Given an input tensor of shape [batch, in_height, in_width, in_channels]
and a filter / kernel tensor of shape
[filter_height, filter_width, in_channels, out_channels], this op
performs the following:
1. Flattens the filter to a 2-D matrix with shape
[filter_height * filter_width * in_channels, output_channels].
2. Extracts image patches from the input tensor to form a virtual
tensor of shape [batch, out_height, out_width,<br>
filter_height * filter_width * in_channels].
3. For each patch, right-multiplies the filter matrix and the image patch
vector.
But since I use weights_frozen which is a tensor and depends on the trainable variable, non-trainable variable and mask_weights it should get zero-valued gradients on the positions where I have a 1 in the mask_weights tensor.
def conv(input_, layer_name...):
weights = tf.get_variable(shape=[filter_height, filter_width, in_channels, out_channels], dtype=tf.float32, initializer=tf.glorot_uniform_initializer(), trainable=True)
weights_fixed = tf.Variable(tf.identity(weights), trainable=False)
mask_weights = tf.placeholder(tf.float32, weights.shape)
weights_frozen = tf.add(tf.multiply(mask_weights, weights_fixed), tf.multiply((1 - mask_weights), weights))
out_conv = tf.nn.conv2d(input=input_, filter=weights_frozen, strides=strides_, padding='SAME')
out_add = tf.nn.bias_add(value=out_conv, bias=biases_frozen)
out = tf.nn.relu(features=out_add)
return out
As mentioned, I expect to get zero-valued gradients on the positions where I have a 1 in the mask_weights tensor, but instead they are non-zero and therefore those weights are being trained, which is not the behavior I'm trying to achieve.

understanding output shape of keras Conv2DTranspose

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose
Here is the prototype:
keras.layers.Conv2DTranspose(
filters,
kernel_size,
strides=(1, 1),
padding='valid',
output_padding=None,
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None
)
In the documentation (https://keras.io/layers/convolutional/), I read:
If output_padding is set to None (default), the output shape is inferred.
In the code (https://github.com/keras-team/keras/blob/master/keras/layers/convolutional.py), I read:
out_height = conv_utils.deconv_length(height,
stride_h, kernel_h,
self.padding,
out_pad_h,
self.dilation_rate[0])
out_width = conv_utils.deconv_length(width,
stride_w, kernel_w,
self.padding,
out_pad_w,
self.dilation_rate[1])
if self.data_format == 'channels_first':
output_shape = (batch_size, self.filters, out_height, out_width)
else:
output_shape = (batch_size, out_height, out_width, self.filters)
and (https://github.com/keras-team/keras/blob/master/keras/utils/conv_utils.py):
def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):
"""Determines output length of a transposed convolution given input length.
# Arguments
dim_size: Integer, the input length.
stride_size: Integer, the stride along the dimension of `dim_size`.
kernel_size: Integer, the kernel size along the dimension of `dim_size`.
padding: One of `"same"`, `"valid"`, `"full"`.
output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
dilation: dilation rate, integer.
# Returns
The output length (integer).
"""
assert padding in {'same', 'valid', 'full'}
if dim_size is None:
return None
# Get the dilated kernel size
kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)
# Infer length if output padding is None, else compute the exact length
if output_padding is None:
if padding == 'valid':
dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
elif padding == 'full':
dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
elif padding == 'same':
dim_size = dim_size * stride_size
else:
if padding == 'same':
pad = kernel_size // 2
elif padding == 'valid':
pad = 0
elif padding == 'full':
pad = kernel_size - 1
dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)
return dim_size
I understand that Conv2DTranspose is kind of a Conv2D, but reversed.
Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image,
I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.
Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.
So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).
I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.
This said, I do not really understand
the meaning of the "output_padding" parameter
the interactions between parameters "padding" and "output_padding"
the various formulas in the function keras.conv_utils.deconv_length
Could someone explain this?
Many thanks,
Julien
I may have found a (partial) answer.
I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.
When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.
For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions
22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49
combinations)
will ALL yield an output dimension of 4x4.
That is because output_dimension = ceiling(input_dimension / stride).
As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.
Any of the 49 possible output dimensions would be correct.
The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.
In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.
So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)
What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)
A few pointers:
https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d
https://discuss.pytorch.org/t/the-output-size-of-convtranspose2d-differs-from-the-expected-output-size/1876/5
https://discuss.pytorch.org/t/question-about-the-output-padding-in-nn-convtrasnpose2d/19740
https://discuss.pytorch.org/t/what-does-output-padding-exactly-do-in-convtranspose2d/2688
So to answer my own questions:
the meaning of the "output_padding" parameter: see above
the interactions between parameters "padding" and "output_padding": these parameters are independant
the various formulas in the function keras.conv_utils.deconv_length
For now, I do not understand the part when output_padding is None;
I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)
Outpadding in Conv2DTranspose is also what I am concerned about when designing an autoencoder.
Assume stride is always 1. Along the encoder path, for each convolution layer, I chose padding='valid', which means that if my input image is HXW, and the filter is sized mXn, the output of the layer will be (H-(m-1))X(W-(n-1)).
In the corresponding Con2DTranspose layer along the decoder path, if I use Theano, in order to resume the input size of its corresponding Con2D, I have to chose padding='full', and out_padding = None or 0 (no difference), which implies the input size will be expanded by [m-1, n-1] around it, that is, (m-1)/2 for top and bottom, and (n-1)/2 for left and right.
If I use tensorflow, I will have to choose padding = 'same', and out_padding = 2*((filter_size-1)//2), I think that is Keras' intended behaviour.
If stride is not 1, then you will have to calculate carefully how many output paddings are to be added.
In Conv2D out_size = floor(in_size+2*padding_size-filter_size)/stride+1)
If we choose padding = 'same', Keras will automatically set padding = (filter_size-1)/2; whilst if we choose 'valid', padding_size will be set 0, which is the convention of any N-D convolutions.
Conversely, in Con2DTranspose out_size = (in_size-1)*stride+filter_size-2*padding_size
where padding_size refers to how many pixels will actually be padded caused by 'padding' option and out_padding together. Based upon the discussion above, there is no 'full' option on tensorflow, we will have to use out_padding to resume the input size of its corresponding Con2D.
Could you try and see if it works properly and let me know, please?
So in summary, I think out_padding is used for facilitating different backends.
When output_padding=None, Keras uses the deconv_output_length method to compute the output length, which sets it to:
if padding == 'valid':
length = input_length * stride + max(filter_size - stride, 0)
elif padding == 'same':
length = input_length * stride
Now in the documentation it says that if output_padding is set, the output length will be
((input_length - 1) * stride + filter_size - 2 * padding + output_padding
So using this we can figure out what the default output_padding is.
In the padding='valid' case, padding = 0 in the above, so solving for output_padding:
output_padding = max(stride - filter_size, 0)
padding='valid'
In this case, padding = 0 in the above, so solving for output_padding:
output_padding = max(stride - filter_size, 0)
and one can check that setting this results in the same as setting it to None
padding = 'same'
This case is much more mysterious, and in fact it seems to be impossible to get the same as output_padding=None by setting it to any integer. For example with strides=2 and kernel_size=2, for an output_padding larger than 1, it gives a warning that the stride must be larger than the output padding. For anything smaller than 1 it gives a warning that the size of out_backprop doesn't match computed. So the only value that works is 1, but this results in a different output shape from None.
In fact it is not implemented by setting output_padding to some default value, it is only used to compute the output shape, which then is used in the convolution method.

Resources