Tensorflow tf.keras.layers.MaxPool1D has the option to set padding='same' to make the input shape the same as the output shape. Is there something equivalent for torch.nn.MaxPool1d ? I see that torch.nn.Conv1d has the option to set padding='same', but this option seems to be missing from maxpool. What is the current workaround for this?
Note that the output of the keras version is only really the same shape as the input whenever you use it with stride and dilation set to 1, so I'll assume the same parameters in this answer.
For any uneven kernel size, this is quite easily achievable in PyTorch by setting the padding to (kernel_size - 1)/2.
For an even kernel size, both sides of the input need to be padded by a different amount, and this seems not possible in the current implementation of MaxPool1d.
Related
I'm trying to build an autoencoder like network but I don't know how to specify the output shape of the network. I have an input of size mxn and a paired expected output of size pxq. I've seen
Calculate the Output size in Convolution layer
Getting the output shape of deconvolution layer using tf.nn.conv2d_transpose in tensorflow
but is there a way force an output shape without having to work out a bunch of math for every input shape?
I really don't think there is a way to do this (tough I will be happy to learn otherwise), since this output shape of a conv layer of any kind is the result of a mathematical operation (convolution) with several parameters. because of that, the resulting shape has to be one of the possible shapes, based on the input tensor and the parameters (stride, kernel size and so on).
this is in contrast to a dense (fully-connected layer), where you can get any shape you want as long as its a single number (4, 60 or 5000 - but not (60,60)).
one small trick that can help you in this kind of situations sometimes is to get the shape of the previous layer print it so you know what parameters you need for the next layer and make sure your calculations are correct:
import keras.backend as K
x = Conv2D()(x) # or any other layer
shape = K.int_shape(x)
print(shape)
x = Conv2D()(x)
Any way to know how many times a convolution kernel is used for one inference in Keras? Of course, this will be a 'high' number because the kernel is applied many times. Any way to know this given the model?
Maybe I'm misunderstanding your question, but isn't that just the height and width of the output shape of your convolutional layer (assuming 2d convolution)? So, if your conv2d output shape is (batch_size, height, width, features) it means that the kernel was applied height*width times, each time generating one new "pixel" with features channels.
Unfortunately we don't have any rule of thumb for this,
You have to try applying kernel and see what fits your model.
As far as I know, the input tuple enters from the convolution blocks.
So if we want to change the input_tuple shape, modifying convolutions would make sense.
Why do we need to include_top=False and remove the fully connected layers at the end?
On the other hand, if we have different number of classes,Keras has an option to change the softmax layer using no_of_classes
I know that I am the one missing something here. Please help me
Example: For Inception Resnet V2
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (299, 299, 3) (with
'channels_last' data format) or (3, 299, 299) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 139. E.g. (150, 150, 3) would be one
valid value.
include_top: whether to include the fully-connected layer at the top
of the network.
https://keras.io/applications/#inceptionresnetv2
This is simply because the fully connected layers at the end can only take fixed size inputs, which has been previously defined by the input shape and all processing in the convolutional layers. Any change to the input shape will change the shape of the input to the fully connected layers, making the weights incompatible (matrix sizes don't match and cannot be applied).
This is a specific problem to fully connected layers. If you use another layer for classification, such as global average pooling, then one would not have this problem.
Consider an input layer in keras as:
model.add(layers.Dense(32, input_shape=(784,)))
What this says is input is a 2D tensor where axix=0 (batch dimension) is not specified while axis=1 is 784. Axis=0 can take any value.
My question is: isnt this style confusing?
Ideally, should it not be
input_shape=(?,784)
This reflects axis=0 is wildcard while axis=1 should be 784
Any particular reason why it is so ? Am I missing something here ?
The consistency in this case is between the sizes of the layers and the size of the input. In general, the shapes are assumed to represent the nature of the data; in that sense, the batch dimension is not part of the data itself, but rather how you group it for training or evaluation. So, in your code snippet, it is quite clear that you have inputs with 784 features and a first layer producing a vector of 32 features. If you want to explicitly include the batch dimension, you can use instead batch_input_shape=(None, 784) (this is sometimes necessary, for example if you want to give batches of a fixed size but with an additional time dimension of unknown size). This is explained in the Sequential model guide, but also matches the documentation of the Input layer, where you can give a shape or batch_shape parameter (analogous to input_shape or batch_input_shape).
This layer in not ready documented very well and I'm having a bit of trouble figuring out exactly how to use it.
I'm Trying something like:
input_img = Input(shape=(1, h, w))
x = Convolution2D(16, 7, 7, activation='relu', border_mode='valid')(input_img)
d = Deconvolution2D(1, 7, 7, (None, 1, 2*h, 2*w))
x = d(x)
but when I try to write d.output_shape, I get the original shape of the image instead of twice that size (which is what I was expecting).
Any help will be greatly appreciated!
Short answer: you need to add subsample=(2,2) to Deconvolution2D if you wish the output to truly be twice as large as the input.
Longer answer: Deconvolution2D is severely undocumented and you have to go through its code to understand how to use it.
First, you must understand how the deconvolution layer works (skip this if you already know all the details). Deconvolution, unlike what its name suggest, is simply applying the back-propgation (gradient calculation method) of a standard convolution layer on the input to the deconvolution layer. The "kernel size" of the deconvolution layer is actually the kernel size of the virtual convolution layer of the backprop step mentioned above. While given the size of a convolution kernel and its stride, it is straightforward to compute the output shape of the convolution layer (assuming no padding it's (input - kernel) // stride + 1), but the reverse is not true. In fact, there can be more than one possible input shapes that matches a given output shape of the convolution layer (this is because integer division isn't invertible). This means that for a deconvolution layer, the output shape cannot be directly determined simply from the input shape (which is implicitly known), kernel size and stride - this is why we need to know the output shape when we initialize the layer. Of course, because of the way the deconvolution layer is defined, for some input shapes you'll get holes in its output which are undefined, and if we forbid these cases then we actually can deduce the output shape.
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).