Does conv2d or any other way by which we can form CNNs support stride, i.e., the amount of moving the filter over the input? the default value of stride for normal 2D convolution is 1 but I would like to change it.
The function T.nnet.conv2d has a parameter subsample which permits you to do this.
Related
I have a question about the function 'Convolution2D' in keras.
model.add(Convolution2D(
nb_filter=32,
nb_row=5,
nb_col=5,
border_mode='same',
input_shape=(1,28,28),
))
By doing this, 32 5*5 filters will be used to convolute the input. But only size of the filters is specified, what does these filters look like? Are they all the same or random numbers in each one?
There is only one filter that is convolved over the layer input.
The filter is initialized using the kernel_initializer which is glorot_uniform
by default.
From the documentation on glorot_uniform:
It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor
Note that the filter changes during the training of the layer. It is optimized to recognize features that help your model make a correct classification.
I found a good explanation here.
From the document, I know SeparableConv2D is a combination of depthwise and pointwise operation. However, when I call
SeparableConv2D(100, 5, input_shape=(416,416,10)
# total parameters is 1350
model.add(DepthwiseConv2D(5, input_shape=(416,416,10)))
model.add(Conv2D(100, 1))
# total parameters is 1360
Does it mean SeparableConv2D does not use bias in depthwise phase by default?
Thanks.
Correct, checking the source code (I did this for tf.keras but I suppose it is the same for standalone keras) shows that in SeparableConv2D, the separable convolution works using only filters, no biases, and a single bias vector is added at the end. The second version, on the other hand, has biases for both DepthwiseConv2D and Conv2D.
Given that convolution is a linear operation and you are using no non-linearity inbetween depthwise and 1x1 convolution, I would suppose that having two biases is unnecessary in this case, similar to how you don't use biases in a layer that is followed by batch normalization, for example. As such, the extra 10 parameters wouldn't actually improve the model (nor should they really hurt either).
ResNet50 is here: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
In a 'conv_block', the first layer is like this:
x = Conv2D(filters1 = 64, # number of filters
kernel_size=(1, 1), # height/width of filters
strides=(2, 2) # stride
)(input_tensor)
My question is:
Isn't this layer going to miss some pixels?
This 1x1 convolutions only look at 1 pixel, and then move 2 pixels(stride=2).
It was mentioned in the original paper of Resnet:
The convolutional layers mostly have 3×3 filters and
follow two simple design rules: (i) for the same output
feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by
convolutional layers that have a stride of 2
So you may consider it as a replacement for Pooling layer, and it also reduces calculation complexity of the whole model comparing to calculate the whole activation map and then pooling it.
Consider an input layer in keras as:
model.add(layers.Dense(32, input_shape=(784,)))
What this says is input is a 2D tensor where axix=0 (batch dimension) is not specified while axis=1 is 784. Axis=0 can take any value.
My question is: isnt this style confusing?
Ideally, should it not be
input_shape=(?,784)
This reflects axis=0 is wildcard while axis=1 should be 784
Any particular reason why it is so ? Am I missing something here ?
The consistency in this case is between the sizes of the layers and the size of the input. In general, the shapes are assumed to represent the nature of the data; in that sense, the batch dimension is not part of the data itself, but rather how you group it for training or evaluation. So, in your code snippet, it is quite clear that you have inputs with 784 features and a first layer producing a vector of 32 features. If you want to explicitly include the batch dimension, you can use instead batch_input_shape=(None, 784) (this is sometimes necessary, for example if you want to give batches of a fixed size but with an additional time dimension of unknown size). This is explained in the Sequential model guide, but also matches the documentation of the Input layer, where you can give a shape or batch_shape parameter (analogous to input_shape or batch_input_shape).
This layer in not ready documented very well and I'm having a bit of trouble figuring out exactly how to use it.
I'm Trying something like:
input_img = Input(shape=(1, h, w))
x = Convolution2D(16, 7, 7, activation='relu', border_mode='valid')(input_img)
d = Deconvolution2D(1, 7, 7, (None, 1, 2*h, 2*w))
x = d(x)
but when I try to write d.output_shape, I get the original shape of the image instead of twice that size (which is what I was expecting).
Any help will be greatly appreciated!
Short answer: you need to add subsample=(2,2) to Deconvolution2D if you wish the output to truly be twice as large as the input.
Longer answer: Deconvolution2D is severely undocumented and you have to go through its code to understand how to use it.
First, you must understand how the deconvolution layer works (skip this if you already know all the details). Deconvolution, unlike what its name suggest, is simply applying the back-propgation (gradient calculation method) of a standard convolution layer on the input to the deconvolution layer. The "kernel size" of the deconvolution layer is actually the kernel size of the virtual convolution layer of the backprop step mentioned above. While given the size of a convolution kernel and its stride, it is straightforward to compute the output shape of the convolution layer (assuming no padding it's (input - kernel) // stride + 1), but the reverse is not true. In fact, there can be more than one possible input shapes that matches a given output shape of the convolution layer (this is because integer division isn't invertible). This means that for a deconvolution layer, the output shape cannot be directly determined simply from the input shape (which is implicitly known), kernel size and stride - this is why we need to know the output shape when we initialize the layer. Of course, because of the way the deconvolution layer is defined, for some input shapes you'll get holes in its output which are undefined, and if we forbid these cases then we actually can deduce the output shape.
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).