I am using Keras to make a CNN, and I want to visualize the model with plot_model().
When I look at the shape of the Conv2d layers, there is a thing that I can't figure out.
Let's say my Conv2d layer has kernel size [8 x 8], stride is [4 by 4], padding is 'same' and I want 16 feature maps.
Input shape to this layer is [None, 3, 160, 320] and output is [None,1,40,16].
'None' is samples, but what is 1 and 40? I guess 16 is number of feature maps?
Since I implemented padding = 'same', shouldn't the image size out have the same width and height as input, or isn't this the same thing?
Thanks!
Well, since you're using "strides", you'll never have the same shape.
Your convolutional filter (which can be seen as a sliding window) is jumping four pixels in its sliding.
As a result, you get your final shape divided by 4 (and rounded up).
3/4 rounded up = 1
160/4 = 40
16 is the number of feature maps, indeed.
Related
In the leading DeepLearning libraries, does the filter (aka kernel or weight) in the convolutional layer convolves also across the "channel" dimension or does it take all the channels at once?
To make an example, if the input dimension is (60,60,10) (where the last dimension is often referred as "channels") and the desired output number of channels is 5, can the filter be (5,5,5,5) or should it be (5,5,10,5) instead ?
It should be (5, 5, 10, 5). Conv2d operation is just like Linear if you ignore the spatial dimensions.
From TensorFlow documentation [link]:
Given an input tensor of shape batch_shape + [in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following:
Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.
It takes all channels at once, so 5×5×10×5 should be right.
julia> using Flux
julia> c = Conv((5,5), 10 => 5); # make a layer, 10 channels to 5
julia> c.weight |> summary
"5×5×10×5 Array{Float32, 4}"
julia> c(randn(Float32, 60, 60, 10, 1)) |> summary # check it works
"56×56×5×1 Array{Float32, 4}"
julia> Conv(rand(Float32, (5,5,5,5))) # different weight size
Conv((5, 5), 5 => 5) # 630 parameters
I'm building a many to many network in Keras, using an LSTM. I have sequences of varying length (labels always have the same length as the sequence they describe). To handle the varying length and after searching on other SO posts I've found padding + masking to be the best solution.
This is my model :
So I have n (874) samples of max_len (24) padded sequences with 25 features each. But how do I handle my labels ? Do I pad them too ?
If I pad them like in the same way as my X (with the same special value) I get this :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
All is fine except I get the following error :
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 24 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast_1)' with input shapes: [1,24].
Searching up this error leads to post about removing retun_sequences=True from my LSTM layer, except I don't want that since each of my timesteps are labelled...
And if I don't pad them, they can't be converted to a tensor to be used by tensorflow.
Edit:
Explanatory illustration of the architecture I want to achieve, courtesy of this answer :https://stackoverflow.com/a/52092176/7732923
Problem found :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
y_train contained 873 samples of length 24, with one label for each timestep as I said, but, probably due to the possibility of wanting multi-label classification, each label for each timestep must be contained in a list, so the right shape for y_train must be :
y_train shape : (873, 24, 1)
So it was just about wrapping each label between [] during preprocessing, the architecture is sound and works (and now I'm left to determine how well but that's another beast ahah)
I am trying to use a convolution layer to convolve a grayscale (single layer) image (stored as a numpy array). Here is the code:
conv1 = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 33)
tensor1 = torch.from_numpy(img_gray)
out_2d_np = conv1(tensor1)
out_2d_np = np.asarray(out_2d_np)
I want my kernel to be 33x33 and the number of output layers should be equal to the number of input layers, which is 1 as the image's RGB channels are summed. Whenout_2d_np = conv1(tensor1) is run it yields the following runtime error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 1 1 33 33, but got 2-dimensional input of size [246, 248] instead
Any idea on how I can solve this? I specifically want to use the torch.nn.Conv2d() class/function.
Thanks in advance for any help!
pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions: mini-batch dim, channel dim, and the two spatial dimensions.
Your input tensor has only two spatial dimensions and it lacks the mini-batch and channel dimensions. In your case these two dimensions are actually singelton dimensions (dimensions with size=1).
try:
conv1(tensor1[None, None, ...])
In the paper 'Fully Convolutional Networks for Semantic Segmentation'
the author distinguishes between input stride and output stride in the context of deconvolution.
How do these terms differ from each other?
Input stride is the stride of the filter . How much you shift the filter in the output .
Output Stride this is actually a nominal value . We get feature map in a CNN after doing several convolution , max-pooling operations . Let's say our input image is 224 * 224 and our final feature map is 7*7 .
Then we say our output stride is : 224/7 = 32 (Approximate of what happened to the image after down sampling .)
This tensorflow script describe what is this output stride , and how to use in FCN which is the case of dense prediction .
one uses inputs with
spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
this case the feature maps at the ResNet output will have spatial shape
[(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
and corners exactly aligned with the input image corners, which greatly
facilitates alignment of the features to the image. Using as input [225, 225]
images results in [8, 8] feature maps at the output of the last ResNet block.
I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here