Convolutional layer: does the filter convolves also trough the nlayers_in or it take all the dimensions? - keras

In the leading DeepLearning libraries, does the filter (aka kernel or weight) in the convolutional layer convolves also across the "channel" dimension or does it take all the channels at once?
To make an example, if the input dimension is (60,60,10) (where the last dimension is often referred as "channels") and the desired output number of channels is 5, can the filter be (5,5,5,5) or should it be (5,5,10,5) instead ?

It should be (5, 5, 10, 5). Conv2d operation is just like Linear if you ignore the spatial dimensions.
From TensorFlow documentation [link]:
Given an input tensor of shape batch_shape + [in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following:
Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.

It takes all channels at once, so 5×5×10×5 should be right.
julia> using Flux
julia> c = Conv((5,5), 10 => 5); # make a layer, 10 channels to 5
julia> c.weight |> summary
"5×5×10×5 Array{Float32, 4}"
julia> c(randn(Float32, 60, 60, 10, 1)) |> summary # check it works
"56×56×5×1 Array{Float32, 4}"
julia> Conv(rand(Float32, (5,5,5,5))) # different weight size
Conv((5, 5), 5 => 5) # 630 parameters

Related

PyTorch high-dimensional tensor through linear layer

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?
In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

PyTorch: Convolving a single channel image using torch.nn.Conv2d

I am trying to use a convolution layer to convolve a grayscale (single layer) image (stored as a numpy array). Here is the code:
conv1 = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 33)
tensor1 = torch.from_numpy(img_gray)
out_2d_np = conv1(tensor1)
out_2d_np = np.asarray(out_2d_np)
I want my kernel to be 33x33 and the number of output layers should be equal to the number of input layers, which is 1 as the image's RGB channels are summed. Whenout_2d_np = conv1(tensor1) is run it yields the following runtime error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 1 1 33 33, but got 2-dimensional input of size [246, 248] instead
Any idea on how I can solve this? I specifically want to use the torch.nn.Conv2d() class/function.
Thanks in advance for any help!
pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions: mini-batch dim, channel dim, and the two spatial dimensions.
Your input tensor has only two spatial dimensions and it lacks the mini-batch and channel dimensions. In your case these two dimensions are actually singelton dimensions (dimensions with size=1).
try:
conv1(tensor1[None, None, ...])

Weird output for weights/filters in CNN

My task is to visualize the plotted weights in a cnn layer, now when I passed parameters, filters = 32 and kernel_size = (3, 3), I am expecting the output to be 32 matrices each of 3x3 size by using .get_weights() function(to extract weights and biases), but I am getting a very weird nested output,
the output is as follows:
a = model.layers[0].get_weights()
a[0][0][0]
array([[ 2.87332404e-02, -2.80513391e-02,
**... 32 values ...**,
-1.55516148e-01, -1.26494586e-01, -1.36454999e-01,
1.61165968e-02, 7.63138831e-02],
[-5.21791205e-02, 3.13560963e-02, **... 32 values ...**,
-7.63987377e-02, 7.28923678e-02, 8.98564830e-02,
-3.02852653e-02, 4.07049060e-02],
[-7.04478994e-02, 1.33816227e-02,
**... 32 values ...**, -1.99537817e-02,
-1.67200342e-01, 1.15980692e-02]], dtype=float32)
I want to know that why I am getting this type of weird output and how can I get the weights in the perfect shape. Thanks in advance.
Weights in neural network are values that represent connection strength between input nodes and output nodes(or nodes in next layer).
Conv2D layer's weights usually have shape of (H, W, I, O), where:-
H is kernel height
W is kernel width
I is number of input channels
O is number of output channels
Conv2D weights can be interpreted as connection strength between a patch of input channels and nodes in output filter/feature map. This way you would have weights of shape(H, W) between each Input channels and each Output Channels. It should be noted that the weights are shared among different patches of the same channel.
Consider the following convolution of (8, 8, 1) input with (2, 2) kernel and output with (8, 8, 1). The weights of this layer has shape (2, 2, 1, 1)
The same input can be used to produce 2 feature map using 2 (2, 2) filters as follows. Now the shape of the weights would be (2, 2,1, 2).
Hope this will clarify how to interpret the shape of convolutional layers.
The shape of the kernel weights from a Conv2D layer is (kernel_size[0], kernel_size[1], n_input_channels, filters). So in your case
a = model.layers[0].get_weights()
print(a[0].shape)
# should print (3,3,z,32) if your input has shape (x, y, z)
If you want to print the weights from one of the filters, you can do
a[0][:,:,:,0]

Pytorch - meaning of a command in a basic "forward" pass

I am new with Pytorch, and will be glad if someone will be able to help me understand the following (and correct me if I am wrong), regarding the meaning of the command x.view in Pytorch first tutorial, and in general about the input of convolutional layers and the input of fully-connected layers:
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
As far as I understand, an input 256X256 image to a convolutional layer is inserted in its 2D form (i.e. - a 256X256 matrix, or a 256X256X3 in the case of a color image). Nevertheless, when we insert an image to a fully-connected linear layer, we need to first reshape the 2D image into a 1D vector (am I right? Is this true also in general (or only in Pytorch)? ). Is this why we use the command “x = x.view(-1, 16 * 5 * 5)” before inserting x into the fully-connected layers?
If the input image x would be 3D (e.g. 256X256X256), would the syntax of the given above “forward” function remain the same?
Thanks a lot in advance
Its from Petteri Nevavuori's lecture notes and shows how a feature map is produced from an image I with a kernel K. With each application of the kernel a dot product is calculated, which effectively is the sum of element-wise multiplications between I and K in an K-sized area within I.
You could say that kernel looks for diagonal features. It then searches the image and finds a perfect matching feature in the lower left corner. Otherwise the kernel is able to identify only parts the feature its looking for. This why the product is called a feature map, as it tells how well a kernel was able to identify a feature in any location of the image it was applied to.
Answer adapted from: https://discuss.pytorch.org/t/convolution-input-and-output-channels/10205/3
Let's say we consider an input image of shape (W x H x 3) where input volume has 3 channels (RGB image). Now we would like to create a ConvLayer for this image.
Each kernel in the ConvLayer will use all input channels of the input volume. Let’s assume we would like to use a 3 by 3 kernel. This kernel will have 27 weights and 1 bias parameter, since (W * H * input_Channels = 3 * 3 * 3 = 27 weights).
The number of output channels is the number of different kernels used in the ConvLayer. If we would like to output 64 channels, we need to define ConvLayer such that it uses 64 different 3x3 kernels.
If you check out the documentation of Conv2d, we can define a ConvLayer mimicking above scenario as follows.
nn.Conv2d(3, 64, 3, stride=1)
Where in_channels = 3, out_channels = 64, kernel_size = 3x3. Check out what is stride in the documentation.
If you check out the implementation of Linear layer, you would see the underlying mathematical equation that a linear operation mimics is: y = Ax + b.
According to pytorch documentation of linear layer, we can see it expects an input of shape (N,∗,in_features) and the output is of shape (N,∗,out_features). So, in your case, if the input image x is of shape 256 x 256 x 256, and you want to transform all the (256*256*256) features to a specific number of feature, you can define a linear layer as:
llayer = nn.Linear(256*256*256, num_features)

Does 1D Convolutional layer support variable sequence lengths?

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.
Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

Resources