Stacking of 2 convolutional layers - keras

In a convolutional layer with n neurons, trained for inputs with dimension h x w x c (height x width x channel), c usually being 3 (RGB), one trains n x c kernels of size k x k (and n bias values). So for each neuron i in the layer and each channel j in the input, we have a weight matrix of size k x k, we call weights_ij. The output of each neuron i=1,..,n (for input X) is as follows:
out_i = sigma ( tmp_i + bias_i)
with tmp_i = sum_{j=1,...,c} conv(X, weights_ij).
The output is then h_new x w_new x n. So basically the depth of the output coincides with the number of neurons in the first layer. h_new and w_new depend on padding and stride in the convolution.
This makes sense to me and I also checked it by coding the convolution and the summation myself and comparing the result with the result of a keras model, that only consists of this one layer. Now my acutal question: when we add a second convolutional layer, my understanding was that the output from the first layer is now a "picture" with n channels and we do exactly the same as before but with c=n (and a new number n2 of neurons in our 2nd layer).
But I also coded that and compared it with the prediction of a keras model with 2 convolutional layers and now the result is not the same. So does anyone know how the 2nd convolutional layer treats the output of the first?

Ok, I solved my problem.
Actually the problem was already present for just one layer and by stacking 2 layers the errors accumulated.
I thought when using stride=2 in the convolutional layer, one applys the convolution to the sections [0:N_k,0:N_k], [2:2+N_k,2:2+N_k], [4:4+N_k,4:4+N_k],... of the input but keras actually applys the convolution to [1:1+N_k,1:1+N_k], [3:3+N_k,3:3+N_k],...

Related

How to code Pytorch to fit a different polynomal to every column/row in an image?

Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()
Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.

Keras Not so Dense Layer

Previous layer is embedding size (V clasess,K -outputdim) - I want to introduce a weights matrix size K x T. The weights will be trainable (as will the embeddings).They generate a VxT matrix will be used downstream.
1) How might I go about this?
2) Will this mess with the gradients?
It's basically vector x Matrix .
Example- embedding vocab = 10, dim K =4. so for a particular member of vocabulary, my embedding weights is a vector size (1,4) (think row vector).
For each row vector I want to multiply a weight matrix size 4x10, yielding a 1 x 10 vector (or layer) . The weight matrix is common to all members of the vocabulary.
This 1 x 10 vector will be input for the next layer.
What you want is a Dense layer, just without a bias. A Dense layer internally has a matrix that is common for all inputs, it does not vary with the input.
So this can be implemented as:
x = Dense(10, use_bias=False)(some_input_tensor)
No activation function is needed since you just want the matrix multiplication.

Pytorch - meaning of a command in a basic "forward" pass

I am new with Pytorch, and will be glad if someone will be able to help me understand the following (and correct me if I am wrong), regarding the meaning of the command x.view in Pytorch first tutorial, and in general about the input of convolutional layers and the input of fully-connected layers:
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
As far as I understand, an input 256X256 image to a convolutional layer is inserted in its 2D form (i.e. - a 256X256 matrix, or a 256X256X3 in the case of a color image). Nevertheless, when we insert an image to a fully-connected linear layer, we need to first reshape the 2D image into a 1D vector (am I right? Is this true also in general (or only in Pytorch)? ). Is this why we use the command “x = x.view(-1, 16 * 5 * 5)” before inserting x into the fully-connected layers?
If the input image x would be 3D (e.g. 256X256X256), would the syntax of the given above “forward” function remain the same?
Thanks a lot in advance
Its from Petteri Nevavuori's lecture notes and shows how a feature map is produced from an image I with a kernel K. With each application of the kernel a dot product is calculated, which effectively is the sum of element-wise multiplications between I and K in an K-sized area within I.
You could say that kernel looks for diagonal features. It then searches the image and finds a perfect matching feature in the lower left corner. Otherwise the kernel is able to identify only parts the feature its looking for. This why the product is called a feature map, as it tells how well a kernel was able to identify a feature in any location of the image it was applied to.
Answer adapted from: https://discuss.pytorch.org/t/convolution-input-and-output-channels/10205/3
Let's say we consider an input image of shape (W x H x 3) where input volume has 3 channels (RGB image). Now we would like to create a ConvLayer for this image.
Each kernel in the ConvLayer will use all input channels of the input volume. Let’s assume we would like to use a 3 by 3 kernel. This kernel will have 27 weights and 1 bias parameter, since (W * H * input_Channels = 3 * 3 * 3 = 27 weights).
The number of output channels is the number of different kernels used in the ConvLayer. If we would like to output 64 channels, we need to define ConvLayer such that it uses 64 different 3x3 kernels.
If you check out the documentation of Conv2d, we can define a ConvLayer mimicking above scenario as follows.
nn.Conv2d(3, 64, 3, stride=1)
Where in_channels = 3, out_channels = 64, kernel_size = 3x3. Check out what is stride in the documentation.
If you check out the implementation of Linear layer, you would see the underlying mathematical equation that a linear operation mimics is: y = Ax + b.
According to pytorch documentation of linear layer, we can see it expects an input of shape (N,∗,in_features) and the output is of shape (N,∗,out_features). So, in your case, if the input image x is of shape 256 x 256 x 256, and you want to transform all the (256*256*256) features to a specific number of feature, you can define a linear layer as:
llayer = nn.Linear(256*256*256, num_features)

What does Dense do?

What is the meaning of the two Dense in this code?
self.model.add(Flatten())
self.model.add(Dense(512))
self.model.add(Activation('relu'))
self.model.add(Dropout(0.5))
self.model.add(Dense(10))
self.model.add(Activation('softmax'))
self.model.summary()
Dense is the only actual network layer in that model.
A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer.
It's the most basic layer in neural networks.
A Dense(10) has ten neurons. A Dense(512) has 512 neurons.
Furthermore, a dense layers applies the a non-linear transform:
f(W.X + b)
As to the effect, well in the case that W and X are a 2D tensor W.X + b is a vector and f is a element wise non-linearity like tanh, so the result is just a vector of size in the numbers of neurons
From the keras docs:
Dense implements the operation: output = activation(dot(input, kernel)
bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created
by the layer, and bias is a bias vector created by the layer (only
applicable if use_bias is True).

How to correctly get layer weights from Conv2D in keras?

I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here

Resources