CNN: what if change some convolution layers with others - conv-neural-network

Generally at the end of CNN there is a linear(fully connected layer) layers. But I have recently thought what if i change some convolution layers in CNN with linear layers? Is it useful? if yes where? or if no why not? I would really like to listen your opinion.

Related

Conv2D filters and CNN architecture

I am currently pursuing undergraduation, I am working on CNN model to recognize Telegu characters.
This Questions has two parts,
I have a (32,32,1) shape Telegu character images, I want to train my CNN model to recognize the character. So, what should be my model architecture and how to decide the architecture, no of parameters and hidden layers. I know that my case is exactly same as handwritten digit recognition, but I want to know how to decide those parameters. Is there any common practice in building such architecture.
Operation Conv2D (32, (5,5)) means 32 filters of size 5x5 are applied on to the input, my question is are these filters all same or different, if different what kind of filters are initialized and who decides them?
I tried to surf internet but everywhere I go, the answer I get is Conv2D operation applies filters on input and does the convolution operation.
To decide which model architecture would be best, you need to experiment. Thats the only way. As you want to classify, VGG architecture would be a good starting point I believe. You need to experiment with number of parameters as it depends on your problem. You can use Keras Tuner for it: https://keras.io/keras_tuner/
For kernel initialization, as far as I know convolutional layers in Keras uses Glorot Uniform Initialization but you can change that by using kernel_initializer parameter. Long story short, convolutional layers are initialized with a distribution function and as training goes filters change the values inside, which is learning process. https://keras.io/api/layers/initializers
Edit: I forgot to inform you that I suggest VGG architecture but in a way you downsize the models a lot. Your input shape is little so if your model is too much deep, you will overfit really quickly.

Differences in encoder - decoder models between Keras and Pytorch

There seem to be significant, fundamental differences in construction of encoder-decoder models between keras and pytorch. Here is keras' enc-dec blog and here is pytorch's enc-dec blog.
Some differences I noticed are the following:
Keras' model directly feeds input to LSTM layer. Whereas Pytorch uses an embedding layer for both the encoder and decoder.
Pytorch uses an embedding layer with no activation in the encoder but uses relu activation for the embedding layer in the decoder.
Given these observations, my questions are the following:
My understanding is the following, is it correct? The embedding layer is not strictly required but it helps in finding a better and denser representation of the input. It is optional and you can still build a good model without the embedding layer (dependent on the problem). This is why Keras chose not to use it in this particular example. Is this a sound reason or is there more to the story?
Why use an activation for the embedding layer in the decoder but not the encoder?
Why use 'relu' as the activation instead of 'tanh', etc for the embedding layer? What's the intuition here? I've only seen 'relu' applied to data that has spatial relation, not temporal relation.
You have a wrong understanding of encoder-decoder models. First of all, please note Keras and Pytorch are two deep learning frameworks, while encoder-decoder is a type of neural network architecture. So, you need to understand how encoder-decoder works in the first place and then revise their architecture as per your need. Now, let me come back to your questions.
Embedding layer converts one-hot encoding representations into low-dimensional vector representations. For example, we have a sentence I love programming. We want to translate this sentence into German using an encoder-decoder network. So, the first step is to first convert the words in the input sentence into a sequence of vector representations, and this can be done using an embedding layer. Please note, the use of Keras or Pytorch doesn't matter. You can think, how would you give a natural language sentence as input to an LSTM? Obviously, you first need to convert them into vectors.
There is no such rule that you should use an activation layer in the embedding layer for the decoder, but not in the encoder. Remember, activation functions are non-linear functions. So, applying a non-linearity has different consequences but it has nothing to do with the encoder-decoder framework.
Again, the choice of activation function depends on other factors, not on encoder or decoder or a specific type of neural network architecture. I suggest you read the characteristics of the popular activation functions that are used in neural networks. Also, do not come into conclusions after observing a few use cases. Such conclusions are dangerous.

What is the difference between density of input layer and input_dim in Keras library?

I have a question about the terms of MLP in Keras.
what does the density of a layer mean?
is it the same as the number of neurons? if it is, so what's the role of input_dim?
I have never head of the "density" of a layer in the context of vanilla feed forward networks. I would assume it refers to the number of neurons, but really it depends on context.
Input layer with a certain dimension and the first hidden layer with input_dim argument are both equivalent ways to handle input in Keras.

Resolution preserving Fully Convolutional Network

I am new to ML and Pytorch and I have the following problem:
I am looking for a Fully Convolutional Network architecture in Pytorch, so that the input would be an RGB image (HxWxC or 480x640x3) and the output would be a single channel image (HxW or 480x640). In other words, I am looking for a network that will preserve the resolution of the input (HxW), and will loose the channel dimension. All of the networks that I've came across (ResNet, Densenet, ...) end with a fully connected layer (without any upsampling or deconvolution). This is problematic for two reasons:
I am restricted with the choice of the input size (HxWxC).
It has nothing to do with the output that I expect to get (a single channel image HxW).
What am I missing? Why is there even a FC layer? Why is there no up-sampling, or some deconvolution layers after feature extraction? Is there any build-in torchvision.model that might suit my requirements? Where can I find such pytorch architecture? As I said, I am new in this field so I don't really like the idea of building such a network from scratch.
Thanks.
You probably came across the networks that are used in classification. So they end up with a pooling and a fully connected layer to produce a fixed number of categorical output.
Have a look at Unet
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
Note: the original unet implementation use a lot of tricks.
You can simply downsample and then upsample symmetrically to do the work.
Your kind of task belongs to dense classification tasks, e.g. segmentation. In those tasks, we use fully convolution nets (see here for the original paper). In the FCNs you don't have any fully-connected layers, because when applying fully-connected layers you lose spatial information which you need for the dense prediction. Also have a look at the U-Net paper. All state-of-the art architectures use some kind of encoder-decoder architecture extended for example with a pyramid pooling module.
There are some implementations in the pytorch model zoo here. Search also Github for pytorch implementations for other networks.

How does a convolution kernel get trained in a CNN?

In a CNN, the convolution operation 'convolves' a kernel matrix over an input matrix. Now, I know how a fully connected layer makes use of gradient descent and backpropagation to get trained. But how does the kernel matrix change over time?
There are multiple ways in which the kernel matrix is initialized as mentioned here, in the Keras documentation. However, I am interested to know how it is trained? If it uses backpropagation too, then is there any paper that describes in detail the training process?
This post also raises a similar question, but it is unanswered.
Here you have a well explained post about backpropagation for Convolutional layer. In short, it is also gradient descent just like with FC layer. In fact, you can effectively turn a Convolutional layer into a Fuly Connected layer as explained here.

Resources