Where is the suitable location to use Upsampling llayer in CNN? - conv-neural-network

I work on multitask learning convolutional neural netowrk. It performs the semantic segmentationa and pixel-wise depth estimation task. After decoder output, feature maps resolution is (240,240) which i want to upsample to achieve input resolution of (480,480). Can anyone help to understand that which of the following locations of upsampling layer could results into better performance ? or does appropriate location of upsampling layer has any significant impact on result ?if yes then could you please elaborate it ?
Apply Umsampling before final output layer and use strided or padded convolution layer to mantain resolution as input image (480,480)?
Apply upsampling after final output layer
I have trained network, in which, I have used upsampling layer before the final prediction and then used padding in convolution layer to maintain desired resolution. I read that if we use upsampling before final prediction, then it enables network to learn a more spatial information and make finer predictions since it deals with higher resolution feature maps. But it will increase computation burden.

Related

Conv2D filters and CNN architecture

I am currently pursuing undergraduation, I am working on CNN model to recognize Telegu characters.
This Questions has two parts,
I have a (32,32,1) shape Telegu character images, I want to train my CNN model to recognize the character. So, what should be my model architecture and how to decide the architecture, no of parameters and hidden layers. I know that my case is exactly same as handwritten digit recognition, but I want to know how to decide those parameters. Is there any common practice in building such architecture.
Operation Conv2D (32, (5,5)) means 32 filters of size 5x5 are applied on to the input, my question is are these filters all same or different, if different what kind of filters are initialized and who decides them?
I tried to surf internet but everywhere I go, the answer I get is Conv2D operation applies filters on input and does the convolution operation.
To decide which model architecture would be best, you need to experiment. Thats the only way. As you want to classify, VGG architecture would be a good starting point I believe. You need to experiment with number of parameters as it depends on your problem. You can use Keras Tuner for it: https://keras.io/keras_tuner/
For kernel initialization, as far as I know convolutional layers in Keras uses Glorot Uniform Initialization but you can change that by using kernel_initializer parameter. Long story short, convolutional layers are initialized with a distribution function and as training goes filters change the values inside, which is learning process. https://keras.io/api/layers/initializers
Edit: I forgot to inform you that I suggest VGG architecture but in a way you downsize the models a lot. Your input shape is little so if your model is too much deep, you will overfit really quickly.

Resolution preserving Fully Convolutional Network

I am new to ML and Pytorch and I have the following problem:
I am looking for a Fully Convolutional Network architecture in Pytorch, so that the input would be an RGB image (HxWxC or 480x640x3) and the output would be a single channel image (HxW or 480x640). In other words, I am looking for a network that will preserve the resolution of the input (HxW), and will loose the channel dimension. All of the networks that I've came across (ResNet, Densenet, ...) end with a fully connected layer (without any upsampling or deconvolution). This is problematic for two reasons:
I am restricted with the choice of the input size (HxWxC).
It has nothing to do with the output that I expect to get (a single channel image HxW).
What am I missing? Why is there even a FC layer? Why is there no up-sampling, or some deconvolution layers after feature extraction? Is there any build-in torchvision.model that might suit my requirements? Where can I find such pytorch architecture? As I said, I am new in this field so I don't really like the idea of building such a network from scratch.
Thanks.
You probably came across the networks that are used in classification. So they end up with a pooling and a fully connected layer to produce a fixed number of categorical output.
Have a look at Unet
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
Note: the original unet implementation use a lot of tricks.
You can simply downsample and then upsample symmetrically to do the work.
Your kind of task belongs to dense classification tasks, e.g. segmentation. In those tasks, we use fully convolution nets (see here for the original paper). In the FCNs you don't have any fully-connected layers, because when applying fully-connected layers you lose spatial information which you need for the dense prediction. Also have a look at the U-Net paper. All state-of-the art architectures use some kind of encoder-decoder architecture extended for example with a pyramid pooling module.
There are some implementations in the pytorch model zoo here. Search also Github for pytorch implementations for other networks.

How should I change the model if accuracy is very low?

I am new to deep learning and I want to build an image classifier using CNN(keras). I have built a model with 2 convolution layers (filters = 32 , kernel = 3x3) followed by a MaxPooling layer(2x2) and this repeated 2 times. Finally 2 fully connected layers. I am getting an accuracy of 50%. My question is how do we choose the model to begin with. Like how do we decide that there should be 2 convolution layers followed by a MaxPooling layer or 1 convolution and 1 MaxPooling layer. Also how do we choose the number of filters in each convolution layer and the kernel size.
If my model is not working then how to decide what changes to be made to the model .
model = Sequential()
model.add(Convolution2D(32,3,3,input_shape=
(280,280,3),activation='relu'))
model.add(Convolution2D(32,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(output_dim=256 , activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(output_dim=5,activation='softmax'))
I am getting an accuracy of 50% after 5 epochs. What changes should i make in my model?
Let us first start with the more straightforward part. Knowing the number of input and output layers and the number of their neurons is the easiest part. Every network has a single input layer and a single output layer. The number of neurons in the input layer equals the number of input variables in the data being processed. The number of neurons in the output layer equals the number of outputs associated with each input.
But the challenge is knowing the number of hidden layers and their neurons.
The answer is you cannot analytically calculate the number of layers or the number of nodes to use per layer in an artificial neural network to address a specific real-world predictive modeling problem.
The number of layers and the number of nodes in each layer are model hyperparameters that you must specify and learn.
You must discover the answer using a robust test harness and controlled experiments. Regardless of the heuristics, you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset.
Again the filter size is one such hyperparameter you should specify before training your network.
For an image recognition problem, if you think that a big amount of pixels are necessary for the network to recognize the object you will use large filters (as 11x11 or 9x9). If you think what differentiates objects are some small and local features you should use small filters (3x3 or 5x5).
These are some tips but do not exist any rules.
There are many tricks to increase the accuracy of your deep learning model. Kindly refer to this link Improve deep learning model performance.
Hope this will help you.

keras RNN w/ local support and shared weights

I would like to understand how Keras sets up weights to be shared. Specifically, I would like to use a convolutional 1D layer for processing a time-frequency representation of an audio signal and feed it into an RNN (perhaps a GRU layer) that has:
local support (e.g. like the Conv1D layer with a specified kernel size). Things that are far away in frequency from an output are unlikely to affect the output.
Shared weights, that is I train only a single set of weights across all of the neurons in the RNN layer. Similar inferences should work at lower or higher frequencies.
Essentially, I'm looking for many of the properties that we find in the 2D RNN layers. I've been looking at some of the Keras source code for the convnets to try to understand how weight sharing is implemented, but when I see the weight allocation code in the layer build methods (e.g. in the _Conv class), it's not clear to me how the code is specifying that the weights for each filter are shared. Is this buried in the backend? I see that the backend call is to a specific 1D, 2D, or 3D convolution.
Any pointers in the right direction would be appreciated.
Thank you - Marie

Difference of filters in convolutional neural network

When creating a convolutional neural network (CNN) (e.g. as described in
https://cs231n.github.io/convolutional-networks/) the input layer is connected with one or several filters, each representing a feature map. Here, each neuron in a filter layer is connected with just a few neurons of the input layer.
In the most simple case each of my n filters has the same dimensionality and uses the same stride.
My (tight-knitted) questions are:
How is ensured that the filters learn different features, although they are trained with the same patches?
"Depends" the learned feature of a filter on the randomly assigned values (for weights and biases) when initiating the network?
I'm not an expert, but I can speak a bit to your questions. To be honest, it sounds like you already have the right idea: it's specifically the initial randomization of weights/biases in the filters that fosters their tendencies to learn different features (although I believe randomness in the error backpropagated from higher layers of the network can play a role as well).
As #user2717954 indicated, there is no guarantee that the filters will learn unique features. However, each time the error of a training sample or batch is backpropagated to a given convolutional layer, the weights and biases of each filter is slightly modified to improve the overall accuracy of the network. Since the initial weights and biases are all different in each filter, it's possible (and likely given a suitable model) for most of the filters to eventually stabilize to values representing a robust set of unique features.
In addition to proper randomization of weights, this also demonstrates why it's crucial to use convolutional layers with an adequate number of filters. Without enough filters, the network is fundamentally limited such that there are important, useful patterns at the given layer of abstraction that simply can't be represented by the network.

Resources