I am training a neural network with 3 convolutional layers and 1 fully connected layer which further connected to a regression layer. My training data set if TID: for image quality estimation task. I am pre-processing images by doing local contrast normalization on a small 6 * 6 window, which results in pixel values ranging from -4 to +4 approximately. I am using RELU activation function in all the layers.
The problem is that the gradient values for different layers produced by my network are very small, somewhere around 2e-6 to 2e-7, which I guess are not ideal for good convergence as it is not producing expected results. I have already tried altering initialization of network weights and biases, changing learning rates, employing L1 and L2 regularization, etc., but nothing seems to be help elevate this problem.
So my first question is that is it actually a problem? or is a common thing to have such small gradient values in convolutional network? If it is a problem then what could an appropriate solution for it?
Related
To implement the CNN model for classification images we need to use sigmoid and relu function. but I am confused what is the use of these.
If you are working with a conventional CNN for image classification, the output layer has N neurons, where N is the number of image classes you want to identify. You want each output neuron to represent the probability that you have observed each image class. The sigmoid function is good for representing a probability. Its domain is all real numbers, but its range is 0 to 1.
For network layers that are not output layers, you could also use the sigmoid. In theory, any non-linear transfer function will work in the inner layers of a neural network. However, there are practical reasons not to use the sigmoid. Some of those reasons are:
Sigmoid requires a fair amount of computation.
The slope of the sigmoid function is very shallow when the input is
far from zero, which slows gradient descent learning down.
Modern neural networks have many layers, and if you have several
layers in a neural network with sigmoid functions between them, it's
quite possible to end up with a zero learning rate.
The ReLU function solves many of sigmoid's problems. It is easy and fast to compute. Whenever the input is positive, ReLU has a slope of -1, which provides a strong gradient to descend. ReLU is not limited to the range 0-1, though, so if you used it it your output layer, it would not be guaranteed to be able to represent a probability.
I applied batch normalization technique to increase the accuracy of my cnn model.The accuracy of model without batch Normalization was only 46 % but after applying batch normalization it crossed 83% but a here arisen a bif overfitting problem that the model was giving validation Accuracy only 15%. Also please tell me how to decide no of filters strides in convolution layer and no of units in dence layer
Batch normalization has been shown to help in many cases but is not always optimal. I found that it depends where it resides in your model architecture and what you are trying to achieve. I have done a lot with different GAN CNNs and found that often BN is not needed and can even degrade performance. It's purpose is to help the model generalize faster but sometimes it increases training times. If I am trying to replicate images, I skip BN entirely. I don't understand what you mean with regards to the accuracy. Do you mean it achieved 83% accuracy with the training data but dropped to 15% accuracy on the validation data? What was the validation accuracy without the BN? In general, the validation accuracy is the more important metric. If you have a high training accuracy and a low validation accuracy, you are indeed overfitting. If you have several convolution layers, you may want to apply BN after each. If you still over-fit, try increasing your strides and kernel size. If that doesn't work you might need to look at the data again and make sure you have enough and that it is somewhat diverse. Assuming you are working with image data, are you creating samples where you rotate your images, crop them, etc. Consider synthetic data to augment your real data to help combat overfiiting.
I am new to deep learning and I want to build an image classifier using CNN(keras). I have built a model with 2 convolution layers (filters = 32 , kernel = 3x3) followed by a MaxPooling layer(2x2) and this repeated 2 times. Finally 2 fully connected layers. I am getting an accuracy of 50%. My question is how do we choose the model to begin with. Like how do we decide that there should be 2 convolution layers followed by a MaxPooling layer or 1 convolution and 1 MaxPooling layer. Also how do we choose the number of filters in each convolution layer and the kernel size.
If my model is not working then how to decide what changes to be made to the model .
model = Sequential()
model.add(Convolution2D(32,3,3,input_shape=
(280,280,3),activation='relu'))
model.add(Convolution2D(32,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(output_dim=256 , activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(output_dim=5,activation='softmax'))
I am getting an accuracy of 50% after 5 epochs. What changes should i make in my model?
Let us first start with the more straightforward part. Knowing the number of input and output layers and the number of their neurons is the easiest part. Every network has a single input layer and a single output layer. The number of neurons in the input layer equals the number of input variables in the data being processed. The number of neurons in the output layer equals the number of outputs associated with each input.
But the challenge is knowing the number of hidden layers and their neurons.
The answer is you cannot analytically calculate the number of layers or the number of nodes to use per layer in an artificial neural network to address a specific real-world predictive modeling problem.
The number of layers and the number of nodes in each layer are model hyperparameters that you must specify and learn.
You must discover the answer using a robust test harness and controlled experiments. Regardless of the heuristics, you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset.
Again the filter size is one such hyperparameter you should specify before training your network.
For an image recognition problem, if you think that a big amount of pixels are necessary for the network to recognize the object you will use large filters (as 11x11 or 9x9). If you think what differentiates objects are some small and local features you should use small filters (3x3 or 5x5).
These are some tips but do not exist any rules.
There are many tricks to increase the accuracy of your deep learning model. Kindly refer to this link Improve deep learning model performance.
Hope this will help you.
I have a question and I am not sure if it's a smart one. But I've been reading quite a lot about convolution neural networks. And so far I understand that the output layer could for example be a softmax layer for a classification problem or you could do regression in order to get a quantitative value. But I was wondering if it is possible to infer more than one parameter. For example, if I have a data and my output label is both price of the house and size of the house. I know it is not a smart example. But I just want to know if it's possible to predict two different output values in the same output layer in the convolution neural network. Or do I need to have two different convolution neural network where one predicts the size of the house and the one predicts price of the house. And how can we combine these two predictions then. And if we can do it in one convolution neural network, then how can we do that?
In your mentioned cases, the output layer is most likely a dense layer, not a convolutional one. But that's beside the point, if you want multiple outputs, then multiple output layers are often trained. So the same convolutional network can go to two separate output layers, which can be trained independently. Then you've one neural network, with two outputs. The convolutional part is often received by transfer learning, and are often frozen layers that can no longer be trained. Have a look at the figures of this paper, this shows how it can be done.
Main Problem
I cannot understand the Plot of the weights of a specific layer.
I used a method from no-learn : plot_conv_weights(layer, figsize=(6, 6))
Im using lasagne as my neural-network library.
The plot comes out fine, but I dont know how i should interpret it.
Neural Network Structure
The structure im using :
InputLayer 1x31x31
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
MaxPool2DLayer 2x2
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
MaxPool2DLayer 40x2x2
DropoutLayer
DenseLayer 96
DropoutLayer 96
DenseLayer 32
DropoutLayer 32
DenseLayer 1 as sigmoid
Here are the weights of the first 3 Layers :
** About the Images **
So for me, they look random and i cannot interpret them!
However, on Cs231, it says the following :
Conv/FC Filters. The second common strategy is to visualize the
weights. These are usually most interpretable on the first CONV layer
which is looking directly at the raw pixel data, but it is possible to
also show the filter weights deeper in the network. The weights are
useful to visualize because well-trained networks usually display nice
and smooth filters without any noisy patterns. Noisy patterns can be
an indicator of a network that hasn’t been trained for long enough, or
possibly a very low regularization strength that may have led to
overfitting
http://cs231n.github.io/understanding-cnn/
Then why mine are random?
The structure is trained and performs well for its task.
References
http://cs231n.github.io/understanding-cnn/
https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/visualize.py
Normally when you visualize the weights you want to check 2 things:
That they are smooth and cover a wide range of values, i.e. it's not a bunch of 1's and 0's. That would mean the non-linearity is being saturated.
That they have some kind of structure. Normally you tend to see oriented edges although this is more difficult to see when you have small filters like 3x3.
That being said, your weights do not appear to be saturated, but they indeed seem to be too random.
During training, did the network converge correctly?
I am also surprised at how big your filters are (30x30). Not sure what you are trying to accomplish with that.