Does MaxPooling reduce overfitting? - conv-neural-network

I have trained the following CNN model with a smaller data set, therefore it does overfitting:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])
The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?
Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten
or with an additional MaxPooling and Dropout like follows?
Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling
- Dropout - Flatten
I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.

Overfitting can happen when your dataset is not large enough to accomodate your number of features.
Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them.
Therefore, max-pooling should logically reduce overfit.
Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint.
Therefore, this also should help reduce overfit.

You Should NOT Use Max-pooling in order to reduce overfitting, although it has a small effect on that, BUT this small effect is not enough because you are applying Max-Pooling after the convolutional operations, which means that the features are already trained in this layer and since max-pooling is used to reduce the hight and width of the output, this will make the features in the next layer has less convolutional operations to learn from, which means a LITTLE EFFECT on the overfitting problem, that won't solve it.
Actually it's not recommended at all using Pooling for this kind of problems, and here are some tips:
Reduce the number of your parameters because it's very hard(not impossible) to find enough data to train 3 millions parameters without overfitting.
Use regularization techniques like Drop-out which is very effective by the way, or L2-regularization,..etc.
3.DONT use max pooling for the purpose of reducing overfitting because it's is used to reduce the rapresentation and to make the
network a bit more robust to some features, further more using it so
much will make the network more and more robust to a some kind of
featuers.
Hope that helps!

Related

Convolutionnal neural network for rare event using mel spectrogram

My purpose was to create a model to detect rare sound in audios
For example, an audio of 2 hours could contains 2 or 3 of this rare event I can't tell you more about the exact subject so sorry about that (because it's private).
I had to work with several audio file which contained some rare event sound and created my own dataset with it. All audio files are already annotated.
So to achieve this purpose I've done this pipeline:
cutting all audio file into 10 seconds segments and annotated them
compute them into a mel-spectrogram
save them into numpy file
reload them before the model
Then I standardize every "image"
And use a common architecture of CNN:
model = Sequential()
model.add(Conv2D(128, kernel_size=(5, 5),activation='sigmoid',input_shape=inputShape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
Every step should tell if a segment of 10 second contains the rare event sound or not. So basically it's a binarry classification.
Problem: But the model have a very bad recall and accuracy. Nothing that I tried could change this. It's around 60% accuracy and 10% recall on the train data. I will like to improve this.
What I try: I already tried transfer learning with vgg16 model, under-sampling cause it was unbalanced and data augmentation. Also I changed the optimiser, decrease or increase the learning rate and use different loss functions
Any ideas will be much appreciated.
According to me, MFCC features would have been a better approach and using it with a sequence network rather than a convolutional network would have worked much better. For sequences, sequence networks like LSTMs, RNNs, Transformers etc. would work much better.
But, if you still want to go with the above model, you have 5x5 filters which will be less efficient than 2 layers of 3x3 filters. So, reduce these to see what happens with your network. You have then increased and decreased number of filters throughout your network, the number of filters generally follow an increasing sequence as you go deeper. You have used sigmoid activations throughout your network, which is never done. The sigmoid activation should only be on the prediction layer. Rest of them should have ReLU activations. Putting all this together should help your model a lot.
Although your internship would be over and hope you'd be doing better. Working on audio is really a challenging. Challenges may be as under and you should work/update these and then check accuracy of model.
1. Size of dataset: In this case rare sound which occur 2, 3 times in 2 hours audio, would definitely have very less number of samples to train the model. Although you have done augmentation but after augmentation what is the size of dataset of both classes? plz mention.
2. Generating Mel-Spec: Parameters which you used for generating mel-spec are also very important I am giving you sample parameters : n_mels=128, n_fft=512, win_length=400, hop_length=160
3. Rest you can follow previous answer.

Batch Normalization when CNN with only 2 ConvLayer?

I wonder if it is a problem to use BatchNormalization when there are only 2 convolutional layers in a CNN.
Can this have adverse effects on classification performance? Now I don't mean the training time, but really the accuracy? Is my network overloaded with unneccessary layers? I want to train the network with a small data set.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding = 'same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compilke(optimizer="Adam", loss='categorical_crossentropy, metrics =['accuracy'])
Many thanks.
Don’t Use With Dropout
Batch normalization offers some regularization effect, reducing generalization error, perhaps no longer requiring the use of dropout for regularization.
Removing Dropout from Modified BN-Inception speeds up training, without increasing overfitting.
— Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.
Further, it may not be a good idea to use batch normalization and dropout in the same network.
The reason is that the statistics used to normalize the activations of the prior layer may become noisy given the random dropping out of nodes during the dropout procedure.
Batch normalization also sometimes reduces generalization error and allows dropout to be omitted, due to the noise in the estimate of the statistics used to normalize each variable.
— Page 425, Deep Learning, 2016.
Source - machinelearningmastery.com - batch normalization

Dropout & batch normalization - does the ordering of layers matter?

I was building a neural network model and my question is that by any chance the ordering of the dropout and batch normalization layers actually affect the model?
Will putting the dropout layer before batch-normalization layer (or vice-versa) actually make any difference to the output of the model if I am using ROC-AUC score as my metric of measurement.
I expect the output to have a large (ROC-AUC) score and want to know that will it be affected in any way by the ordering of the layers.
The order of the layers effects the convergence of your model and hence your results. Based on the Batch Normalization paper, the author suggests that the Batch Normalization should be implemented before the activation function. Since Dropout is applied after computing the activations. Then the right order of layers are:
Dense or Conv
Batch Normalization
Activation
Droptout.
In code using keras, here is how you write it sequentially:
model = Sequential()
model.add(Dense(n_neurons, input_shape=your_input_shape, use_bias=False)) # it is important to disable bias when using Batch Normalization
model.add(BatchNormalization())
model.add(Activation('relu')) # for example
model.add(Dropout(rate=0.25))
Batch Normalization helps to avoid Vanishing/Exploding Gradients when training your model. Therefore, it is specially important if you have many layers. You can read the provided paper for more details.

How do we compare the performance of different ConvNets?

I am currently training a net to play a game with a CNN having the following architecture:
model = Sequential()
model.add(Conv2D(100, kernel_size=(2, 2), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
Now I wish to introduce some complexity in the architecture and make the net deep. How can I tabulate the performance of the CNNs of different complexities and ultimately conclude by giving the best choice for the particular task?
Am i going in the wrong direction? How to decide the depth of a CNN and how does it affect the performance on the same dataset?
Thanks in advance (I am new to this site, kindly excuse the immaturity of this post)
Edit: Information about the dataset I am using: dataset consists of images and each image has 3 possible lables (0, 1, 2) stored in a CSV file with each row corresponding to that particular image.
The simplest thing you can do is generate a few different model architectures, train them on a train set and evaluate them on the test set. Then compare their accuracies and the one with the highest accuracy should in theory be the best performing model.
To make the model deeper you can add extra dense or convolutional layers. For example:
changing this:
model.add(Dense(250, activation='relu'))
to this:
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
will add three extra dense layers. Hence making the network deeper.
You can do the same with duplicating the convolutional layers by duplicating the Conv2D and MaxPooling2D lines.
The alternative to 'trial and error' approach to finding the best architecture and hyperparameters is to use a search approach like explained in this tutorial that user grid search. It will, however, take significantly longer than just trying out a few versions you can up with yourself.

Reduce over-fitting in neural network

I have written a small neural network for classifying cars and non-cars images. I need help with avoiding over-fitting. The model is shown below:
model = Sequential()
model.add(Conv2D(8, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(16, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
I am using generators:
generator = ImageDataGenerator( featurewise_center=True,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=20.,
width_shift_range=0.4,
height_shift_range=0.4,
shear_range=0.2,
zoom_range=0.2,
channel_shift_range=0.1,
fill_mode='nearest',
horizontal_flip=True,
vertical_flip=False,
rescale=1.2,
preprocessing_function=None)
Ultimately, training acc is 98% whereas valid acc is 70%. Can you suggest something?
I would suggest to try to reduce the size of the layers, as this may be the reason for the overfitting (having too many parameters to train).
For example, this layer model.add(Dense(256)) might be too large. You can try to replace the 256 with something in the range 50-70, see how it works, and continue from there. You may also try to decrease the size\amount of convolutional layers.
So I could see at least two techniques:
Try to increase the dropout.
It might be that your overfit comes from underrepresentation of certain car patterns from your valid set in your training set. You might try to increase the value of train - valid split and check if the loss values are closer to each other.
I would comment, but I am too new to the site to comment. I agree with Miriam, overfitting is simply saying "believing the training data too much". What is happening in a neural net is essentially a function that outputs a classification (since you are doing classification vs regression). It means that you have a line and everything under a line is of a class, and everything above is another. By increasing the nodes/layer and number of layers total, you are allowing your neural net to represent a more complex function. So by adding more layers/nodes you will always get a better score on your training set, but not necessarily on other data. Imagine a bunch of points in a line, but they are not directly on the line. and there are some outliers. maybe the proper function to represent it would be a straight line, but a huge neural network might fit the points perfectly with some crazy complex function. When adding new points, the line would give a better classification since the neural network is trying fitting your training data so closely. If you are overfitting, I would say the first place to look is the complexity of your neural network.

Resources