How do we compare the performance of different ConvNets? - python-3.x

I am currently training a net to play a game with a CNN having the following architecture:
model = Sequential()
model.add(Conv2D(100, kernel_size=(2, 2), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
Now I wish to introduce some complexity in the architecture and make the net deep. How can I tabulate the performance of the CNNs of different complexities and ultimately conclude by giving the best choice for the particular task?
Am i going in the wrong direction? How to decide the depth of a CNN and how does it affect the performance on the same dataset?
Thanks in advance (I am new to this site, kindly excuse the immaturity of this post)
Edit: Information about the dataset I am using: dataset consists of images and each image has 3 possible lables (0, 1, 2) stored in a CSV file with each row corresponding to that particular image.

The simplest thing you can do is generate a few different model architectures, train them on a train set and evaluate them on the test set. Then compare their accuracies and the one with the highest accuracy should in theory be the best performing model.
To make the model deeper you can add extra dense or convolutional layers. For example:
changing this:
model.add(Dense(250, activation='relu'))
to this:
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
will add three extra dense layers. Hence making the network deeper.
You can do the same with duplicating the convolutional layers by duplicating the Conv2D and MaxPooling2D lines.
The alternative to 'trial and error' approach to finding the best architecture and hyperparameters is to use a search approach like explained in this tutorial that user grid search. It will, however, take significantly longer than just trying out a few versions you can up with yourself.

Related

Convolutionnal neural network for rare event using mel spectrogram

My purpose was to create a model to detect rare sound in audios
For example, an audio of 2 hours could contains 2 or 3 of this rare event I can't tell you more about the exact subject so sorry about that (because it's private).
I had to work with several audio file which contained some rare event sound and created my own dataset with it. All audio files are already annotated.
So to achieve this purpose I've done this pipeline:
cutting all audio file into 10 seconds segments and annotated them
compute them into a mel-spectrogram
save them into numpy file
reload them before the model
Then I standardize every "image"
And use a common architecture of CNN:
model = Sequential()
model.add(Conv2D(128, kernel_size=(5, 5),activation='sigmoid',input_shape=inputShape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
Every step should tell if a segment of 10 second contains the rare event sound or not. So basically it's a binarry classification.
Problem: But the model have a very bad recall and accuracy. Nothing that I tried could change this. It's around 60% accuracy and 10% recall on the train data. I will like to improve this.
What I try: I already tried transfer learning with vgg16 model, under-sampling cause it was unbalanced and data augmentation. Also I changed the optimiser, decrease or increase the learning rate and use different loss functions
Any ideas will be much appreciated.
According to me, MFCC features would have been a better approach and using it with a sequence network rather than a convolutional network would have worked much better. For sequences, sequence networks like LSTMs, RNNs, Transformers etc. would work much better.
But, if you still want to go with the above model, you have 5x5 filters which will be less efficient than 2 layers of 3x3 filters. So, reduce these to see what happens with your network. You have then increased and decreased number of filters throughout your network, the number of filters generally follow an increasing sequence as you go deeper. You have used sigmoid activations throughout your network, which is never done. The sigmoid activation should only be on the prediction layer. Rest of them should have ReLU activations. Putting all this together should help your model a lot.
Although your internship would be over and hope you'd be doing better. Working on audio is really a challenging. Challenges may be as under and you should work/update these and then check accuracy of model.
1. Size of dataset: In this case rare sound which occur 2, 3 times in 2 hours audio, would definitely have very less number of samples to train the model. Although you have done augmentation but after augmentation what is the size of dataset of both classes? plz mention.
2. Generating Mel-Spec: Parameters which you used for generating mel-spec are also very important I am giving you sample parameters : n_mels=128, n_fft=512, win_length=400, hop_length=160
3. Rest you can follow previous answer.

Multioutput Multiclassification problem with Keras

I have a dataset with images containing one/two/three/... cards.
Since in total I have 52 different cards, I have 52 classes -> thus I have 52 neurons in my output layer.
Training the network with one card per image works well with CNN.
One label would look like this: [0,0,...,1,0,0] for example.
This is the last layer of my network for this task:
model.add(layers.Dense(52, activation='softmax'))
optimizer = keras.optimizers.Adam(lr=0.00001)
model.compile(loss='categorical_crossentropy',metrics=['accuracy'],optimizer=optimizer)
Training my network for two or more cards per image is more challenging for me.
Since one image contains now more than one card, a possible label for this image would look like: [0,1,0,...,1,0,0].
I would start with the same network architecture, but:
I think for this problem I have to use now sigmoid instead of softmax (since each class is independent) in the last layer.
For the loss I would simply use something like mse = tf.keras.losses.MeanSquaredError()
For the accuracy I am not sure.
model.add(layers.Dense(52, activation='sigmoid'))
adam = keras.optimizers.Adam(lr=0.00001)
model.compile(loss=mse ,metrics=['__?__'],optimizer=adam)
How wrong am I with these settings?
I searched a lot - but confusingly I am not finding some helpful comments. People give always some hints as using YOLO - but I wont detect objects - I only want to classify: In the picture there is a ace of hearts and a king of hearts for example - where they are doesnt matter.
One more confusion: I red several times that CNN can only classify single class problems - is that true? I hope not - but if it is, why and how can I still solve my problem using keras?
Here is the total network:
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5), activation='relu',input_shape=(500, 500, 3)))
model.add(BatchNormalization())
model.add(layers.MaxPooling2D((4, 4)))
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((4, 4)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((3, 3)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(52, activation='softmax'))
I red several times that CNN can only classify single class problems
That's false. With a CNN you can train a binary classification problem, a multiclass problem and also a multilabel problem. Actually a multilabel problem is what you are looking for.
In a multilabel classification problem you could use [0,1,0,...,1,0,0] as a target output. So for one single input sample multiple classes could be true at the same time! The output of a well trained network in this case could be [0.01, 0.99, 0.001, ..., 0.89, 0.001, 0.0001]. So you can use multiple independent binary classifications in one single network.
I will link another very similar question that I answered in more detail. I already addressed the specific metric, activation and loss function which you could use:
multilabel classification

Understanding nn.Sequential in convolutional layers

I am new to PyTorch/Deep learning and I am trying to understand the use of the following line to define a convolutional layer:
self.layer1 = nn.Sequential(nn.Conv1d(input_dim, n_conv_filters, kernel_size=7, padding=0), nn.ReLU(), nn.MaxPool1d(3))
I understand that that it is creating a 1d convolutional layer to the network with max pooling 3 wide. However, I don't understand the function of the sequential module or RelU. How do these function in creating a layer?
For reference, the rest of the code can be found here: https://github.com/ArdalanM/nlp-benchmarks/blob/master/src/cnn/net.py
As per the description provided it seems you are in the process of developing a convolutional architecture for a problem (More likely a Computer Vision one as CNNs are usually targeted for solving CV problems).
Now talking about the code by using Sequential module you are telling the PyTorch that you are developing an architecture that will work in a sequential manner and by specifying ReLU you are bringing the concept of Non-Linearity in the picture (ReLU is one of the widely used activation functions in the Deep learning framework). Non-Linearity helps CNNs to generalize to complex decision boundaries and ultimately helps them to perform better.
PS: I recommend reviewing the https://towardsdatascience.com/convolutional-neural-network-for-image-classification-with-implementation-on-python-using-pytorch-7b88342c9ca9 for getting better idea from a coder perspective.
it is common way of creating model, simply using sequential class u are creating linear stacks of layers. You can also use functional API that allow you to create entirely arbitrary architectures.
Sequential model,
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Functional API,
input_tensor = layers.Input(shape=(784,))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs=input_tensor, outputs=output_tensor)

Does MaxPooling reduce overfitting?

I have trained the following CNN model with a smaller data set, therefore it does overfitting:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])
The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?
Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten
or with an additional MaxPooling and Dropout like follows?
Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling
- Dropout - Flatten
I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.
Overfitting can happen when your dataset is not large enough to accomodate your number of features.
Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them.
Therefore, max-pooling should logically reduce overfit.
Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint.
Therefore, this also should help reduce overfit.
You Should NOT Use Max-pooling in order to reduce overfitting, although it has a small effect on that, BUT this small effect is not enough because you are applying Max-Pooling after the convolutional operations, which means that the features are already trained in this layer and since max-pooling is used to reduce the hight and width of the output, this will make the features in the next layer has less convolutional operations to learn from, which means a LITTLE EFFECT on the overfitting problem, that won't solve it.
Actually it's not recommended at all using Pooling for this kind of problems, and here are some tips:
Reduce the number of your parameters because it's very hard(not impossible) to find enough data to train 3 millions parameters without overfitting.
Use regularization techniques like Drop-out which is very effective by the way, or L2-regularization,..etc.
3.DONT use max pooling for the purpose of reducing overfitting because it's is used to reduce the rapresentation and to make the
network a bit more robust to some features, further more using it so
much will make the network more and more robust to a some kind of
featuers.
Hope that helps!

Reduce over-fitting in neural network

I have written a small neural network for classifying cars and non-cars images. I need help with avoiding over-fitting. The model is shown below:
model = Sequential()
model.add(Conv2D(8, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(16, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
I am using generators:
generator = ImageDataGenerator( featurewise_center=True,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=20.,
width_shift_range=0.4,
height_shift_range=0.4,
shear_range=0.2,
zoom_range=0.2,
channel_shift_range=0.1,
fill_mode='nearest',
horizontal_flip=True,
vertical_flip=False,
rescale=1.2,
preprocessing_function=None)
Ultimately, training acc is 98% whereas valid acc is 70%. Can you suggest something?
I would suggest to try to reduce the size of the layers, as this may be the reason for the overfitting (having too many parameters to train).
For example, this layer model.add(Dense(256)) might be too large. You can try to replace the 256 with something in the range 50-70, see how it works, and continue from there. You may also try to decrease the size\amount of convolutional layers.
So I could see at least two techniques:
Try to increase the dropout.
It might be that your overfit comes from underrepresentation of certain car patterns from your valid set in your training set. You might try to increase the value of train - valid split and check if the loss values are closer to each other.
I would comment, but I am too new to the site to comment. I agree with Miriam, overfitting is simply saying "believing the training data too much". What is happening in a neural net is essentially a function that outputs a classification (since you are doing classification vs regression). It means that you have a line and everything under a line is of a class, and everything above is another. By increasing the nodes/layer and number of layers total, you are allowing your neural net to represent a more complex function. So by adding more layers/nodes you will always get a better score on your training set, but not necessarily on other data. Imagine a bunch of points in a line, but they are not directly on the line. and there are some outliers. maybe the proper function to represent it would be a straight line, but a huge neural network might fit the points perfectly with some crazy complex function. When adding new points, the line would give a better classification since the neural network is trying fitting your training data so closely. If you are overfitting, I would say the first place to look is the complexity of your neural network.

Resources