When using convolution, one often has recognize big objects in images like elephants or dogs. The first convolutional layers then recognize edges, the last layers focus on more abstract features like legs or body shape.
The task I'm currently working on is different: I have images of the size 512x512 with black/white scans of documents. I don't need to recognize text or some other complex patterns, but rather need to find things like inkblots. The network should just learn if there is anything like that or not.
Usually, I'm using structuring my networks like this:
model = Sequential()
model.add(InputLayer((512, 512, 1))
for size in [32, 64, 128, 256, 512, 512, 512]:
model.add(Dropout(0.1))
model.add(Conv2D(size, (3,3), 'relu'))
model.add(Conv2D(size, (3,3), 'relu'))
model.add(AveragePooling2D())
model.add(Flatten())
model.add(Dense(256, 'relu'))
model.add(Dense(1, 'sigmoid'))
This is the structure I commonly use and tried for this use case, but my results are not overwhelming, which is why I'm questioning my structure. If I only need to recognize the existance of patterns, which can have the size 5x5 or 30x30 tops, should I use a different structure?
Related
My purpose was to create a model to detect rare sound in audios
For example, an audio of 2 hours could contains 2 or 3 of this rare event I can't tell you more about the exact subject so sorry about that (because it's private).
I had to work with several audio file which contained some rare event sound and created my own dataset with it. All audio files are already annotated.
So to achieve this purpose I've done this pipeline:
cutting all audio file into 10 seconds segments and annotated them
compute them into a mel-spectrogram
save them into numpy file
reload them before the model
Then I standardize every "image"
And use a common architecture of CNN:
model = Sequential()
model.add(Conv2D(128, kernel_size=(5, 5),activation='sigmoid',input_shape=inputShape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
Every step should tell if a segment of 10 second contains the rare event sound or not. So basically it's a binarry classification.
Problem: But the model have a very bad recall and accuracy. Nothing that I tried could change this. It's around 60% accuracy and 10% recall on the train data. I will like to improve this.
What I try: I already tried transfer learning with vgg16 model, under-sampling cause it was unbalanced and data augmentation. Also I changed the optimiser, decrease or increase the learning rate and use different loss functions
Any ideas will be much appreciated.
According to me, MFCC features would have been a better approach and using it with a sequence network rather than a convolutional network would have worked much better. For sequences, sequence networks like LSTMs, RNNs, Transformers etc. would work much better.
But, if you still want to go with the above model, you have 5x5 filters which will be less efficient than 2 layers of 3x3 filters. So, reduce these to see what happens with your network. You have then increased and decreased number of filters throughout your network, the number of filters generally follow an increasing sequence as you go deeper. You have used sigmoid activations throughout your network, which is never done. The sigmoid activation should only be on the prediction layer. Rest of them should have ReLU activations. Putting all this together should help your model a lot.
Although your internship would be over and hope you'd be doing better. Working on audio is really a challenging. Challenges may be as under and you should work/update these and then check accuracy of model.
1. Size of dataset: In this case rare sound which occur 2, 3 times in 2 hours audio, would definitely have very less number of samples to train the model. Although you have done augmentation but after augmentation what is the size of dataset of both classes? plz mention.
2. Generating Mel-Spec: Parameters which you used for generating mel-spec are also very important I am giving you sample parameters : n_mels=128, n_fft=512, win_length=400, hop_length=160
3. Rest you can follow previous answer.
I have a dataset with images containing one/two/three/... cards.
Since in total I have 52 different cards, I have 52 classes -> thus I have 52 neurons in my output layer.
Training the network with one card per image works well with CNN.
One label would look like this: [0,0,...,1,0,0] for example.
This is the last layer of my network for this task:
model.add(layers.Dense(52, activation='softmax'))
optimizer = keras.optimizers.Adam(lr=0.00001)
model.compile(loss='categorical_crossentropy',metrics=['accuracy'],optimizer=optimizer)
Training my network for two or more cards per image is more challenging for me.
Since one image contains now more than one card, a possible label for this image would look like: [0,1,0,...,1,0,0].
I would start with the same network architecture, but:
I think for this problem I have to use now sigmoid instead of softmax (since each class is independent) in the last layer.
For the loss I would simply use something like mse = tf.keras.losses.MeanSquaredError()
For the accuracy I am not sure.
model.add(layers.Dense(52, activation='sigmoid'))
adam = keras.optimizers.Adam(lr=0.00001)
model.compile(loss=mse ,metrics=['__?__'],optimizer=adam)
How wrong am I with these settings?
I searched a lot - but confusingly I am not finding some helpful comments. People give always some hints as using YOLO - but I wont detect objects - I only want to classify: In the picture there is a ace of hearts and a king of hearts for example - where they are doesnt matter.
One more confusion: I red several times that CNN can only classify single class problems - is that true? I hope not - but if it is, why and how can I still solve my problem using keras?
Here is the total network:
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5), activation='relu',input_shape=(500, 500, 3)))
model.add(BatchNormalization())
model.add(layers.MaxPooling2D((4, 4)))
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((4, 4)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((3, 3)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(52, activation='softmax'))
I red several times that CNN can only classify single class problems
That's false. With a CNN you can train a binary classification problem, a multiclass problem and also a multilabel problem. Actually a multilabel problem is what you are looking for.
In a multilabel classification problem you could use [0,1,0,...,1,0,0] as a target output. So for one single input sample multiple classes could be true at the same time! The output of a well trained network in this case could be [0.01, 0.99, 0.001, ..., 0.89, 0.001, 0.0001]. So you can use multiple independent binary classifications in one single network.
I will link another very similar question that I answered in more detail. I already addressed the specific metric, activation and loss function which you could use:
multilabel classification
I am new to PyTorch/Deep learning and I am trying to understand the use of the following line to define a convolutional layer:
self.layer1 = nn.Sequential(nn.Conv1d(input_dim, n_conv_filters, kernel_size=7, padding=0), nn.ReLU(), nn.MaxPool1d(3))
I understand that that it is creating a 1d convolutional layer to the network with max pooling 3 wide. However, I don't understand the function of the sequential module or RelU. How do these function in creating a layer?
For reference, the rest of the code can be found here: https://github.com/ArdalanM/nlp-benchmarks/blob/master/src/cnn/net.py
As per the description provided it seems you are in the process of developing a convolutional architecture for a problem (More likely a Computer Vision one as CNNs are usually targeted for solving CV problems).
Now talking about the code by using Sequential module you are telling the PyTorch that you are developing an architecture that will work in a sequential manner and by specifying ReLU you are bringing the concept of Non-Linearity in the picture (ReLU is one of the widely used activation functions in the Deep learning framework). Non-Linearity helps CNNs to generalize to complex decision boundaries and ultimately helps them to perform better.
PS: I recommend reviewing the https://towardsdatascience.com/convolutional-neural-network-for-image-classification-with-implementation-on-python-using-pytorch-7b88342c9ca9 for getting better idea from a coder perspective.
it is common way of creating model, simply using sequential class u are creating linear stacks of layers. You can also use functional API that allow you to create entirely arbitrary architectures.
Sequential model,
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Functional API,
input_tensor = layers.Input(shape=(784,))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs=input_tensor, outputs=output_tensor)
I am currently training a net to play a game with a CNN having the following architecture:
model = Sequential()
model.add(Conv2D(100, kernel_size=(2, 2), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
Now I wish to introduce some complexity in the architecture and make the net deep. How can I tabulate the performance of the CNNs of different complexities and ultimately conclude by giving the best choice for the particular task?
Am i going in the wrong direction? How to decide the depth of a CNN and how does it affect the performance on the same dataset?
Thanks in advance (I am new to this site, kindly excuse the immaturity of this post)
Edit: Information about the dataset I am using: dataset consists of images and each image has 3 possible lables (0, 1, 2) stored in a CSV file with each row corresponding to that particular image.
The simplest thing you can do is generate a few different model architectures, train them on a train set and evaluate them on the test set. Then compare their accuracies and the one with the highest accuracy should in theory be the best performing model.
To make the model deeper you can add extra dense or convolutional layers. For example:
changing this:
model.add(Dense(250, activation='relu'))
to this:
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
will add three extra dense layers. Hence making the network deeper.
You can do the same with duplicating the convolutional layers by duplicating the Conv2D and MaxPooling2D lines.
The alternative to 'trial and error' approach to finding the best architecture and hyperparameters is to use a search approach like explained in this tutorial that user grid search. It will, however, take significantly longer than just trying out a few versions you can up with yourself.
I have written a small neural network for classifying cars and non-cars images. I need help with avoiding over-fitting. The model is shown below:
model = Sequential()
model.add(Conv2D(8, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(16, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
I am using generators:
generator = ImageDataGenerator( featurewise_center=True,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=20.,
width_shift_range=0.4,
height_shift_range=0.4,
shear_range=0.2,
zoom_range=0.2,
channel_shift_range=0.1,
fill_mode='nearest',
horizontal_flip=True,
vertical_flip=False,
rescale=1.2,
preprocessing_function=None)
Ultimately, training acc is 98% whereas valid acc is 70%. Can you suggest something?
I would suggest to try to reduce the size of the layers, as this may be the reason for the overfitting (having too many parameters to train).
For example, this layer model.add(Dense(256)) might be too large. You can try to replace the 256 with something in the range 50-70, see how it works, and continue from there. You may also try to decrease the size\amount of convolutional layers.
So I could see at least two techniques:
Try to increase the dropout.
It might be that your overfit comes from underrepresentation of certain car patterns from your valid set in your training set. You might try to increase the value of train - valid split and check if the loss values are closer to each other.
I would comment, but I am too new to the site to comment. I agree with Miriam, overfitting is simply saying "believing the training data too much". What is happening in a neural net is essentially a function that outputs a classification (since you are doing classification vs regression). It means that you have a line and everything under a line is of a class, and everything above is another. By increasing the nodes/layer and number of layers total, you are allowing your neural net to represent a more complex function. So by adding more layers/nodes you will always get a better score on your training set, but not necessarily on other data. Imagine a bunch of points in a line, but they are not directly on the line. and there are some outliers. maybe the proper function to represent it would be a straight line, but a huge neural network might fit the points perfectly with some crazy complex function. When adding new points, the line would give a better classification since the neural network is trying fitting your training data so closely. If you are overfitting, I would say the first place to look is the complexity of your neural network.