I am new to PyTorch/Deep learning and I am trying to understand the use of the following line to define a convolutional layer:
self.layer1 = nn.Sequential(nn.Conv1d(input_dim, n_conv_filters, kernel_size=7, padding=0), nn.ReLU(), nn.MaxPool1d(3))
I understand that that it is creating a 1d convolutional layer to the network with max pooling 3 wide. However, I don't understand the function of the sequential module or RelU. How do these function in creating a layer?
For reference, the rest of the code can be found here: https://github.com/ArdalanM/nlp-benchmarks/blob/master/src/cnn/net.py
As per the description provided it seems you are in the process of developing a convolutional architecture for a problem (More likely a Computer Vision one as CNNs are usually targeted for solving CV problems).
Now talking about the code by using Sequential module you are telling the PyTorch that you are developing an architecture that will work in a sequential manner and by specifying ReLU you are bringing the concept of Non-Linearity in the picture (ReLU is one of the widely used activation functions in the Deep learning framework). Non-Linearity helps CNNs to generalize to complex decision boundaries and ultimately helps them to perform better.
PS: I recommend reviewing the https://towardsdatascience.com/convolutional-neural-network-for-image-classification-with-implementation-on-python-using-pytorch-7b88342c9ca9 for getting better idea from a coder perspective.
it is common way of creating model, simply using sequential class u are creating linear stacks of layers. You can also use functional API that allow you to create entirely arbitrary architectures.
Sequential model,
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Functional API,
input_tensor = layers.Input(shape=(784,))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs=input_tensor, outputs=output_tensor)
Related
I have a dataset with images containing one/two/three/... cards.
Since in total I have 52 different cards, I have 52 classes -> thus I have 52 neurons in my output layer.
Training the network with one card per image works well with CNN.
One label would look like this: [0,0,...,1,0,0] for example.
This is the last layer of my network for this task:
model.add(layers.Dense(52, activation='softmax'))
optimizer = keras.optimizers.Adam(lr=0.00001)
model.compile(loss='categorical_crossentropy',metrics=['accuracy'],optimizer=optimizer)
Training my network for two or more cards per image is more challenging for me.
Since one image contains now more than one card, a possible label for this image would look like: [0,1,0,...,1,0,0].
I would start with the same network architecture, but:
I think for this problem I have to use now sigmoid instead of softmax (since each class is independent) in the last layer.
For the loss I would simply use something like mse = tf.keras.losses.MeanSquaredError()
For the accuracy I am not sure.
model.add(layers.Dense(52, activation='sigmoid'))
adam = keras.optimizers.Adam(lr=0.00001)
model.compile(loss=mse ,metrics=['__?__'],optimizer=adam)
How wrong am I with these settings?
I searched a lot - but confusingly I am not finding some helpful comments. People give always some hints as using YOLO - but I wont detect objects - I only want to classify: In the picture there is a ace of hearts and a king of hearts for example - where they are doesnt matter.
One more confusion: I red several times that CNN can only classify single class problems - is that true? I hope not - but if it is, why and how can I still solve my problem using keras?
Here is the total network:
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5), activation='relu',input_shape=(500, 500, 3)))
model.add(BatchNormalization())
model.add(layers.MaxPooling2D((4, 4)))
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((4, 4)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((3, 3)))
model.add(BatchNormalization())
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(52, activation='softmax'))
I red several times that CNN can only classify single class problems
That's false. With a CNN you can train a binary classification problem, a multiclass problem and also a multilabel problem. Actually a multilabel problem is what you are looking for.
In a multilabel classification problem you could use [0,1,0,...,1,0,0] as a target output. So for one single input sample multiple classes could be true at the same time! The output of a well trained network in this case could be [0.01, 0.99, 0.001, ..., 0.89, 0.001, 0.0001]. So you can use multiple independent binary classifications in one single network.
I will link another very similar question that I answered in more detail. I already addressed the specific metric, activation and loss function which you could use:
multilabel classification
I have trained the following CNN model with a smaller data set, therefore it does overfitting:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])
The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?
Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten
or with an additional MaxPooling and Dropout like follows?
Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling
- Dropout - Flatten
I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.
Overfitting can happen when your dataset is not large enough to accomodate your number of features.
Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them.
Therefore, max-pooling should logically reduce overfit.
Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint.
Therefore, this also should help reduce overfit.
You Should NOT Use Max-pooling in order to reduce overfitting, although it has a small effect on that, BUT this small effect is not enough because you are applying Max-Pooling after the convolutional operations, which means that the features are already trained in this layer and since max-pooling is used to reduce the hight and width of the output, this will make the features in the next layer has less convolutional operations to learn from, which means a LITTLE EFFECT on the overfitting problem, that won't solve it.
Actually it's not recommended at all using Pooling for this kind of problems, and here are some tips:
Reduce the number of your parameters because it's very hard(not impossible) to find enough data to train 3 millions parameters without overfitting.
Use regularization techniques like Drop-out which is very effective by the way, or L2-regularization,..etc.
3.DONT use max pooling for the purpose of reducing overfitting because it's is used to reduce the rapresentation and to make the
network a bit more robust to some features, further more using it so
much will make the network more and more robust to a some kind of
featuers.
Hope that helps!
I am currently training a net to play a game with a CNN having the following architecture:
model = Sequential()
model.add(Conv2D(100, kernel_size=(2, 2), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
Now I wish to introduce some complexity in the architecture and make the net deep. How can I tabulate the performance of the CNNs of different complexities and ultimately conclude by giving the best choice for the particular task?
Am i going in the wrong direction? How to decide the depth of a CNN and how does it affect the performance on the same dataset?
Thanks in advance (I am new to this site, kindly excuse the immaturity of this post)
Edit: Information about the dataset I am using: dataset consists of images and each image has 3 possible lables (0, 1, 2) stored in a CSV file with each row corresponding to that particular image.
The simplest thing you can do is generate a few different model architectures, train them on a train set and evaluate them on the test set. Then compare their accuracies and the one with the highest accuracy should in theory be the best performing model.
To make the model deeper you can add extra dense or convolutional layers. For example:
changing this:
model.add(Dense(250, activation='relu'))
to this:
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
will add three extra dense layers. Hence making the network deeper.
You can do the same with duplicating the convolutional layers by duplicating the Conv2D and MaxPooling2D lines.
The alternative to 'trial and error' approach to finding the best architecture and hyperparameters is to use a search approach like explained in this tutorial that user grid search. It will, however, take significantly longer than just trying out a few versions you can up with yourself.
When using convolution, one often has recognize big objects in images like elephants or dogs. The first convolutional layers then recognize edges, the last layers focus on more abstract features like legs or body shape.
The task I'm currently working on is different: I have images of the size 512x512 with black/white scans of documents. I don't need to recognize text or some other complex patterns, but rather need to find things like inkblots. The network should just learn if there is anything like that or not.
Usually, I'm using structuring my networks like this:
model = Sequential()
model.add(InputLayer((512, 512, 1))
for size in [32, 64, 128, 256, 512, 512, 512]:
model.add(Dropout(0.1))
model.add(Conv2D(size, (3,3), 'relu'))
model.add(Conv2D(size, (3,3), 'relu'))
model.add(AveragePooling2D())
model.add(Flatten())
model.add(Dense(256, 'relu'))
model.add(Dense(1, 'sigmoid'))
This is the structure I commonly use and tried for this use case, but my results are not overwhelming, which is why I'm questioning my structure. If I only need to recognize the existance of patterns, which can have the size 5x5 or 30x30 tops, should I use a different structure?
I have written a small neural network for classifying cars and non-cars images. I need help with avoiding over-fitting. The model is shown below:
model = Sequential()
model.add(Conv2D(8, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(16, 3, 3, input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
I am using generators:
generator = ImageDataGenerator( featurewise_center=True,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=20.,
width_shift_range=0.4,
height_shift_range=0.4,
shear_range=0.2,
zoom_range=0.2,
channel_shift_range=0.1,
fill_mode='nearest',
horizontal_flip=True,
vertical_flip=False,
rescale=1.2,
preprocessing_function=None)
Ultimately, training acc is 98% whereas valid acc is 70%. Can you suggest something?
I would suggest to try to reduce the size of the layers, as this may be the reason for the overfitting (having too many parameters to train).
For example, this layer model.add(Dense(256)) might be too large. You can try to replace the 256 with something in the range 50-70, see how it works, and continue from there. You may also try to decrease the size\amount of convolutional layers.
So I could see at least two techniques:
Try to increase the dropout.
It might be that your overfit comes from underrepresentation of certain car patterns from your valid set in your training set. You might try to increase the value of train - valid split and check if the loss values are closer to each other.
I would comment, but I am too new to the site to comment. I agree with Miriam, overfitting is simply saying "believing the training data too much". What is happening in a neural net is essentially a function that outputs a classification (since you are doing classification vs regression). It means that you have a line and everything under a line is of a class, and everything above is another. By increasing the nodes/layer and number of layers total, you are allowing your neural net to represent a more complex function. So by adding more layers/nodes you will always get a better score on your training set, but not necessarily on other data. Imagine a bunch of points in a line, but they are not directly on the line. and there are some outliers. maybe the proper function to represent it would be a straight line, but a huge neural network might fit the points perfectly with some crazy complex function. When adding new points, the line would give a better classification since the neural network is trying fitting your training data so closely. If you are overfitting, I would say the first place to look is the complexity of your neural network.