Image Classification - Movie posters by genre. Keras model overfitting - python-3.x

Please note everyone, that this is a "Multi-Label" problem as opposed to Multi-Class. It's a multi-label, multi-class problem. I am trying to classify movie posters by genre by reading the images alone. I have over 22000 images overall. I have one hot encoded the genres for the Keas model to predict multiple genres for movies (Used sigmoid in final dense layer activation). I need recommendation for a proper batch size and number of epochs. I am using VGG16. I would also like to know how to choose which layers to freeze.
I have tried for 25 epochs using VGG16. The model tends to over-fit though. Batch size was 10.
Here's my training images count by genre (Note: A movie could be more than one genre which is more often than not, the case)
{'Action': 2226.0,
'Thriller': 2788.0,
'Drama': 9283.0,
'Romance': 2184.0,
'Horror': 2517.0,
'Sci-Fi': 756.0,
'Mystery': 918.0,
'Adventure': 1105.0,
'Animation': 583.0,
'Crime': 1369.0,
'Comedy': 5524.0,
'Fantasy': 735.0,
'Family': 991.0,
'Music': 319.0,
'History': 359.0,
'War': 177.0,
'Musical': 191.0,
'Biography': 484.0,
'Sport': 190.0}
conv_base = VGG16(weights = 'imagenet', include_top = False, input_shape = (224,224,3))
for layer in conv_base.layers[6:]:
layer.trainable = False
early = EarlyStopping(monitor='acc', patience=7, mode='auto')
classifier = Sequential()
classifier.add(conv_base)
from keras.layers import Dropout
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dropout(0.1))
classifier.add(Dense(units = 19, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ["accuracy"])
classifier.fit(X_train, y_train, epochs=15, batch_size=10, callbacks = [early])

Related

Problem with fine-tuning pretrained models like VGG16, ResNet50 - stack on exactly 50%

My siamese network works with my custom layers and I get +90% accuracy, but when I try to use pretrained models like VGG16 or ResNet50 I always get exactly 50% after each of epoch. I even copied VGG16 archicteture and learned from 0 and had good results. However when I use pretrained model I always get 50%. I guess I have done somehing wrong with reading pretrained model? Beacuse everything its ok with other aspects, as I said it works with my custom netowrks. I also tried some soultions from others posts like changing learning rate etc. but nothing really helps.
Here's my code:
def build_siamese_vgg_model():
inputs = Input((64, 64, 3))
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(64, 64,3))(inputs)
flat = Flatten()(base_model)
outputs = Dense(256)(flat)
model = Model(inputs, outputs)
return model
# Read data
(X_train, Y_train) = functions.read_data("data/train")
(X_val, Y_val) = functions.read_data("data/test")
# Generate pairs
(X_train_pairs, Y_train_pairs) = functions.generate_pairs(X_train, Y_train)
(X_val_pairs, Y_val_pairs) = functions.generate_pairs(X_val, Y_val)
imgA = Input(shape=(64, 64, 3))
imgB = Input(shape=(64, 64, 3))
featureExtractor = build_siamese_vgg_model()
featsA = featureExtractor(imgA)
featsB = featureExtractor(imgB)
distance = Lambda(functions.euclidean_distance)([featsA, featsB])
outputs = Dense(1, activation="sigmoid")(distance)
model_1 = Model(inputs=[imgA, imgB], outputs=distance)
model_1.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(), metrics=["accuracy"])
his = model_1.fit([X_train_pairs[:, 0], X_train_pairs[:, 1]], Y_train_pairs[:],
validation_data=([X_val_pairs[:, 0], X_val_pairs[:, 1]], Y_val_pairs[:]), batch_size=64, epochs=20)

How to add several binary classifiers at the end of a MLP with Keras?

Say I have an MLP that looks like:
model = models.Sequential()
model.add(layers.Dense(200, activation = "relu", input_dim=250))
model.add(layers.Dense(100, activation="relu"))
model.add(layers.Dense(75, activation="relu"))
model.add(layers.Dense(50, activation="relu"))
model.add(layers.Dense(17, activation = "softmax"))
model.compile(optimizer = optimizers.Adam(lr=0.001),
loss = "categorical_crossentropy",
metrics = ['MeanSquaredError', 'AUC' , 'accuracy',tf.keras.metrics.Precision()])
history = model.fit(X_train, y_train, epochs = 100,
validation_data = (X_val, y_val))
Now I want, at the final layer, to add a binary classifier for each of the 17 classes, rather than having the 17 classes output altogether with the softmax; Meaning that the binary classifiers should all ramify from to the last layer. Is this possible to do in Keras? I am guessing it should be a different type of model, instead of Sequential()?
EDIT:
I understood that I can't use the Sequential, and changed the model so that:
from tensorflow.keras import Input
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Dropout
def test_model(layer_in):
dense1 = Dense(200, activation = "relu") (layer_in)
drop1 = Dropout(rate=0.02)(dense1)
dense2 = Dense(100, activation="relu")(drop1)
drop2 = Dropout(rate=0.02)(dense2)
dense3 = Dense(75, activation="relu")(drop2)
drop3 = Dropout(rate=0.02)(dense3)
dense4 = Dense(50, activation="relu")(drop3)
drop4 = Dropout(rate=0.01)(dense4)
out = Dense(17, activation= "softmax")(drop4)
return out
layer_in = Input(shape=(250,))
layer_out = test_model(layer_in)
model = Model(inputs=layer_in, outputs=layer_out)
plot_model(model, show_shapes=True)
So I guess the end goal is to have 17 binary layers at the end with a sigmoid function each, that are all connected to drop4...
In your problem you are trying to use Sequential API to create the Model. There are Limitations to Sequential API, you can just create a layer by layer model. It can't handle multiple inputs/outputs. It can't be used for Branching also.
Below is the text from Keras official website: https://keras.io/guides/functional_api/
The functional API makes it easy to manipulate multiple inputs and outputs. This cannot be handled with the Sequential API.
Also this stack link will be useful for you: Keras' Sequential vs Functional API for Multi-Task Learning Neural Network
Now you can create a Model using Functional API or Model Sub Classing.
In case of functional API Your Model will be
Assuming Output_1 is classification with 17 classes Output_2 is calssification with 2 classes and Output_3 is regression
input_layer=Input(shape=(250))
x=Dense(200, activation = "relu")(input_layer)
x=Dense(100, activation = "relu")(x)
x=Dense(75, activation = "relu")(x)
x=Dense(50, activation = "relu")(x)
output_1=Dense(17, activation = "softmax",name='output_1')(x)
output_2=Dense(3, activation = "softmax",name='output_2')(x)
output_3=Dense(1,name='output_3')(x)
model=Model(inputs=input_layer,outputs=[output_1,output_2,output_3])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss = {'output_1' : tf.keras.losses.CategoricalCrossentropy(),
'output_2' : tf.keras.losses.CategoricalCrossentropy(),
'output_3' : "mse"
},
metrics = {'output_1' :'accuracy',
'output_2': 'accuracy',
'output_3' : tf.keras.metrics.RootMeanSquaredError()
}
)
Update
Below is the code assuming that you have 6 classes You can just extend the same for 17 classes
input_layer=Input(shape=(250))
x=Dense(200, activation = "relu")(input_layer)
x=Dense(100, activation = "relu")(x)
x=Dense(75, activation = "relu")(x)
x=Dense(50, activation = "relu")(x)
output_1=Dense(1,activation='softmax', name='output_1')(x)
output_2=Dense(1,activation='softmax',name='output_2' )(x)
output_3=Dense(1,activation='softmax',name='output_3')(x)
output_4=Dense(1,activation='softmax', name='output_4')(x)
output_5=Dense(1,activation='softmax',name='output_5' )(x)
output_6=Dense(1,activation='softmax',name='output_6')(x)
model=Model(inputs=input_layer,outputs=[output_1,output_2,output_3])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss = {'output_1' : tf.keras.losses.SparseCategoricalCrossentropy(),
'output_2': tf.keras.losses.SparseCategoricalCrossentropy(),
'output_3' : tf.keras.losses.SparseCategoricalCrossentropy(),
'output_4' :tf.keras.losses.SparseCategoricalCrossentropy(),
'output_5' :tf.keras.losses.SparseCategoricalCrossentropy(),
'output_6' :tf.keras.losses.SparseCategoricalCrossentropy()
},
metrics = {'output_1' : 'accuracy',
'output_2': 'accuracy',
'output_3' : 'accuracy',
'output_4' :'accuracy',
'output_5' :'accuracy',
'output_6' :'accuracy'
}
)

How to extract features from an image for training a CNN model

I am working on a project to classify waste as plastics and non plastics using only.images to train them.However i still dont know what features does the model take into account while classifyimg them.I am using CNN,however the accuracy of prediction is still not up to the mark.
The reason why i went to CNN because there is no specific feature to distinguish plastics from others.Is there any other way to approach this problem?
For eg If i train the images of cats,my Neural Network learns what is a cat however i do not explicitly give features is the same case valid here too?
Suppose you want to extract the Features from the Pre-Trained Convolutional Neural Network, VGGNet, VGG16.
Code to reuse the Convolutional Base is:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3)) # This is the Size of your Image
The final feature map has shape (4, 4, 512). That’s the feature on top of which you’ll stick a densely connected classifier.
There are 2 ways to extract Features:
FAST FEATURE EXTRACTION WITHOUT DATA AUGMENTATION: Running the convolutional base over your dataset, recording its output to a
Numpy array on disk, and then using this data as input to a standalone, densely
connected classifier similar to those you saw in part 1 of this book. This solution is fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. But for the same reason, this technique won’t allow you to use data augmentation.
Code for extracting Features using this method is shown below:
import os
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
base_dir = '/Users/fchollet/Downloads/cats_and_dogs_small'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')
datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20
def extract_features(directory, sample_count):
features = np.zeros(shape=(sample_count, 4, 4, 512))
labels = np.zeros(shape=(sample_count))
generator = datagen.flow_from_directory(directory, target_size=(150, 150),
batch_size=batch_size, class_mode='binary')
i=0
for inputs_batch, labels_batch in generator:
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
labels[i * batch_size : (i + 1) * batch_size] = labels_batch
i += 1
if i * batch_size >= sample_count:
break
return features, labels
train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir,1000)
test_features, test_labels = extract_features(test_dir, 1000)
train_features = np.reshape(train_features, (2000, 4*4* 512))
validation_features = np.reshape(validation_features, (1000, 4*4* 512))
test_features = np.reshape(test_features, (1000, 4*4* 512))
from keras import models
from keras import layers
from keras import optimizers
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
loss='binary_crossentropy', metrics=['acc'])
history = model.fit(train_features, train_labels, epochs=30,
batch_size=20, validation_data=(validation_features, validation_labels))
Training is very fast, because you only have to deal with two Dense
layers—an epoch takes less than one second even on CPU
FEATURE EXTRACTION WITH DATA AUGMENTATION: Extending the model you have (conv_base) by adding Dense layers on top, and running the whole thing end to end on the input data. This will allow you to use data augmentation, because every input image goes through the convolutional base every time it’s seen by the model. But for the same reason, this technique is far more expensive than the first
Code for the same is shown below:
from keras import models
from keras import layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=40,
width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,
zoom_range=0.2,horizontal_flip=True,fill_mode='nearest')
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_dir,target_size=(150, 150), batch_size=20, class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=2e-5),
metrics=['acc'])
history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=30, validation_data=validation_generator, validation_steps=50)
For more details, please refer Section 5.3.1 of the book, "Deep Learning with Python", authored by Father of Keras, "Francois Chollet"

GradCam applied to video sequence classification with TimeDistributed CNN and LSTM

after days working on it, I have found any reasonable way of doing, so here I am.
I have a network that aims to predict the next video class, given the features of the current one. Each video is composed by 30 frames. The idea is to apply a feature extraction method to each input, then feed into an LSTM + Dense layer to make prediction.
Here the code:
video = Input(shape=(30,299,299,3))
inc = InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
cnn_out = GlobalAveragePooling2D()(inc.output)
cnn = Model(inputs=inc.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(128, activation='relu', return_sequences=False, kernel_initializer=he_uniform(), bias_initializer='zeros', dropout=0.5)(encoded_frames)
hidden_layer = Dense(1024, activation='relu', kernel_initializer=he_uniform(), bias_initializer='zeros')(encoded_sequence)
outputs = Dense(4, activation="softmax", kernel_initializer=glorot_normal(), bias_initializer='zeros')(hidden_layer)
model = Model(inputs=[video], outputs=outputs)
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
I would like to visualize the feature activations at the CNN stage for each image. So if I look at the saliency map for each input image I can understand which features are more importante than others to make this kind of prediction.
All the examples on internet are facing with just one CNN and one input image, is there any way of doing this?
Any help is really appreciated, thanks!

How to deal with increasing loss and low validation accuracy in training with VGGNet?

I am trying to create a model to predict the art style of painting. To do so I am using the dataset that Kaggle provides for their competition named Painter by Numbers. Though there are 137 art styles in the dataset I am using only three of them. Those three styles are namely - Impressionism, Expressionism, and Surrealism. I have taken 3000 images from each class to train the model. Moreover, I have used 300 images from each class totaling 900 images to validate the training.
I have planned to use pre-trained VGGNet as the bottom layer of my model. I have trained the model on Google Colab. Now the issue is, as the model started to learn the loss is ever increasing and validation accuracy is near .33 which is not pleasant. Random guessing will also give this accuracy.
I created a model with a base layer of pre-trained VGGNet. I added some fully connected layers with 1024 neurons in the first two layers, 512 neurons in the third layer and 3 neurons in the last layer. Optimizer I used was SGD with a learning rate of 0.01, decay 1e-6, momentum 0.9. My loss function is "categorical_crossentropy". Moreover, the input image shape was (100,100,3).
For training, I declared samples per epoch as 100. The number of the epoch was 30. Below I have provided all the codes.
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
input = Input(shape=(100,100,3), name='image_input')
output_vgg16_conv = model_vgg16_conv(input)
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense( 1024, activation='relu', name='fc1')(x)
x = Dense( 1024, activation='relu', name='fc2')(x)
x = Dense( 512, activation='relu', name='fc3')(x)
x = Dense( 3, activation='softmax', name='predictions')(x)
my_model = Model(input=input, output=x)
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
my_model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.1, zoom_range=0.2, horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory(train_root, target_size=(100,100), batch_size=32, class_mode='categorical')
test_set = test_datagen.flow_from_directory(test_root, target_size=(100,100), batch_size=32, class_mode='categorical')
my_model.fit_generator(training_set, samples_per_epoch=100, nb_epoch=30, validation_data=test_set, nb_val_samples=300)
This generates low validation accuracy and ever-increasing loss value. Even the loss value is raised to 10. And moreover, this produces a low validation accuracy. What to do to get the situation better?

Resources