Transfer Learning on Resnet50 doesn't go beyond 40% - keras

I'm using Kaggle - Animal-10 dataset for experimenting transfer learning with FastAI and Keras.
Base model is Resnet-50.
With FastAI I'm able to get accuracy of 95% in 3 epochs.
learn.fine_tune(3, base_lr=1e-2, cbs=[ShowGraphCallback()])
I believe it only trains the top layers.
With Keras
If I train the complete Resnet then only I'm able to achieve accuracy of 96%
If I use the below code for transfer learning, then at max I'm able to reach 40%
num_classes = 10
#number of layers to retrain from previous model
fine_tune = 33 #conv5 block
model = Sequential()
base_layer = ResNet50(include_top=False, pooling='avg', weights="imagenet")
# base_layer.trainable = False
#make only last few layers trainable, except them make all false
for layer in base_layer.layers[:-fine_tune]:
layer.trainable = False
model.add(base_layer)
model.add(layers.Flatten())
# model.add(layers.BatchNormalization())
# model.add(layers.Dense(2048, activation='relu'))
# model.add(layers.Dropout(rate=0.2))
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dense(num_classes, activation='softmax'))
I assume the cause Transfer learning with Keras, validation accuracy does not improve from outset (beyond naive baseline) while train accuracy improves
and thats the reason that now I'm re-training complete block5 of Resnet and still it doesn't add any value.

Related

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

vgg16 model not converging

I tried to train keras's VGG16 on my dataset of orca calls with 8 classes. Data contains 7 classes for orca calls and 1 class for negative. Data contains spectrogram of orca calls and their corresponding labels.
Even after 50 epocs of training, loss and accuracy is not changing at all. I have tried feature scaling and various learning rate but model is not learning anything at all. I have checked the data manually, it is fine.
code:
model = keras.applications.vgg16.VGG16(include_top=True, weights=None, input_tensor=None, input_shape=(320, 480, 3), pooling='max', classes=8)
opt = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=1e-6, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x=X_train, y= Y_train, batch_size=16, epochs=2000, shuffle=True,validation_split=0.2)
I have read some research papers and I have done some contrasting and other type of preprocessing on images according to those papers. Upper half of images is turned to white as the frequencies I am interested in lies in second half. I chose cmap ='gray', as the frequencies popped out more in it. Should I go for some another cmap?
Some Images from my dataset:
Kaggle kernal: https://www.kaggle.com/sainimohit23/myorcanb.

Emotion detection on text

I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.

Retraining the Inception V3 Model for Machine Learning

I'm doing image classification with two classes using the Inception V3 model. Since I'm using two new classes(Normal and Abnormal) I'm freezing the top layers of the Inception V3 model and replacing it with my own.
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
#Classifier Model ontop of Convolutional Model
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(400,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
Is freezing the convolutional layers this way in Inception V3 necessary for training?
#freeze the convolutional layers of InceptionV3
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])
No, it is not necessary to freeze the first layers of a CNN; you can just initialize the weights from a pre-trained model. However, in most cases it is recommended to freeze them as the features they can extract are generic enough to help in any image-related task and doing so can speed up the training process.
That being said, you should experiment a bit with the number of layers you want to freeze. Allowing the latter layers of your base_model to fine-tune on your task could improve performance. You can think of it like a hyper-parameter of your model. Say you want to freeze only the first 30 layers:
for layer in model.layers[:30]:
layer.trainable = False

Resources