InceptionV3 transfer learning with Keras overfitting too soon - keras

I'm using a pre trained InceptionV3 on Keras to retrain the model to make a binary image classification (data labeled with 0's and 1's).
I'm reaching about 65% of accuracy on my k-fold validation with never seen data, but the problem is the model is overfitting to soon. I need to improve this average accuracy, and I guess there is something related to this overfitting problem.
Here are the loss values on epochs:
Here is the code. The dataset and label variables are Numpy Arrays.
dataset = joblib.load(path_to_dataset)
labels = joblib.load(path_to_labels)
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, 2)
X_train, X_test, y_train, y_test = sk.train_test_split(dataset, labels, test_size=0.2)
X_train, X_val, y_train, y_val = sk.train_test_split(X_train, y_train, test_size=0.25) # 0.25 x 0.8 = 0.2
X_train = np.array(X_train)
y_train = np.array(y_train)
X_val = np.array(X_val)
y_val = np.array(y_val)
X_test = np.array(X_test)
y_test = np.array(y_test)
aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
horizontal_flip=True,
fill_mode="nearest")
pre_trained_model = InceptionV3(input_shape = (299, 299, 3),
include_top = False,
weights = 'imagenet')
for layer in pre_trained_model.layers:
layer.trainable = False
x = layers.Flatten()(pre_trained_model.output)
x = layers.Dense(1024, activation = 'relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(2, activation = 'softmax')(x) #already tried with sigmoid activation, same behavior
model = Model(pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr = 0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy']) #Already tried with Adam optimizer, same behavior
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=100)
mc = ModelCheckpoint('best_model_inception_rmsprop.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
history = model.fit(x=aug.flow(X_train, y_train, batch_size=32),
validation_data = (X_val, y_val),
epochs = 100,
callbacks=[es, mc])
The training dataset has 2181 images and validation has 727 images.
Something is wrong, but I can't tell what...
Any thoughts of what can be done to improve it?

One way to avoid overfitting is to use a lot of data. The main reason overfitting happens is because you have a small dataset and you try to learn from it. The algorithm will have greater control over this small dataset and it will make sure it satisfies all the datapoints exactly. But if you have a large number of datapoints, then the algorithm is forced to generalize and come up with a good model that suits most of the points.
Suggestions:
Use a lot of data.
Use less deep network if you have a small number of data samples.
If 2nd satisfies then don't use huge number of epochs - Using many epochs leads is kinda forcing your model to learn that and your model will learn it well but can not generalize.

From your loss graph , i see that the model is generalized at early epoch ( where there is intersection of both the train & val score) so plz try to use the model saved at that epoch ( and not the later epochs which seems to overfit)
Second option what you have is use lot of training samples..
If you have less no. of training samples then use data augmentations

Have you tried following?
Using a higher dropout value
Lower Learning Rate (lr=0.00001 or lr=0.000001 ...)
More data augmentation you can use.
It seems to me your data amount is low. You may use a lower ratio for test and validation (10%, 10%).

Related

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

Neural network only predicts one class from binary class

My task is to learn defected items in a factory. It means, I try to detect defected goods or fine goods. This led a problem where one class dominates the others (one class is 99.7% of the data) as the defected items were very rare. Training accuracy is 0.9971 and validation accuracy is 0.9970. It sounds amazing.
But the problem is, the model only predicts everything is 0 class which is fine goods. That means, it fails to classify any defected goods.
How can I solve this problem? I have checked other questions and tried out, but I still have the situation. the total data points are 122400 rows and 5 x features.
In the end, my confusion matrix of the test set is like this
array([[30508, 0],
[ 92, 0]], dtype=int64)
which does a terrible job.
My code is as below:
le = LabelEncoder()
y = le.fit_transform(y)
ohe = OneHotEncoder(sparse=False)
y = y.reshape(-1,1)
y = ohe.fit_transform(y)
scaler = StandardScaler()
x = scaler.fit_transform(x)
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.25, random_state = 777)
#DNN Modelling
epochs = 15
batch_size =128
Learning_rate_optimizer = 0.001
model = Sequential()
model.add(Dense(5,
kernel_initializer='glorot_uniform',
activation='relu',
input_shape=(5,)))
model.add(Dense(5,
kernel_initializer='glorot_uniform',
activation='relu'))
model.add(Dense(8,
kernel_initializer='glorot_uniform',
activation='relu'))
model.add(Dense(2,
kernel_initializer='glorot_uniform',
activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer=Adam(lr = Learning_rate_optimizer),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
y_pred = model.predict(x_test)
confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
Thank you
it sounds like you have highly imbalanced dataset, the model is learning only how to classify fine goods.
you can try one of the approaches listed here:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
The best attempt would be to firstly take almost equal portions of data of both classes, split them into train-test-val, train the classifier and do thorough testing on your complete dataset. You can also try and use data augmentation techniques to your other set to get more data from the same set. Keep on iterating and maybe even try and change your loss function to suit your condition.

Keras - Issues using pre-trained word embeddings

I'm following Keras tutorials on word embeddings and replicated the code (with a few modifications) from this particular one:
Using pre-trained word embeddings in a Keras model
It's a topic classification problem in which they are loading pre-trained word vectors and use them via a fixed embedding layer.
When using the pre-trained embedding vectors I can, in fact, achieve their 95% accuracy. This is the code:
embedding_layer = Embedding(len(embed_matrix), len(embed_matrix.columns), weights=[embed_matrix],
input_length=data.shape[1:], trainable=False)
sequence_input = Input(shape=(MAXLEN,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Dropout(0.2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
output = Dense(target.shape[1], activation='softmax')(x)
model = Model(sequence_input, output)
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2,
batch_size=128)
The issue happens when I remove the embedding vectors and use completely random vectors, surprisingly achieving higher accuracy: 96.5%.
The code is the same, with one modification: weighs=[random_matrix]. That's a matrix with the same shape of embed_matrix, but using random values. So this is the embedding layer now:
embedding_layer = Embedding(len(embed_matrix),
len(embed_matrix.columns), weights=[random_matrix],
input_length=data.shape[1:], trainable=False)
I experimented many times with random weights and the result is always similar. Notice that even though those weights are random, the trainable parameter is still False, so the NN is not updating them.
After that, I fully removed the embedding layer and used words sequences as the input, expecting that those weights were not contributing to the model's accuracy. With that, I got nothing more than 16% accuracy.
So, what is going on? How could random embeddings achieve the same or better performance than pre-trained ones?
And why using word indexes (normalized, of course) as inputs result in such a poor accuracy?

Emotion detection on text

I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.

Is it logical to loop on model.fit in Keras?

Is it logical to do as below in Keras in order not to run out of memory?
for path in ['xaa', 'xab', 'xac', 'xad']:
x_train, y_train = prepare_data(path)
model.fit(x_train, y_train, batch_size=50, epochs=20, shuffle=True)
model.save('model')
It is, but prefer model.train_on_batch if each iteration is generating a single batch. This eliminates some overhead that comes with fit.
You can also try to create a generator and use model.fit_generator():
def dataGenerator(pathes, batch_size):
while True: #generators for keras must be infinite
for path in pathes:
x_train, y_train = prepare_data(path)
totalSamps = x_train.shape[0]
batches = totalSamps // batch_size
if totalSamps % batch_size > 0:
batches+=1
for batch in range(batches):
section = slice(batch*batch_size,(batch+1)*batch_size)
yield (x_train[section], y_train[section])
Create and use:
gen = dataGenerator(['xaa', 'xab', 'xac', 'xad'], 50)
model.fit_generator(gen,
steps_per_epoch = expectedTotalNumberOfYieldsForOneEpoch
epochs = epochs)
I would suggest having a look at this thread on Github.
You could indeed consider using model.fit(), but it would make the training more stable to do it in such a way:
for epoch in range(20):
for path in ['xaa', 'xab', 'xac', 'xad']:
x_train, y_train = prepare_data(path)
model.fit(x_train, y_train, batch_size=50, epochs=epoch+1, initial_epoch=epoch, shuffle=True)
This way you are iterating over all your data once per epoch, and not iterating 20 epochs over part of your data before switching.
As discussed in the thread, another solution would be to develop your own data generator and use it with model.fit_generator().

Resources