Tensorflow is running out of memory between running two models. The batch size doesn't seem to make a difference. I've tried the clear session command seen in my example code below as well as del model and gc.collect, and tf.config.experimental.set_memory_growth(gpu, True).
A single model runs fine it's only after 1 or 2 where the memory runs out.
import tensorflow as tf
import numpy as np
X_train = np.random.rand(1000000, 768)
X_test = np.random.rand(100, 768)
y_train = np.random.randint(0, 4, size=1000000)
y_test = np.random.randint(0, 4, size=100)
print(X_train.shape, y_train.shape)
y_train = tf.keras.utils.to_categorical(y_train, num_classes=5, dtype='float32')
y_test = tf.keras.utils.to_categorical(y_test, num_classes=5, dtype='float32')
for i in range(10):
tf.keras.backend.clear_session()
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(768,), name='input'),
tf.keras.layers.Dense(768, activation='gelu', name='dense1'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(96, activation='gelu', name='dense2'),
tf.keras.layers.Dropout(0.05),
tf.keras.layers.Dense(5, activation='softmax', name='output')
])
opt = tf.keras.optimizers.Adam(learning_rate=0.0005) # 0.0005 was best so far.
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
model.compile(optimizer=opt,
loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
model.fit(X_train, y_train, epochs=1, validation_split=0.1, callbacks=[es], batch_size=0)
_, accuracy = model.evaluate(X_test, y_test, verbose=2)
Here's the beginning out the output:
2022-09-01 18:26:08.068402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6010 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2060 SUPER, pci bus id: 0000:2b:00.0, compute capability: 7.5
2022-09-01 18:26:09.076943: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 2764800000 exceeds 10% of free system memory.
28125/28125 [==============================] - 55s 2ms/step - loss: 1.3878 - categorical_accuracy: 0.2506 - val_loss: 1.3863 - val_categorical_accuracy: 0.2521
4/4 - 0s - loss: 1.3873 - categorical_accuracy: 0.2400 - 84ms/epoch - 21ms/step
2022-09-01 18:27:05.890074: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 2764800000 exceeds 10% of free system memory.
28106/28125 [============================>.] - ETA: 0s - loss: 1.3874 - categorical_accuracy: 0.24992022-09-01 18:28:07.971774: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.91MiB (rounded to 2000128)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
Thanks for any help
edit: This is using tensorflow 2.9.1 on Windows 10.
Related
I wrote an mlp and want start to tune it to fit a best results. But i've stucked with several different MSE.
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn import metrics
import numpy
import joblib
# load dataset
#dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataframe = read_csv("100.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:6]
Y = dataset[:,6]
# define the model
def larger_model():
# create model
model = Sequential()
model.add(Dense(20, input_dim=6, kernel_initializer='normal', activation='relu'))
model.add(Dense(50, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae','mse'])
return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=100, batch_size=5, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=2)
results = cross_val_score(pipeline, X, Y, cv=kfold)
pipeline.fit(X, Y)
prediction = pipeline.predict(X)
result_test = Y
print("%.2f (%.2f) MSE" % (results.mean(), results.std()))
print('Mean Absolute Error:', metrics.mean_absolute_error(prediction, result_test))
print('Mean Squared Error:', metrics.mean_squared_error(prediction, result_test))
Gives me that result:
Epoch 98/100
200/200 [==============================] - 0s 904us/step - loss: 0.0086 - mae: 0.0669 - mse: 0.0086
Epoch 99/100
200/200 [==============================] - 0s 959us/step - loss: 0.0032 - mae: 0.0382 - mse: 0.0032
Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973
200/200 [==============================] - 0s 600us/step
21.959478
-0.03 (0.02) MSE
Mean Absolute Error: 0.1959771416462339
Mean Squared Error: 0.0705598179059006
So i see here a 3 different mse results. Why so and which one i should take in mind to understand an overall model score when i willbe tune it?
Basically what I understood was if you print the results variable then you will get 2 MSE because you used n_splits=2.
-0.03 (0.02) MSE
Above output is the mean or average of the results(MSE) and std of the results(MSE).
Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973
Above outputs mse = 0.0973 this is I think for split=2 and it will take only 50% of whole data(X) because remaining 50% it will take as validation data.
Mean Squared Error: 0.0705598179059006
Above output is coming where you are predicting on whole data, not 50% by using best model so obviously, you will get 3 different MSEs for the above 3 prints.
I am also solving a very similar kind of problem, so do one thing divide the dataset into train and test and use train data for training and when you are predicting use test dataset then calculate MSE on test data or else keep this as it is and take Mean Squared Error: 0.0705598179059006 as your final mse.
I am trying to predict the forward dynamics of a physical system using neural networks. The equation is
f(X_t,A_t) = (X_{t+1} - X_t)/dt where X is the state vector composed of angular positions and angular velocities and A_t is the control input and dt the small increment in time t. I have 10k data points.
My code:
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler, QuantileTransformer
np.set_printoptions(precision=3, suppress=True)
print(tf.__version__)
"""
Trying to find the increment in states, f_(theta), from the equation
x_{t+1} = x_t + dt * f_(theta)(x_t, a_t)
x: [q1, q2, q3, q4, q5, q6, q7, q1d, q2d, q3d, q4d, q5d, q6d, q7d] # q : angular positions, qd: ang velocities
a: [a1, a2, a3, a4, a5, a6, a7] # control input
"""
data = np.load('state_actions_10k.npy', allow_pickle=True)
X = np.hstack((data[:, :7], data[:, 7:14]))
U = data[:, 14:]
dX = np.diff(X, axis=0) # state residual
dt = 1
scalarX = StandardScaler() # MinMaxScaler(feature_range=(-1,1))#StandardScaler()# RobustScaler()
scalarU = MinMaxScaler(feature_range=(-1, 1))
scalardX = MinMaxScaler(feature_range=(-1, 1))
scalarX.fit(X)
scalarU.fit(U)
scalardX.fit(dX)
normX = scalarX.transform(X)
normU = scalarU.transform(U)
normdX = scalardX.transform(dX)
inputs = np.hstack((normX, normU))
inputs = inputs[:-1]
outputs = normdX
n, test_frac, train_frac = len(X), 0.2, 0.7
val_frac = 1 - test_frac - train_frac
X_train = inputs[:int(n*train_frac)]
X_test = inputs[int(n*train_frac):int(n*(train_frac+test_frac))]
X_val = inputs[int(n*(train_frac+test_frac)):]
Y_train = outputs[:int(n*train_frac)]
Y_test = outputs[int(n*train_frac):int(n*(train_frac+test_frac))]
Y_val = outputs[int(n*(train_frac+test_frac)):]
model = tf.keras.Sequential([
tf.keras.Input(shape=(21,)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(14),
])
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
train = True
if train:
history = model.fit(
X_train,
Y_train,
batch_size=64,
epochs=100,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(X_val, Y_val),
)
model.save_weights('save_weights/trained_weights')
else:
model.load_weights('save_weights/trained_weights')
print("Evaluate on test data")
results = model.evaluate(X_test, Y_test, batch_size=128)
print("test loss, test acc:", results)
print("Generate predictions for 3 samples")
predictions = model.predict(X_test[:3])
print("predictions shape:", predictions.shape)
I see that the validation accuracy is 0.9 at the end which I think is quite low. The state_actions.npy and the saved networks weights are available here
Epoch 100/100
1094/1094 [==============================] - 1s 1ms/step - loss: 9.5670e-05 - accuracy: 0.9777 - val_loss: 0.0014 - val_accuracy: 0.9055
Evaluate on test data
157/157 [==============================] - 0s 759us/step - loss: 2.5453e-04 - accuracy: 0.9586
test loss, test acc: [0.0002545334573369473, 0.9585979580879211]
Can someone suggest a method to improve the accuracy?
I am relatively new to PyTorch and Huggingface-transformers and experimented with DistillBertForSequenceClassification on this Kaggle-Dataset.
from transformers import DistilBertForSequenceClassification
import torch.optim as optim
import torch.nn as nn
from transformers import get_linear_schedule_with_warmup
n_epochs = 5 # or whatever
batch_size = 32 # or whatever
bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
#bert_distil.classifier = nn.Sequential(nn.Linear(in_features=768, out_features=1), nn.Sigmoid())
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=0.1)
X_train = []
Y_train = []
for row in train_df.iterrows():
seq = tokenizer.encode(preprocess_text(row[1]['text']), add_special_tokens=True, pad_to_max_length=True)
X_train.append(torch.tensor(seq).unsqueeze(0))
Y_train.append(torch.tensor([row[1]['target']]).unsqueeze(0))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)
running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
permutation = torch.randperm(len(X_train))
j = 0
for i in range(0,len(X_train), batch_size):
optimizer.zero_grad()
indices = permutation[i:i+batch_size]
batch_x, batch_y = X_train[indices], Y_train[indices]
batch_x.cuda()
batch_y.cuda()
outputs = bert_distil.forward(batch_x.cuda())
loss = criterion(outputs[0],batch_y.squeeze().cuda())
loss.requires_grad = True
loss.backward()
optimizer.step()
running_loss += loss.item()
j+=1
if j == 20:
#print(outputs[0])
print('[%d, %5d] running loss: %.3f loss: %.3f ' %
(epoch + 1, i*1, running_loss / 20, loss.item()))
running_loss = 0.0
j = 0
[1, 608] running loss: 0.689 loss: 0.687
[1, 1248] running loss: 0.693 loss: 0.694
[1, 1888] running loss: 0.693 loss: 0.683
[1, 2528] running loss: 0.689 loss: 0.701
[1, 3168] running loss: 0.690 loss: 0.684
[1, 3808] running loss: 0.689 loss: 0.688
[1, 4448] running loss: 0.689 loss: 0.692 etc...
Regardless on what I tried, loss did never decrease, or even increase, nor did the prediction get better. It seems to me that I forgot something so that weights are actually not updated. Someone has an idea?
O
what I tried
Different loss functions
BCE
CrossEntropy
even MSE-loss
One-Hot Encoding vs A single neuron output
Different learning rates, and optimizers
I even changed all the targets to only one single label, but even then, the network did'nt converge.
Looking at running loss and minibatch loss is easily misleading. You should look at epoch loss, because the inputs are the same for every loss.
Besides, there are some problems in your code, fixing all of them and the behavior is as expected: the loss slowly decreases after each epoch, and it can also overfit to a small minibatch. Please look at the code, changes include: using model(x) instead of model.forward(x), cuda() only called once, smaller learning rate, etc.
Tuning and fine-tuning ML models are difficult work.
n_epochs = 5
batch_size = 1
bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=1e-3)
X_train = []
Y_train = []
for row in train_df.iterrows():
seq = tokenizer.encode(row[1]['text'], add_special_tokens=True, pad_to_max_length=True)[:100]
X_train.append(torch.tensor(seq).unsqueeze(0))
Y_train.append(torch.tensor([row[1]['target']]))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)
running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
permutation = torch.randperm(len(X_train))
for i in range(0,len(X_train), batch_size):
optimizer.zero_grad()
indices = permutation[i:i+batch_size]
batch_x, batch_y = X_train[indices].cuda(), Y_train[indices].cuda()
outputs = bert_distil(batch_x)
loss = criterion(outputs[0], batch_y)
loss.backward()
optimizer.step()
running_loss += loss.item()
print('[%d] epoch loss: %.3f' %
(epoch + 1, running_loss / len(X_train) * batch_size))
running_loss = 0.0
Output:
[1] epoch loss: 0.695
[2] epoch loss: 0.690
[3] epoch loss: 0.687
[4] epoch loss: 0.685
[5] epoch loss: 0.684
I would highlight two possible reasons for your "stable" results:
I agree that the learning rate is surely too high that prevents model from any significant updates.
But what is important to know is that based on the state-of-the-art papers finetuning has very marginal effect on the core NLP abilities of Transformers. For example, the paper says that finetuning only applies really small weight changes. Citing it: "Finetuning barely affects accuracy on NEL, COREF and REL indicating that those tasks are already sufficiently covered by pre-training". Several papers suggest that finetuning for classification tasks is basically waste of time. Thus, considering that DistilBert is actually a student model of BERT, maybe you won't get better results. Try pre-training with your data first. Generally, pre-training has a more significant impact.
I have got similar problem when I tried to use xxxForSequenceClassification to fine-tune my down-stream task.
At last, I changed xxxForSequenceClassification to xxxModel and added Dropout - FC - Softmax. Magically it's solved, loss decreased as expected.
I'm still trying to find out why.
Hope it may help you.
FYI, transformers verion: 3.5.0
Maybe the poor performance is due to gradients being applied to the BERT backbone. Validate it like so:
print([p.requires_grad for p in bert_distil.distilbert.parameters()])
As an alternative solution, try freezing the weights of your trained model:
for param in bert_distil.distilbert.parameters():
param.requires_grad = False
As you are trying to optimize the weights of a trained model during fine-tuning on your data, you face issues described, among other sources, in the ULMIfit (https://arxiv.org/abs/1801.06146) paper
So I would like to compute R2 = 1 - residual_ss/y_ss after keras. I used the prediction model.predict() to compute residual_ss. However, the residual_ss is much larger than y_ss which results in a negative R2. Since residual_ss = n*mse and mse is also the loss function, the code shows the computation for mse after the model:
import keras
keras.__version__
from keras.datasets import boston_housing
import pandas as pd
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std
test_data -= mean
test_data /= std
from keras import models
from keras import layers
def build_model():
# Because we will need to instantiate
# the same model multiple times,
# we use a function to construct it.
model = models.Sequential()
model.add(layers.Dense(64, activation='relu',
input_shape=(train_data.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
return model
model=build_model()
model.fit(train_data, train_targets, epochs=200, batch_size=32)
#try to get mse
y_pred = model.predict(train_data)
mse=np.mean((train_targets-y_pred)*(train_targets-y_pred))
print(mse)
Here is last 3 epochs and the mse in the end
Epoch 198/200
404/404 [=======] - 0s 17us/step - loss: 3.4695 - mean_absolute_error: 1.3338
Epoch 199/200
404/404 [=======] - 0s 22us/step - loss: 3.5412 - mean_absolute_error: 1.3260
Epoch 200/200
404/404 [=======] - 0s 20us/step - loss: 3.2775 - mean_absolute_error: 1.2858
162.25934358457062
I only use train_data and train_targets here. Why I got a mse not even close to the loss (mse) reported in each epoch? So the prediction is not close to the target. Please help.
I am doing some image classification using inception_v3 model in keras, however, my train accuracy is lower than validation during the whole training process. And my validation accuracy is above 0.95 from the first epoch. I also find that train loss is much higher than validation loss. In the end, the test accuracy is 0.5, which is pretty bad.
At first, my optimizer is Adam with learning rate equals to 0.00001, the result is bad. Then I change it to SGD with learning rate of 0.00001, which doesn't make any change to the bad result. I also tried to increase the learning rate to 0.1, but the test accuracy is still around 0.5
import numpy as np
import pandas as pd
import keras
from keras import layers
from keras.applications.inception_v3 import preprocess_input
from keras.models import Model
from keras.layers.core import Dense
from keras.layers import GlobalAveragePooling2D
from keras.optimizers import Adam, SGD, RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.utils import plot_model
from keras.models import model_from_json
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt
import math
import copy
import pydotplus
train_path = 'data/train'
valid_path = 'data/validation'
test_path = 'data/test'
top_model_weights_path = 'model_weigh.h5'
# number of epochs to train top model
epochs = 100
# batch size used by flow_from_directory and predict_generator
batch_size = 2
img_width, img_height = 299, 299
fc_size = 1024
nb_iv3_layers_to_freeze = 172
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
train_batches =
train_datagen.flow_from_directory(train_path,
target_size=(img_width, img_height),
classes=None,
class_mode='categorical',
batch_size=batch_size,
shuffle=True)
valid_batches =
valid_datagen.flow_from_directory(valid_path,
target_size=(img_width,img_height),
classes=None,
class_mode='categorical',
batch_size=batch_size,
shuffle=True)
test_batches =
ImageDataGenerator().flow_from_directory(test_path,
target_size=(img_width,
img_height),
classes=None,
class_mode='categorical',
batch_size=batch_size,
shuffle=False)
nb_train_samples = len(train_batches.filenames)
# get the size of the training set
nb_classes_train = len(train_batches.class_indices)
# get the number of classes
predict_size_train = int(math.ceil(nb_train_samples / batch_size))
nb_valid_samples = len(valid_batches.filenames)
nb_classes_valid = len(valid_batches.class_indices)
predict_size_validation = int(math.ceil(nb_valid_samples / batch_size))
nb_test_samples = len(test_batches.filenames)
nb_classes_test = len(test_batches.class_indices)
predict_size_test = int(math.ceil(nb_test_samples / batch_size))
def add_new_last_layer(base_model, nb_classes):
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(fc_size, activation='relu')(x)
pred = Dense(nb_classes, activation='softmax')(x)
model = Model(input=base_model.input, output=pred)
return model
# freeze base_model layer in order to get the bottleneck feature
def setup_to_transfer_learn(model, base_model):
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer=Adam(lr=0.00001),
loss='categorical_crossentropy',
metrics=['accuracy'])
base_model = keras.applications.inception_v3.InceptionV3(weights='imagenet', include_top=False)
model = add_new_last_layer(base_model, nb_classes_train)
setup_to_transfer_learn(model, base_model)
model.summary()
train_labels = train_batches.classes
train_labels = to_categorical(train_labels, num_classes=nb_classes_train)
validation_labels = valid_batches.classes
validation_labels = to_categorical(validation_labels, num_classes=nb_classes_train)
history = model.fit_generator(train_batches,
epochs=epochs,
steps_per_epoch=nb_train_samples // batch_size,
validation_data=valid_batches,
validation_steps=nb_valid_samples // batch_size,
class_weight='auto')
# save model to json
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize model to HDF5
model.save_weights(top_model_weights_path)
print("Saved model to disk")
# model visualization
plot_model(model,
show_shapes=True,
show_layer_names=True,
to_file='model.png')
(eval_loss, eval_accuracy) = model.evaluate_generator(
valid_batches,
steps=nb_valid_samples // batch_size,
verbose=1)
print("[INFO] evaluate accuracy: {:.2f}%".format(eval_accuracy * 100))
print("[INFO] evaluate loss: {}".format(eval_loss))
test_batches.reset()
predictions = model.predict_generator(test_batches,
steps=nb_test_samples / batch_size,
verbose=0)
# print(predictions)
predicted_class_indices = np.argmax(predictions, axis=1)
# print(predicted_class_indices)
labels = train_batches.class_indices
labels = dict((v, k) for k, v in labels.items())
final_predictions = [labels[k] for k in predicted_class_indices]
# print(final_predictions)
# save as csv file
filenames = test_batches.filenames
results = pd.DataFrame({"Filename": filenames,
"Predictions": final_predictions})
results.to_csv("results.csv", index=False)
# evaluation test result
(test_loss, test_accuracy) = model.evaluate_generator(
test_batches,
steps=nb_train_samples // batch_size,
verbose=1)
print("[INFO] test accuracy: {:.2f}%".format(test_accuracy * 100))
print("[INFO] test loss: {}".format(test_loss))
Here is a brief summary of training process:
Epoch 1/100
2000/2000 [==============================] - 146s 73ms/step - loss: 0.4941 - acc: 0.7465 - val_loss: 0.1612 - val_acc: 0.9770
Epoch 2/100
2000/2000 [==============================] - 140s 70ms/step - loss: 0.4505 - acc: 0.7725 - val_loss: 0.1394 - val_acc: 0.9765
Epoch 3/100
2000/2000 [==============================] - 139s 70ms/step - loss: 0.4505 - acc: 0.7605 - val_loss: 0.1643 - val_acc: 0.9560
......
Epoch 98/100
2000/2000 [==============================] - 141s 71ms/step - loss: 0.1348 - acc: 0.9467 - val_loss: 0.0639 - val_acc: 0.9820
Epoch 99/100
2000/2000 [==============================] - 140s 70ms/step - loss: 0.1495 - acc: 0.9365 - val_loss: 0.0780 - val_acc: 0.9770
Epoch 100/100
2000/2000 [==============================] - 138s 69ms/step - loss: 0.1401 - acc: 0.9458 - val_loss: 0.0471 - val_acc: 0.9890
Here is the result that I get:
[INFO] evaluate accuracy: 98.55%
[INFO] evaluate loss: 0.05201659869024259
2000/2000 [==============================] - 47s 23ms/step
[INFO] test accuracy: 51.70%
[INFO] test loss: 7.737395915810134
I wish someone can help me deal with this problem.
As the code is now, you're not freezing the layers of the model for transfer learning. In the setup_to_transfer_learn you're freezing the layer in base_model, and then compiling the new model (containing layers from the base model), but not actually freezing on the new model. Just change setup_to_transfer_learn:
def setup_to_transfer_learn(model):
for layer in model.layers[:-3]: # since you added three new layers (which should not freeze)
layer.trainable = False
model.compile(optimizer=Adam(lr=0.00001),
loss='categorical_crossentropy',
metrics=['accuracy'])
Then call the function like this:
model = add_new_last_layer(base_model, nb_classes_train)
setup_to_transfer_learn(model)
You should see a large difference in the number of trainable parameters when calling model.summary()
Finally, I solved the problem. I forget to do image preprocessing to my test data. After I add this, everything works really fine.
I change this:
test_batches = ImageDataGenerator().flow_from_directory(test_path,
target_size=(img_width, img_height),
classes=None,
class_mode='categorical',
batch_size=batch_size,
shuffle=False)
to this:
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_batches = test_datagen.flow_from_directory(test_path,
target_size=(img_width, img_height),
classes=None,
class_mode='categorical',
batch_size=batch_size,
shuffle=False)
And the test accuracy is 0.98, test loss is 0.06.
What actually happens is that when you use preprocessing the model may actually start learning those techniques. One way to check if your model is learning good features is using Grad-CAM