Is there a way to reload the weights from a certain epoch or the best weights from the model checkpoint files created by ModelCheckpoint once the training is over?
I have trained that trained for 10 epochs and created a checkpoint that only saved weights after each epoch. The final epoch's val_categorical_accuracy is a bit lower than epoch no. 5. I know I should have set save_best_only=True but I missed that.
So now, is there a way to get the weights from the best epoch or the epoch number 5?
Also, does ModelCheckpoint overwrites weights after each epoch in
the checkpoint file?
What are my options here? Thanks for your help in advance.
Below is my implementation:
checkpoint_path = 'saved_model/cp.ckpt'
checkpoint_dir = os.path.dirname(checkpoint_path)
print(checkpoint_dir)
lstm_model.fit(X_train_seq_pad, y_train_cat,
epochs=100,
validation_data=(X_val_seq_pad, y_val_cat),
callbacks=[callbacks.EarlyStopping(monitor='val_loss', patience=3),
callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)])
If the filepath doesn't contain formatting options like {epoch} then filepath will be overwritten by each new better model. In your case, that's why you can't get the weight at a specific epoch (e.g epoch 5).
Your option here, however, is to choose the formatting option in the ModelCheckpoint callback during training time. Such as
tf.keras.callbacks.ModelCheckpoint(
filepath='model.{epoch:02d}-{val_loss:.4f}.h5',
save_freq='epoch', verbose=1, monitor='val_loss',
save_weights_only=True, save_best_only=False
)
This will save the model weight (in .h5 format) at each epoch, in a different but convenient way. Additionally, if we choose save_best_only to True, it will save best weights in the same way.
Code Example
Here is one end-to-end working example for reference. We will save model weights at each epoch in a convenient way with a formatting option that we will define the filepath parameter as follows:
img = tf.random.normal([20, 32], 0, 1, tf.float32)
tar = np.random.randint(2, size=(20, 1))
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim = 32, activation= 'relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
callback_list = [
tf.keras.callbacks.ModelCheckpoint(
filepath='model.{epoch:02d}-{val_loss:.4f}.h5',
save_freq='epoch', verbose=1, monitor='val_loss',
save_weights_only=True, save_best_only=False
)
]
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(img, tar, epochs=5, verbose=2, validation_split=0.2,
callbacks=callback_list)
It will save the model weight at each epoch. And I will find every weight in my local disk.
# model.epoch_number_score.h5
model.01-0.8022.h5
model.02-0.8014.h5
model.03-0.8005.h5
model.04-0.7997.h5
model.05-0.7989.h5
However, note that I used save_best_only = False, but If we set it to True, you then only get the best weight in the same way. Something like this:
# model.epoch_number_score.h5
model.01-0.8022.h5
model.03-0.8005.h5
model.05-0.7989.h5
Related
I am new to deep learning and doing some classification problems.
I use EarlyStopping and ModelCheckpoint in my callbacks list but when training is starting, the baseline of the model checkpoint is negative infinity and overwrite 'best_model.h5'.
However, 'best_model.h5' already store my last best model. I want to set the baseline of ModelCheckpoint to the performance of my last best model on the data.
Can anyone help me?
es = EarlyStopping(monitor='val_accuracy', mode='max', verbose=1, patience=3)
mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_valid, y_valid), batch_size=400,\
epochs=20, callbacks=[es, mc])
enter image description here
Do this:
mc = ModelCheckpoint('best_model-{epoch:04d}_{val_accuracy:.2f}.h5', monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
This will save your new best model with epoch number and validation_accuracy without overwriting best_model.h5. This should later help you pick the best models and compare.
I think your problem was you wanted to save the val_acc before the first epoch - Back to the mechanism of a general machine learning problem, I don't think the accuracy value before the first iteration makes sense to a comparison (your model has not been trained on the given dataset). If you do want, you could check the validation loss (val_loss) if possible.
But if you want to save the log of your training process, you don't need to save model for each epoch. You could use the history function as (import matplotlib.pyplot as plt)
results = model.fit(x_train, y_train, validation_data=(x_valid, y_valid), batch_size=400,epochs=20, callbacks=[es, mc])
plt.figure(figsize=(8, 8))
plt.title("Learning curve")
plt.plot(results.history["loss"], label="loss")
plt.plot(results.history["val_loss"], label="val_loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.savefig('loss.png')
plt.figure(figsize=(8, 8))
plt.title("Learning curve")
plt.plot(results.history["acc"], label="accuracy")
plt.plot(results.history["val_acc"], label="accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.savefig('acc.png')
I need explanation of save best only option of ModelCheckpoint. If I have a code like this
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
cp = [ModelCheckpoint(filepath=path+"/model-lstmMulti", verbose=1, save_best_only=True)]
history_callback = model.fit(X, y, epochs=350, verbose=1, callbacks=cp)
and then I want to see the accuracy of that best model:
acc_history = history_callback.history["acc"]
np.savetxt(path+"/acc_history.txt", np.asarray(acc_history))
I got the array ie. accuracy of the models of all epochs. Why I don't get only one value - accuracy of the best model?
ModelCheckpoint is a callback function used to save model file (h5) after epochs. It doesn't affect the return history of fit() method. Just use np.max to get the best acc from acc history will do your job.
I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!
I'm using mobilenet v2 to train a model on my images. I've frozen all but a few layers and then added additional layers for training. I'd like to be able to train from an intermediate layer rather than from the beginning. My questions:
Is it possible to provide the output of the last frozen layer as the
input for training (it would be a tensor of (?, 7,7,1280))?
How does one specify training to start from that first trainable
(non-frozen) layer? In this case, mbnetv2_conv.layer[153].
What is y_train in this case? I don't quite understand how y_train
is being used during the training process- in general, when does the
CNN refer back to y_train?
Load mobilenet v2
image_size = 224
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze all layers except the last 3 layers
for layer in mbnetv2_conv.layers[:-3]:
layer.trainable = False
# Create the model
model = models.Sequential()
model.add(mbnetv2_conv)
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(3, activation='softmax'))
model.summary()
# Build an array (?,224,224,3) from images
x_train = np.array(all_images)
# Get layer output
from keras import backend as K
get_last_frozen_layer_output = K.function([mbnetv2_conv.layers[0].input],
[mbnetv2_conv.layers[152].output])
last_frozen_layer_output = get_last_frozen_layer_output([x_train])[0]
# Compile the model
from keras.optimizers import SGD
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['acc'])
# how to train from a specific layer and what should y_train be?
model.fit(last_frozen_layer_output, y_train, batch_size=2, epochs=10)
Yes, you can. Two different ways.
First, the hard way makes you build two new models, one with all your frozen layers, one with all your trainable layers. Add a Flatten() layer to the frozen-layers-only model. And you will copy the weights from mobilenet v2 layer by layer to populate the weights of the frozen-layers-only model. Then you will run your input images through the frozen-layers-only model, saving the output to disk in CSV or pickle form. This is now the input for your trainable-layers model, which you train with the model.fit() command as you did above. Save the weights when you're done training. Then you will have to build the original model with both sets of layers, and load the weights into each layer, and save the whole thing. You're done!
However, the easier way is to save the weights of your model separately from the architecture with:
model.save_weights(filename)
then modify the layer.trainable property of the layers in MobileNetV2 before you add it into a new empty model:
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
for layer in mbnetv2_conv.layers[:153]:
layer.trainable = False
model = models.Sequential()
model.add(mbnetv2_conv)
then reload the weights with
newmodel.load_weights(filename)
This lets you adjust which layers in your mbnetv2_conv model you will be training on the fly, and then just call model.fit() to continue training.
I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.