tf.keras how to save ModelCheckPoint object - keras

ModelCheckpoint can be used to save the best model based on a specific monitored metrics. So it obviously has information about the best metrics stored within its object. If you train on google colab for example, your instance can be killed without warning and you would lose this info after a long training session.
I tried to pickle the ModelCheckpoint object but got:
TypeError: can't pickle _thread.lock objects
Such that i can reuse this same object when I bring my notebook back. Is there a good way to do this? You can try to reproduce by:
chkpt_cb = tf.keras.callbacks.ModelCheckpoint('model.{epoch:02d}-{val_loss:.4f}.h5',
monitor='val_loss',
verbose=1,
save_best_only=True)
with open('chkpt_cb.pickle', 'w') as f:
pickle.dump(chkpt_cb, f, protocol=pickle.HIGHEST_PROTOCOL)

If callback object is not to be pickled (due to thread issue and not advisable), I can pickle this instead:
best = chkpt_cb.best
This stores the best monitored metrics that callback has seen, and it is a float, which you can pickle and reload next time, and then do this:
chkpt_cb.best = best # if chkpt_cb is a brand new object you create when colab killed your session.
This is my own setup:
# All paths should be on Google Drive, I omitted it here for simplicity.
chkpt_cb = tf.keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.4f}.h5',
monitor='val_loss',
verbose=1,
save_best_only=True)
if os.path.exists('chkpt_cb.best.pickle'):
with open('chkpt_cb.best.pickle', 'rb') as f:
best = pickle.load(f)
chkpt_cb.best = best
def save_chkpt_cb():
with open('chkpt_cb.best.pickle', 'wb') as f:
pickle.dump(chkpt_cb.best, f, protocol=pickle.HIGHEST_PROTOCOL)
save_chkpt_cb_callback = tf.keras.callbacks.LambdaCallback(
on_epoch_end=lambda epoch, logs: save_chkpt_cb()
)
history = model.fit_generator(generator=train_data_gen,
validation_data=dev_data_gen,
epochs=5,
callbacks=[chkpt_cb, save_chkpt_cb_callback])
So even when your colab session got killed, you can still retrieve the last best metrics and inform your new instance about it, and continue training as usual. This especially help when you re-compile a stateful optimizer and may cause a regression in the loss/metric and don't want to save those models for first few epochs.

I think you might be misunderstanding the intended usage of the ModelCheckpoint object. It is a callback that periodically gets called during training at a particular phase. The ModelCheckpoint callback in particular gets called after every epoch (if you keep the default period=1) and saves your model to disk in the filename you specify to the filepath argument. The model is saved in the same way described here. Then if you want to load that model later, you can do something like
from keras.models import load_model
model = load_model('my_model.h5')
Other answers on SO provide nice guidance and examples for continuing training from a saved model, for example: Loading a trained Keras model and continue training. Importantly, the saved H5 file stores everything about your model that is needed to continue training.
As suggested in the Keras documentation, you should not use pickle to serialize your model. Simply register the ModelCheckpoint callback with your 'fit' function:
chkpt_cb = tf.keras.callbacks.ModelCheckpoint('model.{epoch:02d}-{val_loss:.4f}.h5',
monitor='val_loss',
verbose=1,
save_best_only=True)
model.fit(x_train, y_train,
epochs=100,
steps_per_epoch=5000,
callbacks=[chkpt_cb])
Your model will be saved in an H5 file named as you have it, with the epoch number and loss values automatically formated for you. For example, your saved file for the 5th epoch with loss 0.0023 would look like model.05-.0023.h5, and since you set save_best_only=True, the model will only be saved if your loss is better than the previously saved one so you don't pollute your directory with a bunch of unneeded model files.

Related

What is the difference in saving the model as cnn.model or cnn.h5?How are these extensions different?

I am using model.save("cnn.model") and model.save("cnn.h5") to save the model after training.
What is the difference of the saving the model in 2 different extensions?
File name, which includes the extension, doesn't matter. Whatever it is, Keras will save a HDF5 formatted model into that file.
Doc: How can I save a Keras model?
You can use model.save(filepath) to save a Keras model into a single
HDF5 file which will contain:
the architecture of the model, allowing to re-create the model
the weights of the model
the training configuration (loss, optimizer)
the state of the optimizer, allowing to resume training exactly where you left off.

How to load weights from file and use them to predict test data in Keras

Yesterday night I let a neural network model training and that took time, so I thought to add a statement to save weights model.save_weights('first_try.h5')
Now as I had the file, I want to benefit saved file.
Prediction is like
pred=model.predict_generator(test_generator, steps=4124, verbose=1)
If you saved your model's weights you can load using load_weights method. But first you have to define your model structure.
e.g.
model = method_to_create_the_model()
model.load_weights("path_to_weight_file")

Load and use saved Keras model.h5

I try to a KerasClassifier (wrapper) into final_model.h5
validator = GridSearchCV(estimator=clf, param_grid=param_grid)
grid_result = validator.fit(train_images, train_labels)
best_estimator = grid_result.best_estimator_
best_estimator.model.save("final_model.h5")
And then I want to reuse the model
from keras.models import load_model
loaded_model = load_model("final_model.h5")
But it seems like loaded_model is now a Sequential object instead. In other words it is different from KerasClassifier object like best_estimator
I want to reuse some method like score which is available in KerasClassifier, which is not available in Sequential model. What should I do?
Also, I would like to know more about how to continue the training process left off on final_model.h5. What can I do next?
Yes, in the end you saved the Keras model as HDF5, not the KerasClassifier that is just an adapter to use with scikit-learn.
But you don't really need the KerasClassifier instance, you want the score function and this in keras is called evaluate, so just call model.evaluate(X, Y) and this will return a list containing first the loss and then any metrics that your model used (most likely accuracy).
To continue training the model, just load it and call model.fit with the new training set and that's it.

Keras: does save_model really save all optimizer weights?

Suppose you have a Keras model with an optimizer like Adam that you save via save_model.
If you load the model again with load_model, does it really load ALL optimizer parameters + weights?
Based on the code of save_model(Link), Keras saves the config of the optimizer:
f.attrs['training_config'] = json.dumps({
'optimizer_config': {
'class_name': model.optimizer.__class__.__name__,
'config': model.optimizer.get_config()},
which, in the case of Adam for example (Link), is as follows:
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
As such, this only saves the fundamental parameters but no per-variable optimizer weights.
However, after dumping the config in save_model, it looks like some optimizer weights are saved as well (Link). Unfortunately, I can't really understand if every weight of the optimizer saved.
So if you want to continue training the model in a new session with load_model, is the state of the optimizer really 100% the same as in the last training session? E.g. in the case of SGD with momentum, does it save all per-variable momentums?
Or in general, does it make a difference in training if you stop and resume training with save/load_model?
It seem your links don't point to the same lines anymore than they originally pointed to at the time of your question, so I don't know which lines you are referring to.
But the answer is yes, the entire state of the optimizer is saved along with the model. You can see this happening in save_model(). Also if you wish not to save the optimizer weights, you can do so by calling save_model(include_optimizer=False).
If you inspect the resulting *.h5 file, for example by means of h5dump | less, you can see those weights. (h5dump is part of h5utils.)
Therefore saving a model and loading it again later should make no difference in many common cases. However there are exceptions not related to the optimizer. One that comes to my mind right now is an LSTM(stateful=True) layer which I believe does not save the internal LSTM states when calling save_model(). There are possibly many more reasons why interrupting the training with save/load might not produce the exact same results as training without interruption. But investigating this maybe makes sense only in the context of concrete code.

Is it possible to continue training from a specific epoch?

A resource manager I'm using to fit a Keras model limits the access to a server to 1 day at a time. After this day, I need to start a new job. Is it possible with Keras to save the current model at epoch K, and then load that model to continue training epoch K+1 (i.e., with a new job)?
You can save weights after every epoch by specifying a callback:
weight_save_callback = ModelCheckpoint('/path/to/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', verbose=0, save_best_only=False, mode='auto')
model.fit(X_train,y_train,batch_size=batch_size,nb_epoch=nb_epoch,callbacks=[weight_save_callback])
This will save the weights after every epoch. You can then load them with:
model = Sequential()
model.add(...)
model.load('path/to/weights.hf5')
Of course your model needs to be the same in both cases.
You can add the initial_epoch argument. This will allow you to continue training from a specific epoch.
You can automatically start your training at the next epoch..!
What you need is to keep track of your training with a training log file as follow:
from keras.callbacks import ModelCheckpoint, CSVLogger
if len(sys.argv)==1:
model=... # you start training normally, no command line arguments
model.compile(...)
i_epoch=-1 # you need this to start at epoch 0
app=False # you want to start logging from scratch
else:
from keras.models import load_model
model=load_model(sys.argv[1]) # you give the saved model as input file
with open(csvloggerfile) as f: # you use your training log to get the right epoch number
i_epoch=list(f)
i_epoch=int(i_epoch[-2][:i_epoch[-2].find(',')])
app=True # you want to append to the log file
checkpointer = ModelCheckpoint(savemodel...)
csv_logger = CSVLogger(csvloggerfile, append=app)
model.fit(X, Y, initial_epoch=i_epoch+1, callbacks=[checkpointer,csv_logger])
That's all folks!

Resources