Suppose you have a Keras model with an optimizer like Adam that you save via save_model.
If you load the model again with load_model, does it really load ALL optimizer parameters + weights?
Based on the code of save_model(Link), Keras saves the config of the optimizer:
f.attrs['training_config'] = json.dumps({
'optimizer_config': {
'class_name': model.optimizer.__class__.__name__,
'config': model.optimizer.get_config()},
which, in the case of Adam for example (Link), is as follows:
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
As such, this only saves the fundamental parameters but no per-variable optimizer weights.
However, after dumping the config in save_model, it looks like some optimizer weights are saved as well (Link). Unfortunately, I can't really understand if every weight of the optimizer saved.
So if you want to continue training the model in a new session with load_model, is the state of the optimizer really 100% the same as in the last training session? E.g. in the case of SGD with momentum, does it save all per-variable momentums?
Or in general, does it make a difference in training if you stop and resume training with save/load_model?
It seem your links don't point to the same lines anymore than they originally pointed to at the time of your question, so I don't know which lines you are referring to.
But the answer is yes, the entire state of the optimizer is saved along with the model. You can see this happening in save_model(). Also if you wish not to save the optimizer weights, you can do so by calling save_model(include_optimizer=False).
If you inspect the resulting *.h5 file, for example by means of h5dump | less, you can see those weights. (h5dump is part of h5utils.)
Therefore saving a model and loading it again later should make no difference in many common cases. However there are exceptions not related to the optimizer. One that comes to my mind right now is an LSTM(stateful=True) layer which I believe does not save the internal LSTM states when calling save_model(). There are possibly many more reasons why interrupting the training with save/load might not produce the exact same results as training without interruption. But investigating this maybe makes sense only in the context of concrete code.
Related
I was wondering, when reading up on model.save() and model.load_weights() (and its callback variant ModelCheckpoint), why the focus is that heavily on weights. I would have expected them to be in the same league as biases, but they hardly get mentioned. The only way to save them as well is to set save_weights_only to false, which would save the entire model. Why is this the case? What are the benefits of only saving the weights?
I am asking this because I am working on Physics-Informed-Neural-Networks, and would like to swap out certain loss functions, which requires me to transfer the weights of one model to another (Something like transfer learning). The results I am getting are not bad, but I am not sure if saving the biases as well would improve its performance.
As Dr. Snoopy has pointed out the "term weights in Keras already includes the biases".
Regarding fine-tuning CNNs in PyTorch, as per SAVING AND LOADING MODELS:
If you only plan to keep the best performing model (according to the acquired validation loss), … You must serialize best_model_state or use best_model_state = deepcopy(model.state_dict()) otherwise your best best_model_state will keep getting updated by the subsequent training iterations. As a result, the final model state will be the state of the overfitted model.
However, I have done something like this:
def train_model(model, ...):
...
if validation_loss improves:
delete previous best model
torch.save(model.state_dict(), best_model_path)
else:
....
...
return model
def test_model(model, best_model_path, ...):
model.load_state_dict(torch.load(best_model_path))
model.eval()
...
...
my_model = train_model(my_model, ...)
test_model(my_model, my_path, ...)
In other words, the model returned by the training phase is the final one which is likely to present overfitting (I did not use deepcopy). But since I saved the best model during training, I have no problem during the test/inference phase because I load the best model, overloading the final model obtained during testing.
Is something wrong with this solution?
Thank you.
You’re still following the tutorial’s instructions. Note this part of the tutorial:
You must serialize best_model_state or use best_model_state = deepcopy(model.state_dict())
You serialized the best model’s state (wrote it to disk), so you don’t need to use deepcopy.
If you kept the model in memory, you’d use deepcopy to make sure it’s not altered during training. But because you’re keeping it on disk instead, it won’t be altered.
I've been working with the MLPClassifier for a while and I think I had a wrong interpretation of what the function is doing for the whole time and I think I got it right now, but I am not sure about that. So I will summarize my understanding and it would be great if you could add your thoughts on the right understanding.
So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process in order to check whether the network is working on a new set as well. If early_stopping = False, no validation within he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to.
What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.
I'm not sure if my question is understandable, but it would be great if you could help me to clear my thoughts.
The sklearn.neural_network.MLPClassifier uses (a variant of) Stochastic Gradient Descent (SGD) by default. Your question could be framed more generally as how SGD is used to optimize the parameter values in a supervised learning context. There is nothing specific to Multi-layer Perceptrons (MLP) here.
So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process
Correct, although it should be noted that this validation set is taken away from the original training set.
in order to check whether the network is working on a new set as well.
Not quite. The point of early stopping is to track the validation score during training and stop training as soon as the validation score stops improving significantly.
If early_stopping = False, no validation within [t]he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to.
Correct.
What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.
As you probably know by now, this is not so. The division of the learning process into epochs is somewhat arbitrary and has nothing to do with validation.
I am trying to load the weights from a Keras 1.0 Model into a Keras 2.0 model I created. I am sure the model architecture is exactly the same. The issues I am having is the load_weights() function is loading all the weights.
When I print the weights to a text file from the original model (loaded via load_model) and from the new model with load_weights() the later is missing many entry and are actually different. This also shows itself when making predictions as the accuracy is lower.
This problem only occurs in my LSTM layers. The embedding layers is fine and the Dense layer is also fine.
Any thoughts? I can not use load_model() as the original saved model was done in keras 1.0 and I need to use keras 2.0
EDIT MORE:
I should note I think the issue is the internal states not being loaded. Let me explain though. When I use get_weights() on each layer and I print it too terminal or a file the original model outputs a much larger matrix.
After using load_weights and then get_weights and print the weight matrix is missing many elements. I'm thinking it's the internal states.
The problem was that there was parameters for a compiled graph that were saved. I think it's safe to just port over the weights and continue training to let it catch up (maybe 1-2 epochs) if you can.
Gl
How can I test my pytorch model on validation data during training?
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
Also how would I undo the myNet.eval() command in order to continue with the training?
If anyone has some code snippet / toy example I would be grateful!
How can I test my pytorch model on validation data during training?
There are plenty examples where there are train and test steps for every epoch during training. An easy one would be the official MNIST example. Since pytorch does not offer any high-level training, validation or scoring framework you have to write it yourself. Commonly this consists of
a data loader (commonly based on torch.utils.dataloader.Dataloader)
a main loop over the total number of epochs
a train() function that uses training data to optimize the model
a test() or valid() function to measure the effectiveness of the model given validation data and a metric
This is also what you will find in the linked example.
Alternatively you can use a framework that provides basic looping and validation facilities so you don't have to implement everything by yourself all the time.
tnt is torchnet for pytorch, supplying you with different metrics (such as accuracy) and abstraction of the train loop. See this MNIST example.
inferno and torchsample attempt to model things very similar to Keras and provide some tools for validation
skorch is a scikit-learn wrapper for pytorch that lets you use all the tools and metrics from sklearn
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train() or, alternatively, supply a boolean to switch between eval and training: myNet.train(True) for train mode.
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
It doesn't prevent gradients from accumulating.
But I think during testing, you do want to ignore gradients. In that case, you should mark the variable input to the network as volatile=True, and it will save some time and space used in forward calculation.
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train()